Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread INADA Naoki
> I care only about builtin open()'s behavior.
> PEP 538 doesn't change default error handler of open().
>
> I think PEP 538 and PEP 540 should behave almost identical except
> changing locale
> or not.  So I need very strong reason if PEP 540 changes default error
> handler of open().
>

I just came up with crazy idea; changing default error handler of open()
to "surrogateescape" only when open mode is "w" or "a".

When reading, "surrogateescape" error handler is dangerous because
it can produce arbitrary broken unicode string by mistake.

On the other hand, "surrogateescape" error handler for writing
is not so dangerous if encoding is UTF-8.
When writing normal unicode string, it doesn't create broken data.
When writing string containing surrogateescaped data, data is
(partially) broken before writing.

This idea allows following code:

with open("files.txt", "w") as f:
for fn in os.listdir():  # may returns surrogateescaped string
f.write(fn+'\n')

And it doesn't allow following code:

with open("image.jpg", "r") as f:  # Binary data, not UTF-8
return f.read()


I'm not sure about this is good idea.  And I don't know when is good for
changing write error handler; only when PEP 538 or PEP 540 is used?
Or always when os.fsencoding() is UTF-8?

Any thoughts?

INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Eric Snow
On Dec 6, 2017 21:14, "Guido van Rossum"  wrote:

OK, then please just change the PEP's Version: header to 3.8.


Will do.  Have a nice vacation! :)

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Guido van Rossum
OK, then please just change the PEP's Version: header to 3.8.

On Wed, Dec 6, 2017 at 7:57 PM, Eric Snow 
wrote:

>
>
> On Dec 6, 2017 20:31, "Guido van Rossum"  wrote:
>
> If the point is just to be able to test the existing API better, no PEP is
> needed, right? It would be an unsupported, undocumented API.
>
>
> In the short term that's one major goal.  In the long term the
> functionality provided by the PEP is a prerequisite for other
> concurrency-related features, and targeting 3.8 for that is fine. :)
>
> -eric
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Eric Snow
On Dec 6, 2017 20:31, "Guido van Rossum"  wrote:

If the point is just to be able to test the existing API better, no PEP is
needed, right? It would be an unsupported, undocumented API.


In the short term that's one major goal.  In the long term the
functionality provided by the PEP is a prerequisite for other
concurrency-related features, and targeting 3.8 for that is fine. :)

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Guido van Rossum
If the point is just to be able to test the existing API better, no PEP is
needed, right? It would be an unsupported, undocumented API.

On Wed, Dec 6, 2017 at 7:22 PM, Nick Coghlan  wrote:

> On 7 December 2017 at 12:46, Guido van Rossum  wrote:
> > So you're okay with putting this off till (at least) 3.8? That sounds
> good
> > to me, given that I'd like to go on vacation soon.
>
> Eric reminded me off-list that we'd like to at least add the lower
> level _interpreters API for the benefit of the test suite - right now,
> all of our subinterpreter testing needs to be run through either
> test_embed or test_capi, which is annoying enough that we end up
> simply not testing the subinterpreter functionality properly (in
> practice, we're relying heavily on the regression test suites for
> mod_wsgi and JEP to find any problems we inadvertently introduce when
> refactoring CPython's internals).
>
> If we were to put that under test.support._interpreters for 3.7, we'd
> be able to make it clear that we're in "Even more experimental than
> provisional API status would account for" territory, while still
> enabling the improved testing and accessibility for experimentation
> that we're after in order to make some better informed API design
> proposals for Python 3.8.
>
> Regards,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 12:46, Guido van Rossum  wrote:
> So you're okay with putting this off till (at least) 3.8? That sounds good
> to me, given that I'd like to go on vacation soon.

Eric reminded me off-list that we'd like to at least add the lower
level _interpreters API for the benefit of the test suite - right now,
all of our subinterpreter testing needs to be run through either
test_embed or test_capi, which is annoying enough that we end up
simply not testing the subinterpreter functionality properly (in
practice, we're relying heavily on the regression test suites for
mod_wsgi and JEP to find any problems we inadvertently introduce when
refactoring CPython's internals).

If we were to put that under test.support._interpreters for 3.7, we'd
be able to make it clear that we're in "Even more experimental than
provisional API status would account for" territory, while still
enabling the improved testing and accessibility for experimentation
that we're after in order to make some better informed API design
proposals for Python 3.8.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Guido van Rossum
So you're okay with putting this off till (at least) 3.8? That sounds good
to me, given that I'd like to go on vacation soon.

On Wed, Dec 6, 2017 at 5:04 PM, Nick Coghlan  wrote:

> On 7 December 2017 at 01:50, Guido van Rossum  wrote:
> > Sorry to burst your bubble, but I have not followed any of the discussion
> > and I am actually very worried about this topic. I don't think I will be
> > able to make time for this before the 3.7b1 feature freeze.
>
> I think that will be OK, as it will encourage us to refactor Eric's
> branch into two distinct pieces in the meantime: exposing any needed C
> API elements that aren't currently visible as
> "nominally-private-but-linkable-if-you're-prepared-to-cope-with-potential-
> instability"
> interfaces, and then a pip-installable extension module that adds the
> Python level API.
>
> We won't be able to experiment with ideas like removing GIL sharing
> between subinterpreters that way, but we'll be able to work on the
> semantics of the user facing API design, and enable experimentation
> with things like CSP and Actor-based programming backed by stronger
> memory separation than is offered by Python threads.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 01:50, Guido van Rossum  wrote:
> Sorry to burst your bubble, but I have not followed any of the discussion
> and I am actually very worried about this topic. I don't think I will be
> able to make time for this before the 3.7b1 feature freeze.

I think that will be OK, as it will encourage us to refactor Eric's
branch into two distinct pieces in the meantime: exposing any needed C
API elements that aren't currently visible as
"nominally-private-but-linkable-if-you're-prepared-to-cope-with-potential-instability"
interfaces, and then a pip-installable extension module that adds the
Python level API.

We won't be able to experiment with ideas like removing GIL sharing
between subinterpreters that way, but we'll be able to work on the
semantics of the user facing API design, and enable experimentation
with things like CSP and Actor-based programming backed by stronger
memory separation than is offered by Python threads.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 08:20, Victor Stinner  wrote:
> 2017-12-06 23:07 GMT+01:00 Antoine Pitrou :
>> One question: how do you plan to test for the POSIX locale?
>
> I'm not sure. I will probably rely on Nick for that ;-) Nick already
> implemented this exact check for his PEP 538 which is already
> implemented in Python 3.7.
>
> I already implemented the PEP 540:
>
>https://bugs.python.org/issue29240
>https://github.com/python/cpython/pull/855
>
> Right now, my implementation uses:
>
>char *ctype = _PyMem_RawStrdup(setlocale(LC_CTYPE, ""));
>...
>if (strcmp(ctype, "C") == 0) ...

We have a private helper for this as a result of the PEP 538
implementation: _Py_LegacyLocaleDetected()

Details are in the source code at
https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L345

As per my comment there, and Jakub Wilk's post to this thread, we're
missing a case to also check for the string "POSIX" (which will fix
several of the current locale coercion discrepancies between Linux and
*BSD systems).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 01:59, Jakub Wilk  wrote:
> * Nick Coghlan , 2017-12-06, 16:15:
>> The one that's relevant to default locale detection is just the string
>> that "setlocale(LC_CTYPE, NULL)" returns.
>
> POSIX doesn't require any particular return value for setlocale() calls.
> It's only guaranteed that the returned string can be used in subsequent
> setlocale() calls to restore the original locale.
>
> So in the POSIX locale, a compliant setlocale() implementation could return
> "C", or "POSIX", or even something entirely different.

Thanks. I'd been wondering if we should also handle the "POSIX" case
in the legacy locale detection logic, and you've convinced me that we
should. Issue filed for that here: https://bugs.python.org/issue32238

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Thu, 7 Dec 2017 00:22:52 +0100
Victor Stinner  wrote:
> 2017-12-06 23:36 GMT+01:00 Antoine Pitrou :
> > Other than that, +1 on the PEP.  
> 
> Naoki doesn't seem to be confortable with the usage of the
> surrogateescape error handler by default for open(). Are you ok with
> that? If yes, would you mind to explain why? :-)

Sorry, I had missed that objection.  I agree with Inada Naoki: it's
better to keep it strict.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] iso8601 parsing

2017-12-06 Thread Paul Ganssle
Here is the PR I've submitted:

https://github.com/python/cpython/pull/4699

The contract that I'm supporting (and, I think it can be argued, the only 
reasonable contract in the intial implementation) is the following:

dtstr = dt.isoformat(*args, **kwargs)
dt_rt = datetime.fromisoformat(dtstr)
assert dt_rt == dt# The two points represent the same 
absolute time
assert dt_rt.replace(tzinfo=None) == dt.replace(tzinfo=None)   # And the 
same wall time

For all valid values of `dt`, `args` and `kwargs`.

A corollary of the `dt_rt == dt` invariant is that you can perfectly recreate 
the original `datetime` with the following additional step:

dt_rt = dt_rt if dt.tzinfo is None else dt_rt.astimezone(dt.tzinfo)

There is no way for us to guarantee that `dt_rt.tzinfo == dt.tzinfo` or that 
`dt_rt.tzinfo is dt.tzinfo`, because `isoformat()` is slightly lossy (it loses 
the political zone), but this is not an issue because lossless round trips just 
require you to serialize the political zone, which is generally simple enough.


On 12/06/2017 07:54 PM, Barry Scott wrote:
> 
> 
>> On 26 Oct 2017, at 17:45, Chris Barker  wrote:
>>
>> This is a key point that I hope is obvious:
>>
>> If an ISO string has NO offset or timezone indicator, then a naive datetime 
>> should be created.
>>
>> (I say, I "hope" it's obvious, because the numpy datetime64 implementation 
>> initially (and for years) would apply the machine local timezone to a bare 
>> iso string -- which was a f-ing nightmare!)
> 
> 
> I hope the other obvious thing is that if there is a offset then a datetime 
> that is *not* naive can be created
> as it describes an unambiguous point in time. We just cannot know what 
> political timezone to choose.
> I'd guess that it should use the UTC timezone in that case.
> 
> Barry
> 
> 
> 
> 
> 
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
> 



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
2017-12-06 23:36 GMT+01:00 Antoine Pitrou :
> Other than that, +1 on the PEP.

Naoki doesn't seem to be confortable with the usage of the
surrogateescape error handler by default for open(). Are you ok with
that? If yes, would you mind to explain why? :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Wed, 6 Dec 2017 23:20:41 +0100
Victor Stinner  wrote:
> 2017-12-06 23:07 GMT+01:00 Antoine Pitrou :
> > One question: how do you plan to test for the POSIX locale?  
> 
> I'm not sure. I will probably rely on Nick for that ;-) Nick already
> implemented this exact check for his PEP 538 which is already
> implemented in Python 3.7.

Other than that, +1 on the PEP.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
2017-12-06 23:07 GMT+01:00 Antoine Pitrou :
> One question: how do you plan to test for the POSIX locale?

I'm not sure. I will probably rely on Nick for that ;-) Nick already
implemented this exact check for his PEP 538 which is already
implemented in Python 3.7.

I already implemented the PEP 540:

   https://bugs.python.org/issue29240
   https://github.com/python/cpython/pull/855

Right now, my implementation uses:

   char *ctype = _PyMem_RawStrdup(setlocale(LC_CTYPE, ""));
   ...
   if (strcmp(ctype, "C") == 0) ...

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Wed, 6 Dec 2017 01:49:41 +0100
Victor Stinner  wrote:
> Hi,
> 
> I knew that I had to rewrite my PEP 540, but I was too lazy. Since
> Guido explicitly requested a shorter PEP, here you have!
> 
> https://www.python.org/dev/peps/pep-0540/
> 
> Trust me, it's the same PEP, but focused on the most important
> information and with a shorter rationale ;-)

Congrats on the rewriting!  The shortening is appreciated :-)

One question: how do you plan to test for the POSIX locale?  Apparently
you need to check at least for the "C" and "POSIX" strings, but perhaps
other aliases as well?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Greg Ewing

Victor Stinner wrote:

Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with
surrogateescape, or backslashreplace for stderr, or surrogatepass for
fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But
the PEP title would be too long, no? :-)


Relaxed UTF-8 Mode?

UTF8-Yeah-I'm-Fine-With-That mode?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] iso8601 parsing

2017-12-06 Thread Barry Scott


> On 26 Oct 2017, at 17:45, Chris Barker  wrote:
> 
> This is a key point that I hope is obvious:
> 
> If an ISO string has NO offset or timezone indicator, then a naive datetime 
> should be created.
> 
> (I say, I "hope" it's obvious, because the numpy datetime64 implementation 
> initially (and for years) would apply the machine local timezone to a bare 
> iso string -- which was a f-ing nightmare!)


I hope the other obvious thing is that if there is a offset then a datetime 
that is *not* naive can be created
as it describes an unambiguous point in time. We just cannot know what 
political timezone to choose.
I'd guess that it should use the UTC timezone in that case.

Barry



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] HP-UX pr not feeling the love

2017-12-06 Thread Rob Boehne
Thanks!  I’m personally comfortable with dropping support for systems that 
people can’t buy support for, like IRIX, ULTRIX, SCO etc.  It’s hard to 
envision those being anything other than hobbyist platforms.  Here at 
Datalogics, we are selling and supporting software on HP-UX, for both pa-risc 
and Itanium, and now that SCons is beginning to support Python 3.x, I am 
attempting to use some of my spare time to get this platform (and others we 
support) to build, and then to run, on the development branch for Python 3.   
So rather than taking a large block of my time to port and fix any problems, 
I’m going to submit pr’s in a trickle, as time permits.  I’m picking HP-UX 
because it’s probably the most obscure thing we use, and likely would take the 
most effort.


From: Lukasz Langa 
Date: Wednesday, December 6, 2017 at 11:45 AM
To: Rob Boehne 
Cc: "python-dev@python.org" 
Subject: Re: [Python-Dev] HP-UX pr not feeling the love

Hi Rob,
thanks for your patch. CPython core developers, as volunteers, have limited 
resources available to maintain Python. Those resources are not only time, they 
are also mental resources necessary to make a change in Python as well as 
actual physical resources. Supporting a platform requires all three:

1. You need time to make a platform work initially, and then continuous effort 
to keep it working, fixing regressions, including this platform in new 
features, etc.
2. You need mental resources to manage additional complexity that comes from 
#ifdef sprinkled through the code, cryptic configure/Makefile machinery, etc.
3. You need access to machines running the given operating system to be able to 
test if your changes are compatible.

This is why we are keeping the list of supported platforms relatively short. In 
fact, in time we're cutting support for less popular platforms that we couldn't 
keep running. Details in https://www.python.org/dev/peps/pep-0011/. Look, just 
in 3.7 we're dropping IRIX and systems without threads.

As you're saying, while your current PR is relatively innocent, more are needed 
to make it work. If those require more drastic changes in our codebase, we 
won't be able to accept them due to reasons stated above.

I understand where you're coming from. If you're serious about this, we would 
need to see the full extent of changes required to make Python 3.7 work on HP 
UX, preferably minimal. We would also need a buildbot added to our fleet (see 
http://buildbot.python.org/) that would ensure the build stays green. Finally, 
we would need you to think whether you could provide the patches that keep the 
build green for a significant period of time (counted in years).

- Ł




On Dec 6, 2017, at 7:22 AM, Rob Boehne 
> wrote:

Hello,

Back in June I was fired up to get my diverse set of platforms all running 
Python 3, but quickly ran into issues and submitted a PR.

https://github.com/python/cpython/pull/2519

It seems as though this HP-UX specific change isn’t getting much consideration, 
which probably isn’t a big deal.  What may be more important is that I’ve 
stopped trying to contribute, and if I really need Python 3 on HP-UX, AIX, 
Sparc Solaris or other operating systems, I’ll have to hack it together myself 
and maintain  my own fork, while presumably others do the same.  At the same 
time I’m working hard to convince management that we shouldn’t create technical 
debt by maintaining patches to all the tools we use, and that we should get 
these changes accepted into the upstream repos.

Could someone have a look at this PR and possibly merge?

Thanks,

Rob Boehne

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Brett Cannon
On Wed, 6 Dec 2017 at 06:10 INADA Naoki  wrote:

> >> And I have one worrying point.
> >> With UTF-8 mode, open()'s default encoding/error handler is
> >> UTF-8/surrogateescape.
> >
> > The Strict UTF-8 Mode is for you if you prioritize correctness over
> usability.
>
> Yes, but as I said, I cares about not experienced developer
> who doesn't know what UTF-8 mode is.
>
> >
> > In the very first version of my PEP/idea, I wanted to use
> > UTF-8/strict. But then I started to play with the implementation and I
> > got many "practical" issues. Using UTF-8/strict, you quickly get
> > encoding errors. For example, you become unable to read undecodable
> > bytes from stdin. stdin.read() only gives you an error, without
> > letting you decide how to handle these "invalid" data. Same issue with
> > stdout.
> >
>
> I don't care about stdio, because PEP 538 uses surrogateescape for
> stdio/error
>
> https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams
>
> I care only about builtin open()'s behavior.
> PEP 538 doesn't change default error handler of open().
>
> I think PEP 538 and PEP 540 should behave almost identical except
> changing locale
> or not.  So I need very strong reason if PEP 540 changes default error
> handler of open().
>

I don't have enough locale experience to weigh in as an expert, but I
already was leaning towards INADA-san's logic of not wanting to change
open() and this makes me really not want to change it.

-Brett


>
>
> > In the old long version of the PEP, I tried to explain UTF-8/strict
> > issues with very concrete examples, the removed "Use Cases" section:
> >
> https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490
> >
> > Tell me if I should rephrase the rationale of the PEP 540 to better
> > justify the usage of surrogateescape.
>
> OK, "List a directory into a text file" example demonstrates why
> surrogateescape
> is used for open().  If os.listdir() returns surrogateescpaed data,
> file.write() will be
> fail.
> All other examples are about stdio.
>
> But we should achieve good balance between correctness and usability of
> default behavior.
>
> >
> > Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with
> > surrogateescape, or backslashreplace for stderr, or surrogatepass for
> > fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But
> > the PEP title would be too long, no? :-)
> >
>
> I feel short name is enough.
>
> >
> >> And opening binary file without "b" option is very common mistake of new
> >> developers.  If default error handler is surrogateescape, they lose a
> chance
> >> to notice their bug.
> >
> > When open() in used in text mode to read "binary data", usually the
> > developer would only notify when getting the POSIX locale (ASCII
> > encoding). But the PEP 538 already changed that by using the C.UTF-8
> > locale (and so the UTF-8 encoding, instead of the ASCII encoding).
> >
>
> With PEP 538 (C.UTF-8 locale), open() uses UTF-8/strict, not
> UTF-8/surrogateescape.
>
> For example, this code raise UnicodeDecodeError with PEP 538 if the
> file is JPEG file.
>
> with open(fn) as f:
> f.read()
>
>
> > I'm not sure that locales are the best way to detect such class of
> > bytes. I suggest to use -b or -bb option to detect such bugs without
> > having to care of the locale.
> >
>
> But many new developers doesn't use/know -b or -bb option.
>
> >
> >> On the other hand, it helps some use cases when user want
> byte-transparent
> >> behavior, without modifying code to use "surrogateescape" explicitly.
> >>
> >> Which is more important scenario?  Anyone has opinion about it?
> >> Are there any rationals and use cases I missing?
> >
> > Usually users expect that Python 3 "just works" and don't bother them
> > with the locale (thay nobody understands).
> >
> > The old version of the PEP contains a long list of issues:
> >
> https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986
> >
> > I already replaced the strict error handler with surrogateescape for
> > sys.stdin and sys.stdout on the POSIX locale in Python 3.5:
> > https://bugs.python.org/issue19977
> >
> > For the rationale, read for example these comments:
> >
> [snip]
>
> OK, I'll read them and think again about open()'s default behavior.
> But I still hope open()'s behavior is consistent with PEP 538 and PEP 540.
>
> Regards,
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] HP-UX pr not feeling the love

2017-12-06 Thread Lukasz Langa
Hi Rob,
thanks for your patch. CPython core developers, as volunteers, have limited 
resources available to maintain Python. Those resources are not only time, they 
are also mental resources necessary to make a change in Python as well as 
actual physical resources. Supporting a platform requires all three:

1. You need time to make a platform work initially, and then continuous effort 
to keep it working, fixing regressions, including this platform in new 
features, etc.
2. You need mental resources to manage additional complexity that comes from 
#ifdef sprinkled through the code, cryptic configure/Makefile machinery, etc.
3. You need access to machines running the given operating system to be able to 
test if your changes are compatible.

This is why we are keeping the list of supported platforms relatively short. In 
fact, in time we're cutting support for less popular platforms that we couldn't 
keep running. Details in https://www.python.org/dev/peps/pep-0011/ 
. Look, just in 3.7 we're dropping 
IRIX and systems without threads.

As you're saying, while your current PR is relatively innocent, more are needed 
to make it work. If those require more drastic changes in our codebase, we 
won't be able to accept them due to reasons stated above.

I understand where you're coming from. If you're serious about this, we would 
need to see the full extent of changes required to make Python 3.7 work on HP 
UX, preferably minimal. We would also need a buildbot added to our fleet (see 
http://buildbot.python.org/ ) that would ensure 
the build stays green. Finally, we would need you to think whether you could 
provide the patches that keep the build green for a significant period of time 
(counted in years).

- Ł



> On Dec 6, 2017, at 7:22 AM, Rob Boehne  wrote:
> 
> Hello,
> 
> Back in June I was fired up to get my diverse set of platforms all running 
> Python 3, but quickly ran into issues and submitted a PR.
> 
> https://github.com/python/cpython/pull/2519 
> 
> 
> It seems as though this HP-UX specific change isn’t getting much 
> consideration, which probably isn’t a big deal.  What may be more important 
> is that I’ve stopped trying to contribute, and if I really need Python 3 on 
> HP-UX, AIX, Sparc Solaris or other operating systems, I’ll have to hack it 
> together myself and maintain  my own fork, while presumably others do the 
> same.  At the same time I’m working hard to convince management that we 
> shouldn’t create technical debt by maintaining patches to all the tools we 
> use, and that we should get these changes accepted into the upstream repos.
> 
> Could someone have a look at this PR and possibly merge?
> 
> Thanks,
> 
> Rob Boehne
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev 
> 
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl 
> 



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Jakub Wilk

* Nick Coghlan , 2017-12-06, 16:15:
Something I've just noticed that needs to be clarified: on Linux, "C" 
locale and "POSIX" locale are aliases, but this isn't true in general 
(e.g. it's not the case on *BSD systems, including Mac OS X).
For those of us with little to no BSD/MacOS experience, can you give a 
quick run-down of the differences between "C" and "POSIX"?


POSIX says that "C" and "POSIX" are equivalent[0].

The one that's relevant to default locale detection is just the string 
that "setlocale(LC_CTYPE, NULL)" returns.


POSIX doesn't require any particular return value for setlocale() calls. 
It's only guaranteed that the returned string can be used in subsequent 
setlocale() calls to restore the original locale.


So in the POSIX locale, a compliant setlocale() implementation could 
return "C", or "POSIX", or even something entirely different.



Beyond that, I don't know what the actual functional differences are.


I don't believe there are any.


[0] http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html

--
Jakub Wilk
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] HP-UX pr not feeling the love

2017-12-06 Thread Rob Boehne
Hello,

Back in June I was fired up to get my diverse set of platforms all running 
Python 3, but quickly ran into issues and submitted a PR.

https://github.com/python/cpython/pull/2519

It seems as though this HP-UX specific change isn’t getting much consideration, 
which probably isn’t a big deal.  What may be more important is that I’ve 
stopped trying to contribute, and if I really need Python 3 on HP-UX, AIX, 
Sparc Solaris or other operating systems, I’ll have to hack it together myself 
and maintain  my own fork, while presumably others do the same.  At the same 
time I’m working hard to convince management that we shouldn’t create technical 
debt by maintaining patches to all the tools we use, and that we should get 
these changes accepted into the upstream repos.

Could someone have a look at this PR and possibly merge?

Thanks,

Rob Boehne

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Guido van Rossum
Sorry to burst your bubble, but I have not followed any of the discussion
and I am actually very worried about this topic. I don't think I will be
able to make time for this before the 3.7b1 feature freeze.

On Tue, Dec 5, 2017 at 6:51 PM, Eric Snow 
wrote:

> Hi all,
>
> I've finally updated PEP 554.  Feedback would be most welcome.  The
> PEP is in a pretty good place now and I hope to we're close to a
> decision to accept it. :)
>
> In addition to resolving the open questions, I've also made the
> following changes to the PEP:
>
> * put an API summary at the top and moved the full API description down
> * add the "is_shareable()" function to indicate if an object can be shared
> * added None as a shareable object
>
> Regarding the open questions:
>
>  * "Leaking exceptions across interpreters"
>
> I chose to go with an approach that effectively creates a
> traceback.TracebackException proxy of the original exception, wraps
> that in a RuntimeError, and raises that in the calling interpreter.
> Raising an exception that safely preserves the original exception and
> traceback seems like the most intuitive behavior (to me, as a user).
> The only alternative that made sense is to fully duplicate the
> exception and traceback (minus stack frames) in the calling
> interpreter, which is probably overkill and likely to be confusing.
>
>  * "Initial support for buffers in channels"
>
> I chose to add a "SendChannel.send_buffer(obj)" method for this.
> Supporting buffer objects from the beginning makes sense, opening good
> experimentation opportunities for a valuable set of users.  Supporting
> buffer objects separately and explicitly helps set clear expectations
> for users.  I decided not to go with a separate class (e.g.
> MemChannel) as it didn't seem like there's enough difference to
> warrant keeping them strictly separate.
>
> FWIW, I'm still strongly in favor of support for passing (copies of)
> bytes objects via channels.  Passing objects to SendChannel.send() is
> obvious.  Limiting it, for now, to bytes (and None) helps us avoid
> tying ourselves strongly to any particular implementation (it seems
> like all the reservations were relative to the implementation).  So I
> do not see a reason to wait.
>
>  * "Pass channels explicitly to run()?"
>
> I've applied the suggested solution (make "channels" an explicit
> keyword argument).
>
> -eric
>
>
> I've include the latest full text
> (https://www.python.org/dev/peps/pep-0554/) below:
>
> +
>
> PEP: 554
> Title: Multiple Interpreters in the Stdlib
> Author: Eric Snow 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2017-09-05
> Python-Version: 3.7
> Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017
>
>
> Abstract
> 
>
> CPython has supported multiple interpreters in the same process (AKA
> "subinterpreters") since version 1.5.  The feature has been available
> via the C-API. [c-api]_ Subinterpreters operate in
> `relative isolation from one another `_, which
> provides the basis for an
> `alternative concurrency model `_.
>
> This proposal introduces the stdlib ``interpreters`` module.  The module
> will be `provisional `_.  It exposes the basic
> functionality of subinterpreters already provided by the C-API, along
> with new functionality for sharing data between interpreters.
>
>
> Proposal
> 
>
> The ``interpreters`` module will be added to the stdlib.  It will
> provide a high-level interface to subinterpreters and wrap a new
> low-level ``_interpreters`` (in the same was as the ``threading``
> module).  See the `Examples`_ section for concrete usage and use cases.
>
> Along with exposing the existing (in CPython) subinterpreter support,
> the module will also provide a mechanism for sharing data between
> interpreters.  This mechanism centers around "channels", which are
> similar to queues and pipes.
>
> Note that *objects* are not shared between interpreters since they are
> tied to the interpreter in which they were created.  Instead, the
> objects' *data* is passed between interpreters.  See the `Shared data`_
> section for more details about sharing between interpreters.
>
> At first only the following types will be supported for sharing:
>
> * None
> * bytes
> * PEP 3118 buffer objects (via ``send_buffer()``)
>
> Support for other basic types (e.g. int, Ellipsis) will be added later.
>
> API summary for interpreters module
> ---
>
> Here is a summary of the API for the ``interpreters`` module.  For a
> more in-depth explanation of the proposed classes and functions, see
> the `"interpreters" Module API`_ section below.
>
> For creating and using interpreters:
>
> +--+
> --+
> | signature| description
> |
> 

Re: [Python-Dev] Zero-width matching in regexes

2017-12-06 Thread Serhiy Storchaka

06.12.17 15:37, Paul Moore пише:

Behaviour (1) means that we'd get

>>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1)
'xx xx'

(because \w* matches the empty string after each word, as well as each
word itself). I just tested in Perl, and that is indeed what happens
there as well.


Yes, because in this case you need to use `\w+`, not `\w*`.

No CPython tests will be failed if change re.sub() to behaviour (2) 
except just added in 3.7 tests and the one test specially purposed to 
guard the old behavior. But I don't know how much third party code will 
be broken if made this change.



On that basis, I have to say that I find behaviour (2) more intuitive
and (arguably) "correct":

>>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0)
'x x'
>>> re.sub(r'\w*', 'x', 'hello world')
'x x'


The actual behavior of re.sub() and regex.sub() in the VERSION0 mode was 
a weird behavior (4).


>>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION0)
'[]h[ello] []w[orld]'
>>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION1)
'[][hello][] [][world][]'
>>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world')  # 3.6, behavior (4)
'[]h[ello] []w[orld]'
>>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world')  # 3.7, behavior (2)
'[][hello] [][world]'

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread INADA Naoki
>> And I have one worrying point.
>> With UTF-8 mode, open()'s default encoding/error handler is
>> UTF-8/surrogateescape.
>
> The Strict UTF-8 Mode is for you if you prioritize correctness over usability.

Yes, but as I said, I cares about not experienced developer
who doesn't know what UTF-8 mode is.

>
> In the very first version of my PEP/idea, I wanted to use
> UTF-8/strict. But then I started to play with the implementation and I
> got many "practical" issues. Using UTF-8/strict, you quickly get
> encoding errors. For example, you become unable to read undecodable
> bytes from stdin. stdin.read() only gives you an error, without
> letting you decide how to handle these "invalid" data. Same issue with
> stdout.
>

I don't care about stdio, because PEP 538 uses surrogateescape for stdio/error
https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams

I care only about builtin open()'s behavior.
PEP 538 doesn't change default error handler of open().

I think PEP 538 and PEP 540 should behave almost identical except
changing locale
or not.  So I need very strong reason if PEP 540 changes default error
handler of open().


> In the old long version of the PEP, I tried to explain UTF-8/strict
> issues with very concrete examples, the removed "Use Cases" section:
> https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490
>
> Tell me if I should rephrase the rationale of the PEP 540 to better
> justify the usage of surrogateescape.

OK, "List a directory into a text file" example demonstrates why surrogateescape
is used for open().  If os.listdir() returns surrogateescpaed data,
file.write() will be
fail.
All other examples are about stdio.

But we should achieve good balance between correctness and usability of
default behavior.

>
> Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with
> surrogateescape, or backslashreplace for stderr, or surrogatepass for
> fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But
> the PEP title would be too long, no? :-)
>

I feel short name is enough.

>
>> And opening binary file without "b" option is very common mistake of new
>> developers.  If default error handler is surrogateescape, they lose a chance
>> to notice their bug.
>
> When open() in used in text mode to read "binary data", usually the
> developer would only notify when getting the POSIX locale (ASCII
> encoding). But the PEP 538 already changed that by using the C.UTF-8
> locale (and so the UTF-8 encoding, instead of the ASCII encoding).
>

With PEP 538 (C.UTF-8 locale), open() uses UTF-8/strict, not
UTF-8/surrogateescape.

For example, this code raise UnicodeDecodeError with PEP 538 if the
file is JPEG file.

with open(fn) as f:
f.read()


> I'm not sure that locales are the best way to detect such class of
> bytes. I suggest to use -b or -bb option to detect such bugs without
> having to care of the locale.
>

But many new developers doesn't use/know -b or -bb option.

>
>> On the other hand, it helps some use cases when user want byte-transparent
>> behavior, without modifying code to use "surrogateescape" explicitly.
>>
>> Which is more important scenario?  Anyone has opinion about it?
>> Are there any rationals and use cases I missing?
>
> Usually users expect that Python 3 "just works" and don't bother them
> with the locale (thay nobody understands).
>
> The old version of the PEP contains a long list of issues:
> https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986
>
> I already replaced the strict error handler with surrogateescape for
> sys.stdin and sys.stdout on the POSIX locale in Python 3.5:
> https://bugs.python.org/issue19977
>
> For the rationale, read for example these comments:
>
[snip]

OK, I'll read them and think again about open()'s default behavior.
But I still hope open()'s behavior is consistent with PEP 538 and PEP 540.

Regards,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Zero-width matching in regexes

2017-12-06 Thread Paul Moore
On 6 December 2017 at 13:13, Serhiy Storchaka  wrote:
> 05.12.17 22:26, Terry Reedy пише:
>>
>> On 12/4/2017 6:21 PM, MRAB wrote:
>>>
>>> I've finally come to a conclusion as to what the "correct" behaviour of
>>> zero-width matches should be: """always return the first match, but never a
>>> zero-width match that is joined to a previous zero-width match""".
>>
>>
>> Is this different from current re or regex?
>
>
> Partially. There are different ways of handling the problem of repeated
> zero-width searching.
>
> 1. The one formulated by Matthew. This is the behavior of findall() and
> finditer() in regex in both VERSION0 and VERSION1 modes, sub() in regex in
> the VERSION1 mode, and findall() and finditer() in re since 3.7.
>
> 2. Prohibit a zero-width match that is joined to a previous match
> (independent from its width). This is the behavior of sub() in re and in
> regex in the VERSION0 mode, and split() in regex in the VERSION1 mode. This
> is the only correctly documented and explicitly tested behavior in re.
>
> 3. Prohibit a zero-width match (always). This is the behavior of split() in
> re in 3.4 and older (deprecated since 3.5) and in regex in VERSION0 mode.
>
> 4. Exclude the character following a zero-width match from following
> matches. This is the behavior of findall() and finditer() in 3.6 and older.
>
> The case 4 is definitely incorrect. It leads to excluding characters from
> matching. re.findall(r'^|\w+', 'two words') returns ['', 'wo', 'words'].
>
> The case 3 is pretty useless. It disallow splitting on useful zero-width
> patterns like `\b` and makes `\s*` just equal to `\s+`.
>
> The difference between cases 1 and 2 is subtle. The case 1 looks more
> logical and matches the behavior of Perl and PCRE, but the case 2 is
> explicitly documented and tested. This behavior is kept for compatibility
> with an ancient re implementation.

Behaviour (1) means that we'd get

>>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1)
'xx xx'

(because \w* matches the empty string after each word, as well as each
word itself). I just tested in Perl, and that is indeed what happens
there as well.

On that basis, I have to say that I find behaviour (2) more intuitive
and (arguably) "correct":

>>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0)
'x x'
>>> re.sub(r'\w*', 'x', 'hello world')
'x x'

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Zero-width matching in regexes

2017-12-06 Thread Serhiy Storchaka

05.12.17 22:26, Terry Reedy пише:

On 12/4/2017 6:21 PM, MRAB wrote:
I've finally come to a conclusion as to what the "correct" behaviour 
of zero-width matches should be: """always return the first match, but 
never a zero-width match that is joined to a previous zero-width 
match""".


Is this different from current re or regex?


Partially. There are different ways of handling the problem of repeated 
zero-width searching.


1. The one formulated by Matthew. This is the behavior of findall() and 
finditer() in regex in both VERSION0 and VERSION1 modes, sub() in regex 
in the VERSION1 mode, and findall() and finditer() in re since 3.7.


2. Prohibit a zero-width match that is joined to a previous match 
(independent from its width). This is the behavior of sub() in re and in 
regex in the VERSION0 mode, and split() in regex in the VERSION1 mode. 
This is the only correctly documented and explicitly tested behavior in re.


3. Prohibit a zero-width match (always). This is the behavior of split() 
in re in 3.4 and older (deprecated since 3.5) and in regex in VERSION0 mode.


4. Exclude the character following a zero-width match from following 
matches. This is the behavior of findall() and finditer() in 3.6 and older.


The case 4 is definitely incorrect. It leads to excluding characters 
from matching. re.findall(r'^|\w+', 'two words') returns ['', 'wo', 
'words'].


The case 3 is pretty useless. It disallow splitting on useful zero-width 
patterns like `\b` and makes `\s*` just equal to `\s+`.


The difference between cases 1 and 2 is subtle. The case 1 looks more 
logical and matches the behavior of Perl and PCRE, but the case 2 is 
explicitly documented and tested. This behavior is kept for 
compatibility with an ancient re implementation.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 6 December 2017 at 20:38, Victor Stinner  wrote:
> Nick:
>> So if PEP 540 is going to implicitly trigger switching encodings, it
>> needs to specify whether it's going to look for the C locale or the
>> POSIX locale (I'd suggest C locale, since that's the actual default
>> that causes problems).
>
> I'm thinking at the test already used by check_force_ascii() (function
> checking if the LC_CTYPE uses the ASCII encoding or something else):
>
> loc = setlocale(LC_CTYPE, NULL);
> if (loc == NULL)
> goto error;
> if (strcmp(loc, "C") != 0) {
> /* the LC_CTYPE locale is different than C */
> return 0;
> }

Yeah, the locale coercion code changes the locale multiple times to
make sure we have a coercion target that will actually work (and then
checks nl_langinfo as well, since that sometimes breaks on BSD
systems, even if the original setlocale() call claimed to work). Once
we've found a locale that appears to work though, then we configure
the LC_CTYPE environment variable, and reload the locale from the
environment.

It's all annoyingly convoluted and arcane, but it works well enough
for 
https://github.com/python/cpython/blob/master/Lib/test/test_c_locale_coercion.py
to pass across the full BuildBot fleet :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Zero-width matching in regexes

2017-12-06 Thread Serhiy Storchaka

05.12.17 01:21, MRAB пише:
I've finally come to a conclusion as to what the "correct" behaviour of 
zero-width matches should be: """always return the first match, but 
never a zero-width match that is joined to a previous zero-width match""".


If it's about to return a zero-width match that's joined to a previous 
zero-width match, then backtrack and keep on looking for a match.


Isn't this how sub(), findall() and finditer() work in regex with 
VERSION1? I agree that this behavior looks most logical and self-consistent.


Unfortunately the different behavior of re.sub() is documented explicitly:

"Empty matches for the pattern are replaced only when not adjacent to a 
previous match, so sub('x*', '-', 'abc') returns '-a-b-c-'."


And there a special purposed test for this. One time the behavior was 
changed when the re implementation was changed from pre to sre, but the 
older behavior was restored. [1] [2]


[1] https://bugs.python.org/issue462270
[2] 
https://github.com/python/cpython/commit/21009b9c6fc40b25fcb30ee60d6108f235733e40


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
Nick:
> So if PEP 540 is going to implicitly trigger switching encodings, it
> needs to specify whether it's going to look for the C locale or the
> POSIX locale (I'd suggest C locale, since that's the actual default
> that causes problems).

I'm thinking at the test already used by check_force_ascii() (function
checking if the LC_CTYPE uses the ASCII encoding or something else):

loc = setlocale(LC_CTYPE, NULL);
if (loc == NULL)
goto error;
if (strcmp(loc, "C") != 0) {
/* the LC_CTYPE locale is different than C */
return 0;
}

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
Hi Naoki,

2017-12-06 5:07 GMT+01:00 INADA Naoki :
> Oh, revised version is really short!
>
> And I have one worrying point.
> With UTF-8 mode, open()'s default encoding/error handler is
> UTF-8/surrogateescape.

The Strict UTF-8 Mode is for you if you prioritize correctness over usability.

In the very first version of my PEP/idea, I wanted to use
UTF-8/strict. But then I started to play with the implementation and I
got many "practical" issues. Using UTF-8/strict, you quickly get
encoding errors. For example, you become unable to read undecodable
bytes from stdin. stdin.read() only gives you an error, without
letting you decide how to handle these "invalid" data. Same issue with
stdout.

Compare encodings of the UTF-8 mode and the Strict UTF-8 Mode:
https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler

I tried to summarize all these kinds of issues in the second short
subsection of the rationale:
https://www.python.org/dev/peps/pep-0540/#passthough-undecodable-bytes-surrogateescape

In the old long version of the PEP, I tried to explain UTF-8/strict
issues with very concrete examples, the removed "Use Cases" section:
https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490

Tell me if I should rephrase the rationale of the PEP 540 to better
justify the usage of surrogateescape.

Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with
surrogateescape, or backslashreplace for stderr, or surrogatepass for
fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But
the PEP title would be too long, no? :-)


> And opening binary file without "b" option is very common mistake of new
> developers.  If default error handler is surrogateescape, they lose a chance
> to notice their bug.

When open() in used in text mode to read "binary data", usually the
developer would only notify when getting the POSIX locale (ASCII
encoding). But the PEP 538 already changed that by using the C.UTF-8
locale (and so the UTF-8 encoding, instead of the ASCII encoding).

I'm not sure that locales are the best way to detect such class of
bytes. I suggest to use -b or -bb option to detect such bugs without
having to care of the locale.


> On the other hand, it helps some use cases when user want byte-transparent
> behavior, without modifying code to use "surrogateescape" explicitly.
>
> Which is more important scenario?  Anyone has opinion about it?
> Are there any rationals and use cases I missing?

Usually users expect that Python 3 "just works" and don't bother them
with the locale (thay nobody understands).

The old version of the PEP contains a long list of issues:
https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986

I already replaced the strict error handler with surrogateescape for
sys.stdin and sys.stdout on the POSIX locale in Python 3.5:
https://bugs.python.org/issue19977

For the rationale, read for example these comments:

* https://bugs.python.org/issue19846#msg205727 "As I would state it,
the problem is that python's boundary with the OS is not yet uniform.
(...) Note that currently, input() and sys.stdin.read() won't read
undecodable data so this is somewhat symmetrical but it seems to me
that saying "everything that interfaces with the OS except the
standard streams will use surrogateescape on undecodable bytes" is
drawing a line in an unintuitive location."

* https://bugs.python.org/issue19977#msg206141 "My impression was that
python3 was supposed to help get rid of UnicodeError tracebacks, not
mojibake.  If mojibake was the problem then we should never have gone
down the surrogateescape path for input."

* https://bugs.python.org/issue19846#msg205646 "For example I'm using
[LANG=C] for testcases to set the language uncomplicated to english."

In bug reports, to get the user expectations, just ignore all core
developers comments :-)

Users set the locale to C to get messages in english and still expects
"Unicode" to work properly.

Only Python 3 is so strict about encodings. Most other programming
languages, like Python 2, "just works", since they process data as
bytes.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-12-06 Thread Victor Stinner
Let's discuss -Xdev implementation issue at
https://bugs.python.org/issue32230

In short, -Xdev must add its warning at the end to respect BytesWarning,
whereas it's not possible with -W option :-(

Victor

Le 6 déc. 2017 09:15, "Nick Coghlan"  a écrit :

On 6 December 2017 at 14:50, Nick Coghlan  wrote:
> On 6 December 2017 at 14:34, Nick Coghlan  wrote:
>> That said, I go agree we could offer easier to use APIs to app
>> developers that just want to hide warnings from their users, so I've
>> filed https://bugs.python.org/issue32229 to propose a straightforward
>> "warnings.hide_warnings()" API that encapsulates things like checking
>> for a non-empty sys.warnoptions list.
>
> I've updated the "Limitations" section of the PEP to mention that
> separate proposal:
> https://github.com/python/peps/commit/6e93c8d2e6ad698834578d4077b92a
8fc84a70f5

Having rebased the PEP 565 patch atop the "-X dev" changes, I think
that if we don't change some of the details of how `-X dev` is
implemented, `warnings.hide_warnings` (or a comparable convenience
API) is going to be a requirement to help app developers effectively
manage their default warnings settings in 3.7+.

The problem is that devmode doesn't currently behave the same way
`-Wd` does when it comes to sys.warnoptions:

$ ./python -Wd -c "import sys; print(sys.warnoptions);
print(sys.flags.dev_mode)"
['d']
False
$ ./python -X dev -c "import sys; print(sys.warnoptions);
print(sys.flags.dev_mode)"
[]
True

As currently implemented, the warnings module actually checks
`sys.flags.dev_mode` directly during startup (or `sys._xoptions` in
the case of the pure Python fallback), and populates the warnings
filter differently depending on what it finds:

$ ./python -c "import warnings; print('\n'.join(map(str,
warnings.filters)))"
('default', None, , '__main__', 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)

$ ./python -X dev -c "import warnings; print('\n'.join(map(str,
warnings.filters)))"
('ignore', None, , None, 0)
('default', None, , None, 0)
('default', None, , None, 0)

$ ./python -Wd -c "import warnings; print('\n'.join(map(str,
warnings.filters)))"
('default', None, , None, 0)
('default', None, , '__main__', 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)

This means the app development snippet proposed in the PEP will no
longer do the right thing, since it will ignore the dev mode flag:

if not sys.warnoptions:
# This still runs for `-X dev`
warnings.simplefilter("ignore")

My main suggested fix would be to adjust the way `-X dev` is
implemented to include `sys.warnoptions.append('default')` (and remove
the direct dev_mode query from the warnings module code).

However, another possible way to go would be to make the correct
Python 3.7+-only snippet look like this:

import warnings
warnings.hide_warnings()

And have the forward-compatible snippet look like this:

import warnings:
if hasattr(warnings, "hide_warnings"):
# Accounts for `-W`, `-X dev`, and any other implementation
specific settings
warnings.hide_warnings()
else:
# Only accounts for `-W`
import sys
if not sys.warnoptions:
warnings.simplefilter("ignore")

(We can also do both, of course)

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-12-06 Thread Nick Coghlan
On 6 December 2017 at 14:50, Nick Coghlan  wrote:
> On 6 December 2017 at 14:34, Nick Coghlan  wrote:
>> That said, I go agree we could offer easier to use APIs to app
>> developers that just want to hide warnings from their users, so I've
>> filed https://bugs.python.org/issue32229 to propose a straightforward
>> "warnings.hide_warnings()" API that encapsulates things like checking
>> for a non-empty sys.warnoptions list.
>
> I've updated the "Limitations" section of the PEP to mention that
> separate proposal:
> https://github.com/python/peps/commit/6e93c8d2e6ad698834578d4077b92a8fc84a70f5

Having rebased the PEP 565 patch atop the "-X dev" changes, I think
that if we don't change some of the details of how `-X dev` is
implemented, `warnings.hide_warnings` (or a comparable convenience
API) is going to be a requirement to help app developers effectively
manage their default warnings settings in 3.7+.

The problem is that devmode doesn't currently behave the same way
`-Wd` does when it comes to sys.warnoptions:

$ ./python -Wd -c "import sys; print(sys.warnoptions);
print(sys.flags.dev_mode)"
['d']
False
$ ./python -X dev -c "import sys; print(sys.warnoptions);
print(sys.flags.dev_mode)"
[]
True

As currently implemented, the warnings module actually checks
`sys.flags.dev_mode` directly during startup (or `sys._xoptions` in
the case of the pure Python fallback), and populates the warnings
filter differently depending on what it finds:

$ ./python -c "import warnings; print('\n'.join(map(str,
warnings.filters)))"
('default', None, , '__main__', 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)

$ ./python -X dev -c "import warnings; print('\n'.join(map(str,
warnings.filters)))"
('ignore', None, , None, 0)
('default', None, , None, 0)
('default', None, , None, 0)

$ ./python -Wd -c "import warnings; print('\n'.join(map(str,
warnings.filters)))"
('default', None, , None, 0)
('default', None, , '__main__', 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)
('ignore', None, , None, 0)

This means the app development snippet proposed in the PEP will no
longer do the right thing, since it will ignore the dev mode flag:

if not sys.warnoptions:
# This still runs for `-X dev`
warnings.simplefilter("ignore")

My main suggested fix would be to adjust the way `-X dev` is
implemented to include `sys.warnoptions.append('default')` (and remove
the direct dev_mode query from the warnings module code).

However, another possible way to go would be to make the correct
Python 3.7+-only snippet look like this:

import warnings
warnings.hide_warnings()

And have the forward-compatible snippet look like this:

import warnings:
if hasattr(warnings, "hide_warnings"):
# Accounts for `-W`, `-X dev`, and any other implementation
specific settings
warnings.hide_warnings()
else:
# Only accounts for `-W`
import sys
if not sys.warnoptions:
warnings.simplefilter("ignore")

(We can also do both, of course)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v4 (new interpreters module)

2017-12-06 Thread Eric Snow
On Dec 5, 2017 23:49, "Nick Coghlan"  wrote:

Nice updates! I like this version.


Great! :)

My one suggestion here would be to consider a dedicated exception type
like "interpreters.SubinterpreterError", rather than re-using
RuntimeError directly. That way you can put the extracted traceback on
a named attribute, and retain the option of potentially adding
subinterpreter awareness to the traceback module in the future.


Yeah, I already have a deferred idea item for this. :). TBH, I was on the
fence about a dedicated exception type, so you've nudged me on board. :)

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com