date:20160610

Re: [Python-Dev] Cutoff time for patches for upcoming releases

2016-06-10 Thread Benjamin Peterson

2016-06-11 18:00 UTC
 
 
On Fri, Jun 10, 2016, at 14:37, Terry Reedy wrote:
> A question for each of the three release managers:
> when is the earliest that you might tag your release and
> cutoff submission of further patches for the release?
>
> 2.7.12 ('6-12')?
>
> 3.5.2 ('6-12')?
>
> 3.6.0a2 ('6-13')?
>
> --
> Terry Jan Reedy
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benjamin%40python.org
 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Steven D'Aprano

On Fri, Jun 10, 2016 at 11:22:42PM +0200, Victor Stinner wrote:
> 2016-06-10 20:47 GMT+02:00 Meador Inge :
> > Apologies in advance if this is answered in one of the links you posted, but
> > out of curiosity was geometric mean considered?
> >
> > In the compiler world this is a very common way of aggregating performance
> > results.
> 
> FYI I chose to store all timings in the JSON file. So later, you are
> free to recompute the average differently, compute other statistics,
> etc.
> 
> I saw that the CPython benchmark suite has an *option* to compute the
> geometric mean. I don't understand well the difference with the
> arithmeric mean.
> 
> Is the geometric mean recommended to aggregate results of different
> (unrelated) benchmarks, or also even for multuple runs of a single
> benchmark?

The Wikipedia article discusses this, but sits on the fence and can't 
decide whether using the gmean for performance results is a good or bad 
idea:

https://en.wikipedia.org/wiki/Geometric_mean#Properties

Geometric mean is usually used in finance for averaging rates of growth:

https://www.math.toronto.edu/mathnet/questionCorner/geomean.html

If you express your performances as speeds (as "calculations per 
second") then the harmonic mean is the right way to average them.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Steven D'Aprano

On Sat, Jun 11, 2016 at 12:06:31AM +0200, Victor Stinner wrote:

> > Victor if you could calculate the sample skewness of your results I think
> > that would be very interesting!
> 
> I'm good to copy/paste code, but less to compute statistics :-) Would
> be interesed to write a pull request, or at least to send me a
> function computing the expected value?
> https://github.com/haypo/perf

I have some code and tests for calculating (population and sample) 
skewness and kurtosis. Do you think it will be useful to add it to the 
statistics module? I can polish it up and aim to have it ready by 3.6.0 
alpha 4.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Neil Schemenauer

Nick Coghlan  wrote:
> It could be very interesting to add an "ascii-warn" codec to Python
> 2.7, and then set that as the default encoding when the -3 flag is
> set.

I don't think that can work.  The library code in Python would spew
out warnings even in the cases when nothing is wrong with the
application code.  I think warnings have to be added to a Python
where str and bytes have been properly separated.  Without extreme
backporting efforts, that means 3.x.

We don't want to saddle 3.x with a bunch of backwards compatibility
cruft.  Maybe some of my runtime warning changes could be merged
using a command line flag to enable them.  It would be nice to have
the stepping stone version just be normal 3.x with a command line
option.  However, for the sanity of people maintaining 3.x, I think
perhaps we don't want to do it.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Reminder: 3.6.0a2 snapshot 2016-06-13 12:00 UTC

2016-06-10 Thread Ned Deily

Just a quick reminder that the next alpha snapshot for the 3.6 release cycle is 
coming up in a couple of days.  This is the second of four alphas we have 
planned.  Alpha 2 follows the development sprints at the PyCon US 2016 in 
Portland.  Thanks to all of you who were able to be there and contribute!  And 
to all of you who continue to contribue from afar.  While there are still 
plenty of proposed patches awaiting review, nearly 300 commits have been pushed 
to the default branch (for 3.6.0) in the four weeks since alpha 1. 

As a reminder, alpha releases are intended to make it easier for the wider 
community to test the current state of new features and bug fixes for an 
upcoming Python release as a whole and for us to test the release process.  
During the alpha phase, features may be added, modified, or deleted up until 
the start of the beta phase.  Alpha users beware!

Also note that Larry has announced plans to do a 3.5.2 release candidate 
sometime this weekend and Benjamin plans to do a 2.7.12 release candidate.  So 
get important maintenance release fixes in ASAP.

Looking ahead, the next alpha release, 3.6.0a3, will follow in about a month on 
2016-07-11.

2016-06-13 ~12:00 UTC: code snapshot for 3.6.0 alpha 1

now to 2016-09-07: Alpha phase (unrestricted feature development)

2016-09-07: 3.6.0 feature code freeze, 3.7.0 feature development begins

2016-09-07 to 2016-12-04: 3.6.0 beta phase (bug and regression fixes, no new 
features)

2016-12-04 3.6.0 release candidate 1 (3.6.0 code freeze)

2016-12-16 3.6.0 release (3.6.0rc1 plus, if necessary, any dire emergency fixes)

--Ned

P.S. Just to be clear, this upcoming alpha snapshot will *not* contain a 
resolution for 3.6.0 of the current on-going discussions about the behavior of 
os.urandom(), the secrets module, and friends (Issue26839, Issue27288, et al).  
I think the focus should be on getting 3.5.2 settled and then we can decide on 
and implement any changes for 3.6.0 in an upcoming alpha prior to beta 1.

https://www.python.org/dev/peps/pep-0494/

--
  Ned Deily
  n...@python.org -- []

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Victor Stinner

Hi,

2016-06-10 20:37 GMT+02:00 Kevin Modzelewski via Python-Dev
:
> Hi all, I wrote a blog post about this.
> http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/

Oh nice, it's even better to have different articles to explain the
problem of using the minimum ;-) It added it to my doc.


> We can rule out any argument that one (minimum or average) is strictly
> better than the other, since there are cases that make either one better.
> It comes down to our expectation of the underlying distribution.

Ah? In which cases do you prefer to use the minimum? Are you able to
get reliable benchmark results when using the minimum?


> Victor if you could calculate the sample skewness of your results I think
> that would be very interesting!

I'm good to copy/paste code, but less to compute statistics :-) Would
be interesed to write a pull request, or at least to send me a
function computing the expected value?
https://github.com/haypo/perf

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Cutoff time for patches for upcoming releases

2016-06-10 Thread Terry Reedy


A question for each of the three release managers:
when is the earliest that you might tag your release and
cutoff submission of further patches for the release?

2.7.12 ('6-12')?

3.5.2 ('6-12')?

3.6.0a2 ('6-13')?

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Donald Stufft

> On Jun 10, 2016, at 5:21 PM, Tim Peters  wrote:
> 
> Isn't that precisely the purpose of the GRND_NONBLOCK flag?

It doesn’t behave exactly the same as /dev/urandom. If the pool hasn’t been 
initialized yet /dev/urandom will return possibly predictable data whereas 
getrandom(GRND_NONBLOCK) will EAGAIN.

—
Donald Stufft

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Victor Stinner

2016-06-10 20:47 GMT+02:00 Meador Inge :
> Apologies in advance if this is answered in one of the links you posted, but
> out of curiosity was geometric mean considered?
>
> In the compiler world this is a very common way of aggregating performance
> results.

FYI I chose to store all timings in the JSON file. So later, you are
free to recompute the average differently, compute other statistics,
etc.

I saw that the CPython benchmark suite has an *option* to compute the
geometric mean. I don't understand well the difference with the
arithmeric mean.

Is the geometric mean recommended to aggregate results of different
(unrelated) benchmarks, or also even for multuple runs of a single
benchmark?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Tim Peters

[Random832]
> So, I have a question. If this "weakness" in /dev/urandom is so
> unimportant to 99% of situations... why isn't there a flag that can be
> passed to getrandom() to allow the same behavior?

Isn't that precisely the purpose of the GRND_NONBLOCK flag?

http://man7.org/linux/man-pages/man2/getrandom.2.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Random832

On Fri, Jun 10, 2016, at 15:54, Theodore Ts'o wrote:
> So even on Python pre-3.5.0, realistically speaking, the "weakness" of
> os.random would only be an issue (a) if it is run within the first few
> seconds of boot, and (b) os.random is used to directly generate a
> long-term cryptographic secret.  If you are fork openssl or ssh-keygen
> to generate a public/private keypair, then you aren't using os.random.

So, I have a question. If this "weakness" in /dev/urandom is so
unimportant to 99% of situations... why isn't there a flag that can be
passed to getrandom() to allow the same behavior?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Larry Hastings




On 06/10/2016 12:54 PM, Theodore Ts'o wrote:

So even on Python pre-3.5.0, realistically speaking, the "weakness" of
os.random would only be an issue (a) if it is run within the first few
seconds of boot, and (b) os.random is used to directly generate a
long-term cryptographic secret.  If you are fork openssl or ssh-keygen
to generate a public/private keypair, then you aren't using os.random.


Just a gentle correction: wherever Mr. Ts'o says "os.random", he means 
"os.urandom()".  We don't have an "os.random" in Python.


My thanks to today's celebrity guest correspondent, Mr. Theodore Ts'o!


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Robert Collins

On 11 June 2016 at 04:09, Victor Stinner  wrote:
..> We should design a CLI command to do timeit+compare at once.

http://judge.readthedocs.io/en/latest/ might offer some inspiration

There's also ministat -
https://www.freebsd.org/cgi/man.cgi?query=ministat=0=0=FreeBSD+8-current=html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Terry Reedy


On 6/10/2016 12:09 PM, Victor Stinner wrote:

2016-06-10 17:09 GMT+02:00 Paul Moore :

Also, the way people commonly use
micro-benchmarks ("hey, look, this way of writing the expression goes
faster than that way") doesn't really address questions like "is the
difference statistically significant".


If you use the "python3 -m perf compare method1.json method2.json",
perf will checks that the difference is significant using the
is_significant() method:
http://perf.readthedocs.io/en/latest/api.html#perf.is_significant
"This uses a Student’s two-sample, two-tailed t-test with alpha=0.95."


Depending on the sampling design, a matched-pairs t-test may be more 
appropriate.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread M.-A. Lemburg

On 10.06.2016 20:55, Donald Stufft wrote:
> Ok, so you’re looking for how would you replicate the blocking behavior of 
> os.urandom that exists in 3.5.0 and 3.5.1?
> 
> In that case, it’s hard. I don’t think linux provides any way to externally 
> determine if /dev/urandom has been initialized or not. Probably the easiest 
> thing to do would be to interface with the getrandom() function using a 
> c-ext, CFFI, or ctypes. If you’re looking for a way of doing this without 
> calling the getrandom() function.. I believe the answer is you can’t.

Well, you can see the effect by running Python early in the boot process.

See e.g. http://bugs.python.org/issue26839#msg267749

and if you look at the system log file, you'll find a notice
entry "random: %s pool is initialized" which gets written once the
pool is initialized:

http://lxr.free-electrons.com/source/drivers/char/random.c#L684

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jun 10 2016)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...   http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...   http://zope.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
  http://www.malemburg.com/

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Theodore Ts'o

I will observe that feelings have gotten a little heated, so without
making any suggestions to how the python-dev community should decide
things, let me offer some observations that might perhaps shed a
little light, and perhaps dispell a little bit of the heat.

As someone who has been working in security for a long time --- before
I started getting paid to hack Linux full-time, worked on Kerberos,
was on the Security Area Directorate of the IETF, where among other
things I was one of the working group chairs for the IP Security
(ipsec) working group --- I tend to cringe a bit when people talk
about security in terms of absolutes.  For example, the phrase
"improving Python's security".  Security is something that is best
talked about given a specific threat environment, where the value of
what you are trying to protect, the capabilities and resources of the
attackers, etc., are all well known.

This gets hard for those of us who work on infrastructure which can
get used in many different arenas, and so that's something that
applies both to the Linux Kernel and to C-Python, because how people
will use the tools that we spend so much of our passion crafting is
largely out of our control, and we may not even know how they are
using it.

As far as /dev/urandom is concerned, it's true that it doesn't block
before it has been initialized.  If you are a security academic who
likes to write papers about how great you are at finding defects in
other people's work.  This is definitely a weakness.

Is it a fatal weakness?  Well, first of all, on most server and
desktop deployments, we save 1 kilobyte or so of /dev/urandom output
during the shutdown sequence, and immediately after the init scripts
are completed.  This saved entropy is then piped back into /dev/random
infrastructure and used initialized /dev/random and /dev/urandom very
early in the init scripts.  On a freshly instaled machine, this won't
help, true, but in practice, on most systems, /dev/urandom will get
initialized from interrupt timing sampling within a few seconds after
boot.  For example, on a sample Google Compute Engine VM which is
booted into Debian and then left idle, /dev/urandom was initialized
within 2.8 seconds after boot, while the root file system was
remounted read-only 1.6 seconds after boot.

So even on Python pre-3.5.0, realistically speaking, the "weakness" of
os.random would only be an issue (a) if it is run within the first few
seconds of boot, and (b) os.random is used to directly generate a
long-term cryptographic secret.  If you are fork openssl or ssh-keygen
to generate a public/private keypair, then you aren't using os.random.

Furthermore, if you are running on a modern x86 system with RDRAND,
you'll also be fine, because we mix in randomness from the CPU chip
via the RDRAND instruction.

So this whole question of whether os.random should block *is*
important in certain very specific cases, and if you are generating
long-term cryptogaphic secrets in Python, maybe you should be worrying
about that.  But to be honest, there are lots of other things you
should be worrying about as well, and I would hope that people writing
cryptographic code would be asking questions of how the random nunmber
stack is working, not just at the C-Python interpretor level, but also
at the OS level.

My preference would be that os.random should block, because the odds
that people would be trying to generate long-term cryptographic
secrets within seconds after boot is very small, and if you *do* block
for a second or two, it's not the end of the world.  The problem that
triggered this was specifically because systemd was trying to use
C-Python very early in the boot process to initialize the SIPHASH used
for the dictionary, and it's not clear that really needed to be
extremely strong because it wasn't a long-term cryptogaphic secret ---
certainly not how systemd was using that specific script!

The reason why I think blocking is better is that once you've solved
the "don't hang the VM for 90 seconds until python has started up",
someone who is using os.random will almost certainly not be on the
blocking path of the system boot sequence, and so blocking for 2
seconds before generating a long-term cryptographic secret is not the
end of the world.

And if it does block by accident, in a security critical scenario it
will hopefully force the progammer to think, and and in a non-security
critical scenario, it should be easy to switch to either a totally
non-blocking interface, or switch to a pseudo-random interface hwich
is more efficient.


*HOWEVER*, on the flip side, if os.random *doesn't* block, in 99.999%
percent of the cases, the python script that is directly generating a
long-term secret will not be started 1.2 seconds after the root file
system is remounted read/write, so it is *also* not the end of the
world.  Realistically speaking, we do know which processes are likely
to be generating long-term cryptographic secrets imnmediately after

Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace

2016-06-10 Thread Eric Snow

On Fri, Jun 10, 2016 at 11:29 AM, Nick Coghlan  wrote:
> On 10 June 2016 at 09:42, Eric Snow  wrote:
>> On Thu, Jun 9, 2016 at 2:39 PM, Nick Coghlan  wrote:
>>> That restriction would be comparable to what we do with __slots__ today:
>>>
>>> >>> class C:
>>> ... __slots__ = 1
>>> ...
>>> Traceback (most recent call last):
>>>  File "", line 1, in 
>>> TypeError: 'int' object is not iterable
>>
>> Are you suggesting that we require it be a tuple of identifiers (or
>> None) and raise TypeError otherwise, similar to __slots__?  The
>> difference is that __slots__ has specific type requirements that do
>> not apply to __definition_order__, as well as a different purpose.
>> __definition_order__ is about preserving definition-type info that we
>> are currently throwing away.
>
> If we don't enforce the tuple-of-identifiers restriction at type
> creation time, everyone that *doesn't* make it a tuple-of-identifiers
> is likely to have a subtle compatibility bug with class decorators and
> other code that assume the default tuple-of-identifiers format is the
> only possible format (aside from None). To put it in PEP 484 terms:
> regardless of what the PEP says, people are going to assume the type
> of __definition_order__ is Optional[Tuple[str]], as that's going to
> cover almost all class definitions they encounter.
>
> It makes sense to me to give class definitions and metaclasses the
> opportunity to change the *content* of the definition order: "Use
> these names in this order, not the names and order you would have
> calculated by default".
>
> It doesn't make sense to me to give them an opportunity to change the
> *form* of the definition order, since that makes it incredibly
> difficult to consume correctly: "Sure, it's *normally* a
> tuple-of-identifiers, but it *might* be a dictionary, or a complex
> number, or a set, or whatever the class author decided to make it".
>
> By contrast, if the class machinery enforces Optional[Tuple[str]],
> then it becomes a lot easier to consume reliably, and anyone violating
> the constraint gets an immediate exception when defining the offending
> class, rather than a potentially obscure exception from a class
> decorator or other piece of code that assumes __definition_order__
> could only be None or a tuple of strings.

That makes sense.  I'll adjust the PEP (and the implementation).

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Eric Snow

On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
 wrote:
> Eric, have you any work in progress on compact dicts?

Nope.  I presume you are talking the proposal Raymond made a while back.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Barry Warsaw

On Jun 10, 2016, at 12:05 PM, David Mertz wrote:

>OK.  My understanding is that Guido ruled out introducing an os.getrandom()
>API in 3.5.2.  But would you be happy if that interface is added to 3.6?

I would.

>It feels to me like the correct spelling in 3.6 should probably be
>secrets.getrandom() or something related to that.

ISTM that secrets is a somewhat higher level API while it makes sense that a
fairly simple plumbing of the underlying C call should go in os.  But I
wouldn't argue much if folks had strong opinions to the contrary.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Larry Hastings



On 06/10/2016 01:01 PM, David Mertz wrote:

So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(),


That makes... five of us I think ;-) (Larry Guido Barry Tim David)


and the NEW APIs in secrets should use the "best available randomness 
(even if it blocks)"


I'm not particular about how the new API is spelled.  However, I do 
think os.getrandom() should be exposed as a thin wrapper over 
getrandom() in 3.6.   That would permit Python programmers to take 
maximal advantage of the features offered by their platform.  It would 
also permit the secrets module to continue to be written in pure Python.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Tim Peters

[Tim]
>> secrets.token_bytes() is already the way to spell "get a string of
>> messed-up bytes", and that's the dead obvious (according to me) place
>> to add the potentially blocking implementation.

[Sebastian Krause]
> I honestly didn't think that this was the dead obvious function to
> use. To me the naming kind of suggested that it would do some
> special magic that tokens needed, instead of just returning random
> bytes (even though the best token is probably just perfectly random
> data). If you want to provide a general function for secure random
> bytes I would suggest at least a better naming.

There was ample bikeshedding over the names of `secrets` functions at
the time.  If token_bytes wasn't the obvious function to you, I
suspect you have scant idea what _is_ in the `secrets` module.   The
naming is logical in context, where various "token_xxx" functions
supply random-ish bytes in different formats.  In that context,
xxx=bytes is the obvious way to get raw bytes.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Larry Hastings


On 06/10/2016 11:55 AM, Donald Stufft wrote:
Ok, so you’re looking for how would you replicate the blocking 
behavior of os.urandom that exists in 3.5.0 and 3.5.1?


In that case, it’s hard. I don’t think linux provides any way to 
externally determine if /dev/urandom has been initialized or not. 
Probably the easiest thing to do would be to interface with the 
getrandom() function using a c-ext, CFFI, or ctypes. If you’re looking 
for a way of doing this without calling the getrandom() function.. I 
believe the answer is you can’t.


I'm certain you're correct: you can't perform any operation on 
/dev/urandom to determine whether or not the urandom device has been 
initialized.  That's one of the reasons why Mr. Ts'o added 
getrandom()--you can use it to test exactly that (getrandom(GRND_NONBLOCK)).


That's also why I proposed adding os.getrandom() in 3.5.2, to make it 
possible to block until urandom was initialized (without using ctypes 
etc as you suggest).  However, none of the cryptography guys jumped up 
and said they wanted it, and in any case it was overruled by Guido, so 
we're not adding it to 3.5.2.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread David Mertz

On Fri, Jun 10, 2016 at 12:55 PM, Larry Hastings  wrote:

> On 06/10/2016 12:29 PM, David Mertz wrote:
>
> I believe that secrets.token_bytes() and secrets.SystemRandom() should be
> changed even for 3.5.1 to use getrandom() on Linux.
>
> Surely you meant 3.5.2?  3.5.1 shipped last December.
>

Yeah, that combines a couple thinkos even.  I had intended to write "for
3.5.2" ... but that is also an error, since the secrets module doesn't
exist until 3.6.  So yes, I think 3.5.2 should restore the 2.6-3.4 behavior
of os.urandom(), and the NEW APIs in secrets should use the "best available
randomness (even if it blocks)"

Donald is correct that we have the spelling secrets.token_bytes() available
in 3.6a1, so the spellings secrets.getrandom() or secrets.randbytes() are
not needed.  However, Sebastian's (adapted) suggestion to allow
secrets.token_bytes(k,
*, nonblock=False) as the signature makes sense to me (i.e. it's a choice
of "block or raise exception", not an option to get non-crypto bytes).

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Donald Stufft


> On Jun 10, 2016, at 3:33 PM, Brett Cannon  wrote:
> 
> If that's the case then we should file a bug so we are sure this is the case 
> and we need to decouple the secrets documentation from random so that they 
> can operate independently with secrets always doing whatever is required to 
> be as secure as possible.


https://bugs.python.org/issue27288 

—
Donald Stufft



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Sebastian Krause

Tim Peters  wrote:
> secrets.token_bytes() is already the way to spell "get a string of
> messed-up bytes", and that's the dead obvious (according to me) place
> to add the potentially blocking implementation.

I honestly didn't think that this was the dead obvious function to
use. To me the naming kind of suggested that it would do some
special magic that tokens needed, instead of just returning random
bytes (even though the best token is probably just perfectly random
data). If you want to provide a general function for secure random
bytes I would suggest at least a better naming.

Sebastian
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Larry Hastings


On 06/10/2016 12:29 PM, David Mertz wrote:
I believe that secrets.token_bytes() and secrets.SystemRandom() should 
be changed even for 3.5.1 to use getrandom() on Linux.


Surely you meant 3.5.2?  3.5.1 shipped last December.


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Sven R. Kunze


On 10.06.2016 21:17, Donald Stufft wrote:


On Jun 10, 2016, at 3:05 PM, David Mertz > wrote:


OK.  My understanding is that Guido ruled out introducing an 
os.getrandom() API in 3.5.2.  But would you be happy if that 
interface is added to 3.6?


It feels to me like the correct spelling in 3.6 should probably be 
secrets.getrandom() or something related to that.




I am not a security expert but your reply makes it clear to me. So, for 
me this makes:


os -> os-dependent and because of this varies from os to os (also 
quality-wise)

random -> pseudo-random, but it works for most non-critical use-cases
secret -> that's for crypto


If don't need crypto, secret would be a waste of resources, but if you 
need crypto, then os and random are unsafe. I think that's simple 
enough. At least, I would understand it.



Just my 2 cents: if I need crypto, I would pay the price of blocking 
rather then to get an exception (what are my alternatives? I need those 
bits! ) or get unsecure bits.



Sven


Well we have 
https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding 
a getrandom() function to secrets would largely be the same as that 
function.


The problem of course is that the secrets library in 3.6 uses 
os.urandom under the covers, so it’s security rests on the security of 
os.urandom. To ensure that the secrets library is actually safe even 
in early boot it’ll need to stop using os.urandom on Linux and use the 
getrandom() function.


That same library exposes random.SystemRandom as secrets.SystemRandom 
[1], and of course SystemRandom uses os.urandom too. So if we want 
people to treat secrets.SystemRandom as “always secure” then it would 
need to stop using os.urandom and start using the get random() 
function on Linux as well.



[1] This is actually documented as "using the highest-quality sources 
provided by the operating system” in the secrets documentation, and 
I’d argue that it is not using the highest-quality source if it’s 
reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems 
where getrandom() is available. Of course, it’s just an alias for 
random.SystemRandom, and that is documented as using os.urandom.


—
Donald Stufft





___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Sebastian Krause

David Mertz  wrote:
> It feels to me like the correct spelling in 3.6 should probably be
> secrets.getrandom() or something related to that.

Since there already is a secrets.randbits(k), I would keep the
naming similar and suggest something like:

secrets.randbytes(k, *, nonblock=False)

With the argument "nonblock" you can control what happens when not
enough entropy is available: It either blocks or (if nonblock=True)
raises an exception. The third option, getting unsecure random data,
is simply not available in this function.

Then you can keep os.urandom() as it was in Python 3.4 and earlier,
but update the documentation to better warn about its behavior and
point developers to the secrets module.

Sebastian
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Brett Cannon

On Fri, 10 Jun 2016 at 12:20 Donald Stufft  wrote:

>
> On Jun 10, 2016, at 3:05 PM, David Mertz  wrote:
>
> OK.  My understanding is that Guido ruled out introducing an
> os.getrandom() API in 3.5.2.  But would you be happy if that interface is
> added to 3.6?
>
> It feels to me like the correct spelling in 3.6 should probably be
> secrets.getrandom() or something related to that.
>
>
>
> Well we have
> https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so
> adding a getrandom() function to secrets would largely be the same as that
> function.
>
> The problem of course is that the secrets library in 3.6 uses os.urandom
> under the covers, so it’s security rests on the security of os.urandom. To
> ensure that the secrets library is actually safe even in early boot it’ll
> need to stop using os.urandom on Linux and use the getrandom() function.
>
> That same library exposes random.SystemRandom as secrets.SystemRandom [1],
> and of course SystemRandom uses os.urandom too. So if we want people to
> treat secrets.SystemRandom as “always secure” then it would need to stop
> using os.urandom and start using the get random() function on Linux as well.
>
>
> [1] This is actually documented as "using the highest-quality sources
> provided by the operating system” in the secrets documentation, and I’d
> argue that it is not using the highest-quality source if it’s reading from
> /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom()
> is available. Of course, it’s just an alias for random.SystemRandom, and
> that is documented as using os.urandom.
>

If that's the case then we should file a bug so we are sure this is the
case and we need to decouple the secrets documentation from random so that
they can operate independently with secrets always doing whatever is
required to be as secure as possible.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread David Mertz

Ooops thinko there! Of course `secrets` won't exist in 3.5.1, so that's
a 3.6 matter instead.

On Fri, Jun 10, 2016 at 12:29 PM, David Mertz  wrote:

> I believe that secrets.token_bytes() and secrets.SystemRandom() should be
> changed even for 3.5.1 to use getrandom() on Linux.
>
> Thanks for fixing my spelling of the secrets API, Donald. :-)
>
> On Fri, Jun 10, 2016 at 12:17 PM, Donald Stufft  wrote:
>
>>
>> On Jun 10, 2016, at 3:05 PM, David Mertz  wrote:
>>
>> OK.  My understanding is that Guido ruled out introducing an
>> os.getrandom() API in 3.5.2.  But would you be happy if that interface is
>> added to 3.6?
>>
>> It feels to me like the correct spelling in 3.6 should probably be
>> secrets.getrandom() or something related to that.
>>
>>
>>
>> Well we have
>> https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so
>> adding a getrandom() function to secrets would largely be the same as that
>> function.
>>
>> The problem of course is that the secrets library in 3.6 uses os.urandom
>> under the covers, so it’s security rests on the security of os.urandom. To
>> ensure that the secrets library is actually safe even in early boot it’ll
>> need to stop using os.urandom on Linux and use the getrandom() function.
>>
>> That same library exposes random.SystemRandom as secrets.SystemRandom
>> [1], and of course SystemRandom uses os.urandom too. So if we want people
>> to treat secrets.SystemRandom as “always secure” then it would need to stop
>> using os.urandom and start using the get random() function on Linux as well.
>>
>>
>> [1] This is actually documented as "using the highest-quality sources
>> provided by the operating system” in the secrets documentation, and I’d
>> argue that it is not using the highest-quality source if it’s reading from
>> /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom()
>> is available. Of course, it’s just an alias for random.SystemRandom, and
>> that is documented as using os.urandom.
>>
>> —
>> Donald Stufft
>>
>>
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread David Mertz

I believe that secrets.token_bytes() and secrets.SystemRandom() should be
changed even for 3.5.1 to use getrandom() on Linux.

Thanks for fixing my spelling of the secrets API, Donald. :-)

On Fri, Jun 10, 2016 at 12:17 PM, Donald Stufft  wrote:

>
> On Jun 10, 2016, at 3:05 PM, David Mertz  wrote:
>
> OK.  My understanding is that Guido ruled out introducing an
> os.getrandom() API in 3.5.2.  But would you be happy if that interface is
> added to 3.6?
>
> It feels to me like the correct spelling in 3.6 should probably be
> secrets.getrandom() or something related to that.
>
>
>
> Well we have
> https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so
> adding a getrandom() function to secrets would largely be the same as that
> function.
>
> The problem of course is that the secrets library in 3.6 uses os.urandom
> under the covers, so it’s security rests on the security of os.urandom. To
> ensure that the secrets library is actually safe even in early boot it’ll
> need to stop using os.urandom on Linux and use the getrandom() function.
>
> That same library exposes random.SystemRandom as secrets.SystemRandom [1],
> and of course SystemRandom uses os.urandom too. So if we want people to
> treat secrets.SystemRandom as “always secure” then it would need to stop
> using os.urandom and start using the get random() function on Linux as well.
>
>
> [1] This is actually documented as "using the highest-quality sources
> provided by the operating system” in the secrets documentation, and I’d
> argue that it is not using the highest-quality source if it’s reading from
> /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom()
> is available. Of course, it’s just an alias for random.SystemRandom, and
> that is documented as using os.urandom.
>
> —
> Donald Stufft
>
>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Donald Stufft

> On Jun 10, 2016, at 3:05 PM, David Mertz  wrote:
> 
> OK.  My understanding is that Guido ruled out introducing an os.getrandom() 
> API in 3.5.2.  But would you be happy if that interface is added to 3.6? 
> 
> It feels to me like the correct spelling in 3.6 should probably be 
> secrets.getrandom() or something related to that.

Well we have 
https://docs.python.org/dev/library/secrets.html#secrets.token_bytes 
 so 
adding a getrandom() function to secrets would largely be the same as that 
function.

The problem of course is that the secrets library in 3.6 uses os.urandom under 
the covers, so it’s security rests on the security of os.urandom. To ensure 
that the secrets library is actually safe even in early boot it’ll need to stop 
using os.urandom on Linux and use the getrandom() function.

That same library exposes random.SystemRandom as secrets.SystemRandom [1], and 
of course SystemRandom uses os.urandom too. So if we want people to treat 
secrets.SystemRandom as “always secure” then it would need to stop using 
os.urandom and start using the get random() function on Linux as well.

[1] This is actually documented as "using the highest-quality sources provided 
by the operating system” in the secrets documentation, and I’d argue that it is 
not using the highest-quality source if it’s reading from /dev/urandom or 
getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of 
course, it’s just an alias for random.SystemRandom, and that is documented as 
using os.urandom.

—
Donald Stufft

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Tim Peters

[David Mertz]
> OK.  My understanding is that Guido ruled out introducing an os.getrandom()
> API in 3.5.2.  But would you be happy if that interface is added to 3.6?
>
> It feels to me like the correct spelling in 3.6 should probably be
> secrets.getrandom() or something related to that.

secrets.token_bytes() is already the way to spell "get a string of
messed-up bytes", and that's the dead obvious (according to me) place
to add the potentially blocking implementation.  Indeed, everything in
the `secrets` module should block when the OS thinks that's needed.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread David Mertz

OK.  My understanding is that Guido ruled out introducing an os.getrandom()
API in 3.5.2.  But would you be happy if that interface is added to 3.6?

It feels to me like the correct spelling in 3.6 should probably be
secrets.getrandom() or something related to that.

On Fri, Jun 10, 2016 at 11:55 AM, Donald Stufft  wrote:

> Ok, so you’re looking for how would you replicate the blocking behavior of
> os.urandom that exists in 3.5.0 and 3.5.1?
>
> In that case, it’s hard. I don’t think linux provides any way to
> externally determine if /dev/urandom has been initialized or not. Probably
> the easiest thing to do would be to interface with the getrandom() function
> using a c-ext, CFFI, or ctypes. If you’re looking for a way of doing this
> without calling the getrandom() function.. I believe the answer is you
> can’t.
>
> The closest thing you can get is checking
> the /proc/sys/kernel/random/entropy_avail file, but that tells you how much
> entropy the system currently thinks it has (which will go up and down over
> time) and corresponds to /dev/random on Linux not /dev/urandom.
>
> You could read from /dev/random, but that’s going to randomly block
> outside of the pool initialization whenever the kernel things it doesn’t
> have enough entropy. Cryptographers and security experts alike consider
> this to be pretty stupid behavior and don’t recommend using it because of
> this “randomly block throughout the use of your application” behavior.
>
> So really, out of the recommended solutions you really only have find a
> way to interface with the getrandom() function, or just consume
> /dev/urandom and hope it’s been initialized.
>
>
> On Jun 10, 2016, at 2:43 PM, David Mertz  wrote:
>
> My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux),
> and block rather than allow bad bits."
>
> I'm not quite sure I understand all of your question, Donald.  On Python
> 3.4—and by BDFL declaration on 3.5.2—os.urandom() *will not* block,
> although it might on 3.5.1.
>
> On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft  wrote:
>
>>
>> On Jun 10, 2016, at 2:29 PM, David Mertz  wrote:
>>
>> If I *were* someone who needed to write a Linux system initialization
>> script using Python 3.5.2, what would the code look like.  I think for this
>> use case, requiring something with a little bit of "code smell" is fine,
>> but I kinda hope it exists at all.
>>
>>
>> Do you mean if os.urandom blocked and you wanted to call os.urandom from
>> your boot script? Or if os.urandom doesn’t block and you wanted to ensure
>> you got good random numbers on boot?
>>
>> —
>> Donald Stufft
>>
>>
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
>
>
> —
> Donald Stufft
>
>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Donald Stufft

> On Jun 10, 2016, at 2:55 PM, Donald Stufft  wrote:
> 
> So really, out of the recommended solutions you really only have find a way 
> to interface with the getrandom() function, or just consume /dev/urandom and 
> hope it’s been initialized.

I’d note, this is one of the reasons why I felt like blocking (or raising an 
exception) on os.urandom was the right solution— because it’s hard to get that 
behavior on Linux otherwise. However, if we instead kept the blocking (or 
exception) behavior, getting the old behavior back on Linux is trivial, since 
it would only require open(“/dev/urandom”).read(…).

—
Donald Stufft

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Donald Stufft

Ok, so you’re looking for how would you replicate the blocking behavior of 
os.urandom that exists in 3.5.0 and 3.5.1?

In that case, it’s hard. I don’t think linux provides any way to externally 
determine if /dev/urandom has been initialized or not. Probably the easiest 
thing to do would be to interface with the getrandom() function using a c-ext, 
CFFI, or ctypes. If you’re looking for a way of doing this without calling the 
getrandom() function.. I believe the answer is you can’t.

The closest thing you can get is checking the 
/proc/sys/kernel/random/entropy_avail file, but that tells you how much entropy 
the system currently thinks it has (which will go up and down over time) and 
corresponds to /dev/random on Linux not /dev/urandom.

You could read from /dev/random, but that’s going to randomly block outside of 
the pool initialization whenever the kernel things it doesn’t have enough 
entropy. Cryptographers and security experts alike consider this to be pretty 
stupid behavior and don’t recommend using it because of this “randomly block 
throughout the use of your application” behavior.

So really, out of the recommended solutions you really only have find a way to 
interface with the getrandom() function, or just consume /dev/urandom and hope 
it’s been initialized.

> On Jun 10, 2016, at 2:43 PM, David Mertz  wrote:
> 
> My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), and 
> block rather than allow bad bits."
> 
> I'm not quite sure I understand all of your question, Donald.  On Python 
> 3.4—and by BDFL declaration on 3.5.2—os.urandom() *will not* block, although 
> it might on 3.5.1.
> 
> On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft  > wrote:
> 
>> On Jun 10, 2016, at 2:29 PM, David Mertz > > wrote:
>> 
>> If I *were* someone who needed to write a Linux system initialization script 
>> using Python 3.5.2, what would the code look like.  I think for this use 
>> case, requiring something with a little bit of "code smell" is fine, but I 
>> kinda hope it exists at all.
> 
> 
> Do you mean if os.urandom blocked and you wanted to call os.urandom from your 
> boot script? Or if os.urandom doesn’t block and you wanted to ensure you got 
> good random numbers on boot?
> 
> —
> Donald Stufft
> 
> 
> 
> 
> 
> 
> -- 
> Keeping medicines from the bloodstreams of the sick; food 
> from the bellies of the hungry; books from the hands of the 
> uneducated; technology from the underdeveloped; and putting 
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.

—
Donald Stufft

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Franklin? Lee

Eric, have you any work in progress on compact dicts?

On Fri, Jun 10, 2016 at 12:54 PM, Eric Snow  wrote:
> On Thu, Jun 9, 2016 at 1:10 PM, Émanuel Barry  wrote:
>> As stated by Guido (and pointed out in the PEP):
>>
>> Making **kwds ordered is still open, but requires careful design and
>> implementation to avoid slowing down function calls that don't benefit.
>>
>> The PEP has not been updated in a while, though. Python 3.5 has been
>> released, and with it a C implementation of OrderedDict.
>>
>> Eric, are you still interested in this?
>
> Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6).
> I'm certainly not opposed to someone picking up the banner.
> 
>
>> IIRC that PEP was one of the
>> motivating use cases for implementing OrderedDict in C.
>
> Correct, though I'm not sure OrderedDict needs to be involved any more.
>
>> Maybe it's time for
>> a second round of discussion on Python-ideas?
>
> Fine with me, though I won't have a lot of time in the 3.6 timeframe
> to handle a high-volume discussion or push through an implementation.
>
> -eric
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Meador Inge

On Fri, Jun 10, 2016 at 6:13 AM, Victor Stinner 
wrote:

The second result is a new perf module which includes all "tricks"
> discovered in my research: compute average and standard deviation,
> spawn multiple worker child processes, automatically calibrate the
> number of outter-loop iterations, automatically pin worker processes
> to isolated CPUs, and more.
>

Apologies in advance if this is answered in one of the links you posted, but
out of curiosity was geometric mean considered?

In the compiler world this is a very common way of aggregating performance
results.

-- Meador
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread David Mertz

My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux),
and block rather than allow bad bits."

I'm not quite sure I understand all of your question, Donald.  On Python
3.4—and by BDFL declaration on 3.5.2—os.urandom() *will not* block,
although it might on 3.5.1.

On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft  wrote:

>
> On Jun 10, 2016, at 2:29 PM, David Mertz  wrote:
>
> If I *were* someone who needed to write a Linux system initialization
> script using Python 3.5.2, what would the code look like.  I think for this
> use case, requiring something with a little bit of "code smell" is fine,
> but I kinda hope it exists at all.
>
>
> Do you mean if os.urandom blocked and you wanted to call os.urandom from
> your boot script? Or if os.urandom doesn’t block and you wanted to ensure
> you got good random numbers on boot?
>
> —
> Donald Stufft
>
>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Chris Jerdonek

On Fri, Jun 10, 2016 at 11:29 AM, David Mertz  wrote:
> This is fairly academic, since I do not anticipate needing to do this
> myself, but I have a specific question.  I'll assume that Python 3.5.2 will
> go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux.
> Moreover, I understand that the case where the insecure bits might be
> returned are limited to Python scripts that run on system initialization on
> Linux.
>
> If I *were* someone who needed to write a Linux system initialization script
> using Python 3.5.2, what would the code look like.  I think for this use
> case, requiring something with a little bit of "code smell" is fine, but I
> kinda hope it exists at all.

Good question.  And going back to Larry's original e-mail, where he said--

On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings  wrote:
> THE PROBLEM
> ...
> The issue author had already identified the cause: CPython was blocking on
> getrandom() in order to initialize hash randomization.  On this fresh
> virtual machine the entropy pool started out uninitialized.  And since the
> only thing running on the machine was CPython, and since CPython was blocked
> on initialization, the entropy pool was initializing very, very slowly.

it seems to me that you'd want such a solution to have code that
causes the initialization of the entropy pool to be sped up so that it
happens as quickly as possible (if that is even possible).  Is it
possible? (E.g. by causing the machine to start doing things other
than just CPython?)

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Kevin Modzelewski via Python-Dev

Hi all, I wrote a blog post about this.
http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/

We can rule out any argument that one (minimum or average) is strictly
better than the other, since there are cases that make either one better.
It comes down to our expectation of the underlying distribution.

Victor if you could calculate the sample skewness
 of your results I
think that would be very interesting!

kmod

On Fri, Jun 10, 2016 at 10:04 AM, Steven D'Aprano 
wrote:

> On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote:
> > I started to work on visualisation. IMHO it helps to understand the
> problem.
> >
> > Let's create a large dataset: 500 samples (100 processes x 5 samples):
> > ---
> > $ python3 telco.py --json-file=telco.json -p 100 -n 5
> > ---
> >
> > Attached plot.py script creates an histogram:
> > ---
> > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms
> >
> > 26.1 ms:   1 #
> > 26.2 ms:  12 #
> > 26.3 ms:  34 
> > 26.4 ms:  44 
> > 26.5 ms: 109 ##
> > 26.6 ms: 117 
> > 26.7 ms:  86 ##
> > 26.8 ms:  50 ##
> > 26.9 ms:  32 ###
> > 27.0 ms:  10 
> > 27.1 ms:   3 ##
> > 27.2 ms:   1 #
> > 27.3 ms:   1 #
> >
> > minimum 26.1 ms: 0.2% (1) of 500 samples
> > ---
> [...]
> > The distribution looks a gaussian curve:
> > https://en.wikipedia.org/wiki/Gaussian_function
>
> Lots of distributions look a bit Gaussian, but they can be skewed, or
> truncated, or both. E.g. the average life-span of a lightbulb is
> approximately Gaussian with a central peak at some value (let's say 5000
> hours), but while it is conceivable that you might be really lucky and
> find a bulb that lasts 15000 hours, it isn't possible to find one that
> lasts -1 hours. The distribution is truncated on the left.
>
> To me, your graph looks like the distribution is skewed: the right-hand
> tail (shown at the bottom) is longer than the left-hand tail, six
> buckets compared to five buckets. There are actual statistical tests for
> detecting deviation from Gaussian curves, but I'd have to look them up.
> But as a really quick and dirty test, we can count the number of samples
> on either side of the central peak (the mode):
>
> left: 109+44+34+12+1 = 200
> centre: 117
> right: 500 - 200 - 117 = 183
>
> It certainly looks *close* to Gaussian, but with the crude tests we are
> using, we can't be sure. If you took more and more samples, I would
> expect that the right-hand tail would get longer and longer, but the
> left-hand tail would not.
>
>
> > The interesting thing is that only 1 sample on 500 are in the minimum
> > bucket (26.1 ms). If you say that the performance is 26.1 ms, only
> > 0.2% of your users will be able to reproduce this timing.
>
> Hmmm. Okay, that is a good point. In this case, you're not so much
> reporting your estimate of what the "true speed" of the code snippet
> would be in the absence of all noise, but your estimate of what your
> users should expect to experience "most of the time".
>
> Assuming they have exactly the same hardware, operating system, and load
> on their system as you have.
>
>
> > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms ..
> > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us
> > 394/500 = 79%.
> >
> > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than
> > 26.1 ms (0.2%).
>
> I think I understand the point you are making. I'll have to think about
> it some more to decide if I agree with you.
>
> But either way, I think the work you have done on perf is fantastic and
> I think this will be a great tool. I really love the histogram. Can you
> draw a histogram of two functions side-by-side, for comparisons?
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/kmod%40dropbox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Donald Stufft


> On Jun 10, 2016, at 2:29 PM, David Mertz  wrote:
> 
> If I *were* someone who needed to write a Linux system initialization script 
> using Python 3.5.2, what would the code look like.  I think for this use 
> case, requiring something with a little bit of "code smell" is fine, but I 
> kinda hope it exists at all.


Do you mean if os.urandom blocked and you wanted to call os.urandom from your 
boot script? Or if os.urandom doesn’t block and you wanted to ensure you got 
good random numbers on boot?

—
Donald Stufft



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread David Mertz

This is fairly academic, since I do not anticipate needing to do this
myself, but I have a specific question.  I'll assume that Python 3.5.2 will
go back to the 2.6-3.4 behavior in which os.urandom() never blocks on
Linux.  Moreover, I understand that the case where the insecure bits might
be returned are limited to Python scripts that run on system initialization
on Linux.

If I *were* someone who needed to write a Linux system initialization
script using Python 3.5.2, what would the code look like.  I think for this
use case, requiring something with a little bit of "code smell" is fine,
but I kinda hope it exists at all.


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace

2016-06-10 Thread Nick Coghlan

On 10 June 2016 at 09:42, Eric Snow  wrote:
> On Thu, Jun 9, 2016 at 2:39 PM, Nick Coghlan  wrote:
>> That restriction would be comparable to what we do with __slots__ today:
>>
>> >>> class C:
>> ... __slots__ = 1
>> ...
>> Traceback (most recent call last):
>>  File "", line 1, in 
>> TypeError: 'int' object is not iterable
>
> Are you suggesting that we require it be a tuple of identifiers (or
> None) and raise TypeError otherwise, similar to __slots__?  The
> difference is that __slots__ has specific type requirements that do
> not apply to __definition_order__, as well as a different purpose.
> __definition_order__ is about preserving definition-type info that we
> are currently throwing away.

If we don't enforce the tuple-of-identifiers restriction at type
creation time, everyone that *doesn't* make it a tuple-of-identifiers
is likely to have a subtle compatibility bug with class decorators and
other code that assume the default tuple-of-identifiers format is the
only possible format (aside from None). To put it in PEP 484 terms:
regardless of what the PEP says, people are going to assume the type
of __definition_order__ is Optional[Tuple[str]], as that's going to
cover almost all class definitions they encounter.

It makes sense to me to give class definitions and metaclasses the
opportunity to change the *content* of the definition order: "Use
these names in this order, not the names and order you would have
calculated by default".

It doesn't make sense to me to give them an opportunity to change the
*form* of the definition order, since that makes it incredibly
difficult to consume correctly: "Sure, it's *normally* a
tuple-of-identifiers, but it *might* be a dictionary, or a complex
number, or a set, or whatever the class author decided to make it".

By contrast, if the class machinery enforces Optional[Tuple[str]],
then it becomes a lot easier to consume reliably, and anyone violating
the constraint gets an immediate exception when defining the offending
class, rather than a potentially obscure exception from a class
decorator or other piece of code that assumes __definition_order__
could only be None or a tuple of strings.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Nick Coghlan

On 10 June 2016 at 11:00, Neil Schemenauer  wrote:
> On 6/10/2016 10:49 AM, Nick Coghlan wrote:
>> More -3 warnings in Python 2.7 are definitely welcome (since those can
>> pick up runtime behaviors that the static analysers miss), and if
>> there are things the existing code converters and static analysers
>> *could* detect but don't, that's a fruitful avenue for improvement as
>> well.
>
> We are really limited on what can be done with the bytes/string issue
> because in Python 2 there is no distinct type for bytes. Also, the standard
> library does all sorts of unclean mixing of str and unicode so a warning
> would spew a lot of noise.
>
> Likewise, a warning about comparison behavior (None, default ordering of
> types) would also not be useful because there is so much standard library
> code that would spew warnings.

Implicitly enabling those warnings universally with -3 might not be an
option then, but it may be feasible to have those warnings ignored by
default, and allow people to enable them selectively for their own
code via the warnings module.

Failing that, you may be right that there's value in a permissive
Python 3.x variant as an optional compatibility testing tool (I admit
I originally thought you were proposing such an environment as a
production deployment target for partially migrated code, which I'd be
thoroughly against, but as a tool for running a test suite or
experimentally migrated instance it would be closer in spirit to the
-3 switch and the static analysers - folks can use it if they think it
will help them, but they don't need to worry about it if they don't
need it themselves)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Neil Schemenauer


On 6/10/2016 10:49 AM, Nick Coghlan wrote:

What Brett said is mostly accurate for me, except with one slight
caveat: I've been explicitly trying to nudge you towards making the
*existing tools better*, rather than introducing new tools. With
modernize and futurize we have a fairly clear trade-off ("Do you want
your code to look more like Python 2 or more like Python 3?"), and
things like "pylint --py3k" and the static analyzers are purely
additive to the migration process (so folks can take them or leave
them), but alternate interpreter builds and new converters have really
high barriers to adoption.


I agree with that idea.  If there is anything that is "clean" enough, it 
should be merged with either 2.7.x or 3.x.  There is nothing in my tree 
that can be usefully merged though.



More -3 warnings in Python 2.7 are definitely welcome (since those can
pick up runtime behaviors that the static analysers miss), and if
there are things the existing code converters and static analysers
*could* detect but don't, that's a fruitful avenue for improvement as
well.
We are really limited on what can be done with the bytes/string issue 
because in Python 2 there is no distinct type for bytes. Also, the 
standard library does all sorts of unclean mixing of str and unicode so 
a warning would spew a lot of noise.


Likewise, a warning about comparison behavior (None, default ordering of 
types) would also not be useful because there is so much standard 
library code that would spew warnings.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Nick Coghlan

On 10 June 2016 at 07:09, Cody Piersall  wrote:
>> One problem is that the str literals should be bytes
>> literals.  Comparison with None needs to be avoided.
>>
>> With Python 2 code runs successfully.  With Python 3 the code
>> crashes with a traceback.  With my modified Python 3.6, the code
>> runs successfully but generates the following warnings:
>>
>> test.py:13: DeprecationWarning: encoding bytes to str
>>   output.write('%d:' % len(s))
>> test.py:14: DeprecationWarning: encoding bytes to str
>>   output.write(s)
>> test.py:15: DeprecationWarning: encoding bytes to str
>>   output.write(',')
>> test.py:5: DeprecationWarning: encoding bytes to str
>>   if c == ':':
>> test.py:9: DeprecationWarning: encoding bytes to str
>>   size += c
>> test.py:24: DeprecationWarning: encoding bytes to str
>>   data = data + s
>> test.py:26: DeprecationWarning: encoding bytes to str
>>   if input.read(1) != ',':
>> test.py:31: DeprecationWarning: default compare is depreciated
>>   if a > 0:
>>
>
> This seems _very_ useful; I'm surprised that other people don't think
> so too.  Currently, the easiest way to find bytes/str errors in a big
> application is by running the program, finding where it crashes,
> fixing that one line (or hopefully wherever the data entered the
> system if you can find it), and repeating the process.

It could be very interesting to add an "ascii-warn" codec to Python
2.7, and then set that as the default encoding when the -3 flag is
set. The expressed lack of interest has been in the idea of
recommending people use an alternate interpreter build (which has
nothing to do with the usefulness of the added warnings, and
everything to do with the logistics of distributing and adopting
alternate runtimes), rather than in the concept of improving the
available runtime compatibility warnings.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Nick Coghlan

On 9 June 2016 at 16:43, Brett Cannon  wrote:
> That's not what I'm saying at all (nor what I think Nick is saying); more
> tooling to ease the transition is always welcomed.

What Brett said is mostly accurate for me, except with one slight
caveat: I've been explicitly trying to nudge you towards making the
*existing tools better*, rather than introducing new tools. With
modernize and futurize we have a fairly clear trade-off ("Do you want
your code to look more like Python 2 or more like Python 3?"), and
things like "pylint --py3k" and the static analyzers are purely
additive to the migration process (so folks can take them or leave
them), but alternate interpreter builds and new converters have really
high barriers to adoption.

More -3 warnings in Python 2.7 are definitely welcome (since those can
pick up runtime behaviors that the static analysers miss), and if
there are things the existing code converters and static analysers
*could* detect but don't, that's a fruitful avenue for improvement as
well.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Brett Cannon

On Fri, 10 Jun 2016 at 10:11 Steven D'Aprano  wrote:

> On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote:
> > I started to work on visualisation. IMHO it helps to understand the
> problem.
> >
> > Let's create a large dataset: 500 samples (100 processes x 5 samples):
> > ---
> > $ python3 telco.py --json-file=telco.json -p 100 -n 5
> > ---
> >
> > Attached plot.py script creates an histogram:
> > ---
> > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms
> >
> > 26.1 ms:   1 #
> > 26.2 ms:  12 #
> > 26.3 ms:  34 
> > 26.4 ms:  44 
> > 26.5 ms: 109 ##
> > 26.6 ms: 117 
> > 26.7 ms:  86 ##
> > 26.8 ms:  50 ##
> > 26.9 ms:  32 ###
> > 27.0 ms:  10 
> > 27.1 ms:   3 ##
> > 27.2 ms:   1 #
> > 27.3 ms:   1 #
> >
> > minimum 26.1 ms: 0.2% (1) of 500 samples
> > ---
> [...]
> > The distribution looks a gaussian curve:
> > https://en.wikipedia.org/wiki/Gaussian_function
>
> Lots of distributions look a bit Gaussian, but they can be skewed, or
> truncated, or both. E.g. the average life-span of a lightbulb is
> approximately Gaussian with a central peak at some value (let's say 5000
> hours), but while it is conceivable that you might be really lucky and
> find a bulb that lasts 15000 hours, it isn't possible to find one that
> lasts -1 hours. The distribution is truncated on the left.
>
> To me, your graph looks like the distribution is skewed: the right-hand
> tail (shown at the bottom) is longer than the left-hand tail, six
> buckets compared to five buckets. There are actual statistical tests for
> detecting deviation from Gaussian curves, but I'd have to look them up.
> But as a really quick and dirty test, we can count the number of samples
> on either side of the central peak (the mode):
>
> left: 109+44+34+12+1 = 200
> centre: 117
> right: 500 - 200 - 117 = 183
>
> It certainly looks *close* to Gaussian, but with the crude tests we are
> using, we can't be sure. If you took more and more samples, I would
> expect that the right-hand tail would get longer and longer, but the
> left-hand tail would not.
>
>
> > The interesting thing is that only 1 sample on 500 are in the minimum
> > bucket (26.1 ms). If you say that the performance is 26.1 ms, only
> > 0.2% of your users will be able to reproduce this timing.
>
> Hmmm. Okay, that is a good point. In this case, you're not so much
> reporting your estimate of what the "true speed" of the code snippet
> would be in the absence of all noise, but your estimate of what your
> users should expect to experience "most of the time".
>
>
I think the other way to think about why you don't want to use the minimum
is what if one run just happened to get lucky and ran when nothing else was
running (some random lull on the system), while the second run didn't get
so lucky on magically hitting an equivalent lull? Using the average helps
remove the "luck of the draw" potential of taking the minimum. This is why
the PyPy folks suggested to Victor to not consider the minimum but the
average instead; minimum doesn't measure typical system behaviour.



> Assuming they have exactly the same hardware, operating system, and load
> on their system as you have.
>

Sure, but that's true of any benchmarking. The only way to get accurate
measurements for one's own system is to run the benchmarks yourself.

-Brett


>
>
> > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms ..
> > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us
> > 394/500 = 79%.
> >
> > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than
> > 26.1 ms (0.2%).
>
> I think I understand the point you are making. I'll have to think about
> it some more to decide if I agree with you.
>
> But either way, I think the work you have done on perf is fantastic and
> I think this will be a great tool. I really love the histogram. Can you
> draw a histogram of two functions side-by-side, for comparisons?
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-10 Thread Nick Coghlan

On 9 June 2016 at 19:21, Barry Warsaw  wrote:
> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
>
>>Deprecation of current "zero-initialised sequence" behaviour
>>
>>
>>Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
>>argument and interpret it as meaning to create a zero-initialised sequence of
>>the given size::
>>
>> >>> bytes(3)
>> b'\x00\x00\x00'
>> >>> bytearray(3)
>> bytearray(b'\x00\x00\x00')
>>
>>This PEP proposes to deprecate that behaviour in Python 3.6, and remove it
>>entirely in Python 3.7.
>>
>>No other changes are proposed to the existing constructors.
>
> Does it need to be *actually* removed?  That does break existing code for not
> a lot of benefit.  Yes, the default constructor is a little wonky, but with
> the addition of the new constructors, and the fact that you're not proposing
> to eventually change the default constructor, removal seems unnecessary.
> Besides, once it's removed, what would `bytes(3)` actually do?  The PEP
> doesn't say.

Raise TypeError, presumably. However, I agree this isn't worth the
hassle of breaking working code, especially since truly ludicrous
values will fail promptly with MemoryError - it's only a particular
range of values that fit within the limits of the machine, but also
push it into heavy swapping that are a potential problem.

> Also, since you're proposing to add `bytes.byte(3)` have you considered also
> adding an optional count argument?  E.g. `bytes.byte(3, count=7)` would yield
> b'\x03\x03\x03\x03\x03\x03\x03'.  That seems like it could be useful.

The purpose of bytes.byte() in the PEP is to provide a way to
roundtrip ord() calls with binary inputs, since the current spelling
is pretty unintuitive:

>>> ord("A")
65
>>> chr(ord("A"))
'A'
>>> ord(b"A")
65
>>> bytes([ord(b"A")])
b'A'

That said, perhaps it would make more sense for the corresponding
round-trip to be:

>>> bchr(ord("A"))
b'A'

With the "b" prefix on "chr" reflecting the "b" prefix on the output.
This also inverts the chr/unichr pairing that existed in Python 2
(replacing it with bchr/chr), and is hence very friendly to
compatibility modules like six and future (future.builtins already
provides a chr that behaves like the Python 3 one, and bchr would be
much easier to add to that than a new bytes object method).

In terms of an efficient memory-preallocation interface, the
equivalent NumPy operation to request a pre-filled array is
"ndarray.full":
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.full.html
(there's also an inplace mutation operation, "fill")

For bytes and bytearray though, that has an unfortunate name collision
with "zfill", which refers to zero-padding numeric values for fixed
width display.

If the PEP just added bchr() to complement chr(), and [bytes,
bytearray].zeros() as a more discoverable alternative to passing
integers to the default constructor, I think that would be a decent
step forward, and the question of pre-initialising with arbitrary
values can be deferred for now (and perhaps left to NumPy
indefinitely)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Steven D'Aprano

On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote:
> I started to work on visualisation. IMHO it helps to understand the problem.
> 
> Let's create a large dataset: 500 samples (100 processes x 5 samples):
> ---
> $ python3 telco.py --json-file=telco.json -p 100 -n 5
> ---
> 
> Attached plot.py script creates an histogram:
> ---
> avg: 26.7 ms +- 0.2 ms; min = 26.2 ms
> 
> 26.1 ms:   1 #
> 26.2 ms:  12 #
> 26.3 ms:  34 
> 26.4 ms:  44 
> 26.5 ms: 109 ##
> 26.6 ms: 117 
> 26.7 ms:  86 ##
> 26.8 ms:  50 ##
> 26.9 ms:  32 ###
> 27.0 ms:  10 
> 27.1 ms:   3 ##
> 27.2 ms:   1 #
> 27.3 ms:   1 #
> 
> minimum 26.1 ms: 0.2% (1) of 500 samples
> ---
[...] 
> The distribution looks a gaussian curve:
> https://en.wikipedia.org/wiki/Gaussian_function

Lots of distributions look a bit Gaussian, but they can be skewed, or 
truncated, or both. E.g. the average life-span of a lightbulb is 
approximately Gaussian with a central peak at some value (let's say 5000 
hours), but while it is conceivable that you might be really lucky and 
find a bulb that lasts 15000 hours, it isn't possible to find one that 
lasts -1 hours. The distribution is truncated on the left.

To me, your graph looks like the distribution is skewed: the right-hand 
tail (shown at the bottom) is longer than the left-hand tail, six 
buckets compared to five buckets. There are actual statistical tests for 
detecting deviation from Gaussian curves, but I'd have to look them up. 
But as a really quick and dirty test, we can count the number of samples 
on either side of the central peak (the mode):

left: 109+44+34+12+1 = 200
centre: 117
right: 500 - 200 - 117 = 183

It certainly looks *close* to Gaussian, but with the crude tests we are 
using, we can't be sure. If you took more and more samples, I would 
expect that the right-hand tail would get longer and longer, but the 
left-hand tail would not.

> The interesting thing is that only 1 sample on 500 are in the minimum
> bucket (26.1 ms). If you say that the performance is 26.1 ms, only
> 0.2% of your users will be able to reproduce this timing.

Hmmm. Okay, that is a good point. In this case, you're not so much 
reporting your estimate of what the "true speed" of the code snippet 
would be in the absence of all noise, but your estimate of what your 
users should expect to experience "most of the time".

Assuming they have exactly the same hardware, operating system, and load 
on their system as you have.

> The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms ..
> 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us
> 394/500 = 79%.
> 
> IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than
> 26.1 ms (0.2%).

I think I understand the point you are making. I'll have to think about 
it some more to decide if I agree with you.

But either way, I think the work you have done on perf is fantastic and 
I think this will be a great tool. I really love the histogram. Can you 
draw a histogram of two functions side-by-side, for comparisons?

-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Terry Reedy


On 6/10/2016 11:07 AM, Victor Stinner wrote:

I started to work on visualisation. IMHO it helps to understand the problem.

Let's create a large dataset: 500 samples (100 processes x 5 samples):


As I finished by response to Steven, I was thinking you should do 
something like this to get real data.



---
$ python3 telco.py --json-file=telco.json -p 100 -n 5
---

Attached plot.py script creates an histogram:
---
avg: 26.7 ms +- 0.2 ms; min = 26.2 ms

26.1 ms:   1 #
26.2 ms:  12 #
26.3 ms:  34 
26.4 ms:  44 
26.5 ms: 109 ##
26.6 ms: 117 
26.7 ms:  86 ##
26.8 ms:  50 ##
26.9 ms:  32 ###
27.0 ms:  10 
27.1 ms:   3 ##
27.2 ms:   1 #
27.3 ms:   1 #

minimum 26.1 ms: 0.2% (1) of 500 samples
---

Replace "if 1" with "if 0" to produce a graphical view, or just view
the attached distribution.png, the numpy+scipy histogram.

The distribution looks a gaussian curve:
https://en.wikipedia.org/wiki/Gaussian_function


I am not too surprised.  If there are several somewhat independent 
sources of slowdown, their sum would tend to be normal.  I am also not 
surprised that there is also a bit of skewness, but probably not enough 
to worry about.



The interesting thing is that only 1 sample on 500 are in the minimum
bucket (26.1 ms). If you say that the performance is 26.1 ms, only
0.2% of your users will be able to reproduce this timing.

The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms ..
26.9 ms: we got 109+117+86+50+32 samples in this range which gives us
394/500 = 79%.

IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than
26.1 ms (0.2%).


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Alex Walters



> -Original Message-
> From: Python-Dev [mailto:python-dev-bounces+tritium-
> list=sdamon@python.org] On Behalf Of Sebastian Krause
> Sent: Friday, June 10, 2016 1:01 PM
> To: python-dev@python.org
> Subject: Re: [Python-Dev] BDFL ruling request: should we block forever
> waiting for high-quality random bits?
> 
> Guido van Rossum  wrote:
> > I just don't like the potentially blocking behavior, and experts'
opinions
> > seem to widely vary on how insecure the fallback bits really are, how
> > likely you are to find yourself in that situation, and how probable an
> > exploit would be.
> 
> This is not just a theoretical problem being discussed by security
> experts that *could* be exploited, there have already been multiple
> real-life cases of devices (mostly embedded Linux machines)
> generating predicatable SSH keys because they read from an
> uninitialized /dev/urandom at first boot. Most recently in the
> Raspbian distribution for the Raspberry Pi:
> https://www.raspberrypi.org/forums/viewtopic.php?f=66=126892
> 
> At least in 3.6 there should be obvious way to get random data that
> *always* guarantees to be secure and either fails or blocks if it
> can't guarantee that.
> 
> Sebastian

And that should live in the secrets module.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread zreed

I would be super excited for this feature, so if there's a reasonable
chance of it being picked up I don't mind doing the implementation work.


On Fri, Jun 10, 2016, at 11:54 AM, Eric Snow wrote:
> On Thu, Jun 9, 2016 at 1:10 PM, Émanuel Barry  wrote:
> > As stated by Guido (and pointed out in the PEP):
> >
> > Making **kwds ordered is still open, but requires careful design and
> > implementation to avoid slowing down function calls that don't benefit.
> >
> > The PEP has not been updated in a while, though. Python 3.5 has been
> > released, and with it a C implementation of OrderedDict.
> >
> > Eric, are you still interested in this?
> 
> Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6).
> I'm certainly not opposed to someone picking up the banner.
> 
> 
> > IIRC that PEP was one of the
> > motivating use cases for implementing OrderedDict in C.
> 
> Correct, though I'm not sure OrderedDict needs to be involved any more.
> 
> > Maybe it's time for
> > a second round of discussion on Python-ideas?
> 
> Fine with me, though I won't have a lot of time in the 3.6 timeframe
> to handle a high-volume discussion or push through an implementation.
> 
> -eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Sebastian Krause

Guido van Rossum  wrote:
> I just don't like the potentially blocking behavior, and experts' opinions
> seem to widely vary on how insecure the fallback bits really are, how
> likely you are to find yourself in that situation, and how probable an
> exploit would be.

This is not just a theoretical problem being discussed by security
experts that *could* be exploited, there have already been multiple
real-life cases of devices (mostly embedded Linux machines)
generating predicatable SSH keys because they read from an
uninitialized /dev/urandom at first boot. Most recently in the
Raspbian distribution for the Raspberry Pi:
https://www.raspberrypi.org/forums/viewtopic.php?f=66=126892

At least in 3.6 there should be obvious way to get random data that
*always* guarantees to be secure and either fails or blocks if it
can't guarantee that.

Sebastian
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Terry Reedy


On 6/10/2016 9:20 AM, Steven D'Aprano wrote:

On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:

Hi,

Last weeks, I made researchs on how to get stable and reliable
benchmarks, especially for the corner case of microbenchmarks. The
first result is a serie of article, here are the first three:


Thank you for this! I am very interested in benchmarking.


https://haypo.github.io/journey-to-stable-benchmark-system.html
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
https://haypo.github.io/journey-to-stable-benchmark-average.html


I strongly question your statement in the third:

[quote]
But how can we compare performances if results are random?
Take the minimum?

No! You must never (ever again) use the minimum for
benchmarking! Compute the average and some statistics like
the standard deviation:
[end quote]


While I'm happy to see a real-world use for the statistics module, I
disagree with your logic.

The problem is that random noise can only ever slow the code down, it
cannot speed it up. To put it another way, the random errors in the
timings are always positive.

Suppose you micro-benchmark some code snippet and get a series of
timings. We can model the measured times as:

measured time t = T + ε

where T is the unknown "true" timing we wish to estimate,


For comparative timings, we do not care about T.  So arguments about the 
best estimate of T mist the point.


What we do wish to estimate is the relationship between two Ts, T0 for 
'control', and T1 for 'treatment', in particular T1/T0.  I suspect 
Viktor is correct that mean(t1)/mean(t0) is better than min(t1)/min(t0) 
as an estimate of the true ratio T1/T0 (for a particular machine).


But given that we have matched pairs of measurements with the same 
hashseed and address, it may be better yet to estimate T1/T0 from the 
ratios t1i/t0i, where i indexes experimental conditions.  But it has 
been a long time since I have read about estimation of ratios.  What I 
remember is that this is a nasty subject.


It is also the case that while an individual with one machine wants the 
best ratio for that machine, we need to make CPython patch decisions for 
the universe of machines that run Python.



and ε is some variable error due to noise in the system.

> But ε is always positive,  never negative,

lognormal might be a first guess. But what we really have is 
contributions from multiple factors,


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Eric Snow

On Thu, Jun 9, 2016 at 1:10 PM, Émanuel Barry  wrote:
> As stated by Guido (and pointed out in the PEP):
>
> Making **kwds ordered is still open, but requires careful design and
> implementation to avoid slowing down function calls that don't benefit.
>
> The PEP has not been updated in a while, though. Python 3.5 has been
> released, and with it a C implementation of OrderedDict.
>
> Eric, are you still interested in this?

Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6).
I'm certainly not opposed to someone picking up the banner.


> IIRC that PEP was one of the
> motivating use cases for implementing OrderedDict in C.

Correct, though I'm not sure OrderedDict needs to be involved any more.

> Maybe it's time for
> a second round of discussion on Python-ideas?

Fine with me, though I won't have a lot of time in the 3.6 timeframe
to handle a high-volume discussion or push through an implementation.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Eric Snow

On Thu, Jun 9, 2016 at 12:41 PM,   wrote:
> Is there any further thoughts on including this in 3.6?

I don't have any plans and I don't know of anyone willing to champion
the PEP for 3.6.  Note that the implementation itself shouldn't take
very long.

>  Similar to the
> recent discussion on OrderedDict namespaces for metaclasses, this would
> simplify / enable a number of type factory use cases where proper
> metaclasses are overkill. This feature would also be quite nice in say
> pandas where the (currently unspecified) field order used in the
> definition of frames is preserved in user-visible displays.

Good point.  One weakness of the PEP has been sufficient
justification.  The greater number of compelling use cases, the
better.  So thanks! :)

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace

2016-06-10 Thread Eric Snow

On Thu, Jun 9, 2016 at 2:39 PM, Nick Coghlan  wrote:
> I'm guessing Ethan is suggesting defining it as:
>
> __definition_order__ = tuple(ns["__definition_order__"])
>
> When the attribute is present in the method body.

Ah.  I'd rather stick to "consenting adults" in the case that
__definition_order__ is explicitly set.  We'll strongly recommend
setting it to None or a tuple of identifier strings.

>
> That restriction would be comparable to what we do with __slots__ today:
>
> >>> class C:
> ... __slots__ = 1
> ...
> Traceback (most recent call last):
>  File "", line 1, in 
> TypeError: 'int' object is not iterable

Are you suggesting that we require it be a tuple of identifiers (or
None) and raise TypeError otherwise, similar to __slots__?  The
difference is that __slots__ has specific type requirements that do
not apply to __definition_order__, as well as a different purpose.
__definition_order__ is about preserving definition-type info that we
are currently throwing away.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Brett Cannon

On Thu, 9 Jun 2016 at 19:53 Mark Lawrence via Python-Dev <
python-dev@python.org> wrote:

> On 10/06/2016 00:43, Brett Cannon wrote:
> >
> > That's not what I'm saying at all (nor what I think Nick is saying);
> > more tooling to ease the transition is always welcomed. The point we are
> > trying to make is 2to3 is not considered best practice anymore, and so
> > targeting its specific output might not be the best use of your time.
> > I'm totally happy to have your fork work out and help give warnings for
> > situations where runtime semantics are the only way to know there will
> > be a problem that static analyzing tools can't handle and have the
> > porting HOWTO updated so that people can run their test suite with your
> > interpreter to help with that final bit of porting. I personally just
> > don't want to see you waste time on warnings that are handled by the
> > tools already or ignore the fact that six, modernize, and futurize can
> > help more than 2to3 typically can with the easy stuff when trying to
> > keep 2/3 compatibility. IOW some of us have become allergic to the word
> > "2to3" in regards to porting. :) But if you want to target 2to3 output
> > then by all means please do and your work will still be appreciated.
> >
>
> Given the above and that 2to3 appears to be unsupported* is there a case
> for deprecating it?
>

I don't think so because it's still a useful transpiler tool. Basically the
community has decided the standard rewriters included with 2to3 aren't how
people prefer to port, but 2to3 as a tool is the basis of both modernize
and futurize (as are some of those rewriters, but tweaked to do something
different).


>
> *  There are 46 outstanding issues on the bug tracker.  Is the above the
> reason for this, I don't know?
>

Typically the bugs are for the rewrite rules and they are for edge cases
that no one wants to try and tackle as they are tough to cover (although
this is based on what comes through my inbox so my generalization could be
wrong).

-Brett


>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Guido van Rossum

I somehow feel compelled to clarify that (perhaps unlike Larry) my concern
is not the strict rules of backwards compatibility (if that was the case I
would have objected to changing this in 3.5.2).

I just don't like the potentially blocking behavior, and experts' opinions
seem to widely vary on how insecure the fallback bits really are, how
likely you are to find yourself in that situation, and how probable an
exploit would be.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Victor Stinner

2016-06-10 17:09 GMT+02:00 Paul Moore :
> Also, the way people commonly use
> micro-benchmarks ("hey, look, this way of writing the expression goes
> faster than that way") doesn't really address questions like "is the
> difference statistically significant".

If you use the "python3 -m perf compare method1.json method2.json",
perf will checks that the difference is significant using the
is_significant() method:
http://perf.readthedocs.io/en/latest/api.html#perf.is_significant
"This uses a Student’s two-sample, two-tailed t-test with alpha=0.95."

FYI at the beginning, this function comes from the Unladen Swallow
benchmark suite ;-)

We should design a CLI command to do timeit+compare at once.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Summary of Python tracker Issues

2016-06-10 Thread Python tracker


ACTIVITY SUMMARY (2016-06-03 - 2016-06-10)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open5553 (+16)
  closed 33491 (+75)
  total  39044 (+91)

Open issues with patches: 2424 


Issues opened (69)
==

#16484: pydoc generates invalid docs.python.org link for xml.etree.Ele
http://bugs.python.org/issue16484  reopened by martin.panter

#26243: zlib.compress level as keyword argument
http://bugs.python.org/issue26243  reopened by serhiy.storchaka

#26839: Python 3.5 running on Linux kernel 3.17+ can block at startup 
http://bugs.python.org/issue26839  reopened by haypo

#27186: add os.fspath()
http://bugs.python.org/issue27186  reopened by brett.cannon

#27197: mock.patch interactions with "from" imports
http://bugs.python.org/issue27197  opened by clarkbreyman

#27198: Adding an assertClose() method to unittest.TestCase
http://bugs.python.org/issue27198  opened by ChrisBarker

#27199: TarFile expose copyfileobj bufsize to improve throughput
http://bugs.python.org/issue27199  opened by fried

#27200: make doctest in CPython has failures
http://bugs.python.org/issue27200  opened by Jelle Zijlstra

#27201: expose the ABI name as a config variable
http://bugs.python.org/issue27201  opened by doko

#27204: Failing doctests in Doc/howto/
http://bugs.python.org/issue27204  opened by Jelle Zijlstra

#27205: Failing doctests in Library/collections.rst
http://bugs.python.org/issue27205  opened by Jelle Zijlstra

#27206: Failing doctests in Doc/tutorial/
http://bugs.python.org/issue27206  opened by Jelle Zijlstra

#27207: Failing doctests in Doc/whatsnew/3.2.rst
http://bugs.python.org/issue27207  opened by Jelle Zijlstra

#27208: Failing doctests in Library/traceback.rst
http://bugs.python.org/issue27208  opened by Jelle Zijlstra

#27209: Failing doctests in Library/email.*.rst
http://bugs.python.org/issue27209  opened by Jelle Zijlstra

#27210: Failing doctests due to environmental dependencies in Lib/*lib
http://bugs.python.org/issue27210  opened by Jelle Zijlstra

#27212: Doc for itertools, 'islice()' implementation have unwanted beh
http://bugs.python.org/issue27212  opened by alex0307

#27213: Rework CALL_FUNCTION* opcodes
http://bugs.python.org/issue27213  opened by serhiy.storchaka

#27214: a potential future bug and an optimization that mostly undermi
http://bugs.python.org/issue27214  opened by Oren Milman

#27218: improve tracing performance with f_trace set to Py_None
http://bugs.python.org/issue27218  opened by xdegaye

#27219: turtle.fillcolor doesn't accept a tuple of floats
http://bugs.python.org/issue27219  opened by Jelle Zijlstra

#27220: Add a pure Python version of 'collections.defaultdict'
http://bugs.python.org/issue27220  opened by ebarry

#27221: multiprocessing documentation is outdated regarding method pic
http://bugs.python.org/issue27221  opened by memeplex

#27222: redundant checks and a weird use of goto statements in long_rs
http://bugs.python.org/issue27222  opened by Oren Milman

#27223: _ready_ready and _write_ready should respect _conn_lost
http://bugs.python.org/issue27223  opened by lukasz.langa

#27226: distutils: unable to compile both .opt-1.pyc and .opt2.pyc sim
http://bugs.python.org/issue27226  opened by mgorny

#27227: argparse fails to parse [] when using choices and nargs='*'
http://bugs.python.org/issue27227  opened by evan_

#27231: Support the fspath protocol in the posixpath module
http://bugs.python.org/issue27231  opened by Jelle Zijlstra

#27232: os.fspath() should not use repr() on error
http://bugs.python.org/issue27232  opened by Jelle Zijlstra

#27233: Missing documentation for PyOS_FSPath
http://bugs.python.org/issue27233  opened by Jelle Zijlstra

#27235: Heap overflow occurred due to the int overflow (Python-2.7.11/
http://bugs.python.org/issue27235  opened by madness

#27238: Bare except: usages in turtle.py
http://bugs.python.org/issue27238  opened by Jelle Zijlstra

#27240: 'UnstructuredTokenList' object has no attribute '_fold_as_ew'
http://bugs.python.org/issue27240  opened by touilleMan

#27241: Catch exceptions raised in pstats add (repl)
http://bugs.python.org/issue27241  opened by ll

#27242: Make the docs for NotImplemented & NotImplementedError unambig
http://bugs.python.org/issue27242  opened by ebarry

#27243: __aiter__ should return async iterator instead of awaitable
http://bugs.python.org/issue27243  opened by yselivanov

#27244: print(';;') fails in pdb with SyntaxError
http://bugs.python.org/issue27244  opened by cjw296

#27245: IDLE: Fix deletion of custom themes and key bindings
http://bugs.python.org/issue27245  opened by terry.reedy

#27248: Possible refleaks in PyType_Ready in error condition
http://bugs.python.org/issue27248  opened by xiang.zhang

#27250: Add os.urandom_block()
http://bugs.python.org/issue27250  opened by haypo

#27252: Make dict views copyable

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Paul Moore

On 10 June 2016 at 15:34, David Malcolm  wrote:
>> The problem is that random noise can only ever slow the code down, it
>> cannot speed it up.
[...]
> Isn't it possible that under some circumstances the 2nd process could
> prefetch memory into the cache in such a way that the workload under
> test actually gets faster than if the 2nd process wasn't running?

My feeling is that it would be much rarer for random effects to speed
up the benchmark under test - possible in the sort of circumstance you
describe, but not common.

The conclusion I draw is "be careful how you interpret summary
statistics if you don't know the distribution of the underlying data
as an estimator of the value you are interested in".

In the case of Victor's article, he's specifically trying to
compensate for variations introduced by Python's hash randomisation
algorithm. And for that, you would get both positive and negative
effects on code speed, so the average makes sense. But only if you've
already eliminated the other common noise (such as other proceses,
etc). In Victor's articles, he sounds like he's done this, but he's
using very Linux-specific mechanisms, and I don't know if he's done
the same for other platforms. Also, the way people commonly use
micro-benchmarks ("hey, look, this way of writing the expression goes
faster than that way") doesn't really address questions like "is the
difference statistically significant".

Summary: Micro-benchmarking is hard. Victor looks like he's done some
really interesting work on it, but any "easy to use" timeit tool will
typically get used in an over-simplistic way in practice, and so you
probably shouldn't read too much into timing figures quoted in
isolation, no matter what tool was used to generate them.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Paul Moore

On 10 June 2016 at 15:09, Cody Piersall  wrote:
>> One problem is that the str literals should be bytes
>> literals.  Comparison with None needs to be avoided.
>>
>> With Python 2 code runs successfully.  With Python 3 the code
>> crashes with a traceback.  With my modified Python 3.6, the code
>> runs successfully but generates the following warnings:
>>
>> test.py:13: DeprecationWarning: encoding bytes to str
>>   output.write('%d:' % len(s))
>> test.py:14: DeprecationWarning: encoding bytes to str
>>   output.write(s)
>> test.py:15: DeprecationWarning: encoding bytes to str
>>   output.write(',')
>> test.py:5: DeprecationWarning: encoding bytes to str
>>   if c == ':':
>> test.py:9: DeprecationWarning: encoding bytes to str
>>   size += c
>> test.py:24: DeprecationWarning: encoding bytes to str
>>   data = data + s
>> test.py:26: DeprecationWarning: encoding bytes to str
>>   if input.read(1) != ',':
>> test.py:31: DeprecationWarning: default compare is depreciated
>>   if a > 0:
>>
>
> This seems _very_ useful; I'm surprised that other people don't think
> so too.  Currently, the easiest way to find bytes/str errors in a big
> application is by running the program, finding where it crashes,
> fixing that one line (or hopefully wherever the data entered the
> system if you can find it), and repeating the process.

It *is* very nice. But...

> This is nice because you can get in "fix my encoding errors" mode for
> more than just one traceback at a time; the new method would be to run
> the program, look at the millions of bytes/str errors, and fix
> everything that showed up in this round at once.  That seems like a
> big win for productivity to me.

If you're fixing encoding errors at the point they occur, rather than
looking at the high-level design of the program's handling of textual
and bytestring data, you're likely to end up in a bit of a mess no
matter how you locate the issues. Most likely because at the point in
the code where the warning occurs, you no longer know what the correct
encoding to use should be.

But absolutely, anything that gives extra information about where the
encoding hotspots are in your code is of value.
Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Cody Piersall

> One problem is that the str literals should be bytes
> literals.  Comparison with None needs to be avoided.
>
> With Python 2 code runs successfully.  With Python 3 the code
> crashes with a traceback.  With my modified Python 3.6, the code
> runs successfully but generates the following warnings:
>
> test.py:13: DeprecationWarning: encoding bytes to str
>   output.write('%d:' % len(s))
> test.py:14: DeprecationWarning: encoding bytes to str
>   output.write(s)
> test.py:15: DeprecationWarning: encoding bytes to str
>   output.write(',')
> test.py:5: DeprecationWarning: encoding bytes to str
>   if c == ':':
> test.py:9: DeprecationWarning: encoding bytes to str
>   size += c
> test.py:24: DeprecationWarning: encoding bytes to str
>   data = data + s
> test.py:26: DeprecationWarning: encoding bytes to str
>   if input.read(1) != ',':
> test.py:31: DeprecationWarning: default compare is depreciated
>   if a > 0:
>

This seems _very_ useful; I'm surprised that other people don't think
so too.  Currently, the easiest way to find bytes/str errors in a big
application is by running the program, finding where it crashes,
fixing that one line (or hopefully wherever the data entered the
system if you can find it), and repeating the process.

This is nice because you can get in "fix my encoding errors" mode for
more than just one traceback at a time; the new method would be to run
the program, look at the millions of bytes/str errors, and fix
everything that showed up in this round at once.  That seems like a
big win for productivity to me.

Cody
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-10 Thread Sebastian Krause

Nathaniel Smith  wrote:
> (This is based on the assumption that the only time that explicitly
> calling os.urandom is the best option is when one cares about the
> cryptographic strength of the result -- I'm explicitly distinguishing
> here between the hash seeding issue that triggered the original bug
> report and explicit calls to os.urandom.)

I disagree with that assumption. I've often found myself to use
os.urandom for non-secure random data and seen it as the best option
simply because it directly returns the type I wanted: bytes.

The last time I looked the random module didn't have a function to
directly give me bytes, so I would have to wrap it in something like:

bytearray(random.getrandbits(8) for _ in range(size))

Or maybe the function exists, but then it doesn't seem very
discoverable. Ideally I would only want to use the random module for
non-secure and (in 3.6) the secrets module (which could block) for
secure random data and never bother with os.urandom (and knowing how
it behaves). But then those modules should probably get new
functions to directly return bytes.

Sebastian
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread David Malcolm

On Fri, 2016-06-10 at 23:20 +1000, Steven D'Aprano wrote:
> On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
> > Hi,
> > 
> > Last weeks, I made researchs on how to get stable and reliable
> > benchmarks, especially for the corner case of microbenchmarks. The
> > first result is a serie of article, here are the first three:
> 
> Thank you for this! I am very interested in benchmarking.
> 
> > https://haypo.github.io/journey-to-stable-benchmark-system.html
> > https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
> > https://haypo.github.io/journey-to-stable-benchmark-average.html
> 
> I strongly question your statement in the third:
> 
> [quote]
> But how can we compare performances if results are random? 
> Take the minimum?
> 
> No! You must never (ever again) use the minimum for 
> benchmarking! Compute the average and some statistics like
> the standard deviation:
> [end quote]
> 
> 
> While I'm happy to see a real-world use for the statistics module, I 
> disagree with your logic.
> 
> The problem is that random noise can only ever slow the code down, it
> cannot speed it up. 

Consider a workload being benchmarked running on one core, which has a
particular pattern of cache hits and misses.  Now consider another
process running on a sibling core, sharing the same cache.

Isn't it possible that under some circumstances the 2nd process could
prefetch memory into the cache in such a way that the workload under
test actually gets faster than if the 2nd process wasn't running?

[...snip...]

Hope this is constructive
Dave
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Steven D'Aprano

On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
> Hi,
> 
> Last weeks, I made researchs on how to get stable and reliable
> benchmarks, especially for the corner case of microbenchmarks. The
> first result is a serie of article, here are the first three:

Thank you for this! I am very interested in benchmarking.

> https://haypo.github.io/journey-to-stable-benchmark-system.html
> https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
> https://haypo.github.io/journey-to-stable-benchmark-average.html

I strongly question your statement in the third:

[quote]
But how can we compare performances if results are random? 
Take the minimum?

No! You must never (ever again) use the minimum for 
benchmarking! Compute the average and some statistics like
the standard deviation:
[end quote]

While I'm happy to see a real-world use for the statistics module, I 
disagree with your logic.

The problem is that random noise can only ever slow the code down, it 
cannot speed it up. To put it another way, the random errors in the 
timings are always positive.

Suppose you micro-benchmark some code snippet and get a series of 
timings. We can model the measured times as:

measured time t = T + ε

where T is the unknown "true" timing we wish to estimate, and ε is some 
variable error due to noise in the system. But ε is always positive, 
never negative, and we always measure something larger than T.

Let's suppose we somehow (magically) know what the epsilons are:

measurements = [T + 0.01, T + 0.02, T + 0.04, T + 0.01]

The average is (4*T + 0.08)/4 = T + 0.02

But the minimum is T + 0.01, which is a better estimate than the 
average. Taking the average means that *worse* epsilons will effect your 
estimate, while the minimum means that only the smallest epsilon effects 
your estimate.

Taking the average is appropriate is if the error terms can be positive 
or negative, e.g. if they are *measurement error* rather than noise:

measurements = [T + 0.01, T - 0.02, T + 0.04, T - 0.01]

The average is (4*T + 0.02)/4 = T + 0.005

The minimum is T - 0.02, which is worse than the average.

Unless you have good reason to think that the timing variation is mostly 
caused by some error which can be both positive and negative, the 
minimum is the right statistic to use, not the average. But ask 
yourself: what sort of error, noise or external influence will cause the 
code snippet to run FASTER than the fastest the CPU can execute it?

-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Victor Stinner

Hi,

Last weeks, I made researchs on how to get stable and reliable
benchmarks, especially for the corner case of microbenchmarks. The
first result is a serie of article, here are the first three:

https://haypo.github.io/journey-to-stable-benchmark-system.html
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
https://haypo.github.io/journey-to-stable-benchmark-average.html

The second result is a new perf module which includes all "tricks"
discovered in my research: compute average and standard deviation,
spawn multiple worker child processes, automatically calibrate the
number of outter-loop iterations, automatically pin worker processes
to isolated CPUs, and more.

The perf module allows to store benchmark results as JSON to analyze
them in depth later. It helps to configure correctly a benchmark and
check manually if it is reliable or not.

The perf documentation also explains how to get stable and reliable
benchmarks (ex: how to tune Linux to isolate CPUs).

perf has 3 builtin CLI commands:

* python -m perf: show and compare JSON results
* python -m perf.timeit: new better and more reliable implementation of timeit
* python -m metadata: display collected metadata

Python 3 is recommended to get time.perf_counter(), use the new
accurate statistics module, automatic CPU pinning (I will implement it
on Python 2 later), etc. But Python 2.7 is also supported, fallbacks
are implemented when needed.

Example with the patched telco benchmark (benchmark for the decimal
module) on a Linux with two isolated CPUs.

First run the benchmark:
---
$ python3 telco.py --json-file=telco.json
.
Average: 26.7 ms +- 0.2 ms
---


Then show the JSON content to see all details:
---
$ python3 -m perf -v show telco.json
Metadata:
- aslr: enabled
- cpu_affinity: 2, 3
- cpu_count: 4
- cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
- hostname: smithers
- loops: 10
- platform: Linux-4.4.9-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
- python_executable: /usr/bin/python3
- python_implementation: cpython
- python_version: 3.4.3

Run 1/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.8 ms, 26.7 ms
Run 2/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms
Run 3/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.9 ms, 26.8 ms
(...)
Run 25/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms

Average: 26.7 ms +- 0.2 ms (25 runs x 3 samples; 1 warmup)
---

Note: benchmarks can be analyzed with Python 2.

I'm posting my email to python-dev because providing timeit results is
commonly requested in review of optimization patches.

The next step is to patch the CPython benchmark suite to use the perf
module. I already forked the repository and started to patch some
benchmarks.

If you are interested by Python performance in general, please join us
on the speed mailing list!
https://mail.python.org/mailman/listinfo/speed

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Paul Moore

On 10 June 2016 at 03:13, Barry Warsaw  wrote:
> In my own experience, and IIRC Amber had a similar experience, the ease of
> porting to Python 3 really comes down to how bytes/unicode clean your code
> base is.  Almost all the other pieces are either pretty manageable or fairly
> easily automated.  But if you're code isn't bytes-clean you're in for a world
> of hurt because you first have to decide how to represent those things.
> Twisted's job is especially fun because it's all about wire protocols, which I
> think Amber described as (paraphrasing) bytes that happen to have contents
> that look like strings.

Although I have much less experience with porting than many others in
this thread, that's my experience as well. Get a clear and
well-understood separation of bytes and strings, and the rest of the
porting exercise is (relatively!) straightforward. But if you just
once think "I'm not quite sure, but I think I just need to decode here
to be safe" and you'll be fighting Unicode errors for ever.

My hope is that static typing tools like MyPy could help here. I
typically review Python 2 code by mentally categorising which
functions (theoretically) take bytes, which take strings, and which
are confused. And sort things out from there. Type annotations seem
like they'd help that process. But I've yet to use typing in practice,
so it may not be that simple.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-10 Thread Stephen J. Turnbull

Neil Schemenauer writes:

 > I have to wonder if you guys actually ported at lot of Python 2
 > code.

Python 3 (including stdlib) itself is quite a bit of code.

 > According to you guys, there is no problem

No, according to us, there are problems, but in the code, not in the
language or its implementation.  This is a Brooksian "no silver
bullet" problem: it's very hard to write reliable code that handles
multiple text representations (as pretty much everything does
nowadays), except by converting to internal text on input and back to
encoded text on output.  The warnings you quote (and presumably the
code that generates them) make assumptions (cf Barry's post) that are
frequently invalid.  I don't know about cross-type comparisons, but as
Barry and Brett both pointed out, mixtures of bytes and text are
*rarely* easy to fix, because it's often extremely difficult to know
which is the appropriate representation for a given variable unless
you do a complete refactoring as described above.  When I've tried to
fix such warnings one at a time, it's always been whack-a-mole.

The experience in GNU Emacs and Mailman 2 has been that it took about
ten years to get to the point where they went a whole year without an
encoding bug once non-Latin-1 encodings were being handled.  XEmacs
OTOH took only about 3 years from the proof-of-concept introduction of
multibyte characters to essentially no bugs (except in C code, of
course!) because we had the same policy as Python 3: bytes and text
don't mix, and in development we also would abort on mixing integers
and characters (in GNU Emacs, the character type was the same as the
integer type until very recently).  We affectionately referred to
those bugs as "Ebola" (not very polite, but it gets the point across
about how seriously we took the idea of making the internal text
representation completely opaque).  In Mailman 2, we still can't say
confidently that there are no Unicode bugs left even today.  We still
need an outer "except UnicodeError: quarantine_and_call_for_help(msg)"
handler, although AFAIK it hasn't been reported for a couple years.

It's not that you can't continue to run the potentially buggy code in
Python 2.  Mailman 2 does; you can, too.  What we don't support (and I
personally hope we never support) is running that code in Python 3
(warnings or no).  If you want to support that yourself, more power to
you, but I advise you that my experience suggests that it's not going
to be a panacea, and I do believe it's going to be more trouble than
biting the bullet and just thoroughly porting your code.  Even if that
takes as much time as it took Amber to port Twisted.

 > and we already have good enough tooling. ;-(

Nobody said that, just that the existing tooling is pretty good for
the problems that tools can help with, while no tool is likely to be
much help with some of the code your tool allows to run.  You're
welcome to try to prove that claim wrong -- if you do, it would indeed
be very valuable!  But I personally, based on my own experience, think
that the chance of success is too low to justify the cost.  (Granted,
I don't have to port Twisted, so in that sense I'm biased. :-/ )

BTW tools continue to be added, as well as language changes (PEP 461!)
There is no resistence to that.

What you're running into here is that several of us have substantial
experience with various of the issues raised, and that experience
convinces us that there's no silver bullet, just hard work, if you
face them.

Steve

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

73 matches

Mail list logo