[Python-Dev] Re: Restrict the type of __slots__

2022-03-18 Thread Raymond Hettinger
> I propose to restrict the type of __slots__.

-1 for adding a restriction.  This breaks code for no good reason.  This
API has been around for a very long time.  I've seen lists, tuples, dicts,
single strings, and occasionally something more exotic.  Why wreck stable
code?

Also, the inspect module will detect whether __slots__ is a dictionary and
will use it to display docstrings.  In the database world, data
dictionaries have proven value, so it would be a bummer to kill off this
functionality which is used in much the same way as docstrings for
properties.  It is still rarely used, but I'm hoping it will catch on (just
like people are slowly growing more aware that they can add docstringa to
fields in named tuples).

Raymond



On Fri, Mar 18, 2022 at 4:33 AM Serhiy Storchaka 
wrote:

> Currently __slots__ can be either string or an iterable of strings.
>
> 1. If it is a string, it is a name of a single slot. Third-party code
> which iterates __slots__ will be confused.
>
> 2. If it is an iterable, it should emit names of slots. Note that
> non-reiterable iterators are accepted too, but it causes weird bugs if
> __slots__ is iterated more than once. For example it breaks default
> pickling and copying.
>
> I propose to restrict the type of __slots__. Require it always been a
> tuple of strings. Most __slots__ in real code are tuples. It is rarely
> we need only single slot and set __slots__ as a string.
>
> It will break some code (there are 2 occurrences in the stdlib an 1 in
> scripts), but that code can be easily fixed.
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/E32BRLAWOU5GESMZ5MLAOIYPXSL37HOI/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E5ROGDNKI5FFPTXBQGHUQSVVHCAB7VUT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Optimizing literal comparisons and contains

2021-11-27 Thread raymond . hettinger
For the benefit of the audience on python-dev, you should also mention that 
this proposal and associated PR has been twice discussed and rejected on the 
tracker:

   https://bugs.python.org/issue45907
   https://bugs.python.org/issue45843

The response just given by Skip pretty much matches the comments already given 
by Batuhan, Pablo, and Serhiy.  So far, no one who has looked at this thinks 
this should be done.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QQ7TBTIQSCXZ66E3WLLT77EUKIQOO3FN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 467 feedback from the Steering Council

2021-09-09 Thread raymond . hettinger
> I would rather keep `bchr` and lose the `.fromint()` methods.

For me, "bchr" isn't a readable name.  If I expand mentally expand it to 
"byte_character", it becomes an oxymoron that opposes what we try teach about 
bytes and characters being different things.  

Can you show examples in existing code of how this would be used? I'm unclear 
on how frequently users need to create a single byte from an integer.  For me, 
it is very rare.  Perhaps once in a large program will I search for a record 
separator in binary data. I would prefer to write it as:

RS = byte.fromint(30)
...
i = data.index(RS, start)
...
if RS in data:

Having this as bchr() wouldn't make the code better because it is less explicit 
about turning an integer into a byte.  Also, it doesn't look nice when in-lined 
without giving it a variable name:

i = data.index(bchr(30), start) # Yuck
...
if bchr(30) in data:# Yuck

Also keep in mind that we already have a way to spell it, "bytes([30])", so any 
new way needs to significantly add more clarity.  I think bytes.fromint() does 
that.   

The number of use cases also matters.  The bar for adding a new builtin 
function is very high.

Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DOUFRRLGMAFYJZ4ONYK6CKHHCYKPXJBW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 467 feedback from the Steering Council

2021-08-10 Thread raymond . hettinger
I recommend removing the "discouragement" from writing "bytes(10)".  That is 
merely stylistic.  As long as we support the API, it is valid Python.  In the 
contexts where it is currently used, it tends to be clear about what it is 
doing:  buffer = bytearray(bufsize). That doesn't need to be discouraged.

Also, I concur the with SC comment that the singular of bytearray() or bytes() 
is byte(), not bchr().  Practically what people want here is an efficient 
literal that is easier to write than:  b'\x1F'.   I don't think bchr() meets 
that need. Neither bchr(0x1f) or bytearray.fromint(0x1f) are fast (not a 
literal) nor are they easier to read or type.

The history of bytes/bytearray is a dual-purpose view.  It can be used in a 
string-like way to emulate Python 2 string handling (hence all the usual string 
methods and a repr that displays in a string-like fashion).  It can also be 
used as an array of numbers, 0 to 255 (hence the list methods and having an 
iterator of ints).  ISTM that the authors of this PEP reject or want to 
discourage the latter use cases.  

This is disappointing because often the only reasonable way to manipulate 
binary data is with bytearrays.  A user could switch to array.array() or a 
numpy.array, but that is unnecessarily inconvenient given that we already have 
a nice builtin type that means the need (for images, crypto hashes, 
compression, bloom filters, or anything where a C programmer would an array of 
unsigned chars).

Given that bytes/bytearray is already an uncomfortable hybrid of string and 
list APIs for binary data, I don't think the competing views and APIs will be 
disentangled by adding methods that duplicate functionality that already 
exists.   Instead, I recommend that the PEP focus on one or two cases where 
methods could be added that simplify any common tasks that are currently 
awkward.  For example, creating a single byte with bytes([0x1f]) isn't 
pleasant, obvious, or fast.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OKILIXKK7F6BHDRTFRGUFXXUDNNZW3BL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Announcing the CPython Docs Workgroup

2021-05-03 Thread Raymond Hettinger
That seems exclusionary.   Right now, anyone can contribute to documentation, 
anyone can comment on proposals, and any core dev can accept their patches.

In the interest of transparency, can you explain why the other initial members 
did not need to go through an application process?  ISTM the initial group 
excludes our most active documentation contributors and includes people who 
have only minimal contributions to existing documentation and mostly have not 
participated in any documentation reviews on the issue tracker. Did the SC 
approve all the initial members?

Raymond



___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MTZNPTQBUURQABJWNLZA26CKA357SWFP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Announcing the CPython Docs Workgroup

2021-05-01 Thread Raymond Hettinger
Please add me to the list of members for the initial workgroup.

Thank you,


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DQXRKNTL2XJFRQXS4W777XB6XUOVSINL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Please do not remove random bits of information from the tutorial

2020-11-09 Thread Raymond Hettinger



> On Nov 7, 2020, at 9:51 AM, Riccardo Polignieri via Python-Dev 
>  wrote:
> 
> My concern here is that if you start removing or simplifying some 
> "too-difficult-for-a-tutorial" bits of information on an occasional basis, 
> and without too much scrutiny or editorial guidance, you will end up loosing 
> something precious.

I concur with you sentiments and do not want the tutorial to be dumbed down.

Here are a few thoughts on the subject:

* The word "tutorial" does not imply "easy".  Instead it is a self-paced, 
example driven walk-through of the language.  That said, if the word "tutorial" 
doesn't sit well, then just rename the guide.

* The world is full of well-written guides for beginners.  The variety is 
especially important because "beginner" means many different things:  "never 
programmed before", "casually checking out what the language offers", "expert 
in some other language", "is a student in elementary school", "is a student in 
high school", "is an electrical engineer needing write scripts",  etc.   

* One thing that makes the current tutorial special is that much of it was 
written by Guido.  Delete this text and you lose one of the few places where 
his voice comes through.

* There is value in having non-trivial coverage of the language.  When people 
ask how __cause__ works, we can link to the tutorial.  Otherwise, we have to 
throw them to the wolves by linking to the unfriendly, highly technical 
reference guide or to a PEP.

* For many people, our tutorial serves as the only systematic walk-through of 
the language.  If you decide to drop the mention of complex numbers, the odds 
of a person ever finding about that capability drop to almost zero.

* My suggestion is that we add a section to the beginning of the tutorial with 
external links elsewhere, "If you are ten years old, go here.  If have never 
programmed before, go here, etc"

* If you think the word tutorial implies fluffy and easy, then let's just 
rename it to "Language walk-through with examples" or some such.

* FWIW, I've closely monitored the bug tracker daily for almost two decades.  
We almost never get a user complaint that the tutorial is too advanced.  For 
the most part, it has long been of good service to users.  Almost certainly it 
can be improved, but hopefully not be dropping content.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CYFDV4ZYGUFGCYUI5HPTF66UNZ4FXO2M/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Drop Solaris, OpenSolaris, Illumos and OpenIndiana support in Python

2020-10-30 Thread Raymond Hettinger

> On Oct 30, 2020, at 4:51 PM, Gregory P. Smith  wrote:
> 
> On Fri, Oct 30, 2020 at 1:14 PM Raymond Hettinger 
>  wrote:
> FWIW, when the tracker issue landed with a PR, I became concerned that it 
> would be applied without further discussion and without consulting users.
> 
> An issue and a PR doesn't simply mean "it is happening".

There have been a number of issues/pr pairs this year that have followed 
exactly that path.  

While we'll never know for sure, it is my belief that this would have been 
applied had I not drawn attention to it.  Very few people follow the bug 
tracker everyday — the sparse Solaris community almost certainly would not have 
been aware of the tracker entry.   Likewise, I don't think there would have 
been a python-dev thread; otherwise, it would have happened *prior* to the PR, 
the tracker issue, and all of the comments from the people affected.  The call 
for helpers was made only *after* the user pleas not to pull the trigger.  

It's all fine now.  The decision is being broadly discussed.  That is what is 
important.


Raymond  

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UT24JUKSRYPXTV5BR2NDLME5Q6YCSAI5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Drop Solaris, OpenSolaris, Illumos and OpenIndiana support in Python

2020-10-30 Thread Raymond Hettinger
Here are a couple comments on the Twitter thread that warrant your attention.

Apparently, this is being used by the European Space Agency on their space 
craft.
-- https://twitter.com/nikolaivk/status/1322094167980466178

"To be clear I will put some money where my mouth is.  If we need to invest 
resources either in the form of developers or dollars to keep the port alive we 
will. By we I mean RackTop and/or Staysail Systems." -- 
https://twitter.com/gedamore/status/1321959956199866369


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F5CBTK4KRG4IY3OYD25VEUEEJNYDPZZU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Drop Solaris, OpenSolaris, Illumos and OpenIndiana support in Python

2020-10-30 Thread Raymond Hettinger
I vote against removal.

We have no compelling need to disrupt an entire community and ecosystem even 
though it it is small.  

To anyone chiming in to say, yes drop the support, ask whether you've consulted 
any of the users — they should have a say in the matter. It is better for them 
to be a bit neglected than to be cut it off entirely.

FWIW, when the tracker issue landed with a PR, I became concerned that it would 
be applied without further discussion and without consulting users.  So I asked 
on Twitter whether Solaris was being used.  If you're interested in the 
responses, see the thread at: https://twitter.com/i/status/1321917936668340227 
(Victor can't see it because he blocked my account a long time ago).  Also take 
a look at the user comments on the tracker: https://bugs.python.org/issue42173 
.  For those who don't follow links, here's a sample:

* "Platform genocide is both unnecessary and unwarranted." -- brett3
* "Please do not drop support." -- jm650
* "I just want to lend my voice in favor of maintaining "Solarish" support as 
well, and offer what help I may for resolving issues."-- robertfrench
* "No no no, please don't." -- tbalbers
* "Please do not drop support for SunOS." -- mariuspana
* "Please continue support for Solaris/IllumOS! This is very important for us." 
-- marcheschi
* "Please don't drop Solaris support, we still use it to this day." -- abarbu
* ... and many more will the same flavor

Given this kind of user response, I think it would irresponsible to drop 
support.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OUANIRPREED7ULVVVZX6CSEIEIJRPFDU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-18 Thread Raymond Hettinger



> On Oct 17, 2020, at 2:40 PM, Tim Peters  wrote:
> 
> Still waiting for someone who thinks string search speed is critical
> in their real app to give it a try.  In the absence of that, I endorse
> merging this.

Be bold.  Merge it.   :-)


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZHFE4H5RCCRW6XTYLNXOBP2FURZ2VIKW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 620: Hide implementation details from the C API

2020-06-29 Thread Raymond Hettinger


> On Jun 29, 2020, at 5:46 PM, Victor Stinner  wrote:
> 
> You missed the point of the PEP: "It becomes possible to experiment
> with more advanced optimizations in CPython than just
> micro-optimizations, like tagged pointers."
> 
> IMHO it's time to stop wasting our limited developer resources on
> micro-optimizations and micro-benchmarks, but think about overall
> Python performance and major Python internals redesign to find a way
> to make Python overall 2x faster, rather than making a specific
> function 10% faster.

That is a really bold claim.  AFAICT there is zero evidence that this actually 
possible.  Like the sandboxing project, these experiments may all prove to be 
dead-ends.

If we're going to bet the farm on this, there should at least be a 
proof-of-concept. Otherwise, it's just an expensive lottery ticket.

> I don't think that the performance of accessing namedtuple attributes
> is a known bottleneck of Python performance.

This time you missed the point.  Named tuple access was just one point of 
impact — it is not the only code that calls PyTuple_Check().   It looks like 
inlining did not work and that EVERY SINGLE type check in CPython was affected 
(including third party extensions).  Also, there was no review — we have a 
single developer pushing through hundreds of these changes at a rate where no 
one else can keep up.

> Measuring benchmarks which take less than 1 second requires being very
> careful.


Perhaps you don't want to believe the results, but the timings are careful, 
stable, repeatable, and backed-up by a disassembly that shows the exact cause.  
The builds used for the timings were the production macOS builds as distributed 
on python.org.

There is a certain irony in making repeated, unsubstantiated promises to make 
the language 2x faster and then checking in changes that make the 
implementation slower.


Raymond


P.S. What PyPy achieved was monumental.  But it took a decade even with a 
well-organized and partially-funded team of superstars. It always lagged 
CPython in features. And the results were entirely dependent on a single design 
decision to run a pure python interpreter written in rpython to take advantage 
of its tracing JIT.  I don't imagine CPython can hope to achieve anything like 
this.  Likely, the best we can do is replace reference counting with garbage 
collection.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W3GTCPQCTXH3DMFFNS7KDV5GAHT6XZLJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 620: Hide implementation details from the C API

2020-06-29 Thread Raymond Hettinger

> On Jun 22, 2020, at 5:10 AM, Victor Stinner  wrote:
> 
> Introduce C API incompatible changes to hide implementation details.

How much of the existing C extension ecosystem do you expect to break as a 
result of these incompatible changes? 


> It will be way easier to add new features.

This isn't self-evident.  What is currently difficult that would be easier?


> It becomes possible to experiment with more advanced optimizations in CPython
> than just micro-optimizations, like tagged pointers.

Is there any proof-of-concept to suggest that it is in realm of possibility 
that such an experiment would produce a favorable outcome?  Otherwise, it isn't 
a reasonable justification for an extensive and irrevocable series a sweeping 
changes that affect the entire ecosystem of existing extensions.


> **STATUS**: Completed (in Python 3.9)

I'm not sure that many people are monitoring that huge number of changes that 
have gone in mostly unreviewed.  Mark Shannon and Stephan Krah have both raised 
concerns.  It seems like one person has been given blanket authorization to 
revise nearly every aspect of the internals and to undo the design choices made 
by all the developers who've previously worked on the project.


> Converting macros to static inline functions should only impact very few
> C extensions which use macros in unusual ways.

These should be individually verified to make sure they actually get inlined by 
the compiler.  In https://bugs.python.org/issue39542 about nine PRs were 
applied without review or discussion.  One of those, 
https://github.com/python/cpython/pull/18364 , converted PyType_Check() to 
static inline function but I'm not sure that it actually does get inlined.  
That may be the reason named tuple attribute access slowed by about 25% between 
Python 3.8 and Python 3.9.¹  Presumably, that PR also affected every single 
type check in the entire C codebase and will affect third-party extensions as 
well.

FWIW, I do appreciate the devotion and amount of effort in this undertaking — 
that isn't a question.  However, as a community this needs to be conscious 
decision.  I'm unclear about whether any benefits will ever materialize.  I am 
clear that packages will be broken, that performance will be impacted, and that 
this is one-way trip that can never be undone.  Most of the work is being done 
by one person. Many of the PRs aren't reviewed.  The rate and volume of PRs are 
so high that almost no one can keep track of what is happening. Mark and Stefan 
have pushed back but with no effect.


Raymond 


==

¹ Timings for attribute access

$ python3.8 -m timeit -s 'from collections import namedtuple' -s 
'Point=namedtuple("Point", "x y")'  -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; 
p.x; p.y'
200 loops, best of 5: 119 nsec per loop

$ python3.9 -m timeit -s 'from collections import namedtuple' -s 
'Point=namedtuple("Point", "x y")'  -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; 
p.x; p.y'
200 loops, best of 5: 152 nsec per loop

==

Python 3.8 disassembly (clean and fast)
---

_tuplegetter_descr_get:
testq   %rsi, %rsi
je  L299
subq$8, %rsp
movq8(%rsi), %rax
movq16(%rdi), %rdx
testb   $4, 171(%rax)
je  L300
cmpq16(%rsi), %rdx
jnb L301
movq24(%rsi,%rdx,8), %rax
addq$1, (%rax)
L290:
addq$8, %rsp
ret


Python 3.9 disassembly (doesn't look in-lined)
---

_tuplegetter_descr_get:
testq   %rsi, %rsi 
pushq   %r12 <-- new cost
pushq   %rbp <-- new cost
pushq   %rbx <-- new cost
movq%rdi, %rbx
je  L382
movq16(%rdi), %r12
movq%rsi, %rbp
movq8(%rsi), %rdi
call_PyType_GetFlags <-- new non-inlined function call
testl   $67108864, %eax
je  L383
cmpq16(%rbp), %r12
jnb L384
movq24(%rbp,%r12,8), %rax
addq$1, (%rax)
popq%rbx <-- new cost
popq%rbp <-- new cost
popq%r12v<-- new cost
ret





___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q3YHYIKNUQH34FDEJRSLUP2MTYELFWY3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: The Anti-PEP

2020-06-25 Thread Raymond Hettinger
> it is hard to make a decision between the pros and cons, 
> when the pros are in a single formal document and the 
> cons are scattered across the internet.

Mark, I support your idea.  It is natural for PEP authors to not fully 
articulate the voices of opposition or counter-proposals.   
The current process doesn't make it likely that a balanced document is created 
for decision making purposes.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UZBFF5CWWIDRRLYDXIDIPBHJSIQ4RXUE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Latest PEP 554 updates.

2020-05-04 Thread Raymond Hettinger


> On May 4, 2020, at 10:30 AM, Eric Snow  wrote:
> 
> Further feedback is welcome, though I feel like the PR is ready (or
> very close to ready) for pronouncement.  Thanks again to all.


Congratulations.  Regardless of the outcome, you've certainly earned top marks 
for vision, tenacity, team play, and overcoming adversity.

May your sub-interpreters be plentiful,


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TD26ZW2EKO2Q4OFHRHEEF2MQPLXAGHL6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Raymond Hettinger



> On Apr 30, 2020, at 10:44 AM, Carl Meyer  wrote:
> 
> On Wed, Apr 29, 2020 at 9:36 PM Raymond Hettinger
>  wrote:
>> Do you have some concrete examples we could look at?   I'm having trouble 
>> visualizing any real use cases and none have been presented so far.
> 
> This pattern occurs not infrequently in our Django server codebase at
> Instagram. A typical case would be that we need a client object to
> make queries to some external service, queries using the client can be
> made from various locations in the codebase (and new ones could be
> added any time), but there is noticeable overhead to the creation of
> the client (e.g. perhaps it does network work at creation to figure
> out which remote host can service the needed functionality) and so
> having multiple client objects for the same remote service existing in
> the same process is waste.
> 
> Or another similar case might be creation of a "client" object for
> querying a large on-disk data set.

Thanks for the concrete example.  AFAICT, it doesn't require (and probably 
shouldn't have) a lock to be held for the duration of the call.  Would it be 
fair to say the 100% of your needs would be met if we just added this to the 
functools module?

  call_once = lru_cache(maxsize=None)

That's discoverable, already works, has no risk of deadlock, would work with 
multiple argument functions, has instrumentation, and has the ability to clear 
or reset.

I'm still looking for an example that actually requires a lock to be held for a 
long duration.


Raymond

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y3I646QBI7ICASP62ATFBUPROZ2J4TKE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread raymond . hettinger
Would either of the existing solutions work for you?

class X:
def __init__(self, name):
self.name = name

@cached_property
def title(self):
  print("compute title once")
  return self.name.title()

@property
@lru_cache
def upper(self):
  print("compute uppper once")
  return self.name.upper()

obj = X("victor")
print(obj.title)
print(obj.title)
print(obj.upper)
print(obj.upper)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4LW5FFI74J6A4FHLUTKWHH3WLWBMXASM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Raymond Hettinger


> On Apr 30, 2020, at 6:32 AM, Joao S. O. Bueno  wrote:
> 
> Of course this is meant to be something simple - so there are no "real
> world use cases" that are "wow, it could not have
> been done without it".

The proposed implementation does something risky, it hold holds a non-reentrant 
lock across a call to an arbitrary user-defined function.  The only reason to 
do so is to absolutely guarantee the function will never be called twice.  We 
really should look for some concrete examples that require that guarantee, and 
it would be nice to see how that guarantee is being implemented currently (it 
isn't obvious to me).

Also, most initialization functions I've encountered take at least one 
argument, so the proposed call_once() implementation wouldn't be usable at all. 

> I was one of the first to reply to this on
> "python-ideas", as I often need the pattern, but seldon
> worrying about rentrancy, or parallel calling. Most of the uses are
> just that: initalize a resource lazily, and just
> "lru_cache" could work. My first thought was for something more
> light-weight than lru_cache (and a friendlier
> name).

Right.  Those cases could be solved trivially if we added:

call_once = lru_cache(maxsize=None)

which is lightweight, very fast, and has a clear name.  Further, it would work 
with multiple arguments and  would not fail if the underlying function turned 
out to be reentrant.

AFAICT, the *only* reason to not use the lru_cache() implementation is that in 
multithreaded code, it can't guarantee that the underlying function doesn't get 
called a second time while still executing the first time. If those are things 
you don't care about, then you don't need the proposed implementation; we can 
give you what you want by adding a single line to functools.

> So, one of the points I'd likely have used this is here:
> 
> https://github.com/jsbueno/terminedia/blob/d97976fb11ac54b527db4183497730883ba71515/terminedia/unicode.py#L30

Thanks — this is a nice example.  Here's what it tells us:

1) There exists at least one use case for a zero argument initialization 
function
2) Your current solution is trivially easy, clear, and fast.   "if CHAR_BASE: 
return".
3) This function returns None, so efforts by call_once() to block and await a 
result are wasted.
4) It would be inconsequential if this function were called twice.
5) A more common way to do this is to move the test into the lookup() function 
-- see below.


Raymond

-

CHAR_BASE = {}

def _init_chars():
for code in range(0, 0x10):
char = chr(code)
values = {}
attrs = "name category east_asian_width"
for attr in attrs.split():
try:
values[attr] = getattr(unicodedata, attr)(char)
except ValueError:
values[attr] = "undefined"
CHAR_BASE[code] = Character(char, code, values["name"], 
values["category"], values["east_asian_width"])

def lookup(name_part, chars_only=False):
if not CHAR_BASE:
  _init_chars()
results = [char for char in CHAR_BASE.values() if re.search(name_part, 
char.name, re.IGNORECASE)]
if not chars_only:
return results
return [char.char for char in results]
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JZQLF5LXV47SJP6ZSTG27246S6OIYTPM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-29 Thread Raymond Hettinger



> On Apr 29, 2020, at 4:20 PM, Antoine Pitrou  wrote:
> 
> On Wed, 29 Apr 2020 12:01:24 -0700
> Raymond Hettinger  wrote:
>> 
>> The call_once() decorator would need different logic:
>> 
>> 1) if the function has already been called and result is known, return the 
>> prior result  :-)
>> 2) if function has already been called, but the result is not yet known, 
>> either block or fail  :-(
> 
> It definitely needs to block.

Do you think it is safe to hold a non-reentrant lock across an arbitrary user 
function?

Traditionally, the best practice for locks was to acquire, briefly access a 
shared resource, and release promptly.

>> 3) call the function, this cannot be reentrant :-(
> 
> Right.  The typical use for such a function is lazy initialization of
> some resource, not recursive computation.

Do you have some concrete examples we could look at?   I'm having trouble 
visualizing any real use cases and none have been presented so far.

Presumably, the initialization function would have to take zero arguments, have 
a useful return value, must be called only once, not be idempotent, wouldn't 
fail if called in two different processes, can be called from multiple places, 
and can guarantee that a decref, gc, __del__, or weakref callback would never 
trigger a reentrant call.

Also, if you know of a real world use case, what solution is currently being 
used.  I'm not sure what alternative call_once() is competing against.

>> 
>> 6) does not have instrumentation for number of hits
>> 7) does not have a clearing or reset mechanism
> 
> Clearly, instrumentation and a clearing mechanism are not necessary.
> They might be "nice to have", but needn't hinder initial adoption of
> the API.

Agreed.  It is inevitable that those will be requested, but they are incidental 
to the core functionality.

Do you have any thoughts on what the semantics should be if the inner function 
raises an exception?  Would a retry be allowed?  Or does call_once() literally 
mean "can never be called again"?



Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y2MUKYDCV53PBWRRBU4ZAKB5XED4X4HX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-29 Thread Raymond Hettinger

> On Apr 29, 2020, at 12:55 AM, Tom Forbes  wrote:
> 
> Hey Raymond,
> Thanks for your input here! A new method wouldn’t be worth adding purely for 
> performance reasons then, but there is still an issue around semantics and 
> locking.

Right.


> it doesn’t actually ensure the function is called once.

Let's be precise about this.  The lru_cache() logic is:

1) if the function has already been called and result is known, return the 
prior result  :-)
2) call the underlying function
3) add the question/answer pair to the cache dict. 

You are correct that a lru_cache() wrapped function can be called more than 
once if before step three happens, the wrapped function is called again, either 
by another thread or by a reentrant call.  This is by design and means that 
lru_cache() can be wrapped around almost anything, reentrant or not.  Also 
calls to lru_cache() don't block across the function call, nor do they fail 
because another call is in progress.  This makes lru_cache() easy to use and 
reliable, but it does allow the possibility that the function is called more 
than once.

The call_once() decorator would need different logic:

1) if the function has already been called and result is known, return the 
prior result  :-)
2) if function has already been called, but the result is not yet known, either 
block or fail  :-(
3) call the function, this cannot be reentrant :-(
4) record the result for future calls.

The good news is that call_once() can guarantee the function will not be called 
more than once.  The bad news is that task switches during step three will 
either get blocked for the duration of the function call or they will need to 
raise an exception.Likewise, it would be a mistake use call_once() when 
reentrancy is possible.

> The reason I bring this up is that I’ve seen several ad-hoc `call_once` 
> implementations recently, and creating one is surprisingly complex for 
> someone who’s not that experienced with Python.


Would it fair to describe call_once() like this?

call_once() is just like lru_cache() but:

1) guarantees that a function never gets called more than once
2) will block or fail if a thread-switch happens during a call
3) only works for functions that take zero arguments
4) only works for functions that can never be reentrant
5) cannot make the one call guarantee across multiple processes
6) does not have instrumentation for number of hits
7) does not have a clearing or reset mechanism


Raymond


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CTAGWXD7WRU3NAHLP5IZ75PM2E3TQTG2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-28 Thread Raymond Hettinger

>  t...@tomforb.es wrote:
> 
> I would like to suggest adding a simple “once” method to functools. As the 
> name suggests, this would be a decorator that would call the decorated 
> function, cache the result and return it with subsequent calls.

It seems like you would get just about everything you want with one line:

call_once = lru_cache(maxsize=None)

which would be used like this:

   @call_once
   def welcome():
   len('hello')

> Using lru_cache like this works but it’s not as efficient as it could be - in 
> every case you’re adding lru_cache overhead despite not requiring it.


You're likely imagining more overhead than there actually is.  Used as shown 
above, the lru_cache() is astonishingly small and efficient.  Access time is 
slightly cheaper than writing d[()]  where d={(): some_constant}. The 
infinite_lru_cache_wrapper() just makes a single dict lookup and returns the 
value.¹ The lru_cache_make_key() function just increments the empty args tuple 
and returns it.²   And because it is a C object, calling it will be faster than 
for a Python function that just returns a constant, "lambda: some_constant()".  
This is very, very fast.


Raymond


¹ https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L870
² https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L809






> 
> Hello,
> After a great discussion in python-ideas[1][2] it was suggested that I 
> cross-post this proposal to python-dev to gather more comments from those who 
> don't follow python-ideas.
> 
> The proposal is to add a "call_once" decorator to the functools module that, 
> as the name suggests, calls a wrapped function once, caching the result and 
> returning it with subsequent invocations. The rationale behind this proposal 
> is that:
> 1. Developers are using "lru_cache" to achieve this right now, which is less 
> efficient than it could be
> 2. Special casing "lru_cache" to account for zero arity methods isn't trivial 
> and we shouldn't endorse lru_cache as a way of achieving "call_once" 
> semantics 
> 3. Implementing a thread-safe (or even non-thread safe) "call_once" method is 
> non-trivial
> 4. It complements the lru_cache and cached_property methods currently present 
> in functools.
> 
> The specifics of the method would be:
> 1. The wrapped method is guaranteed to only be called once when called for 
> the first time by concurrent threads
> 2. Only functions with no arguments can be wrapped, otherwise an exception is 
> thrown
> 3. There is a C implementation to keep speed parity with lru_cache
> 
> I've included a naive implementation below (that doesn't meet any of the 
> specifics listed above) to illustrate the general idea of the proposal:
> 
> ```
> def call_once(func):
>sentinel = object()  # in case the wrapped method returns None
>obj = sentinel
>@functools.wraps(func)
>def inner():
>nonlocal obj, sentinel
>if obj is sentinel:
>obj = func()
>return obj
>return inner
> ```
> 
> I'd welcome any feedback on this proposal, and if the response is favourable 
> I'd love to attempt to implement it.
> 
> 1. 
> https://mail.python.org/archives/list/python-id...@python.org/thread/5OR3LJO7LOL6SC4OOGKFIVNNH4KADBPG/#5OR3LJO7LOL6SC4OOGKFIVNNH4KADBPG
> 2. 
> https://discuss.python.org/t/reduce-the-overhead-of-functools-lru-cache-for-functions-with-no-parameters/3956
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/5CFUCM4W3Z36U3GZ6Q3XBLDEVZLNFS63/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OYBYJ2373OTHALHTPQJV5EBX6N5M4DDL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Accepting PEP 617: New PEG parser for CPython

2020-04-20 Thread Raymond Hettinger
This will be a nice improvement.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/C3MUSEKXCDL4HSIEIJNBHWQG5B7WCQLD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 "String methods to remove prefixes and suffixes" accepted

2020-04-20 Thread Raymond Hettinger
Please consider adding underscores to the names:  remove_prefix() and 
remove_suffix().

The latter method causes a mental hiccup when first read as removes-uffix, 
forcing mental backtracking to get to remove-suffix. We had a similar problem 
with addinfourl initially being read as add-in-four-l before mentally 
backtracking to add-info-url.

The PEP says this alternative was considered, but I disagree with the rationale 
given in the PEP.  The reason that "startswith" and "endswith" don't have 
underscores is that they aren't needed to disambiguate the text.  Our rules are 
to add underscores and to spell-out words when it improves readability, which 
in this case it does.   Like casing conventions, our rules and preferences for 
naming evolved after the early modules were created -- the older the module, 
the more likely that it doesn't follow modern conventions.

We only have one chance to get this right (bugs can be fixed, but API choices 
persist for very long time).  Take it from someone with experience with this 
particular problem.  I created imap() but later regretted the naming pattern 
when if came to ifilter() and islice() which sometimes cause mental hiccups 
initially being read as if-ilter and is-lice.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZMXSQ5T6L6CR5GUIBFEYLJJF7FE4B4US/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Improvement to SimpleNamespace

2020-04-15 Thread Raymond Hettinger
[Serhiy]
> As a workaround you can use
> 
> object_hook=lambda x: SimpleNamespace(**x)

That doesn't suffice because some valid JSON keys are not valid identifiers.  
You still need a way to get past those when they arise:  
catalog.books.fiction['Paradise Lost'].isbn  Also, it still leaves you with 
using setattr(ns, attrname, attrvalue) or tricks with vars() when doing 
updates.  The AttrDict recipe is popular for a reason.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MNVWBEJI465QUODJEYPMAXPXOX3UDJ6Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Improvement to SimpleNamespace

2020-04-14 Thread Raymond Hettinger
[GvR]
> We should not try to import JavaScript's object model into Python.

Yes, I get that.  Just want to point-out that working with heavily nested 
dictionaries (typical for JSON) is no fun with square brackets and quotation 
marks.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/G5SJKRQ7S5VY3JKLAVOTCCA7RSDUNWXS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Improvement to SimpleNamespace

2020-04-14 Thread Raymond Hettinger
SimpleNamespace() is really good at giving attribute style-access. I would like 
to make that functionality available to the JSON module (or just about anything 
else that accepts a custom dict) by adding the magic methods for mappings so 
that this works:

 catalog = json.load(f, object_hook=SimpleNamespace)
 print(catalog['clothing']['mens']['shoes']['extra_wide']['quantity'])  
# currently possible with dict()
 print(catalog.clothing.mens.shoes.extra_wide.quantity]) # 
proposed with SimpleNamespace()
 print(catalog.clothing.boys['3t'].tops.quantity
   # would also be supported

I've already seen something like this in production; however, people are having 
to write custom subclasses to do it.  This is kind of bummer because the custom 
subclasses are a pain to write, are non-standard, and are generally somewhat 
slow.  I would like to see a high-quality version this made more broadly 
available.

The core idea is keep the simple attribute access but make it easier to load 
data programmatically:

>>> ns = SimpleNamespace(roses='red', violets='blue')
>>> thing = input()
sugar
>>> quality = input()
sweet
>>> setattr(ns, thing, quality)# current
>>> ns['sugar'] = 'sweet'   # proposed

If the PEP 584 __ior__ method were supported, updating a SimpleNamespace would 
be much cleaner:

  ns |= some_dict

I posted an issue on the tracker: https://bugs.python.org/issue40284 .  There 
was a suggestion to create a different type for this, but I don't see the point 
in substantially duplicating everything SimpleNamespace already does just so we 
can add some supporting dunder methods.   Please add more commentary so we can 
figure-out the best way to offer this powerful functionality.


Raymond


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JOMND56PJGRN7FQQLLCWONE5Z7R2EKXW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-02-03 Thread Raymond Hettinger
I forget to mention that list.index() also uses PyObject_RichCompareBool().  
Given a non-empty list *s*:

s[0] = x
assert s.index(x) == 0   # We want this to always work

or:
 
s = [x]
assert s.index(x) == 0# Should not raise a ValueError  

If those two assertions aren't reliable, then it's hard to correctly reason 
about algorithms that use index() to find previously stored objects. This, of 
course, is the primary use case for index().

Likewise, list.remove() also uses PyObject_RichCompareBool():

s = []
...
s.append(x)
s.remove(x)

In a code review, would you suspect that the above code could fail?  If so, how 
would you mitigate the risk to prevent failure?  Off-hand, the simplest 
remediation I can think of is:

s = []
...
s.append(x)
if x == x:# New, perplexing code
s.remove(x)  # Now, this is guaranteed not to fail
else:
logging.debug(f"Removing the first occurrence of {x!r} the hard way")
for i, y in enumerate(s):
 if x is y:
 del s[i]
 break

In summary, I think it is important to guarantee the identity-implies-equality 
step currently in PyObject_RichCompareBool().  It isn't just an optimization, 
it is necessary for writing correct application code without tricks such at the 
"if x == x: ..." test.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NDBUPT6OWNLPLTD5MI3A3VYNNKLMA3ME/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Request to postpone some Python 3.9 incompatible changes to Python 3.10

2020-02-03 Thread Raymond Hettinger
> We propose to revert 5 changes:
> 
>   • Removed tostring/fromstring methods in array.array and base64 modules
>   • Removed collections aliases to ABC classes
>   • Removed fractions.gcd() function (which is similar to math.gcd())
>   • Remove "U" mode of open(): having to use io.open() just for Python 2 
> makes the code uglier
>   • Removed old plistlib API: 2.7 doesn't have the new API

+1 from me.  We don't gain anything by removing these in 3.9 instead of 3.10, 
so it is perfectly reasonable to ease the burden on users by deferring them for 
another release.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/52V6RP2WBC43OWTLBICS77MD3IGSV5CI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-02-03 Thread Raymond Hettinger
> PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are 
> the same object, then equality comparison returns True and inequality False. 
> No attempt is made to execute __eq__ or __ne__ methods in those cases.
> 
> This has visible consequences all over the place, but they don't appear to be 
> documented. For example,
> 
> ...
> despite that math.nan == math.nan is False.
> 
> It's usually clear which methods will be called, and when, but not really 
> here. Any _context_ that calls PyObject_RichCompareBool() under the covers, 
> for an equality or inequality test, may or may not invoke __eq__ or __ne__, 
> depending on whether the comparands are the same object. Also any context 
> that inlines these special cases to avoid the overhead of calling 
> PyObject_RichCompareBool() at all.
> 
> If it's intended that Python-the-language requires this, that needs to be 
> documented.

This has been slowly, but perhaps incompletely documented over the years and 
has become baked in the some of the collections ABCs as well.  For example, 
Sequence.__contains__() is defined as:

def __contains__(self, value):
for v in self:
if v is value or v == value:  # note the identity test
return True
return False

Various collections need to assume reflexivity, not just for speed, but so that 
we can reason about them and so that they can maintain internal consistency. 
For example, MutableSet defines pop() as:

def pop(self):
"""Return the popped value.  Raise KeyError if empty."""
it = iter(self)
try:
value = next(it)
except StopIteration:
raise KeyError from None
self.discard(value)
return value

That pop() logic implicitly assumes an invariant between membership and 
iteration:

   assert(x in collection for x in collection)

We really don't want to pop() a value *x* and then find that *x* is still in 
the container.   This would happen if iter() found the *x*, but discard() 
couldn't find the object because the object can't or won't recognize itself:

 s = {float('NaN')}
 s.pop()
 assert not s  # Do we want the language to guarantee that 
s is now empty?  I think we must.

The code for clear() depends on pop() working:

def clear(self):
"""This is slow (creates N new iterators!) but effective."""
try:
while True:
self.pop()
except KeyError:
pass

It would unfortunate if clear() could not guarantee a post-condition that the 
container is empty:

 s = {float('NaN')}
 s.clear()
 assert not s   # Can this be allowed to fail?

The case of count() is less clear-cut, but even there identity-implies-equality 
improves our ability to reason about code:  Given some list, *s*, possibly 
already populated, would you want the following code to always work:

 c = s.count(x)
 s.append(x)
 assert s.count(x) == c + 1 # To me, this is fundamental to what 
the word "count" means.

I can't find it now, but remember a possibly related discussion where we 
collectively rejected a proposal for an __is__() method.  IIRC, the reasoning 
was that our ability to think about code correctly depended on this being true:

a = b
assert a is b

Back to the discussion at hand, I had thought our position was roughly:

* __eq__ can return anything it wants.

* Containers are allowed but not required to assume that 
identity-implies-equality.

* Python's core containers make that assumption so that we can keep
  the containers internally consistent and so that we can reason about
  the results of operations.

Also, I believe that even very early dict code (at least as far back as Py 
1.5.2) had logic for "v is value or v == value".

As far as NaNs go, the only question is how far to propagate their notion of 
irreflexivity. Should "x == x" return False for them? We've decided yes.  When 
it comes to containers, who makes the rules, the containers or their elements.  
Mostly, we let the elements rule, but containers are allowed to make useful 
assumptions about the elements when necessary.  This isn't much different than 
the rules for the "==" operator where __eq__() can return whatever it wants, 
but functions are still allowed to write "if x == y: ..." and assumes that 
meaningful boolean value has been returned (even if it wasn't).  Likewise, the 
rule for "<" is that it can return whatever it wants, but sorted() and min() 
are allowed to assume a meaningful total ordering (which might or might not be 
true).  In other words, containers and functions are allowed, when necessary or 
useful, to override the decisions made by their data.   This seems like a 
reasonable state of affairs.

The current docs make an effort to describe what we have now: 
https://docs.python.org/3/reference/expressions.html#value-comparisons 

Sorry for the lack of concision.  I'm 

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-15 Thread Raymond Hettinger



> On Dec 15, 2019, at 6:48 PM, Larry Hastings  wrote:
> 
> As of 3.7, dict objects are guaranteed to maintain insertion order.  But set 
> objects make no such guarantee, and AFAIK in practice they don't maintain 
> insertion order either.  Should they?

I don't think they should.  

Several thoughts:

* The corresponding mathematical concept is unordered and it would be weird to 
impose such as order.

* You can already get membership testing while retaining insertion ordering by 
running dict.fromkeys(seq).

* Set operations have optimizations that preclude giving a guaranteed order 
(for example, set intersection loops over the smaller of the two input sets).

* To implement ordering, set objects would have to give-up their current 
membership testing optimization that exploits cache locality in lookups (it 
looks at several consecutive hashes at a time before jumping to the next random 
position in the table).

* The ordering we have for dicts uses a hash table that indexes into a 
sequence.  That works reasonably well for typical dict operations but is 
unsuitable for set operations where some common use cases make interspersed 
additions and deletions (that is why the LRU cache still uses a cheaply updated 
doubly-linked list rather that deleting and reinserting dict entries).

* This idea has been discussed a couple times before and we've decided not to 
go down this path.  I should document prominently because it is inevitable that 
it will be suggested periodically because it is such an obvious thing to 
consider.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6CO2CZS4CPP6MSJKRZXXQYFLY5T3UVDU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: [python-committers] PEP 581/588 RFC: Collecting feedback about GitHub Issues

2019-08-30 Thread Raymond Hettinger


> On Aug 27, 2019, at 10:44 AM, Mariatta  wrote:
> 
> (cross posting to python-committers, python-dev, core-workflow)
> 
> PEP 581: Using GitHub Issues has been accepted by the steering council, but 
> PEP 588: GitHub Issues Migration plan is still in progress.
> 
> I'd like to hear from core developers as well as heavy b.p.o users, the 
> following:
> 
>   • what features do they find lacking from GitHub issues, or
>   • what are the things you can do in b.p.o but not in GitHub, or
>   • Other workflow that will be blocked if we were to switch to GitHub 
> today
> By understanding your needs, we can be better prepared for the migration, and 
> we can start looking for solutions.

One other bit of workflow that would be blocked if there was a switch to GitHub 
today:

* On the tracker, we have long running conversations, sometimes spanning years. 
  We need to be able to continue those conversations even though the original 
participants may not have Github accounts (also, if they do have a Github 
account, we'll need to be able to link to the corresponding BPO account).  

* I believe some of the accounts are anonymous or have pseudonyms.  Am not sure 
how those can be migrated, we know very little about the participant except for 
their recurring posts.


Raymond


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MT2PJFS2CVDSWQKCCRUR3BCNXR4OSEKU/


[Python-Dev] Re: [python-committers] PEP 581/588 RFC: Collecting feedback about GitHub Issues

2019-08-29 Thread Raymond Hettinger

> On Aug 27, 2019, at 10:44 AM, Mariatta  wrote:
> 
> (cross posting to python-committers, python-dev, core-workflow)
> 
> PEP 581: Using GitHub Issues has been accepted by the steering council, but 
> PEP 588: GitHub Issues Migration plan is still in progress.
> 
> I'd like to hear from core developers as well as heavy b.p.o users, the 
> following:
> 
>   • what features do they find lacking from GitHub issues, or
>   • what are the things you can do in b.p.o but not in GitHub, or
>   • Other workflow that will be blocked if we were to switch to GitHub 
> today
> By understanding your needs, we can be better prepared for the migration, and 
> we can start looking for solutions.

Thanks for soliciting input and working on this.

I'm a heavy BPO user (often visiting many times per day for almost two 
decades).  Here are some things that were working well that I would miss:

* We controlled the landing page, giving us

   - A professional, polished appearance
   - A prominent Python Logo
   - A search bar specific to the issue tracker
   - A link to Python Home and the Dev Guide
   - Hot links to Easy Issues, Issues created by You, Issues Assigned to You

* The display format was terse so we could easily view the 50 most recent 
active issues (this is important because of the high volume of activity)

  See https://mail.python.org/pipermail/python-bugs-list/2019-July/date.html 
for an idea of the monthly volume.

* The page used straight HTML anchor tags so my browser could mark which issue 
had been visited.  This is important when handing a lot of issues which are 
constantly being reordered.

* The input box allowed straight text input in a monospace font so it was easy 
to paste code snippets and traceback without incorporating markup.

* Our page didn't have advertising on it.

* Having a CSV download option was occasionally helpful.

* BPO was well optimized for a high level of activity and high information 
density.

* BPO existed for a very long time.  It contains extensive internal links 
between issues. There are also a huge number of external deep links to specific 
messages and whatnot.  Innumerable tweets, blog posts, code comments, design 
documents, and stack overflow questions all have deep links to the site. It 
would be a major bummer if these links were broken.  It is my hope that they be 
preserved basically forever.


Things that I look forward to with Github Issues:

* Single sign-on

* Better linkage between issues and PRs


What I really don't want:

* The typical Github project page prominently shows a list of files and 
directories before there is any description.  If the CPython issues pages looks 
like this, it will be a big step backwards, making it look more like a weekend 
project than a mature professional project.  It would be something I would not 
want to show to clients.  It would not give us the desired level of control 
over the end-user experience.

* If there are advertisements on the page that we don't control, that would be 
unprecedented and unwelcome.

* On the one hand, we want issues to be easier to file.  On the other hand, if 
the volume of low quality issues reports goes up, it will just add to the total 
labor and contribute to negativity (denying someone's request isn't fun for 
either the rejector or rejectee).

* We need to retain control over our data so that we're free to make other 
migration decisions in the future.  We can make a change now *because* we have 
the freedom.  The migration needs to avoid vendor lock-in.


I have high hopes for this being a successful migration but have to confess 
major disappointment that the steering committee approved this without talking 
with the heavy BPO users and without seeing what the new landing page would 
look like.

In the end, the success of the migration depends on how the site works for the 
most active issue responders.  If the workload goes up and becomes more awkward 
to do in volume, then heavy volunteer participation will necessarily decline.   
Perhaps a half-dozen individuals do more than half of the work on the tracker.

I have high hopes for the success of the migration but success isn't a given.


Raymond







___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BNHMLY4YEXIG4VANOXSOGNXO5Y7OT3BO/


[Python-Dev] Re: Announcing the new Python triage team on GitHub

2019-08-21 Thread Raymond Hettinger
Thanks for doing this.  I hope it encourages more participation.

The capabilities of a triager mostly look good except for "closing PRs and 
issues".  This is a superpower that has traditionally been reserved for more 
senior developers because it grants the ability to shut-down the work of 
another aspiring contributor.  Marking someone else's suggestion as rejected is 
the most perilous and least fun aspect of core development.  Submitters tend to 
expect their idea won't be rejected without a good deal of thought and expert 
consideration.   Our bar for becoming a triager is somewhat low, so I don't 
think it makes sense to give the authority to reject a PR or close an issue.

ISTM the primary value of having triager is to tag issues appropriately, summon 
the appropriate experts, and make a first pass at review and/or improvements.  

FWIW, the definition of the word triage is "in medical use: the assignment of 
degrees of urgency to wounds or illnesses to decide the order of treatment of a 
large number of patients or casualties."  That doesn't imply making a final 
disposition.

Put another way, the only remaining distinction between a "triager" and a "core 
developer" is the ability to push the "commit" button.  In a way, that is the 
least interesting part of the process and is often a foregone conclusion by the 
time it happens.


Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CIBKJSXQX5DZDKPA6TYTKNLHS4TA2LXM/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread raymond . hettinger
This isn't about me.  As a heavy user of the 3.8 beta, I'm just the canary in 
the coal mine.

After many encounters with these warnings, I'm starting to believe that 
Python's long-standing behavior was convenient for users.  Effectively, "\-" 
wasn't an error, it was just a way of writing "\-". For the most part, that 
worked out fine. Sure, we all seen interactive prompt errors from having \t in 
a pathname but not in production (likely because a FileNotFoundError would 
surface immediately).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4YNZYCOBWGMLC6BDXQFJJWLXEK47I5PU/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread raymond . hettinger
For me, these warnings are continuing to arise almost daily.  See two recent 
examples below.  In both cases, the code previously had always worked without 
complaint.

- Example from yesterday's class 

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old))
 \o
  \o
'''
   
SyntaxWarning: invalid escape sequence \-

- Example from today's class 

# Cut and pasted from: 
# https://en.wikipedia.org/wiki/VCard#vCard_2.1
vcard = '''
BEGIN:VCARD
VERSION:2.1
N:Gump;Forrest;;Mr.
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
TEL;WORK;VOICE:(111) 555-1212
TEL;HOME;VOICE:(404) 555-1212
ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
 =0ABaytown\, LA 30314=0D=0AUnited States of America
ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A=
 Baytown, LA 30314=0D=0AUnited States of America
EMAIL:forrestg...@example.com
REV:20080424T195243Z
END:VCARD
'''

SyntaxWarning: invalid escape sequence \,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OYGRL5AWSJZ34MDLGIFTWJXQPLNSK23S/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-05 Thread raymond . hettinger
End-user experience isn't something that can just be argued away.  Steve and I 
are reporting a recurring annoyance.  The point of a beta release is to elicit 
these kinds of reports so they can be addressed before it is too late.  ISTM 
you are choosing not to believe the early feedback and don't want to provide a 
mitigation.

This decision reverses 25+ years of Python practice and is the software 
equivalent of telling users "you're holding it wrong".   Instead of an 
awareness campaign to use the silent-by-default warnings, we're going directly 
towards breaking working code.  That seems pretty user hostile to me.

Chris's language survey one shows only language, Lua, that treated this an 
error.  For compiled languages that emit warnings, the end-user will never see 
those warning so there is no end-user consequence.  In our case though, 
end-users will see the messages and may not have an ability to do anything 
about it. 

I wish people with more product management experience would chime in; 
otherwise, 3.8 is going to ship with an intentional hard-to-ignore annoyance on 
the premise that we don't like the way people have been programming and that 
they need to change their code even if it was working just fine.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D4ETYYRD4RB37BFZ35STKTDKVT7WH3E2/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-05 Thread raymond . hettinger
> I broadly agree that the warning is very annoying, particularly 
> when it  comes from third-party packages (I see it from some
> of pip's vendored dependencies all the time),

The same here as well.

The other annoyance is that it pops up during live demos, student teaching 
sessions, and during ipython data analysis in a way that becomes a distractor 
and makes Python look and feel like it is broken.

I haven't found a since case where it improved the user experience.

> though I do also see many people bitten by 
> FileNotFoundError because of a '\n' in their filename.

Yes, I've seen that as well.  

Unfortunately, the syntax warning or error doesn't detect that case.  It only 
complains about invalid sequences which weren't the actual problem we were 
trying to solve.  The new warning soon-to-be error breaks code that currently 
works but is otherwise innocuous.

> Raymond - a question if I may. How often do you see these 
> occurring from docstrings, compared to regular strings?

About half.

Thanks for weighing in.  I think this is an important usability discussion.  
IMO it is the number one issue affecting the end user experience with this 
release.   If we could get more people to actively use the beta release, the 
issue would stand-out front and center.  But if people don't use the beta in 
earnest, we won't have confirmation until it is too late.

We really don't have to go this path.  Arguably, the implicit conversion of 
'\latex' to '\\latex' is a feature that has existed for three decades, and now 
we're deciding to turn it off to define existing practices as errors.  I don't 
think any commercial product manager would allow this to occur without a lot of 
end user testing.


Raymond

P.S. In the world of C compilers, I suspect that if the relatively new compiler 
warnings were treated as errors, the breakage would be widespread. Presumably 
that's why they haven't gone down this road.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DH63MEQWGGJRMCDRC57F33DR7HH7HDIT/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-05 Thread raymond . hettinger
Thanks for looking at other languages do.   It gives some hope that this won't 
end-up being a usability fiasco.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OM3W7ABSARYDMUIEIGYUYAUSRVHLZ6T5/


[Python-Dev] What to do about invalid escape sequences

2019-08-04 Thread raymond . hettinger
We should revisit what we want to do (if anything) about invalid escape 
sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which 
is visible by default.  The intention is to make it a SyntaxError in Python 3.9.

This once seemed like a reasonable and innocuous idea to me; however, I've been 
using the 3.8 beta heavily for a month and no longer think it is a good idea.  
The warning crops up frequently, often due to third-party packages (such as 
docutils and bottle) that users can't easily do anything about.  And during 
live demos and student workshops, it is especially distracting.  

I now think our cure is worse than the disease.  If code currently has a 
non-raw string with '\latex', do we really need Python to yelp about it (for 
3.8) or reject it entirely (for 3.9)?   If someone can't remember exactly which 
special characters need to be escaped, do we really need to stop them in their 
tracks during a data analysis session?  Do we really need to reject ASCII art 
in docstrings: ` \---> special case'?  

IIRC, the original problem to be solved was false positives rather than false 
negatives:  filename = '..\training\new_memo.doc'.  The warnings and errors 
don't do (and likely can't do) anything about this.

If Python 3.8 goes out as-is, we may be punching our users in the nose and 
getting almost no gain from it.  ISTM this is a job best left for linters.  For 
a very long time, Python has been accepting the likes of 'more \latex markup' 
and has been silently converting it to 'more \\latex markup'.  I now think it 
should remain that way.  This issue in the 3.8 beta releases has been an almost 
daily annoyance for me and my customers. Depending on how you use Python, this 
may not affect you or it may arise multiple times per day.


Raymond

P.S.  Before responding, it would be a useful exercise to think for a moment 
about whether you remember exactly which characters must be escaped or whether 
you habitually put in an extra backslash when you aren't sure.  Then see:  
https://bugs.python.org/issue32912
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZX2JLOZDOXWVBQLKE4UCVTU5JABPQSLB/


[Python-Dev] Re: The order of operands in the comparison

2019-07-21 Thread raymond . hettinger
FWIW, the bisect_left and bisect_right functions have different argument order 
so that they can both use __lt__, making them consistent with sorting and with 
the heapq functions.

Raymond
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O2AETKAY5EVLIQNRRSFV53NQ6K3TF5EN/


[Python-Dev] Re: What is a public API?

2019-07-13 Thread Raymond Hettinger

> On Jul 13, 2019, at 1:56 PM, Serhiy Storchaka  wrote:
> 
> Could we strictly define what is considered a public module interface in 
> Python?

The RealDefinition™ is that whatever we include in the docs is public, 
otherwise not.

Beyond that, there is a question of how users can deduce what is public when 
they run "import somemodule; print(dir(some module))".

In some modules, we've been careful to use both __all__ and to use an 
underscore prefix to indicate private variables and helper functions 
(collections and random for example).  IMO, when a module has shown that care, 
future maintainers should stick with that practice. 

The calendar module is an example of where that care was taken for many years 
and then a recent patch went against that practice.  This came to my attention 
when an end-user questioned which functions were for internal use only and 
posted their question on Twitter.  On the tracker, I then made a simple request 
to restore the module's convention but you seem steadfastly resistant to the 
suggestion.

When we do have evidence of user confusion (as in the case with the calendar 
module), we should just fix it.  IMO, it would be an undue burden on the user 
to have to check every method in dir() against the contents of __all__ to 
determine what is public (see below).  Also, as a maintainer of the module, I 
would not have found it obvious whether the functions were public or not.  The 
non-public functions look just like the public ones.

It's true that the practices across the standard library have historically been 
loose and varied (__all__ wasn't always used and wasn't always kept up-to-date, 
some modules took care with private underscore names and some didn't).  To me 
this has mostly worked out fine and didn't require a strict rule for all 
modules everywhere.  IMO, there is no need to sweep through the library and 
change long-standing policies on existing modules.


Raymond


--
>>> import calendar
>>> dir(calendar)
['Calendar', 'EPOCH', 'FRIDAY', 'February', 'HTMLCalendar', 
'IllegalMonthError', 'IllegalWeekdayError', 'January', 'LocaleHTMLCalendar', 
'LocaleTextCalendar', 'MONDAY', 'SATURDAY', 'SUNDAY', 'THURSDAY', 'TUESDAY', 
'TextCalendar', 'WEDNESDAY', '_EPOCH_ORD', '__all__', '__builtins__', 
'__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', 
'__spec__', '_colwidth', '_locale', '_localized_day', '_localized_month', 
'_spacing', 'c', 'calendar', 'datetime', 'day_abbr', 'day_name', 
'different_locale', 'error', 'firstweekday', 'format', 'formatstring', 
'isleap', 'leapdays', 'main', 'mdays', 'month', 'month_abbr', 'month_name', 
'monthcalendar', 'monthlen', 'monthrange', 'nextmonth', 'prcal', 'prevmonth', 
'prmonth', 'prweek', 'repeat', 'setfirstweekday', 'sys', 'timegm', 'week', 
'weekday', 'weekheader']


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ACNDSI6FN6DZKOASNZS4AEQJWWXL6F7Q/


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-20 Thread Raymond Hettinger


> On Mar 20, 2019, at 6:07 PM, Victor Stinner  wrote:
> 
> what's the rationale of this backward incompatible change?

Please refrain from abusive mischaracterizations.  It is only backwards 
incompatible if there was a guaranteed behavior.  Whether there was or not is 
what this thread is about.  

My reading of this thread was that the various experts did not want to lock in 
the 3.7 behavior nor did they think the purpose of the XML modules is to 
produce an exact binary output.  The lxml maintainer is dropping sorting (its 
expensive and it overrides the order specified by the user). Other XML modules 
don't sort. It only made sense as a way to produce a deterministic output 
within a feature release back when there was no other way to do it.

For my part, any agreed upon outcome in fine. I'm not willing be debased 
further, so I am out of this discussion. It's up to you all to do the right 
thing.


Raymond



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-20 Thread Raymond Hettinger


> On Mar 20, 2019, at 5:22 PM, Victor Stinner  wrote:
> 
> I don't understand why such simple solution has been rejected.

It hasn't been rejected. That is above my pay grade.  Stefan and I recommended 
against going down this path. However, since you're in disagreement and have 
marked this as a release blocker, it is now time for the steering committee to 
earn their pay (which is at least double what I'm making) or defer to the 
principal module maintainer, Stefan.

To recap reasons for not going down this path:

1) The only known use case for a "sort=True" parameter is to perpetuate the 
practice of byte-by-byte output comparisons guaranteed to work across feature 
releases.  The various XML experts in this thread have opined that isn't 
something we should guarantee (and sorting isn't the only aspect detail subject 
to change, Stefan listed others).

2) The intent of the XML modules is to implement the specification and be 
interoperable with other languages and other XML tools. It is not intended to 
be used to generate an exact binary output.  Per section 3.1 of the XML spec, 
"Note that the order of attribute specifications in a start-tag or 
empty-element tag is not significant."

3) Mitigating a test failure is a one-time problem. API expansions are forever.

4) The existing API is not small and presents a challenge for teaching. Making 
the API bigger will make it worse.

5) As far as I can tell, XML tools in other languages (such as Java) don't sort 
(and likely for good reason).  LXML is dropping its attribute sorting as well, 
so the standard library would become more of an outlier.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-20 Thread Raymond Hettinger


> On Mar 19, 2019, at 4:53 AM, Ned Batchelder  wrote:
> 
> None of this is impossible, but please try not to preach to us maintainers 
> that we are doing it wrong, that it will be easy to fix, etc

There's no preaching and no judgment.  We can't have a conversation though if 
we can't state the crux of the problem: some existing tests in third-party 
modules depend on the XML serialization being byte-for-byte identical forever. 
The various respondents to this thread have indicated that the standard library 
should only make that guarantee within a single feature release and that it may 
to vary across feature releases.

For docutils, it may end-up being an easy fix (either with a semantic 
comparison or with regenerating the target files when point releases differ).  
For Coverage, I don't make any presumption that reengineering the tests will be 
easy or fun.  Several mitigation strategies have been proposed:

* alter to element creation code to create the attributes in the desired order
* use a canonicalization tool to create output that is guarantee not to change
* generate new baseline files when a feature release changes
* apply Stefan's recipe for reordering attributes
* make a semantic level comparison

Will any other these work for you?


Raymond







___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-20 Thread Raymond Hettinger


> On Mar 20, 2019, at 3:59 PM, Ethan Furman  wrote:
> 
> Hmm.  Said somewhat less snarkily, is there a more general solution to the 
> problem of absent docstrings or do we have to attack this problem 
> piece-by-piece?

I think this is the last piece.  The pydoc help() utility already knows how to 
find docstrings for other class level descriptors:  property, class method, 
staticmethod.

Enum() already has nice looking help() output because the class variables are 
assigned values that have a nice __repr__, making them self documenting.

By design, dataclasses aren't special -- they just make regular classes, 
similar to or better than you would write by hand.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-20 Thread Raymond Hettinger


> On Mar 20, 2019, at 3:47 PM, Ivan Pozdeev via Python-Dev 
>  wrote:
> 
>> NormalDist.mu.__doc__ = 'Arithmetic mean'
>> NormalDist.sigma.__doc__ = 'Standard deviation'
> 
> IMO this is another manifestation of the problem that things in the class 
> definition have no access to the class object.
> Logically speaking, a definition item should be able to see everything that 
> is defined before it.

The member objects get created downstream by the type() metaclass.  So, there 
isn't a visibility issue because the objects don't exist yet.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-20 Thread Raymond Hettinger


> On Mar 20, 2019, at 3:30 PM, Gregory P. Smith  wrote:
> 
> I like the idea of documenting attributes, but we shouldn't force the user to 
> use __slots__ as that has significant side effects and is rarely something 
> people should bother to use.

Member objects are like property objects in that they exist at the class level 
and show up in the help whether you want them to or not.   AFAICT, they are the 
only such objects to not have a way to attach docstrings.

For instance level attributes created by __init__, the usual way to document 
them is in either the class docstring or the __init__ docstring.  This is 
because they don't actually exist until  __init__ is run.

No one is forcing anyone to use slots.  I'm just proposing that for classes 
that do use them that there is currently no way to annotate them like we do for 
property objects (which people aren't being forced to use either).  The goal is 
to make help() better for whatever people are currently doing.  That shouldn't 
be controversial.  

Someone not liking or recommending slots is quite different from not wanting 
them documented.  In the examples I posted (taken from the standard library), 
the help() is clearly better with the annotations than without.


Raymond




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-19 Thread Raymond Hettinger



> On Mar 19, 2019, at 1:52 PM, MRAB  wrote:
> 
> Thinking ahead, could there ever be anything else that you might want also to 
> attach to member objects?

Our experience with property object suggests that once docstrings are 
supported, there don't seem to be any other needs.   But then, you never can 
tell ;-)


Raymond


"Difficult to see. Always in motion is the future." -- Master Yoda


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Best way to specify docstrings for member objects

2019-03-19 Thread Raymond Hettinger
I'm working on ways to make improve help() by giving docstrings to member 
objects.

One way to do it is to wait until after the class definition and then make 
individual, direct assignments to __doc__ attributes.This way widely the 
separates docstrings from their initial __slots__ definition.   Working 
downstream from the class definition feels awkward and doesn't look pretty.  

There's another way I would like to propose¹.  The __slots__ definition already 
works with any iterable including a dictionary (the dict values are ignored), 
so we could use the values for the  docstrings.   

This keeps all the relevant information in one place (much like we already do 
with property() objects).  This way already works, we just need a few lines in 
pydoc to check to see if a dict if present.  This way also looks pretty and 
doesn't feel awkward.

I've included worked out examples below.  What do you all think about the 
proposal?


Raymond


¹ https://bugs.python.org/issue36326


== Desired help() output ==

>>> help(NormalDist)
Help on class NormalDist in module __main__:

class NormalDist(builtins.object)
 |  NormalDist(mu=0.0, sigma=1.0)
 |  
 |  Normal distribution of a random variable
 |  
 |  Methods defined here:
 |  
 |  __init__(self, mu=0.0, sigma=1.0)
 |  NormalDist where mu is the mean and sigma is the standard deviation.
 |  
 |  cdf(self, x)
 |  Cumulative distribution function.  P(X <= x)
 |  
 |  pdf(self, x)
 |  Probability density function.  P(x <= X < x+dx) / dx
 |  
 |  --
 |  Data descriptors defined here:
 |  
 |  mu
 |  Arithmetic mean.
 |  
 |  sigma
 |  Standard deviation.
 |  
 |  variance
 |  Square of the standard deviation.



== Example of assigning docstrings after the class definition ==

class NormalDist:
'Normal distribution of a random variable'

__slots__ = ('mu', 'sigma')

def __init__(self, mu=0.0, sigma=1.0):
'NormalDist where mu is the mean and sigma is the standard deviation.'
self.mu = mu
self.sigma = sigma

@property
def variance(self):
'Square of the standard deviation.'
return self.sigma ** 2.

def pdf(self, x):
'Probability density function.  P(x <= X < x+dx) / dx'
variance = self.variance
return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance)

def cdf(self, x):
'Cumulative distribution function.  P(X <= x)'
return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0

NormalDist.mu.__doc__ = 'Arithmetic mean'
NormalDist.sigma.__doc__ = 'Standard deviation'



== Example of assigning docstrings with a dict =

class NormalDist:
'Normal distribution of a random variable'

__slots__ = {'mu' : 'Arithmetic mean.', 'sigma': 'Standard deviation.'}

def __init__(self, mu=0.0, sigma=1.0):
'NormalDist where mu is the mean and sigma is the standard deviation.'
self.mu = mu
self.sigma = sigma

@property
def variance(self):
'Square of the standard deviation.'
return self.sigma ** 2.

def pdf(self, x):
'Probability density function.  P(x <= X < x+dx) / dx'
variance = self.variance
return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance)

def cdf(self, x):
'Cumulative distribution function.  P(X <= x)'
return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-18 Thread Raymond Hettinger


> On Mar 18, 2019, at 4:15 PM, Nathaniel Smith  wrote:
> 
> I noticed that your list doesn't include "add a DOM equality operator". That 
> seems potentially simpler to implement than canonical XML serialization, and 
> like a useful thing to have in any case. Would it make sense as an option?

Time machine!  Stéphane Wirtel just posted a basic semantic comparison between 
two streams.¹   Presumably, there would need to be a range of options for 
specifying what constitutes equivalence but this is a nice start.

Raymond


¹ https://bugs.python.org/file48217/test_xml_compare.py

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-18 Thread Raymond Hettinger
We're having a super interesting discussion on 
https://bugs.python.org/issue34160 .  It is now marked as a release blocker and 
warrants a broader discussion.

Our problem is that at least two distinct and important users have written 
tests that depend on exact byte-by-byte comparisons of the final serialization. 
 So any changes to the XML modules will break those tests (not the applications 
themselves, just the test cases that assume the output will be forever, 
byte-by-byte identical).  

In theory, the tests are incorrectly designed and should not treat the module 
output as a canonical normal form.  In practice, doing an equality test on the 
output is the simplest, most obvious approach, and likely is being done in 
other packages we don't know about yet.

With pickle, json, and __repr__, the usual way to write a test is to verify a 
roundtrip:  assert pickle.loads(pickle.dumps(data)) == data.  With XML, the 
problem is that the DOM doesn't have an equality operator.  The user is left 
with either testing specific fragments with element.find(xpath) or with using a 
standards compliant canonicalization package (not available from us). Neither 
option is pleasant.

The code in the current 3.8 alpha differs from 3.7 in that it removes attribute 
sorting and instead preserves the order the user specified when creating an 
element.  As far as I can tell, there is no objection to this as a feature.  
The problem is what to do about the existing tests in third-party code, what 
guarantees we want to make going forward, and what do we recommend as a best 
practice for testing XML generation.

Things we can do:

1) Revert back to the 3.7 behavior. This of course, makes all the test pass :-) 
 The downside is that it perpetuates the practice of bytewise equality tests 
and locks in all implementation quirks forever.  I don't know of anyone 
advocating this option, but it is the simplest thing to do.

2). Go into every XML module and add attribute sorting options to each function 
that generate xml.  This gives users a way to make their tests pass for now. 
There are several downsides. a) It grows the API in a way that is inconsistent 
with all the other XML packages I've seen. b) We'll have to test, maintain, and 
document the API forever -- the API is already large and time consuming to 
teach. c) It perpetuates the notion that bytewise equality tests are the right 
thing to do, so we'll have this problem again if substitute in another code 
generator or alter any of the other implementation quirks (i.e. how CDATA 
sections are serialized).

3) Add a standards compliant canonicalization tool (see 
https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be the 
right-way-to-do-it but takes time and energy.

4) Fix the tests in the third-party modules to be more focused on their actual 
test objectives, the semantics of the generated XML rather than the exact 
serialization.  This option would seem like the right-thing-to-do but it isn't 
trivial because the entire premise of the existing test is invalid.  For every 
case, we'll actually have to think through what the test objective really is.

Of these, option 2 is my least preferred.  Ideally, we don't guarantee bytewise 
identical output across releases, and ideally we don't grow a new API that 
perpetuates the issue. That said, I'm not wedded to any of these options and 
just want us to do what is best for the users in the long run.

Regardless of option chosen, we should make explicit whether on not the Python 
standard library modules guarantee cross-release bytewise identical output for 
XML. That is really the core issue here.  Had we had an explicit notice one way 
or the other, there wouldn't be an issue now.

Any thoughts?



Raymond Hettinger


P.S.   Stefan Behnel is planning to remove attribute sorting from lxml.  On the 
bug tracker, he has clearly articulated his reasons.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible performance regression

2019-02-26 Thread Raymond Hettinger
On Feb 26, 2019, at 2:28 PM, Neil Schemenauer  wrote:
> 
> Are you compiling with --enable-optimizations (i.e. PGO)?  In my
> experience, that is needed to get meaningful results.

I'm not and I would worry that PGO would give less stable comparisons because 
it is highly sensitive to changes its training set as well as the actual 
CPython implementation (two moving targets instead of one).  That said, it 
doesn't really matter to the world how I build *my* Python.  We're trying to 
keep performant the ones that people actually use.  For the Mac, I think there 
are only four that matter:

1) The one we distribute on the python.org 
website at 
https://www.python.org/ftp/python/3.8.0/python-3.8.0a2-macosx10.9.pkg

2) The one installed by homebrew

3) The way folks typically roll their own:
$ ./configure && make   (or some variant of make install)

4) The one shipped by Apple and put in /usr/bin

Of the four, the ones I've been timing are #1 and #3.

I'm happy to drop this.  I was looking for independent confirmation and didn't 
get it. We can't move forward unless some else also observes a consistently 
measurable regression for a benchmark they care about on a build that they care 
about.  If I'm the only who notices then it really doesn't matter.  Also, it 
was reassuring to not see the same effect on a GCC-8 build.

Since the effect seems to be compiler specific, it may be that we knocked it 
out of a local minimum and that performance will return the next time someone 
touches the eval-loop.


Raymond  








___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible performance regression

2019-02-26 Thread Raymond Hettinger


On Feb 25, 2019, at 8:23 PM, Eric Snow  wrote:
> 
> So it looks like commit ef4ac967 is not responsible for a performance
> regression.

I did narrow it down to that commit and I can consistently reproduce the timing 
differences.

That said, I'm only observing the effect when building with the Mac default 
Clang (Apple LLVM version 10.0.0 (clang-1000.11.45.5).   When building GCC 
8.3.0, there is no change in performance.

I conclude this is only an issue for Mac builds.

> I ran the "performance" suite (https://github.com/python/performance),
> which has 57 different benchmarks. 

Many of those benchmarks don't measure eval-loop performance.  Instead, they 
exercise json, pickle, sqlite etc.  So, I would expect no change in many of 
those because they weren't touched.

Victor said he generally doesn't care about 5% regressions.  That makes sense 
for odd corners of Python.  The reason I was concerned about this one is that 
it hits the eval-loop and seems to effect every single op code.  The regression 
applies somewhat broadly (increasing the cost of reading and writing local 
variables by about 20%).  The effect is somewhat broad based.

That said, it seems to be compiler specific and only affects the Mac builds, so 
maybe we can decide that we don't care.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Compact ordered set

2019-02-26 Thread Raymond Hettinger
Quick summary of what I found when I last ran experiments with this idea:

* To get the same lookup performance, the density of index table would need to 
go down to around 25%. Otherwise, there's no way to make up for the extra 
indirection and the loss of cache locality.

* There was a small win on iteration performance because its cheaper to loop 
over a dense array than a sparse array (fewer memory access and elimination of 
the unpredictable branch).  This is nice because iteration performance matters 
in some key use cases.

* I gave up on ordering right away.  If we care about performance, keys can be 
stored in the order added; but no effort should be expended to maintain order 
if subsequent deletions occur.  Likewise, to keep set-to-set operations 
efficient (i.e. looping over the smaller input), no order guarantee should be 
given for those operations.  In general, we can let order happen but should not 
guarantee it and work to maintain it or slow-down essential operations to make 
them ordered.

* Compacting does make sets a little smaller but does cost an indirection and 
incurs a cost for switching index sizes between 1-byte arrays, 2-byte arrays, 
4-byte arrays, and 8-byte arrays.  Those don't seem very expensive; however, 
set lookups are already very cheap when the hash values are known (when they're 
not, the cost of computing the hash value tends to dominate anything done by 
the setobject itself).

* I couldn't find any existing application that would notice the benefit of 
making sets a bit smaller.  Most applications use dictionaries (directly or 
indirectly) everywhere, so compaction was an overall win.  Sets tend to be used 
more sparsely (no pun intended) and tend to be only a small part of overall 
memory usage. I had to consider this when bumping the load factor down to 60%, 
prioritizing speed over space.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Compact ordered set

2019-02-26 Thread Raymond Hettinger



> On Feb 26, 2019, at 3:30 AM, INADA Naoki  wrote:
> 
> I'm working on compact and ordered set implementation.
> It has internal data structure similar to new dict from Python 3.6.

> 
> On Feb 26, 2019, at 3:30 AM, INADA Naoki  wrote:
> 
> I'm working on compact and ordered set implementation.
> It has internal data structure similar to new dict from Python 3.6

I've also looked at this as well.  Some thoughts: 

* Set objects have a different and conflicting optimization that works better 
for a broad range of use cases.  In particular, there is a linear probing 
search step that gives excellent cache performance (multiple entries retrieved 
in a single cache line) and it reduces the cost of finding the next entry to a 
single increment (entry++). This greatly reduces the cost of collisions and 
makes it cheaper to verify an item is not in a set. 

* The technique for compaction involves making the key/hash entry array dense 
and augmenting it with a sparse array of indices.  This necessarily involves 
adding a layer of indirection for every probe.

* With the cache misses, branching costs, and extra layer of indirection, 
collisions would stop being cheap, so we would need to work to avoid them 
altogether. To get anything like the current performance for a collision of the 
first probe, I suspect we would have to lower the table density down from 60% 
to 25%.  

* The intersection operation has an important optimization where it loops over 
the smaller of its two inputs.  To give a guaranteed order that preserves the 
order of the first input, you would have to forgo this optimization, possibly 
crippling any existing code that depends on it.

* Maintaining order in the face of deletions adds a new workload to sets that 
didn't exist before. You risk thrashing the set support a feature that hasn't 
been asked for and that isn't warranted mathematically (where the notion of 
sets is unordered).

* It takes a lot of care and planning to avoid fooling yourself with benchmarks 
on sets.  Anything done with a small tight loop will tend to hide all branch 
prediction costs and cache miss costs, both of which are significant in real 
world uses of sets.

* For sets, we care much more about look-up performance than space.  And unlike 
dicts where we usually expect to find a key, sets are all about checking 
membership which means they have to be balanced for the case where the key is 
not in the set.

* Having and preserving order is one of the least important things a set can 
offer (it does have some value, but it is the least important feature, one that 
was never contemplated by the original set PEP).

After the success of the compact dict, I can understand an almost irresistible 
urge to apply the same technique to sets. If it was clear that it was a win, I 
would have already done it long ago, even before dicts (it was much harder to 
get buy in to changing the dicts).  Please temper the enthusiasm with 
rationality and caution.  The existing setobject code has been finely tuned and 
micro-optimized over the years, giving it excellent performance on workloads we 
care about.  It would be easy throw all of that away.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible performance regression

2019-02-25 Thread Raymond Hettinger


> On Feb 25, 2019, at 2:54 AM, Antoine Pitrou  wrote:
> 
> Have you tried bisecting to find out the offending changeset, if there
> any?

I got it down to two checkins before running out of time:

Between
git checkout 463572c8beb59fd9d6850440af48a5c5f4c0c0c9  

And:
git checkout 3b0abb019662e42070f1d6f7e74440afb1808f03  

So the subinterpreter patch was likely the trigger.

I can reproduce it over and over again on Clang, but not for a GCC-8 build, so 
it is compiler specific (and possibly macOS specific).

Will look at it more after work this evening.  I posted here to try to solicit 
independent confirmation.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible performance regression

2019-02-25 Thread Raymond Hettinger


> On Feb 24, 2019, at 10:06 PM, Eric Snow  wrote:
> 
> I'll look into it in more depth tomorrow.  FWIW, I have a few commits
> in the range you described, so I want to make sure I didn't slow
> things down for us. :)

Thanks for looking into it.

FWIW, I can consistently reproduce the results several times in row.  Here's 
the bash script I'm using:

#!/bin/bash

make clean
./configure
make# Apple LLVM 
version 10.0.0 (clang-1000.11.45.5)

for i in `seq 1 3`;
do
git checkout d610116a2e48b55788b62e11f2e6956af06b3de0   # Go back to 2/23
make# Rebuild
sleep 30# Let the system 
get quiet and cool
echo ' baseline ---' >> results.txt # Label output
./python.exe Tools/scripts/var_access_benchmark.py >> results.txt   # Run 
benchmark

git checkout 16323cb2c3d315e02637cebebdc5ff46be32ecdf   # Go to end-of-day 
2/24
make# Rebuild
sleep 30# Let the system 
get quiet and cool
echo ' end of day ---' >> results.txt   # Label output
./python.exe Tools/scripts/var_access_benchmark.py >> results.txt   # Run 
benchmark


> 
> -eric
> 
> 
> * commit 175421b58cc97a2555e474f479f30a6c5d2250b0 (HEAD)
> | Author: Pablo Galindo 
> | Date:   Sat Feb 23 03:02:06 2019 +
> |
> | bpo-36016: Add generation option to gc.getobjects() (GH-11909)
> 
> $ ./python Tools/scripts/var_access_benchmark.py
> Variable and attribute read access:
>  18.1 ns   read_local
>  19.4 ns   read_nonlocal

These timings are several times larger than they should be.  Perhaps you're 
running a debug build?  Or perhaps 32-bit? Or on VM or some such.  Something 
looks way off because I'm getting 4 and 5 ns on my 2013 Haswell laptop.



Raymond









___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Possible performance regression

2019-02-24 Thread Raymond Hettinger
I'll been running benchmarks that have been stable for a while.  But between 
today and yesterday, there has been an almost across the board performance 
regression.  

It's possible that this is a measurement error or something unique to my system 
(my Mac installed the 10.14.3 release today), so I'm hoping other folks can run 
checks as well.


Raymond


-- Yesterday 


$ ./python.exe Tools/scripts/var_access_benchmark.py
Variable and attribute read access:
   4.0 ns   read_local
   4.5 ns   read_nonlocal
  13.1 ns   read_global
  17.4 ns   read_builtin
  17.4 ns   read_classvar_from_class
  15.8 ns   read_classvar_from_instance
  24.6 ns   read_instancevar
  19.7 ns   read_instancevar_slots
  18.5 ns   read_namedtuple
  26.3 ns   read_boundmethod

Variable and attribute write access:
   4.6 ns   write_local
   4.8 ns   write_nonlocal
  17.5 ns   write_global
  39.1 ns   write_classvar
  34.4 ns   write_instancevar
  25.3 ns   write_instancevar_slots

Data structure read access:
  17.5 ns   read_list
  18.4 ns   read_deque
  19.2 ns   read_dict

Data structure write access:
  19.0 ns   write_list
  22.0 ns   write_deque
  24.4 ns   write_dict

Stack (or queue) operations:
  55.5 ns   list_append_pop
  46.3 ns   deque_append_pop
  46.7 ns   deque_append_popleft

Timing loop overhead:
   0.3 ns   loop_overhead


-- Today 
---

$ ./python.exe py Tools/scripts/var_access_benchmark.py

Variable and attribute read access:
   5.0 ns   read_local
   5.3 ns   read_nonlocal
  14.7 ns   read_global
  18.6 ns   read_builtin
  19.9 ns   read_classvar_from_class
  17.7 ns   read_classvar_from_instance
  26.1 ns   read_instancevar
  21.0 ns   read_instancevar_slots
  21.7 ns   read_namedtuple
  27.8 ns   read_boundmethod

Variable and attribute write access:
   6.1 ns   write_local
   7.3 ns   write_nonlocal
  18.9 ns   write_global
  40.7 ns   write_classvar
  36.2 ns   write_instancevar
  26.1 ns   write_instancevar_slots

Data structure read access:
  19.1 ns   read_list
  19.6 ns   read_deque
  20.6 ns   read_dict

Data structure write access:
  22.8 ns   write_list
  23.5 ns   write_deque
  27.8 ns   write_dict

Stack (or queue) operations:
  54.8 ns   list_append_pop
  49.5 ns   deque_append_pop
  49.4 ns   deque_append_popleft

Timing loop overhead:
   0.3 ns   loop_overhead


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add minimal information with a new issue?

2019-02-21 Thread Raymond Hettinger
On Feb 21, 2019, at 6:53 AM, Stephane Wirtel  wrote:
> 
> What do you think if we suggest a "template" for the new bugs?

99% of the time the template would be not applicable.  Historically, we asked 
for more information when needed and that wasn't very often.

I think that anything that raises the cost of filing a bug report will work to 
our detriment. Ideally, we want the barriers to reporting to be as low as 
possible.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Asking for reversion

2019-02-05 Thread Raymond Hettinger


> On Feb 5, 2019, at 9:52 AM, Giampaolo Rodola'  wrote:
> 
>  The main problem I have with this PR is that it seems to introduce 8 brand 
> new APIs, but since there is no doc, docstrings or tests it's unclear which 
> ones are supposed to be used, how or whether they are supposed to supersede 
> or deprecate older (slower) ones involving inter process communication.

The release manger already opined that if tests and docs get finished for the 
second alpha, he prefers not to have a reversion and would rather on build on 
top of what already shipped in the first alpha.  FWIW, the absence of docs 
isn't desirable but it isn't atypical.  PEP 572 code landed without the docs. 
Docs for dataclasses arrived much after the code. The same was true for the 
decimal module. Hopefully, everyone will team up with Davin and help him get 
the ball over the goal line.

BTW, this is a feature we really want.  Our multicore story for Python isn't a 
good one.  Due to the GIL, threading usually can't exploit multiple cores for 
better performance.  Async has lower overhead than threading but achieves its 
gains by keeping all the data in a single process.  That leaves us with 
multiprocessing where the primary obstacle has been the heavy cost of moving 
data between processes.  If that cost can be reduced, we've got a winning story 
for multicore.

This patch is one of the better things that is happening to Python.  Aside from 
last week's procedural missteps and communication issues surrounding the 
commit, the many months of prior work on this have been stellar. How about we 
stop using a highly public forum to pile up on Davin (being the subject of a 
thread like this can be a soul crushing experience).  Right now, he could 
really use some help and support from everyone on the team.


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Asking for reversion

2019-02-04 Thread Raymond Hettinger

> On Feb 4, 2019, at 2:36 AM, Łukasz Langa  wrote:
> 
> @Raymond, would you be willing to work with Davin on finishing this work in 
> time for alpha2?

I would be happy to help, but this is beyond my technical ability.  The people 
who are qualified to work on this have already chimed in on the discussion.  
Fortunately, I think this is a feature that everyone wants. So it just a matter 
of getting the experts on the subject to team-up and help get it done.


Raymond





___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Asking for reversion

2019-02-03 Thread Raymond Hettinger


> On Feb 3, 2019, at 5:40 PM, Terry Reedy  wrote:
> 
> On 2/3/2019 7:55 PM, Guido van Rossum wrote:
>> Also, did anyone ask Davin directly to roll it back?
> 
> Antoine posted on the issue, along with Robert O.  Robert reviewed and make 
> several suggestions.

I think the PR sat in a stable state for many months, and it looks like RO's 
review comments came *after* the commit.  

FWIW, with dataclasses we decided to get the PR committed early, long before 
most of the tests and all of the docs. The principle was that bigger changes 
needed to go in as early as possible in the release cycle so that we could 
thoroughly exercise it (something that almost never happens while something is 
in the PR stage).  It would be great if the same came happen here.  IIRC, 
shared memory has long been the holy grail for multiprocessing, helping to 
mitigate its principle disadvantage (the cost of moving data between 
processes).  It's something we really want.

But let's see what the 3.8 release manager has to say.


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Asking for reversion

2019-02-03 Thread Raymond Hettinger



> On Feb 3, 2019, at 1:03 PM, Antoine Pitrou  wrote:
> 
> I'd like to ask for the reversion of the changes done in
> https://github.com/python/cpython/pull/11664

Please work *with* Davin on this one.

It was only recently that you edited his name out of the list of maintainers 
for multiprocessing even though that is what he's been working on for the last 
two years and at the last two sprints.  I'd like to see more team work here 
rather than applying social pressures via python-dev (which is a *very* public 
list). 


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: How about updating OrderedDict in csv and configparser to regular dict?

2019-01-31 Thread Raymond Hettinger



> On Jan 31, 2019, at 3:06 AM, Steve Holden  wrote:
> 
> And I see that such a patch is now merged. Thanks,  Raymond!

And thank you for getting ordering into csv.DictReader.  That was a significant 
improvement in usability :-)


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] How to update namedtuple asdict() to use dict instead of OrderedDict

2019-01-30 Thread Raymond Hettinger



> On Jan 30, 2019, at 9:11 PM, Tim Delaney  wrote:
> 
> Alternatively, would it be viable to make OrderedDict work in a way that so 
> long as you don't use any reordering operations it's essentially just a very 
> thin layer on top of a dict,

There's all kinds of tricks we could do but none of them are worth it.  It took 
Eric Snow a long time to write the OrderedDict patch and it took years to get 
most of the bugs out of it.  I would really hate to go through a redesign and 
eat up our time for something that probably won't be much used any more.

I'm really just aiming for something as simple as s/OrderedDict/dict in 
namedtuple :-)  


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] How to update namedtuple asdict() to use dict instead of OrderedDict

2019-01-30 Thread Raymond Hettinger


> On Jan 30, 2019, at 6:00 PM, David Mertz  wrote:
> 
> Ditto +1 option 4
> 
> On Wed, Jan 30, 2019, 5:56 PM Paul Moore  On Wed, 30 Jan 2019 at 22:35, Raymond Hettinger
>  wrote:
> > My recommendation is Option 4 as being less disruptive and more beneficial 
> > than the other options.  In the unlikely event that anyone is currently 
> > depending on the reordering methods for the output of _asdict(), the 
> > remediation is trivially simple:   nt._asdict() -> 
> > OrderedDict(nt.as_dict()).
> >
> > What do you all think?
>> 
>> +1 from me on option 4.
>> 
>> Paul


Thanks everyone.  I'll move forward with option 4.  In Barry's word, JFDI :-)

> On Jan 30, 2019, at 6:10 PM, Nathaniel Smith  wrote:
> 
> How viable would it be to make OrderedDict smaller, faster, and give
> it a cleaner looking repr?

Not so much.  The implementations substantially different because they have 
different superpowers.  A regular dict is really good at being a dict while 
retaining order but it isn't good at reordering operations such as 
popitem(False), popitem(True), move_to_end(), and whatnot.  An OrderedDict is a 
heavier weight structure (a hash table augmented by a doubly-linked link) -- it 
is worse at being a dictionary but really good at intensive reordering 
operations typical in cache recency tracking and whatnot.  Also, there are 
long-standing API differences including weak references, ability to assign 
attributes, an equality operation that requires exact order when compared to 
another ordered dict etc, as well as the reordering methods.  If it was easy, 
clean, and desirable, it would have already been done :-)  Overall, I think the 
OrderedDict is increasingly irrelevant except for use cases requiring 
cross-version compatibility and for cases that need heavy reordering.  
Accordingly, I mostly 
 expect to leave it alone and fall into the not-much-used category like 
UserDict, UserList, and UserString.

> On Jan 30, 2019, at 3:41 PM, Glenn Linderman  wrote:


> Would it be practical to add deprecated methods to regular dict for the 
> OrderedDict reordering methods that raise with an error suggesting "To use 
> this method, convert dict to OrderedDict." (or some better wording).

That's an interesting idea.  Regular dicts aren't well suited to the reordering 
operations (like lists, repeated inserts at the front of the sequence wouldn't 
be performant relative to OrderedDict which uses double-linked lists 
internally).  My instinct is to leave regular dicts alone so that they can 
focus on their primary task (being good a fast lookups).


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] How to update namedtuple asdict() to use dict instead of OrderedDict

2019-01-30 Thread Raymond Hettinger
Now that regular dicts are ordered and compact, it makes more sense for the 
_asdict() method to create a regular dict (as it did in its early days) rather 
than an OrderedDict.  The regular dict is much smaller, much faster, and has a 
much cleaner looking repr.  It would also help namedtuple() stay in sync with 
dataclasses which already take advantage of the ordering feature of regular 
dicts.

The question is how to be go about making the change in a way gives the most 
benefit to users as soon as possible and that creates the least disruption.

Option 1) Add a deprecation notice to 3.8, make no code change in 3.8, and then 
update the code in 3.9.  This has several issues: a) it doesn't provide an 
executable DeprecationWarning in 3.8, b) it isn't really a deprecation, and c) 
it defers the benefits of the change for another release.

Option 2) Add a deprecation notice to 3.8, add a DeprecationWarning to the 
_asdict() method, and make the actual improvement in 3.9.  The main issue here 
is that it will create a lot of noise for normal uses of the _asdict() method 
which are otherwise unaffected by the change. The typical use cases for 
_asdict() are to create keyword arguments and to pass named tuple data into 
functions or methods that expect regular dictionaries.  Those use cases would 
benefit from seeing the change made sooner and would suffer in the interim from 
their code slowing down for warnings that aren't useful.

Option 3). Add a deprecation notice to 3.8 and have the _asdict() method create 
a subclass of OrderedDict that issues warnings only for the methods and 
attributes that will change (move_to_end, popitem, __eq__, __dict__, 
__weakref__).  This is less noisy but it adds a lot of machinery just to make a 
notification of a minor change.  Also, it fails to warn that the data type will 
change.  And it may create more confusion than options 1 and 4 which are 
simpler.

Option 4) Just make the change directly in 3.8,  s/OrderedDict/dict/, and be 
done will it.  This gives users the benefits right away and doesn't annoy them 
with warnings that they likely don't care about.   There is some precedent for 
this.  To make namedtuple class creation faster, the *verbose* option was 
dropped without any deprecation period.  It looks like no one missed that 
feature at all, but they did get the immediate benefit of faster import times.  
In the case of using regular dicts in named tuples, people will get immediate 
and significant space savings as well as a speed benefit.

My recommendation is Option 4 as being less disruptive and more beneficial than 
the other options.  In the unlikely event that anyone is currently depending on 
the reordering methods for the output of _asdict(), the remediation is 
trivially simple:   nt._asdict() -> OrderedDict(nt.as_dict()).

What do you all think?


Raymond



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Lost sight

2019-01-20 Thread Raymond Hettinger


> On Jan 19, 2019, at 2:12 AM, Serhiy Storchaka  wrote:
> 
> I have virtually completely lost the sight of my right eye (and the loss is 
> quickly progresses) and the sight of my left eye is weak. 

I hope this only temporary.  Best wishes.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] General concerns about C API changes

2018-11-13 Thread Raymond Hettinger
Overall, I support the efforts to improve the C API, but over the last few 
weeks have become worried.  I don't want to hold up progress with fear, 
uncertainty, and doubt.  Yet, I would like to be more comfortable that we're 
all aware of what is occurring and what are the potential benefits and risks.

* Inline functions are great.  They provide true local variables, better 
separation of concerns, are far less kludgy than text based macro substitution, 
and will typically generate the same code as the equivalent macro.  This is 
good tech when used with in a single source file where it has predictable 
results.   

However, I'm not at all confident about moving these into header files which 
are included in multiple target .c files which need be compiled into separate 
.o files and linked to other existing libraries.

With a macro, I know for sure that the substitution is taking place.  This 
happens at all levels of optimization and in a debug mode.  The effects are 
100% predictable and have a well-established track record in our mature 
battle-tested code base.  With cross module function calls, I'm less confident 
about what is happening, partly because compilers are free to ignore inline 
directives and partly because the semantics of inlining are less clear when the 
crossing module boundaries.

* Other categories of changes that we make tend to have only a shallow reach.  
However, these C API changes will likely touch every C extension that has ever 
been written, some of which is highly tuned but not actively re-examined.  If 
any mistakes are make, they will likely be pervasive.  Accordingly, caution is 
warranted.

My expectation was that the changes would be conducted in experimental 
branches. But extensive changes are already being made (or about to be made) on 
the 3.8 master. If a year from now, we decide that the changes were 
destabilizing or that the promised benefits didn't materialize, they will be 
difficult to undo because there are so many of them and because they will be 
interleaved with other changes.

The original motivation was to achieve a 2x speedup in return for significantly 
churning the C API. However, the current rearranging of the include files and 
macro-to-inline-function changes only give us churn.  At the very best, they 
will be performance neutral.  At worst, formerly cheap macro calls will become 
expensive in places that we haven't thought to run timings on.  Given that 
compilers don't have to honor an inline directive, we can't really know for 
sure -- perhaps today it works out fine, and perhaps tomorrow the compilers opt 
for a different behavior.

Maybe everything that is going on is fine.  Maybe it's not. I am not expert 
enough to know for sure, but we should be careful before green-lighting such an 
extensive series of changes directly to master.  Reasonable questions to ask 
are: 1) What are the risks to third party modules, 2) Do we really know that 
the macro-to-inline-function transformations are semantically neutral. 3) If 
there is no performance benefit (none has been seen so far, nor is any promised 
in the pending PRs), is it worth it?  

We do know that PyPy folks have had their share of issues with the C API, but 
I'm not sure that we can make any of this go away without changing the 
foundations of the whole ecosystem.  It is inconvenient for a full GC 
environment to interact with the API for a reference counted environment -- I 
don't think we can make this challenge go away without giving up reference 
counting.  It is inconvenient for a system that manifests objects on demand to 
interact with an API that assumes that objects have identity and never more 
once they are created -- I don't think we can make this go away either.  It is 
inconvenient to a system that uses unboxed data to interact with our API where 
everything is an object that includes a type pointer and reference count -- We 
have provided an API for boxing and boxing, but the trip back-and-forth is 
inconveniently expensive -- I don't think we can make that go away either 
because too much of the ecosystem depends on that API.  There are some things 
that ca
 n be mitigated such as challenges with borrowed references but that doesn't 
seem to have been the focus on any of the PRs.

In short, I'm somewhat concerned about the extensive changes that are 
occurring.  I do know they will touch substantially every C module in the 
entire ecosystem.  I don't know whether they are safe or whether they will give 
any real benefit.

FWIW, none of this is a criticism of the work being done.  Someone needs to 
think deeply about the C API or else progress will never be made.  That said, 
it is a high risk project with many PRs going directly into master, so it does 
warrant having buy in that the churn isn't destabilizing and will actually 
produce a benefit that is worth it.


Raymond







___
Python-Dev mailing list
Python-Dev@python.

Re: [Python-Dev] Postponed annotations break inspection of dataclasses

2018-09-23 Thread Raymond Hettinger



> On Sep 22, 2018, at 1:38 PM, Yury Selivanov  wrote:
> 
> On Sat, Sep 22, 2018 at 3:11 PM Guido van Rossum  wrote:
> [..]
>> Still, I wonder if there's a tweak possible of the globals and locals used 
>> when exec()'ing the function definitions in dataclasses.py, so that 
>> get_type_hints() gets the right globals for this use case.
>> 
>> It's really tough to be at the intersection of three PEPs...
> 
> If it's possible to fix exec() to accept any Mapping (not just dicts),
> then we can create a proxy mapping for "Dataclass.__init__.__module__"
> module and everything would work as expected

FWIW, the locals() dict for exec() already accepts any mapping (not just dicts):

>>> class M:
def __getitem__(self, key):
return key.upper()
def __setitem__(self, key, value):
print(f'{key!r}: {value!r}')

>>> exec('a=b', globals(), M())
'a': 'B'


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing C API

2018-07-30 Thread Raymond Hettinger


> On Jul 30, 2018, at 12:06 AM, Serhiy Storchaka  wrote:
> 
> 30.07.18 09:46, Raymond Hettinger пише:
>> I prefer the current organization that keeps the various tests together with 
>> the category being tested.  I almost never need to run the C API tests all 
>> at once, but I do need to see all the tests for an object in one place.  
>> When maintaining something like marshal, it would be easy to miss some of 
>> the tests if they are in a separate file.  IMO, the proposed change would 
>> hinder future maintenance and fly in the face of our traditional code 
>> organization.
> 
> What about moving just test_capi.py, test_getargs2.py and 
> test_structmembers.py into Lib/test/test_capi? They are not related to 
> specific types or modules

That would be reasonable.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing C API

2018-07-29 Thread Raymond Hettinger



> On Jul 29, 2018, at 4:53 AM, Serhiy Storchaka  wrote:
> 
> The benefit is that it will be easier to run all C API tests at once, and 
> only them, and it will be clearer what C API is covered by tests. The 
> disadvantage is that you will need to run several files for testing marshal 
> for example.
> 
> What are your thoughts?

I prefer the current organization that keeps the various tests together with 
the category being tested.  I almost never need to run the C API tests all at 
once, but I do need to see all the tests for an object in one place.  When 
maintaining something like marshal, it would be easy to miss some of the tests 
if they are in a separate file.  IMO, the proposed change would hinder future 
maintenance and fly in the face of our traditional code organization.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [issue34221] Any plans to combine collections.OrderedDict with dict

2018-07-26 Thread Raymond Hettinger

> On Jul 26, 2018, at 10:23 AM, Terry Reedy  wrote:
> 
> On python-idea,  Miro Hrončok asked today whether we can change the 
> OrderedDict repr from, for instance,
> 
> OrderedDict([('a', '1'), ('b', '2')]) # to
> OrderedDict({'a': '1', 'b': '2'})
> 
> I am not sure what our repr change policy is, as there is a 
> back-compatibility issue but I remember there being changes.

We are allowed to change the repr in future versions of the language.  Doing so 
does come at a cost though. There is a small performance penalty (see the 
timings below).  Some doctests will break.  And Python 3.8 printed output in 
books and blog posts would get shuffled if typed in to Python 3.5 -- this is 
problematic because one of the few remaining use cases for OrderedDict is to 
write code that is compatible with older Pythons.  

The proposed repr does look pretty but probably isn't worth the disruption.


Raymond

--

$ python3.7 -m timeit -r 7 'from collections import OrderedDict' 
"OrderedDict([('a', '1'), ('b', '2')])"
20 loops, best of 7: 1.12 usec per loop
$ python3.7 -m timeit -r 7 'from collections import OrderedDict' 
"OrderedDict({'a': '1', 'b': '2'})"
20 loops, best of 7: 1.22 usec per loop
$ python3.7 -m timeit -r 7 'from collections import OrderedDict' 
"OrderedDict([('a', '1'), ('b', '2')])"
20 loops, best of 7: 1.13 usec per loop
$ python3.7 -m timeit -r 7 'from collections import OrderedDict' 
"OrderedDict({'a': '1', 'b': '2'})"
20 loops, best of 7: 1.2 usec per loop
$ python3.7 -m timeit -r 7 'from collections import OrderedDict' 
"OrderedDict([('a', '1'), ('b', '2')])"
20 loops, best of 7: 1.12 usec per loop
$ python3.7 -m timeit -r 7 'from collections import OrderedDict' 
"OrderedDict({'a': '1', 'b': '2'})"
20 loops, best of 7: 1.2 usec per loop
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [issue34221] Any plans to combine collections.OrderedDict with dict

2018-07-25 Thread Raymond Hettinger


> On Jul 25, 2018, at 8:23 PM, INADA Naoki  wrote:
> 
> On Thu, Jul 26, 2018 at 12:04 PM Zhao Lee  wrote:
>> 
>> 
>> Since Python 3.7,dicts remember the order that items were inserted, so any 
>> plans to combine collections.OrderedDict with dict?
>> https://docs.python.org/3/library/collections.html?#collections.OrderedDict
>> https://docs.python.org/3/library/stdtypes.html#dict
> 
> No.  There are some major difference.
> 
> * d1 == d2 ignores order / od1 == od2 compares order
> * OrderedDict has move_to_end() method.
> * OrderedDict.pop() takes `last=True` keyword.

In addition to the API differences noted by Naoki, there are also 
implementation differences.  The regular dict implements a low-cost solution 
for common cases.  The OrderedDict has a more complex scheme that can handle 
frequent rearrangements (move_to_end operations) without touching, resizing, or 
reordering the underlying dictionary. Roughly speaking, regular dicts emphasize 
fast, space-efficient core dictionary operations over ordering requirements 
while OrderedDicts prioritize ordering operations over other considerations.

That said, now that regular dicts are ordered by default, the need for 
collections.OrderedDict() should diminish quite a bit.  Mostly, I think people 
will ignore OrderedDict unless their application heavily exercises move to end 
operations.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add __reversed__ methods for dict

2018-05-26 Thread Raymond Hettinger

> On May 26, 2018, at 7:20 AM, INADA Naoki  wrote:
> 
> Because doubly linked list is very memory inefficient, every implementation
> would be forced to implement dict like PyPy (and CPython) for efficiency.
> But I don't know much about current MicroPython and other Python
> implementation's
> plan to catch Python 3.6 up.

FWIW, Python 3.7 is the first Python that where the language guarantees that 
regular dicts are order preserving.  And the feature being discussed in this 
thread is for Python 3.8.

What potential implementation obstacles do you foresee?  Can you imagine any 
possible way that an implementation would have an order preserving dict but 
would be unable to trivially implement __reversed__?  How could an 
implementation have a __setitem__ that appends at the end, and a popitem() that 
pops from that same end, but still not be able to easily iterate in reverse?  
It really doesn't matter whether an implementer uses a dense array of keys or a 
doubly-linked-list; either way, looping backward is as easy as going forward. 


Raymond


P.S. It isn't going to be hard to update MicroPython to have a compact and 
ordered dict (based on my review of their existing dict implementation).  This 
is something they are really going to want because of the improved memory 
efficiency.  Also, they're also already going to need it just to comply with 
guaranteed keyword argument ordering and guaranteed ordering of class 
dictionaries.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available

2018-05-25 Thread Raymond Hettinger


> On May 24, 2018, at 10:57 AM, Antoine Pitrou  wrote:
> 
> While PEP 574 (pickle protocol 5 with out-of-band data) is still in
> draft status, I've made available an implementation in branch "pickle5"
> in my GitHub fork of CPython:
> https://github.com/pitrou/cpython/tree/pickle5
> 
> Also I've published an experimental backport on PyPI, for Python 3.6
> and 3.7.  This should help people play with the new API and features
> without having to compile Python:
> https://pypi.org/project/pickle5/
> 
> Any feedback is welcome.

Thanks for doing this.

Hope it isn't too late, but I would like to suggest that protocol 5 support 
fast compression by default.  We normally pickle objects so that they can be 
transported (saved to a file or sent over a socket). Transport costs (reading 
and writing a file or socket) are generally proportional to size, so 
compression is likely to be a net win (much as it was for header compression in 
HTTP/2).

The PEP lists compression as a possible a refinement only for large objects, 
but I expect is will be a win for most pickles to compress them in their 
entirety.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add __reversed__ methods for dict

2018-05-25 Thread Raymond Hettinger


> On May 25, 2018, at 9:32 AM, Antoine Pitrou  wrote:
> 
> It's worth nothing that OrderedDict already supports reversed().
> The argument could go both ways:
> 
> 1. dict is similar to OrderedDict nowadays, so it should support
>   reversed() too;
> 
> 2. you can use OrderedDict to signal explicitly that you care about
>   ordering; no need to add anything to dict.

Those are both valid sentiments :-)

My thought is that guaranteed insertion order for regular dicts is brand new, 
so it will take a while for the notion settle in and become part of everyday 
thinking about dicts.  Once that happens, it is probably inevitable that use 
cases will emerge and that __reversed__ will get added at some point.  The 
implementation seems straightforward and it isn't much of a conceptual leap to 
expect that a finite ordered collection would be reversible.

Given that dicts now track insertion order, it seems reasonable to want to know 
the most recent insertions (i.e. looping over the most recently added tasks in 
a task dict).  Other possible use cases will likely correspond to how we use 
the Unix tail command.  

If those use cases arise, it would be nice for __reversed__ to already be 
supported so that people won't be tempted to implement an ugly workaround using 
popitem() calls followed by reinsertions. 


Raymond

.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashes in Python3.5 for tuples and frozensets

2018-05-16 Thread Raymond Hettinger


> On May 16, 2018, at 5:48 PM, Anthony Flury via Python-Dev 
>  wrote:
> 
> However the frozen set hash, the same in both cases, as is the hash of the 
> tuples - suggesting that the vulnerability resolved in Python 3.3 wasn't 
> resolved across all potentially hashable values.

You are correct.  The hash randomization only applies to strings.  None of the 
other object hashes were altered.  Whether this is a vulnerability or not 
depends greatly on what is exposed to users (generally strings) and how it is 
used.

For the most part, it is considered a feature that integers hash to themselves. 
 That is very fast to compute :-) Also, it tends to prevent hash collisions for 
consecutive integers.



Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 572: Usage of assignment expressions in C

2018-04-30 Thread Raymond Hettinger


> On Apr 28, 2018, at 8:45 AM, Antoine Pitrou  wrote:
> 
>> I personally haven't written a lot of C, so have no personal experience,
>> but if this is at all a common approach among experienced C developers, it
>> tells us a lot.
> 
> I think it's a matter of taste and personal habit.  Some people will
> often do it, some less.  Note that C also has a tendency to make it
> more useful, because doesn't have exceptions, so functions need to
> (ab)use return values when they want to indicate an error.  When you're
> calling such functions (for example I/O functions), you routinely have
> to check for special values indicating an error, so it's common to see
> code such as:
> 
>  // Read up to n bytes from file descriptor
>  if ((bytes_read = read(fd, buf, n)) == -1) {
>  // Error occurred while reading, do something
>  }

Thanks Antoine, this is an important point that I hope doesn't get lost.
In a language with exceptions, assignment expressions are less needful.
Also, the pattern of having of having mutating methods return None
further limits the utility.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 572: A backward step in readability

2018-04-30 Thread Raymond Hettinger


> On Apr 30, 2018, at 9:37 AM, Steven D'Aprano  wrote:
> 
> On Mon, Apr 30, 2018 at 08:09:35AM +0100, Paddy McCarthy wrote:
> [...]
>> A PEP that can detract from readability; *readability*, a central
>> tenet of Python, should
>> be rejected, (on principle!), when such objections are treated so 
>> dismissively.
> 
> Unless you have an objective measurement of readability, that objection 
> is mere subjective personal preference, and not one that everyone agrees 
> with.

Sorry Steven, but that doesn't seem like it is being fair to Paddy.
Of course, readability can't be measured objectively with ruler
(that is a false standard).  However, readability is still a real issue
that affects us daily even though objective measurement aren't possible.

All of us who do code reviews make assessments of readability
on a daily basis even though we have no objective measures.
We know hard to read when we see it.

In this thread, several prominent and highly experienced devs
reported finding it difficult to parse some of the examples and
some mis-parsed the semantics of the examples.  It is an objective
fact that they reported readability issues.  That is of great concern
and shouldn't be blown off with a comment that readability,
"is a mere subjective personal preference".  At its heart, readability
is the number one concern in language design.

Also, there another area where it looks like valid concerns
are being dismissed out of hand.  Several respondents worried
that the proposed feature will lead to writing bad code.  
Their comments seem to have been swept under the table with
responses along the lines of "well any feature can be used badly,
so we don't care about that, some people will write bad code no
matter what we do".  While that is true to some extent, there remains 
a valid issue concerning the propensity for misuse.

ISTM the proposed feature relies on users showing a good deal
of self-restriaint and having a clear knowledge of boundary
between the "clear-win" cases (like the regex match object example)
and the puzzling cases (assignments being used in and-operator
and or-operator chains).  It also relies on people not making
hard to find mistakes (like mistyping := when == was intended).

There is a real difference between a feature that could be abused
versus a feature that has a propensity for being misused, being
mistyped, or being misread (all of which have occurred multiple
times in these threads).


> The "not readable" objection has been made, extremely vehemently, 
> against nearly all major syntax changes to Python:

I think that is a false recollection of history.  Comprehensions were
welcomed and highly desired.  Decorators were also highly sought
after -- there was only a question of the best possible syntax. 
The ternary operator was clamored for by an enormous number
of users (though there was little agreement on the best spelling).
Likewise, the case for augmented assignments was somewhat strong
(eliminating having to spell the assignment target twice).

Each of those proposals had their debates, but none of them 
had a bunch of core devs flat-out opposed like we do now.
It really isn't the same at all.

However, even if the history had been recalled correctly, it would
still be a logical fallacy to posit "in the past, people opposed
syntax changes that later proved to be popular, therefore we
should ignore all concerns being expressed today".  To me,
that seems like a rhetorical trick for dismissing a bunch of
thoughtful posts.

Adding this new syntax is a one-way trip -- we don't get to express
regrets later.   Accordingly, it would be nice if the various concerns
being presented were addressed directly rather than being
dismissed with a turn of phrase.  Nor should it matter whether
concerns were articulately expressed (being articulate isn't
always correlated with being right).


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (name := expression) doesn't fit the narrative of PEP 20

2018-04-25 Thread Raymond Hettinger

> On Apr 26, 2018, at 12:40 AM, Tim Peters  wrote:
> 
> [Raymond Hettinger ]
>> After re-reading all the proposed code samples, I believe that
>> adopting the PEP will make the language harder to teach to people
>> who are not already software engineers.
> 
> Can you elaborate on that?  

Just distinguishing between =, :=, and == will be a forever recurring
discussion, far more of a source of confusion than the occasional
question of why Python doesn't have embedded assignment.

Also, it is of concern that a number of prominent core dev
respondents to this thread have reported difficulty scanning
the posted code samples.

> I've used dozens of languages over the
> decades, most of which did have some form of embedded assignment.

Python is special, in part, because it is not one of those languages.
It has virtues that make it suitable even for elementary school children.
We can show well-written Python code to non-computer folks and walk
them through what it does without their brains melting (something I can't
do with many of the other languages I've used).  There is a virtue
in encouraging simple statements that read like English sentences
organized into English-like paragraphs, presenting itself like
"executable pseudocode".

Perl does it or C++ does it is unpersuasive.  Its omission from Python
was always something that I thought Guido had left-out on purpose,
intentionally stepping away from constructs that would be of help
in an obfuscated Python contest.


> Yes, I'm a software engineer, but I've always pitched in on "help
> forums" too.

That's not really the same.  I've taught Python to many thousands
of professionals, almost every week for over six years.  That's given
me a keen sense of what is hard to teach.  It's okay to not agree
with my assessment, but I would like for fruits of my experience
to not be dismissed in a single wisp of a sentence.  Any one feature
in isolation is usually easy to explain, but showing how to combine
them into readable, expressive code is another matter.  And as
Yuri aptly noted, we spend more time reading code than writing code.
If some fraction of our users finds the code harder to scan
because the new syntax, then it would be a net loss for the language.

I hesitated to join this thread because you and Guido seemed to be
pushing back so hard against anyone's who design instincts didn't favor
the new syntax.  It would be nice to find some common ground and
perhaps stipulate that the grammar would grow in complexity, that a new
operator would add to the current zoo of operators, that the visual texture
of the language would change (and in a way that some including me
do not find pleasing), and that while simplest cases may afford
a small net win, it is a certitude that the syntax will routinely be
pushed beyond our comfort zone.

While the regex conditional example looks like a win, it is very modest win
and IMHO not worth the overall net increase language complexity.

Like Yuri, I'll drop-out now.  Hopefully, you all wind find some value
in what I had to contribute to the conversation.


Raymond







___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (name := expression) doesn't fit the narrative of PEP 20

2018-04-25 Thread Raymond Hettinger


> On Apr 25, 2018, at 8:11 PM, Yury Selivanov  wrote:
> 
> FWIW I started my thread for allowing '=' in expressions to make sure that
> we fully explore that path.  I don't like ':=' and I thought that using '='
> can make the idea more appealing to myself and others. It didn't, sorry if
> it caused any distraction. Although adding a new ':=' operator isn't my main
> concern.
> 
> I think it's a fact that PEP 572 makes Python more complex.
> Teaching/learning Python will inevitably become harder, simply because
> there's one more concept to learn.
> 
> Just yesterday this snippet was used on python-dev to show how great the
> new syntax is:
> 
>  my_func(arg, buffer=(buf := [None]*get_size()), size=len(buf))
> 
> To my eye this is an anti-pattern.  One line of code was saved, but the
> other line becomes less readable.  The fact that 'buf' can be used after
> that line means that it will be harder for a reader to trace the origin of
> the variable, as a top-level "buf = " statement would be more visible.
> 
> The PEP lists this example as an improvement:
> 
>  [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
> 
> I'm an experienced Python developer and I can't read/understand this
> expression after one read. I have to read it 2-3 times before I trace where
> 'y' is set and how it's used.  Yes, an expanded form would be ~4 lines
> long, but it would be simple to read and therefore review, maintain, and
> update.
> 
> Assignment expressions seem to optimize the *writing code* part, while
> making *reading* part of the job harder for some of us.  I write a lot of
> Python, but I read more code than I write. If the PEP gets accepted I'll
> use
> the new syntax sparingly, sure.  My main concern, though, is that this PEP
> will likely make my job as a code maintainer harder in the end, not easier.
> 
> I hope I explained my -1 on the PEP without sounding emotional.

FWIW, I concur with all of Yuri's thoughtful comments.

After re-reading all the proposed code samples, I believe that
adopting the PEP will make the language harder to teach to people
who are not already software engineers.  To my eyes, the examples
give ample opportunity for being misunderstood and will create a
need to puzzle-out the intended semantics.

On the plus side, the proposal does address the occasional minor
irritant of writing an assignment on a separate line.  On the minus side,
the visual texture of the new code is less appealing. The proposal
also messes with my mental model for the distinction between
expressions and statements.

It probably doesn't matter at this point (minds already seem to be made up),
but put me down for -1.   This is a proposal we can all easily live without.


Raymond







___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 575: Unifying function/method classes

2018-04-15 Thread Raymond Hettinger


> On Apr 15, 2018, at 5:50 AM, Jeroen Demeyer  wrote:
> 
> On 2018-04-14 23:14, Guido van Rossum wrote:
>> That actually sounds like a pretty big problem. I'm sure there is lots
>> of code that doesn't *just* duck-type nor calls inspect but uses
>> isinstance() to decide how to extract the desired information.
> 
> In the CPython standard library, the *only* fixes that are needed because of 
> this are in:
> 
> - inspect (obviously)
> - doctest (to figure out the __module__ of an arbitrary object)
> - multiprocessing.reduction (something to do with pickling)
> - xml.etree.ElementTree (to determine whether a certain method was overridden)
> - GDB support
> 
> I've been told that there might also be a problem with Random._randbelow, 
> even though it doesn't cause test failures.

Don't worry about Random._randbelow, we're already working on it and it is an 
easy fix.  Instead, focus on Guido's comment. 

> The fact that there is so little breakage in the standard library makes 
> me confident that the problem is not so bad. And in the cases where it 
> does break, it's usually pretty easy to fix.

I don't think that confidence is warranted.  The world of Python is very large. 
 When public APIs (such as that in the venerable types module) get changed, is 
virtually assured that some code will break.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 575: Unifying function/method classes

2018-04-13 Thread Raymond Hettinger

> On Apr 12, 2018, at 9:12 AM, Jeroen Demeyer  wrote:
> 
> I would like to request a review of PEP 575, which is about changing the 
> classes used for built-in functions and Python functions and methods. The 
> text of the PEP can be found at
> 
> https://www.python.org/dev/peps/pep-0575/

Thanks for doing this work.  The PEP is well written and I'm +1 on the general 
idea of what it's trying to do (I'm still taking in all the details).

It would be nice to have a section that specifically discusses the implications 
with respect to other existing function-like tooling:  classmethod, 
staticmethod, partial, itemgetter, attrgetter, methodgetter, etc.

Also, please mention the backward compatibility issue that will arise for code 
that currently relies on types.MethodType, types.BuiltinFunctionType, 
types.BuiltinMethodType, etc.  For example, I would need to update the code in 
random._randbelow().  That code uses the existing builtin-vs-pure-python type 
distinctions to determine whether either the random() or getrandbits() methods 
have been overridden.   This is likely an easy change for me to make, but there 
may be code like it the wild, code that would be broken if the distinction is 
lost.


Raymond







___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Soliciting comments on the future of the cmd module (bpo-33233)

2018-04-06 Thread Raymond Hettinger


> On Apr 6, 2018, at 3:02 PM, Ned Deily  wrote:
> 
> We could be even bolder and officially deprecate "cmd" and consider closing 
> open enhancement issues for it on b.p.o."

FWIW, the pdb module depends on the cmd module.

Also, I still teach people how to use cmd and I think it still serves a useful 
purpose.  So, unless it is considered broken, I don't think it should be 
deprecated.


Raymond




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Replacing self.__dict__ in __init__

2018-03-25 Thread Raymond Hettinger
On Mar 25, 2018, at 8:08 AM, Tin Tvrtković  wrote:
> 
> That's reassuring, thanks.

I misspoke.  The object size is the same but the underlying dictionary loses 
key-sharing and doubles in size.

Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Replacing self.__dict__ in __init__

2018-03-24 Thread Raymond Hettinger

> On Mar 24, 2018, at 7:18 AM, Tin Tvrtković  wrote:
> 
> it's faster to do this:
> 
> self.__dict__ = {'a': a, 'b': b, 'c': c}
> 
> i.e. to replace the instance dictionary altogether. On PyPy, their core devs 
> inform me this is a bad idea because the instance dictionary is special 
> there, so we won't be doing this on PyPy. 
> 
> But is it safe to do on CPython?

This should work. I've seen it done in other production tools without any ill 
effect.

The dict can be replaced during __init__() and still get benefits of 
key-sharing.  That benefit is lost only when the instance dict keys are 
modified downstream from __init__().  So, from a dict size point of view, your 
optimization is fine.

Still, you should look at whether this would affect static type checkers, lint 
tools, and other tooling.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Symmetry arguments for API expansion

2018-03-13 Thread Raymond Hettinger

> On Mar 13, 2018, at 12:07 PM, Guido van Rossum  wrote:
> 
> OK, please make it so.

Will do.  I'll create a tracker issue right away.

Since this one looks easy (as many things do at first), I would like to assign 
it to Nofar Schnider (one of my mentees).


Raymond



> 
> On Tue, Mar 13, 2018 at 11:39 AM, Raymond Hettinger 
>  wrote:
> 
> 
> > On Mar 13, 2018, at 10:43 AM, Guido van Rossum  wrote:
> >
> > So let's make as_integer_ratio() the standard protocol for "how to make a 
> > Fraction out of a number that doesn't implement numbers.Rational". We 
> > already have two examples of this (float and Decimal) and perhaps numpy or 
> > the sometimes proposed fixed-width decimal type can benefit from it too. If 
> > this means we should add it to int, that's fine with me.
> 
> I would like that outcome.
> 
> The signature x.as_integer_ratio() -> (int, int) is pleasant to work with.  
> The output is easy to explain, and the denominator isn't tied to powers of 
> two or ten. Since Python ints are exact and unbounded, there isn't worry 
> about range or rounding issues.
> 
> In contrast, math.frexp(float) ->(float, int) is a bit of pain because it 
> still leaves you in the domain of floats rather than letting you decompose to 
> more more basic types.  It's nice to have a way to move down the chain from 
> ℚ, ℝ, or ℂ to the more basic ℤ (of course, that only works because floats and 
> complex are implemented in a way that precludes exact irrationals).
> 
> 
> Raymond
> 
> 
> 
> 
> 
> -- 
> --Guido van Rossum (python.org/~guido)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Symmetry arguments for API expansion

2018-03-13 Thread Raymond Hettinger


> On Mar 13, 2018, at 10:43 AM, Guido van Rossum  wrote:
> 
> So let's make as_integer_ratio() the standard protocol for "how to make a 
> Fraction out of a number that doesn't implement numbers.Rational". We already 
> have two examples of this (float and Decimal) and perhaps numpy or the 
> sometimes proposed fixed-width decimal type can benefit from it too. If this 
> means we should add it to int, that's fine with me.

I would like that outcome.  

The signature x.as_integer_ratio() -> (int, int) is pleasant to work with.  The 
output is easy to explain, and the denominator isn't tied to powers of two or 
ten. Since Python ints are exact and unbounded, there isn't worry about range 
or rounding issues.

In contrast, math.frexp(float) ->(float, int) is a bit of pain because it still 
leaves you in the domain of floats rather than letting you decompose to more 
more basic types.  It's nice to have a way to move down the chain from ℚ, ℝ, or 
ℂ to the more basic ℤ (of course, that only works because floats and complex 
are implemented in a way that precludes exact irrationals).


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Symmetry arguments for API expansion

2018-03-12 Thread Raymond Hettinger

> On Mar 12, 2018, at 12:15 PM, Guido van Rossum  wrote:
> 
> There's a reason why adding this to int feels right to me. In mypy we treat 
> int as a sub*type* of float, even though technically it isn't a sub*class*. 
> The absence of an is_integer() method on int means that this code has a bug 
> that mypy doesn't catch:
> 
> def f(x: float):
> if x.is_integer():
> "do something"
> else:
> "do something else"
> 
> f(12)

Do you have any thoughts about the other non-corresponding float methods?

>>> set(dir(float)) - set(dir(int))
   {'as_integer_ratio', 'hex', '__getformat__', 'is_integer', '__setformat__', 
'fromhex'}

In general, would you prefer that functionality like is_integer() be a math 
module function or that is should be a method on all numeric types except 
Complex?  I expect questions like this to recur over time.

Also, do you have any thoughts on the feature itself?  Serhiy ran a Github 
search and found that it was baiting people into worrisome code like:  
(x/5).is_integer() or (x**0.5).is_integer()

> So I think the OP of the bug has a valid point, 27 years without this feature 
> notwithstanding.

Okay, I'll ask the OP to update his patch :-)


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Symmetry arguments for API expansion

2018-03-12 Thread Raymond Hettinger
There is a feature request and patch to propagate the float.is_integer() API 
through rest of the numeric types ( https://bugs.python.org/issue26680 ).

While I don't think it is a good idea, the OP has been persistent and wants his 
patch to go forward.  

It may be worthwhile to discuss on this list to help resolve this particular 
request and to address the more general, recurring design questions. Once a 
feature with a marginally valid use case is added to an API, it is common for 
us to get downstream requests to propagate that API to other places where it 
makes less sense but does restore a sense of symmetry or consistency.  In cases 
where an abstract base class is involved, acceptance of the request is usually 
automatic (i.e. range() and tuple() objects growing index() and count() 
methods).  However, when our hand hasn't been forced, there is still an 
opportunity to decline.  That said, proponents of symmetry requests tend to 
feel strongly about it and tend to never fully accept such a request being 
declined (it leaves them with a sense that Python is disordered and unbalanced).


Raymond


 My thoughts on the feature request -

What is the proposal?
* Add an is_integer() method to int(), Decimal(), Fraction(), and Real(). 
Modify Rational() to provide a default implementation.

Starting point: Do we need this?
* We already have a simple, traditional, portable, and readable way to make the 
test:  int(x) == x
* In the context of ints, the test x.is_integer() always returns True.  This 
isn't very useful.
* Aside from the OP, this behavior has never been requested in Python's 27 year 
history.

Does it cost us anything?
* Yes, adding a method to the numeric tower makes it a requirement for every 
class that ever has or ever will register or inherit from the tower ABCs.
* Adding methods to a core object such as int() increases the cognitive load 
for everyday users who look at dir(), call help(), or read the main docs.
* It conflicts with a design goal for the decimal module to not invent new 
functionality beyond the spec unless essential for integration with the rest of 
the language.  The reasons included portability with other implementations and 
not trying to guess what the committee would have decided in the face of tricky 
questions such as whether Decimal('1.01').is_integer()
should return True when the context precision is only three decimal places 
(i.e. whether context precision and rounding traps should be applied before the 
test and whether context flags should change after the test).

Shouldn't everything in a concrete class also be in an ABC and all its 
subclasses?
* In general, the answer is no.  The ABCs are intended to span only basic 
functionality.  For example, GvR intentionally omitted update() from the Set() 
ABC because the need was fulfilled by __ior__().

But int() already has real, imag, numerator, and denominator, why is this 
different?
* Those attributes are central to the functioning of the numeric tower.
* In contrast, the is_integer() method is a peripheral and incidental concept.

What does "API Parsimony" mean?
* Avoidance of feature creep.
* Preference for only one obvious way to do things.
* Practicality (not craving things you don't really need) beats purity 
(symmetry and foolish consistency).
* YAGNI suggests holding off in the absence of clear need.
* Recognition that smaller APIs are generally better for users.

Are there problems with symmetry/consistency arguments?
* The need for guard rails on an overpass doesn't imply the same need on a 
underpass even though both are in the category of grade changing byways.
* "In for a penny, in for a pound" isn't a principle of good design; rather, it 
is a slippery slope whereby the acceptance of a questionable feature in one 
place seems to compel later decisions to propagate the feature to other places 
where the cost / benefit trade-offs are less favorable.

Should float.as_integer() have ever been added in the first place?
* Likely, it should have been a math module function like isclose() and isinf() 
so that it would not have been type specific.
* However, that ship has sailed; instead, the question is whether we now have 
to double down and have to dispatch other ships as well.
* There is some question as to whether it is even a good idea to be testing the 
results of floating point calculations for exact values. It may be useful for 
testing inputs, but is likely a trap for people using it other contexts.

Have we ever had problems with just accepting requests solely based on symmetry?
* Yes.  The str.startswith() and str.endswith() methods were given optional 
start/end arguments to be consistent with str.index(), not because there were 
known use cases where code was made better with the new feature.   This ended 
up conflicting with a later feature request that did have valid use cases 
(supporting multiple test prefixes/suffixes).  As a result, we ended-up with an 
awkward and error-p

[Python-Dev] Should the dataclass frozen property apply to subclasses?

2018-02-21 Thread Raymond Hettinger
When working on the docs for dataclasses, something unexpected came up.  If a 
dataclass is specified to be frozen, that characteristic is inherited by 
subclasses which prevents them from assigning additional attributes:

>>> @dataclass(frozen=True)
class D:
x: int = 10

>>> class S(D):
pass

>>> s = S()
>>> s.cached = True
Traceback (most recent call last):
  File "", line 1, in 
s.cached = True
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/dataclasses.py",
 line 448, in _frozen_setattr
raise FrozenInstanceError(f'cannot assign to field {name!r}')
dataclasses.FrozenInstanceError: cannot assign to field 'cached'

Other immutable classes in Python don't behave the same way:


>>> class T(tuple):
pass

>>> t = T([10, 20, 30])
>>> t.cached = True

>>> class F(frozenset):
pass

>>> f = F([10, 20, 30])
>>> f.cached = True

>>> class B(bytes):
pass

>>> b = B()
>>> b.cached = True


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses, frozen and __post_init__

2018-02-20 Thread Raymond Hettinger


> On Feb 20, 2018, at 2:38 PM, Guido van Rossum  wrote:
> 
> But then the class would also inherit a bunch of misfeatures from tuple (like 
> being indexable and having a length). It would be nicer if it used __slots__ 
> instead.


FWIW, George Sakkis made a tool like this about nine years ago.  
https://code.activestate.com/recipes/576555-records  It would need to be 
modernized to include default arguments, types annotations and whatnot, but 
otherwise it has great performance and low API complexity.

> (Also, the problem with __slots__ is the same as the problem with inheriting 
> from tuple, and it should just be solved right, somehow.)

Perhaps a new variant of __init_subclass__ would work.



Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is static typing still optional?

2018-01-29 Thread Raymond Hettinger


> On Jan 28, 2018, at 11:52 PM, Eric V. Smith  wrote:
> 
> I think it would be a bad design to have to opt-in to hashability if using 
> frozen=True. 

I respect that you see it that way, but it doesn't make sense to me. You can 
have either one without the other.  It seems to me that it is clearer and more 
explicit to just say what you want rather than having implicit logic guess at 
what you meant.  Otherwise, when something goes wrong, it is difficult to debug.

The tooltips for the dataclass decorator are essentially of checklist of 
features that can be turned on or off.  That list of features is mostly 
easy-to-use except for hash=None which has three possible values, only one of 
which is self-evident.

We haven't had much in the way of user testing, so it is a significant data 
point that one of your first users (me) found was confounded by this API.  I 
recommend putting various correct and incorrect examples in front of other 
users (preferably experienced Python programmers) and asking them to predict 
what the code does based on the source code.


Raymond





___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is static typing still optional?

2018-01-28 Thread Raymond Hettinger

>>> 2) Change the default value for "hash" from "None" to "False".  This might 
>>> take a little effort because there is currently an oddity where setting 
>>> hash=False causes it to be hashable.  I'm pretty sure this wasn't intended 
>>> ;-)
>> I haven't looked at this yet.
> 
> I think the hashing logic explained in 
> https://bugs.python.org/issue32513#msg310830 is correct. It uses hash=None as 
> the default, so that frozen=True objects are hashable, which they would not 
> be if hash=False were the default.

Wouldn't it be simpler to make the options orthogonal?  Frozen need not imply 
hashable.  I would think if a user wants frozen and hashable, they could just 
write frozen=True and hashable=True.  That would more explicit and clear than 
just having frozen=True imply that hashability gets turned-on implicitly 
whether you want it or not.

> If there's some case there that you disagree with, I'd be interested in 
> hearing about it.
> 
> That logic is what is currently scheduled to go in to 3.7 beta 1. I have not 
> updated the PEP yet, mostly because it's so difficult to explain.

That might be a strong hint that this part of the API needs to be simplified :-)

"If the implementation is hard to explain, it's a bad idea." -- Zen

If for some reason, dataclasses really do need tri-state logic, it may be 
better off with enum values (NOT_HASHABLE, VALUE_HASHABLE, IDENTITY_HASHABLE, 
HASHABLE_IF_FROZEN or some such) rather than with None, True, and False which 
don't communicate enough information to understand what the decorator is doing.

> What's the case where setting hash=False causes it to be hashable? I don't 
> think that was ever the case, and I hope it's not the case now.

Python 3.7.0a4+ (heads/master:631fd38dbf, Jan 28 2018, 16:20:11) 
[GCC 7.2.0] on darwin
Type "copyright", "credits" or "license()" for more information.

>>> from dataclasses import dataclass
>>> @dataclass(hash=False)
class A:
x: int

>>> hash(A(1))
285969507


I'm hoping that this part of the API gets thought through before it gets set in 
stone.  Since dataclasses code never got a chance to live in the wild (on PyPI 
or some such), it behooves us to think through all the usability issues.  To me 
at least, the tri-state hashability was entirely unexpected and hard to debug 
-- I had to do a close reading of the source to figure-out what was happening.


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses

2017-12-30 Thread Raymond Hettinger

> On Dec 29, 2017, at 4:52 PM, Guido van Rossum  wrote:
> 
> I still think it should overrides anything that's just inherited but nothing 
> that's defined in the class being decorated.

This has the virtue of being easy to explain, and it will help with debugging 
by honoring the code proximate to the decorator :-)

For what it is worth, the functools.total_ordering class decorator does 
something similar -- though not exactly the same.  A root comparison method is 
considered user-specified if it is different than the default method provided 
by object: 

def total_ordering(cls):
"""Class decorator that fills in missing ordering methods"""
# Find user-defined comparisons (not those inherited from object).
roots = {op for op in _convert if getattr(cls, op, None) is not 
getattr(object, op, None)}
...

The @dataclass decorator has a much broader mandate and we have almost no 
experience with it, so it is hard to know what legitimate use cases will arise.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pep-0557 dataclasses top level module vs part of collections?

2017-12-21 Thread Raymond Hettinger


> On Dec 21, 2017, at 3:21 PM, Gregory P. Smith  wrote:
> 
> It seems a suggested use is "from dataclasses import dataclass"
> 
> But people are already familiar with "from collections import namedtuple" 
> which suggests to me that "from collections import dataclass" would be a more 
> natural sounding API addition.

This might make sense if it were a single self contained function.  But 
dataclasses are their own little ecosystem that warrants its own module 
namespace:

>>> import dataclasses
>>> dataclasses.__all__
['dataclass', 'field', 'FrozenInstanceError', 'InitVar', 'fields', 'asdict', 
'astuple', 'make_dataclass', 'replace']

Also, remember that dataclasses have a dual role as a data holder (which is 
collection-like) and as a generator of boilerplate code (which is more like 
functools.total_ordering).

I support Eric's decision to make this a separate module.


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-15 Thread Raymond Hettinger


> On Dec 15, 2017, at 1:47 PM, Guido van Rossum  wrote:
> 
> On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger 
>  wrote:
> 
> > On Dec 15, 2017, at 7:53 AM, Guido van Rossum  wrote:
> >
> > Make it so. "Dict keeps insertion order" is the ruling.
> 
> On Twitter, someone raised an interesting question.
> 
> Is the guarantee just for 3.7 and later?  Or will the blessing also cover 3.6 
> where it is already true.
> 
> The 3.6 guidance is to use OrderedDict() when ordering is required.  As of 
> now, that guidance seems superfluous and may no longer be a sensible 
> practice.  For example, it would be nice for Eric Smith when he does his 3.6 
> dataclasses backport to not have to put OrderedDict back in the code.
> 
> For 3.6 we can't change the language specs, we can just document how it works 
> in CPython. I don't know what other Python implementations do in their 
> version that's supposed to be compatible with 3.6 but I don't want to 
> retroactively declare them non-conforming. (However for 3.7 they have to 
> follow suit.) I also don't think that the "it stays ordered across deletions" 
> part of the ruling is true in CPython 3.6.

FWIW, the regular dict does stay ordered across deletions in CPython3.6:

>>> d = dict(a=1, b=2, c=3, d=4)
>>> del d['b']
>>> d['b'] = 5
>>> d
{'a': 1, 'c': 3, 'd': 4, 'b': 5}

Here's are more interesting demonstration:

from random import randrange, shuffle
from collections import OrderedDict

population = 100
s = list(range(population // 4))
shuffle(s)
d = dict.fromkeys(s)
od = OrderedDict.fromkeys(s)
for i in range(50):
k = randrange(population)
d[k] = i
od[k] = i
k = randrange(population)
if k in d:
del d[k]
del od[k]
assert list(d.items()) == list(od.items())

The dict object insertion logic just appends to the arrays of keys, values, and 
hashvalues.  When the number of usable elements decreases to zero (reaching the 
limit of the most recent array allocation), the dict is resized (compacted) 
left-to-right so that order is preserved.

Here are some of the relevant sections from the 3.6 source tree:

Objects/dictobject.c line 89:

Preserving insertion order

It's simple for combined table.  Since dk_entries is mostly append only, we 
can
get insertion order by just iterating dk_entries.

One exception is .popitem().  It removes last item in dk_entries and 
decrement
dk_nentries to achieve amortized O(1).  Since there are DKIX_DUMMY remains 
in
dk_indices, we can't increment dk_usable even though dk_nentries is
decremented.

In split table, inserting into pending entry is allowed only for 
dk_entries[ix]
where ix == mp->ma_used. Inserting into other index and deleting item cause
converting the dict to the combined table.

Objects/dictobject.c::insertdict() line 1140:

if (mp->ma_keys->dk_usable <= 0) {
/* Need to resize. */
if (insertion_resize(mp) < 0) {
Py_DECREF(value);
return -1;
}
hashpos = find_empty_slot(mp->ma_keys, key, hash);
}

Objects/dictobject.c::dictresize() line 1282:

PyDictKeyEntry *ep = oldentries;
for (Py_ssize_t i = 0; i < numentries; i++) {
while (ep->me_value == NULL)
ep++;
newentries[i] = *ep++;
}

> 
> I don't know what guidance to give Eric, because I don't know what other 
> implementations do nor whether Eric cares about being compatible with those. 
> IIUC micropython does not guarantee this currently, but I don't know if they 
> claim Python 3.6 compatibility -- in fact I can't find any document that 
> specifies the Python version they're compatible with more precisely than 
> "Python 3".


I did a little research and here' what I found:

"MicroPython aims to implement the Python 3.4 standard (with selected features 
from later versions)" 
-- http://docs.micropython.org/en/latest/pyboard/reference/index.html

"PyPy is a fast, compliant alternative implementation of the Python language 
(2.7.13 and 3.5.3)."
-- http://pypy.org/

"Jython 2.7.0 Final Released (May 2015)"
-- http://www.jython.org/

"IronPython 2.7.7 released on 2016-12-07"
-- http://ironpython.net/

So, it looks like your could say 3.6 does whatever CPython 3.6 already does and 
not worry about leaving other implementations behind.  (And PyPy is actually 
ahead of us here, having compact and order-preserving dicts for quite a while).

Cheers,


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-15 Thread Raymond Hettinger

> On Dec 15, 2017, at 7:53 AM, Guido van Rossum  wrote:
> 
> Make it so. "Dict keeps insertion order" is the ruling.

On Twitter, someone raised an interesting question.  

Is the guarantee just for 3.7 and later?  Or will the blessing also cover 3.6 
where it is already true.

The 3.6 guidance is to use OrderedDict() when ordering is required.  As of now, 
that guidance seems superfluous and may no longer be a sensible practice.  For 
example, it would be nice for Eric Smith when he does his 3.6 dataclasses 
backport to not have to put OrderedDict back in the code.  

Do you still have the keys to the time machine?


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New crash in test_embed on macOS 10.12

2017-12-15 Thread Raymond Hettinger


> On Dec 15, 2017, at 11:55 AM, Barry Warsaw  wrote:
> 
> I haven’t bisected this yet, but with git head, built and tested on macOS 
> 10.12.6 and Xcode 9.2, I’m seeing this crash in test_embed:
> 
> ==
> FAIL: test_bpo20891 (test.test_embed.EmbeddingTests)
> --
> Traceback (most recent call last):
>  File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 
> 207, in test_bpo20891
>out, err = self.run_embedded_interpreter("bpo20891")
>  File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 59, 
> in run_embedded_interpreter
>(p.returncode, err))
> AssertionError: -6 != 0 : bad returncode -6, stderr is 'Fatal Python error: 
> PyEval_SaveThread: NULL tstate\n\nCurrent thread 0x7fffcb58a3c0 (most 
> recent call first):\n'
> 
> Seems reproducible across different machines (all running 10.12.6 and Xcode 
> 9.2), even after a make clean and configure.  I don’t see the same failure on 
> Debian, and I don’t see the crashes on the buildbots.
> 
> Can anyone verify?

I saw this same test failure.  After a "make distclean", it went away.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   3   4   5   6   7   8   9   10   >