[Python-Dev] Re: Restrict the type of __slots__
> I propose to restrict the type of __slots__. -1 for adding a restriction. This breaks code for no good reason. This API has been around for a very long time. I've seen lists, tuples, dicts, single strings, and occasionally something more exotic. Why wreck stable code? Also, the inspect module will detect whether __slots__ is a dictionary and will use it to display docstrings. In the database world, data dictionaries have proven value, so it would be a bummer to kill off this functionality which is used in much the same way as docstrings for properties. It is still rarely used, but I'm hoping it will catch on (just like people are slowly growing more aware that they can add docstringa to fields in named tuples). Raymond On Fri, Mar 18, 2022 at 4:33 AM Serhiy Storchaka wrote: > Currently __slots__ can be either string or an iterable of strings. > > 1. If it is a string, it is a name of a single slot. Third-party code > which iterates __slots__ will be confused. > > 2. If it is an iterable, it should emit names of slots. Note that > non-reiterable iterators are accepted too, but it causes weird bugs if > __slots__ is iterated more than once. For example it breaks default > pickling and copying. > > I propose to restrict the type of __slots__. Require it always been a > tuple of strings. Most __slots__ in real code are tuples. It is rarely > we need only single slot and set __slots__ as a string. > > It will break some code (there are 2 occurrences in the stdlib an 1 in > scripts), but that code can be easily fixed. > > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/E32BRLAWOU5GESMZ5MLAOIYPXSL37HOI/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E5ROGDNKI5FFPTXBQGHUQSVVHCAB7VUT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Optimizing literal comparisons and contains
For the benefit of the audience on python-dev, you should also mention that this proposal and associated PR has been twice discussed and rejected on the tracker: https://bugs.python.org/issue45907 https://bugs.python.org/issue45843 The response just given by Skip pretty much matches the comments already given by Batuhan, Pablo, and Serhiy. So far, no one who has looked at this thinks this should be done. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QQ7TBTIQSCXZ66E3WLLT77EUKIQOO3FN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 467 feedback from the Steering Council
> I would rather keep `bchr` and lose the `.fromint()` methods. For me, "bchr" isn't a readable name. If I expand mentally expand it to "byte_character", it becomes an oxymoron that opposes what we try teach about bytes and characters being different things. Can you show examples in existing code of how this would be used? I'm unclear on how frequently users need to create a single byte from an integer. For me, it is very rare. Perhaps once in a large program will I search for a record separator in binary data. I would prefer to write it as: RS = byte.fromint(30) ... i = data.index(RS, start) ... if RS in data: Having this as bchr() wouldn't make the code better because it is less explicit about turning an integer into a byte. Also, it doesn't look nice when in-lined without giving it a variable name: i = data.index(bchr(30), start) # Yuck ... if bchr(30) in data:# Yuck Also keep in mind that we already have a way to spell it, "bytes([30])", so any new way needs to significantly add more clarity. I think bytes.fromint() does that. The number of use cases also matters. The bar for adding a new builtin function is very high. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DOUFRRLGMAFYJZ4ONYK6CKHHCYKPXJBW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 467 feedback from the Steering Council
I recommend removing the "discouragement" from writing "bytes(10)". That is merely stylistic. As long as we support the API, it is valid Python. In the contexts where it is currently used, it tends to be clear about what it is doing: buffer = bytearray(bufsize). That doesn't need to be discouraged. Also, I concur the with SC comment that the singular of bytearray() or bytes() is byte(), not bchr(). Practically what people want here is an efficient literal that is easier to write than: b'\x1F'. I don't think bchr() meets that need. Neither bchr(0x1f) or bytearray.fromint(0x1f) are fast (not a literal) nor are they easier to read or type. The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases. This is disappointing because often the only reasonable way to manipulate binary data is with bytearrays. A user could switch to array.array() or a numpy.array, but that is unnecessarily inconvenient given that we already have a nice builtin type that means the need (for images, crypto hashes, compression, bloom filters, or anything where a C programmer would an array of unsigned chars). Given that bytes/bytearray is already an uncomfortable hybrid of string and list APIs for binary data, I don't think the competing views and APIs will be disentangled by adding methods that duplicate functionality that already exists. Instead, I recommend that the PEP focus on one or two cases where methods could be added that simplify any common tasks that are currently awkward. For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKILIXKK7F6BHDRTFRGUFXXUDNNZW3BL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Announcing the CPython Docs Workgroup
That seems exclusionary. Right now, anyone can contribute to documentation, anyone can comment on proposals, and any core dev can accept their patches. In the interest of transparency, can you explain why the other initial members did not need to go through an application process? ISTM the initial group excludes our most active documentation contributors and includes people who have only minimal contributions to existing documentation and mostly have not participated in any documentation reviews on the issue tracker. Did the SC approve all the initial members? Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MTZNPTQBUURQABJWNLZA26CKA357SWFP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Announcing the CPython Docs Workgroup
Please add me to the list of members for the initial workgroup. Thank you, Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DQXRKNTL2XJFRQXS4W777XB6XUOVSINL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Please do not remove random bits of information from the tutorial
> On Nov 7, 2020, at 9:51 AM, Riccardo Polignieri via Python-Dev > wrote: > > My concern here is that if you start removing or simplifying some > "too-difficult-for-a-tutorial" bits of information on an occasional basis, > and without too much scrutiny or editorial guidance, you will end up loosing > something precious. I concur with you sentiments and do not want the tutorial to be dumbed down. Here are a few thoughts on the subject: * The word "tutorial" does not imply "easy". Instead it is a self-paced, example driven walk-through of the language. That said, if the word "tutorial" doesn't sit well, then just rename the guide. * The world is full of well-written guides for beginners. The variety is especially important because "beginner" means many different things: "never programmed before", "casually checking out what the language offers", "expert in some other language", "is a student in elementary school", "is a student in high school", "is an electrical engineer needing write scripts", etc. * One thing that makes the current tutorial special is that much of it was written by Guido. Delete this text and you lose one of the few places where his voice comes through. * There is value in having non-trivial coverage of the language. When people ask how __cause__ works, we can link to the tutorial. Otherwise, we have to throw them to the wolves by linking to the unfriendly, highly technical reference guide or to a PEP. * For many people, our tutorial serves as the only systematic walk-through of the language. If you decide to drop the mention of complex numbers, the odds of a person ever finding about that capability drop to almost zero. * My suggestion is that we add a section to the beginning of the tutorial with external links elsewhere, "If you are ten years old, go here. If have never programmed before, go here, etc" * If you think the word tutorial implies fluffy and easy, then let's just rename it to "Language walk-through with examples" or some such. * FWIW, I've closely monitored the bug tracker daily for almost two decades. We almost never get a user complaint that the tutorial is too advanced. For the most part, it has long been of good service to users. Almost certainly it can be improved, but hopefully not be dropping content. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CYFDV4ZYGUFGCYUI5HPTF66UNZ4FXO2M/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Drop Solaris, OpenSolaris, Illumos and OpenIndiana support in Python
> On Oct 30, 2020, at 4:51 PM, Gregory P. Smith wrote: > > On Fri, Oct 30, 2020 at 1:14 PM Raymond Hettinger > wrote: > FWIW, when the tracker issue landed with a PR, I became concerned that it > would be applied without further discussion and without consulting users. > > An issue and a PR doesn't simply mean "it is happening". There have been a number of issues/pr pairs this year that have followed exactly that path. While we'll never know for sure, it is my belief that this would have been applied had I not drawn attention to it. Very few people follow the bug tracker everyday — the sparse Solaris community almost certainly would not have been aware of the tracker entry. Likewise, I don't think there would have been a python-dev thread; otherwise, it would have happened *prior* to the PR, the tracker issue, and all of the comments from the people affected. The call for helpers was made only *after* the user pleas not to pull the trigger. It's all fine now. The decision is being broadly discussed. That is what is important. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UT24JUKSRYPXTV5BR2NDLME5Q6YCSAI5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Drop Solaris, OpenSolaris, Illumos and OpenIndiana support in Python
Here are a couple comments on the Twitter thread that warrant your attention. Apparently, this is being used by the European Space Agency on their space craft. -- https://twitter.com/nikolaivk/status/1322094167980466178 "To be clear I will put some money where my mouth is. If we need to invest resources either in the form of developers or dollars to keep the port alive we will. By we I mean RackTop and/or Staysail Systems." -- https://twitter.com/gedamore/status/1321959956199866369 Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/F5CBTK4KRG4IY3OYD25VEUEEJNYDPZZU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Drop Solaris, OpenSolaris, Illumos and OpenIndiana support in Python
I vote against removal. We have no compelling need to disrupt an entire community and ecosystem even though it it is small. To anyone chiming in to say, yes drop the support, ask whether you've consulted any of the users — they should have a say in the matter. It is better for them to be a bit neglected than to be cut it off entirely. FWIW, when the tracker issue landed with a PR, I became concerned that it would be applied without further discussion and without consulting users. So I asked on Twitter whether Solaris was being used. If you're interested in the responses, see the thread at: https://twitter.com/i/status/1321917936668340227 (Victor can't see it because he blocked my account a long time ago). Also take a look at the user comments on the tracker: https://bugs.python.org/issue42173 . For those who don't follow links, here's a sample: * "Platform genocide is both unnecessary and unwarranted." -- brett3 * "Please do not drop support." -- jm650 * "I just want to lend my voice in favor of maintaining "Solarish" support as well, and offer what help I may for resolving issues."-- robertfrench * "No no no, please don't." -- tbalbers * "Please do not drop support for SunOS." -- mariuspana * "Please continue support for Solaris/IllumOS! This is very important for us." -- marcheschi * "Please don't drop Solaris support, we still use it to this day." -- abarbu * ... and many more will the same flavor Given this kind of user response, I think it would irresponsible to drop support. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OUANIRPREED7ULVVVZX6CSEIEIJRPFDU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Changing Python's string search algorithms
> On Oct 17, 2020, at 2:40 PM, Tim Peters wrote: > > Still waiting for someone who thinks string search speed is critical > in their real app to give it a try. In the absence of that, I endorse > merging this. Be bold. Merge it. :-) Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZHFE4H5RCCRW6XTYLNXOBP2FURZ2VIKW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 620: Hide implementation details from the C API
> On Jun 29, 2020, at 5:46 PM, Victor Stinner wrote: > > You missed the point of the PEP: "It becomes possible to experiment > with more advanced optimizations in CPython than just > micro-optimizations, like tagged pointers." > > IMHO it's time to stop wasting our limited developer resources on > micro-optimizations and micro-benchmarks, but think about overall > Python performance and major Python internals redesign to find a way > to make Python overall 2x faster, rather than making a specific > function 10% faster. That is a really bold claim. AFAICT there is zero evidence that this actually possible. Like the sandboxing project, these experiments may all prove to be dead-ends. If we're going to bet the farm on this, there should at least be a proof-of-concept. Otherwise, it's just an expensive lottery ticket. > I don't think that the performance of accessing namedtuple attributes > is a known bottleneck of Python performance. This time you missed the point. Named tuple access was just one point of impact — it is not the only code that calls PyTuple_Check(). It looks like inlining did not work and that EVERY SINGLE type check in CPython was affected (including third party extensions). Also, there was no review — we have a single developer pushing through hundreds of these changes at a rate where no one else can keep up. > Measuring benchmarks which take less than 1 second requires being very > careful. Perhaps you don't want to believe the results, but the timings are careful, stable, repeatable, and backed-up by a disassembly that shows the exact cause. The builds used for the timings were the production macOS builds as distributed on python.org. There is a certain irony in making repeated, unsubstantiated promises to make the language 2x faster and then checking in changes that make the implementation slower. Raymond P.S. What PyPy achieved was monumental. But it took a decade even with a well-organized and partially-funded team of superstars. It always lagged CPython in features. And the results were entirely dependent on a single design decision to run a pure python interpreter written in rpython to take advantage of its tracing JIT. I don't imagine CPython can hope to achieve anything like this. Likely, the best we can do is replace reference counting with garbage collection. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/W3GTCPQCTXH3DMFFNS7KDV5GAHT6XZLJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 620: Hide implementation details from the C API
> On Jun 22, 2020, at 5:10 AM, Victor Stinner wrote: > > Introduce C API incompatible changes to hide implementation details. How much of the existing C extension ecosystem do you expect to break as a result of these incompatible changes? > It will be way easier to add new features. This isn't self-evident. What is currently difficult that would be easier? > It becomes possible to experiment with more advanced optimizations in CPython > than just micro-optimizations, like tagged pointers. Is there any proof-of-concept to suggest that it is in realm of possibility that such an experiment would produce a favorable outcome? Otherwise, it isn't a reasonable justification for an extensive and irrevocable series a sweeping changes that affect the entire ecosystem of existing extensions. > **STATUS**: Completed (in Python 3.9) I'm not sure that many people are monitoring that huge number of changes that have gone in mostly unreviewed. Mark Shannon and Stephan Krah have both raised concerns. It seems like one person has been given blanket authorization to revise nearly every aspect of the internals and to undo the design choices made by all the developers who've previously worked on the project. > Converting macros to static inline functions should only impact very few > C extensions which use macros in unusual ways. These should be individually verified to make sure they actually get inlined by the compiler. In https://bugs.python.org/issue39542 about nine PRs were applied without review or discussion. One of those, https://github.com/python/cpython/pull/18364 , converted PyType_Check() to static inline function but I'm not sure that it actually does get inlined. That may be the reason named tuple attribute access slowed by about 25% between Python 3.8 and Python 3.9.¹ Presumably, that PR also affected every single type check in the entire C codebase and will affect third-party extensions as well. FWIW, I do appreciate the devotion and amount of effort in this undertaking — that isn't a question. However, as a community this needs to be conscious decision. I'm unclear about whether any benefits will ever materialize. I am clear that packages will be broken, that performance will be impacted, and that this is one-way trip that can never be undone. Most of the work is being done by one person. Many of the PRs aren't reviewed. The rate and volume of PRs are so high that almost no one can keep track of what is happening. Mark and Stefan have pushed back but with no effect. Raymond == ¹ Timings for attribute access $ python3.8 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 200 loops, best of 5: 119 nsec per loop $ python3.9 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 200 loops, best of 5: 152 nsec per loop == Python 3.8 disassembly (clean and fast) --- _tuplegetter_descr_get: testq %rsi, %rsi je L299 subq$8, %rsp movq8(%rsi), %rax movq16(%rdi), %rdx testb $4, 171(%rax) je L300 cmpq16(%rsi), %rdx jnb L301 movq24(%rsi,%rdx,8), %rax addq$1, (%rax) L290: addq$8, %rsp ret Python 3.9 disassembly (doesn't look in-lined) --- _tuplegetter_descr_get: testq %rsi, %rsi pushq %r12 <-- new cost pushq %rbp <-- new cost pushq %rbx <-- new cost movq%rdi, %rbx je L382 movq16(%rdi), %r12 movq%rsi, %rbp movq8(%rsi), %rdi call_PyType_GetFlags <-- new non-inlined function call testl $67108864, %eax je L383 cmpq16(%rbp), %r12 jnb L384 movq24(%rbp,%r12,8), %rax addq$1, (%rax) popq%rbx <-- new cost popq%rbp <-- new cost popq%r12v<-- new cost ret ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q3YHYIKNUQH34FDEJRSLUP2MTYELFWY3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: The Anti-PEP
> it is hard to make a decision between the pros and cons, > when the pros are in a single formal document and the > cons are scattered across the internet. Mark, I support your idea. It is natural for PEP authors to not fully articulate the voices of opposition or counter-proposals. The current process doesn't make it likely that a balanced document is created for decision making purposes. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UZBFF5CWWIDRRLYDXIDIPBHJSIQ4RXUE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Latest PEP 554 updates.
> On May 4, 2020, at 10:30 AM, Eric Snow wrote: > > Further feedback is welcome, though I feel like the PR is ready (or > very close to ready) for pronouncement. Thanks again to all. Congratulations. Regardless of the outcome, you've certainly earned top marks for vision, tenacity, team play, and overcoming adversity. May your sub-interpreters be plentiful, Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TD26ZW2EKO2Q4OFHRHEEF2MQPLXAGHL6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Adding a "call_once" decorator to functools
> On Apr 30, 2020, at 10:44 AM, Carl Meyer wrote: > > On Wed, Apr 29, 2020 at 9:36 PM Raymond Hettinger > wrote: >> Do you have some concrete examples we could look at? I'm having trouble >> visualizing any real use cases and none have been presented so far. > > This pattern occurs not infrequently in our Django server codebase at > Instagram. A typical case would be that we need a client object to > make queries to some external service, queries using the client can be > made from various locations in the codebase (and new ones could be > added any time), but there is noticeable overhead to the creation of > the client (e.g. perhaps it does network work at creation to figure > out which remote host can service the needed functionality) and so > having multiple client objects for the same remote service existing in > the same process is waste. > > Or another similar case might be creation of a "client" object for > querying a large on-disk data set. Thanks for the concrete example. AFAICT, it doesn't require (and probably shouldn't have) a lock to be held for the duration of the call. Would it be fair to say the 100% of your needs would be met if we just added this to the functools module? call_once = lru_cache(maxsize=None) That's discoverable, already works, has no risk of deadlock, would work with multiple argument functions, has instrumentation, and has the ability to clear or reset. I'm still looking for an example that actually requires a lock to be held for a long duration. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Y3I646QBI7ICASP62ATFBUPROZ2J4TKE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Adding a "call_once" decorator to functools
Would either of the existing solutions work for you? class X: def __init__(self, name): self.name = name @cached_property def title(self): print("compute title once") return self.name.title() @property @lru_cache def upper(self): print("compute uppper once") return self.name.upper() obj = X("victor") print(obj.title) print(obj.title) print(obj.upper) print(obj.upper) ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4LW5FFI74J6A4FHLUTKWHH3WLWBMXASM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Adding a "call_once" decorator to functools
> On Apr 30, 2020, at 6:32 AM, Joao S. O. Bueno wrote: > > Of course this is meant to be something simple - so there are no "real > world use cases" that are "wow, it could not have > been done without it". The proposed implementation does something risky, it hold holds a non-reentrant lock across a call to an arbitrary user-defined function. The only reason to do so is to absolutely guarantee the function will never be called twice. We really should look for some concrete examples that require that guarantee, and it would be nice to see how that guarantee is being implemented currently (it isn't obvious to me). Also, most initialization functions I've encountered take at least one argument, so the proposed call_once() implementation wouldn't be usable at all. > I was one of the first to reply to this on > "python-ideas", as I often need the pattern, but seldon > worrying about rentrancy, or parallel calling. Most of the uses are > just that: initalize a resource lazily, and just > "lru_cache" could work. My first thought was for something more > light-weight than lru_cache (and a friendlier > name). Right. Those cases could be solved trivially if we added: call_once = lru_cache(maxsize=None) which is lightweight, very fast, and has a clear name. Further, it would work with multiple arguments and would not fail if the underlying function turned out to be reentrant. AFAICT, the *only* reason to not use the lru_cache() implementation is that in multithreaded code, it can't guarantee that the underlying function doesn't get called a second time while still executing the first time. If those are things you don't care about, then you don't need the proposed implementation; we can give you what you want by adding a single line to functools. > So, one of the points I'd likely have used this is here: > > https://github.com/jsbueno/terminedia/blob/d97976fb11ac54b527db4183497730883ba71515/terminedia/unicode.py#L30 Thanks — this is a nice example. Here's what it tells us: 1) There exists at least one use case for a zero argument initialization function 2) Your current solution is trivially easy, clear, and fast. "if CHAR_BASE: return". 3) This function returns None, so efforts by call_once() to block and await a result are wasted. 4) It would be inconsequential if this function were called twice. 5) A more common way to do this is to move the test into the lookup() function -- see below. Raymond - CHAR_BASE = {} def _init_chars(): for code in range(0, 0x10): char = chr(code) values = {} attrs = "name category east_asian_width" for attr in attrs.split(): try: values[attr] = getattr(unicodedata, attr)(char) except ValueError: values[attr] = "undefined" CHAR_BASE[code] = Character(char, code, values["name"], values["category"], values["east_asian_width"]) def lookup(name_part, chars_only=False): if not CHAR_BASE: _init_chars() results = [char for char in CHAR_BASE.values() if re.search(name_part, char.name, re.IGNORECASE)] if not chars_only: return results return [char.char for char in results] ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JZQLF5LXV47SJP6ZSTG27246S6OIYTPM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Adding a "call_once" decorator to functools
> On Apr 29, 2020, at 4:20 PM, Antoine Pitrou wrote: > > On Wed, 29 Apr 2020 12:01:24 -0700 > Raymond Hettinger wrote: >> >> The call_once() decorator would need different logic: >> >> 1) if the function has already been called and result is known, return the >> prior result :-) >> 2) if function has already been called, but the result is not yet known, >> either block or fail :-( > > It definitely needs to block. Do you think it is safe to hold a non-reentrant lock across an arbitrary user function? Traditionally, the best practice for locks was to acquire, briefly access a shared resource, and release promptly. >> 3) call the function, this cannot be reentrant :-( > > Right. The typical use for such a function is lazy initialization of > some resource, not recursive computation. Do you have some concrete examples we could look at? I'm having trouble visualizing any real use cases and none have been presented so far. Presumably, the initialization function would have to take zero arguments, have a useful return value, must be called only once, not be idempotent, wouldn't fail if called in two different processes, can be called from multiple places, and can guarantee that a decref, gc, __del__, or weakref callback would never trigger a reentrant call. Also, if you know of a real world use case, what solution is currently being used. I'm not sure what alternative call_once() is competing against. >> >> 6) does not have instrumentation for number of hits >> 7) does not have a clearing or reset mechanism > > Clearly, instrumentation and a clearing mechanism are not necessary. > They might be "nice to have", but needn't hinder initial adoption of > the API. Agreed. It is inevitable that those will be requested, but they are incidental to the core functionality. Do you have any thoughts on what the semantics should be if the inner function raises an exception? Would a retry be allowed? Or does call_once() literally mean "can never be called again"? Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Y2MUKYDCV53PBWRRBU4ZAKB5XED4X4HX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Adding a "call_once" decorator to functools
> On Apr 29, 2020, at 12:55 AM, Tom Forbes wrote: > > Hey Raymond, > Thanks for your input here! A new method wouldn’t be worth adding purely for > performance reasons then, but there is still an issue around semantics and > locking. Right. > it doesn’t actually ensure the function is called once. Let's be precise about this. The lru_cache() logic is: 1) if the function has already been called and result is known, return the prior result :-) 2) call the underlying function 3) add the question/answer pair to the cache dict. You are correct that a lru_cache() wrapped function can be called more than once if before step three happens, the wrapped function is called again, either by another thread or by a reentrant call. This is by design and means that lru_cache() can be wrapped around almost anything, reentrant or not. Also calls to lru_cache() don't block across the function call, nor do they fail because another call is in progress. This makes lru_cache() easy to use and reliable, but it does allow the possibility that the function is called more than once. The call_once() decorator would need different logic: 1) if the function has already been called and result is known, return the prior result :-) 2) if function has already been called, but the result is not yet known, either block or fail :-( 3) call the function, this cannot be reentrant :-( 4) record the result for future calls. The good news is that call_once() can guarantee the function will not be called more than once. The bad news is that task switches during step three will either get blocked for the duration of the function call or they will need to raise an exception.Likewise, it would be a mistake use call_once() when reentrancy is possible. > The reason I bring this up is that I’ve seen several ad-hoc `call_once` > implementations recently, and creating one is surprisingly complex for > someone who’s not that experienced with Python. Would it fair to describe call_once() like this? call_once() is just like lru_cache() but: 1) guarantees that a function never gets called more than once 2) will block or fail if a thread-switch happens during a call 3) only works for functions that take zero arguments 4) only works for functions that can never be reentrant 5) cannot make the one call guarantee across multiple processes 6) does not have instrumentation for number of hits 7) does not have a clearing or reset mechanism Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CTAGWXD7WRU3NAHLP5IZ75PM2E3TQTG2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Adding a "call_once" decorator to functools
> t...@tomforb.es wrote: > > I would like to suggest adding a simple “once” method to functools. As the > name suggests, this would be a decorator that would call the decorated > function, cache the result and return it with subsequent calls. It seems like you would get just about everything you want with one line: call_once = lru_cache(maxsize=None) which would be used like this: @call_once def welcome(): len('hello') > Using lru_cache like this works but it’s not as efficient as it could be - in > every case you’re adding lru_cache overhead despite not requiring it. You're likely imagining more overhead than there actually is. Used as shown above, the lru_cache() is astonishingly small and efficient. Access time is slightly cheaper than writing d[()] where d={(): some_constant}. The infinite_lru_cache_wrapper() just makes a single dict lookup and returns the value.¹ The lru_cache_make_key() function just increments the empty args tuple and returns it.² And because it is a C object, calling it will be faster than for a Python function that just returns a constant, "lambda: some_constant()". This is very, very fast. Raymond ¹ https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L870 ² https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L809 > > Hello, > After a great discussion in python-ideas[1][2] it was suggested that I > cross-post this proposal to python-dev to gather more comments from those who > don't follow python-ideas. > > The proposal is to add a "call_once" decorator to the functools module that, > as the name suggests, calls a wrapped function once, caching the result and > returning it with subsequent invocations. The rationale behind this proposal > is that: > 1. Developers are using "lru_cache" to achieve this right now, which is less > efficient than it could be > 2. Special casing "lru_cache" to account for zero arity methods isn't trivial > and we shouldn't endorse lru_cache as a way of achieving "call_once" > semantics > 3. Implementing a thread-safe (or even non-thread safe) "call_once" method is > non-trivial > 4. It complements the lru_cache and cached_property methods currently present > in functools. > > The specifics of the method would be: > 1. The wrapped method is guaranteed to only be called once when called for > the first time by concurrent threads > 2. Only functions with no arguments can be wrapped, otherwise an exception is > thrown > 3. There is a C implementation to keep speed parity with lru_cache > > I've included a naive implementation below (that doesn't meet any of the > specifics listed above) to illustrate the general idea of the proposal: > > ``` > def call_once(func): >sentinel = object() # in case the wrapped method returns None >obj = sentinel >@functools.wraps(func) >def inner(): >nonlocal obj, sentinel >if obj is sentinel: >obj = func() >return obj >return inner > ``` > > I'd welcome any feedback on this proposal, and if the response is favourable > I'd love to attempt to implement it. > > 1. > https://mail.python.org/archives/list/python-id...@python.org/thread/5OR3LJO7LOL6SC4OOGKFIVNNH4KADBPG/#5OR3LJO7LOL6SC4OOGKFIVNNH4KADBPG > 2. > https://discuss.python.org/t/reduce-the-overhead-of-functools-lru-cache-for-functions-with-no-parameters/3956 > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/5CFUCM4W3Z36U3GZ6Q3XBLDEVZLNFS63/ > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OYBYJ2373OTHALHTPQJV5EBX6N5M4DDL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Accepting PEP 617: New PEG parser for CPython
This will be a nice improvement. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/C3MUSEKXCDL4HSIEIJNBHWQG5B7WCQLD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 616 "String methods to remove prefixes and suffixes" accepted
Please consider adding underscores to the names: remove_prefix() and remove_suffix(). The latter method causes a mental hiccup when first read as removes-uffix, forcing mental backtracking to get to remove-suffix. We had a similar problem with addinfourl initially being read as add-in-four-l before mentally backtracking to add-info-url. The PEP says this alternative was considered, but I disagree with the rationale given in the PEP. The reason that "startswith" and "endswith" don't have underscores is that they aren't needed to disambiguate the text. Our rules are to add underscores and to spell-out words when it improves readability, which in this case it does. Like casing conventions, our rules and preferences for naming evolved after the early modules were created -- the older the module, the more likely that it doesn't follow modern conventions. We only have one chance to get this right (bugs can be fixed, but API choices persist for very long time). Take it from someone with experience with this particular problem. I created imap() but later regretted the naming pattern when if came to ifilter() and islice() which sometimes cause mental hiccups initially being read as if-ilter and is-lice. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZMXSQ5T6L6CR5GUIBFEYLJJF7FE4B4US/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Improvement to SimpleNamespace
[Serhiy] > As a workaround you can use > > object_hook=lambda x: SimpleNamespace(**x) That doesn't suffice because some valid JSON keys are not valid identifiers. You still need a way to get past those when they arise: catalog.books.fiction['Paradise Lost'].isbn Also, it still leaves you with using setattr(ns, attrname, attrvalue) or tricks with vars() when doing updates. The AttrDict recipe is popular for a reason. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MNVWBEJI465QUODJEYPMAXPXOX3UDJ6Q/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Improvement to SimpleNamespace
[GvR] > We should not try to import JavaScript's object model into Python. Yes, I get that. Just want to point-out that working with heavily nested dictionaries (typical for JSON) is no fun with square brackets and quotation marks. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/G5SJKRQ7S5VY3JKLAVOTCCA7RSDUNWXS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Improvement to SimpleNamespace
SimpleNamespace() is really good at giving attribute style-access. I would like to make that functionality available to the JSON module (or just about anything else that accepts a custom dict) by adding the magic methods for mappings so that this works: catalog = json.load(f, object_hook=SimpleNamespace) print(catalog['clothing']['mens']['shoes']['extra_wide']['quantity']) # currently possible with dict() print(catalog.clothing.mens.shoes.extra_wide.quantity]) # proposed with SimpleNamespace() print(catalog.clothing.boys['3t'].tops.quantity # would also be supported I've already seen something like this in production; however, people are having to write custom subclasses to do it. This is kind of bummer because the custom subclasses are a pain to write, are non-standard, and are generally somewhat slow. I would like to see a high-quality version this made more broadly available. The core idea is keep the simple attribute access but make it easier to load data programmatically: >>> ns = SimpleNamespace(roses='red', violets='blue') >>> thing = input() sugar >>> quality = input() sweet >>> setattr(ns, thing, quality)# current >>> ns['sugar'] = 'sweet' # proposed If the PEP 584 __ior__ method were supported, updating a SimpleNamespace would be much cleaner: ns |= some_dict I posted an issue on the tracker: https://bugs.python.org/issue40284 . There was a suggestion to create a different type for this, but I don't see the point in substantially duplicating everything SimpleNamespace already does just so we can add some supporting dunder methods. Please add more commentary so we can figure-out the best way to offer this powerful functionality. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JOMND56PJGRN7FQQLLCWONE5Z7R2EKXW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?
I forget to mention that list.index() also uses PyObject_RichCompareBool(). Given a non-empty list *s*: s[0] = x assert s.index(x) == 0 # We want this to always work or: s = [x] assert s.index(x) == 0# Should not raise a ValueError If those two assertions aren't reliable, then it's hard to correctly reason about algorithms that use index() to find previously stored objects. This, of course, is the primary use case for index(). Likewise, list.remove() also uses PyObject_RichCompareBool(): s = [] ... s.append(x) s.remove(x) In a code review, would you suspect that the above code could fail? If so, how would you mitigate the risk to prevent failure? Off-hand, the simplest remediation I can think of is: s = [] ... s.append(x) if x == x:# New, perplexing code s.remove(x) # Now, this is guaranteed not to fail else: logging.debug(f"Removing the first occurrence of {x!r} the hard way") for i, y in enumerate(s): if x is y: del s[i] break In summary, I think it is important to guarantee the identity-implies-equality step currently in PyObject_RichCompareBool(). It isn't just an optimization, it is necessary for writing correct application code without tricks such at the "if x == x: ..." test. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NDBUPT6OWNLPLTD5MI3A3VYNNKLMA3ME/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Request to postpone some Python 3.9 incompatible changes to Python 3.10
> We propose to revert 5 changes: > > • Removed tostring/fromstring methods in array.array and base64 modules > • Removed collections aliases to ABC classes > • Removed fractions.gcd() function (which is similar to math.gcd()) > • Remove "U" mode of open(): having to use io.open() just for Python 2 > makes the code uglier > • Removed old plistlib API: 2.7 doesn't have the new API +1 from me. We don't gain anything by removing these in 3.9 instead of 3.10, so it is perfectly reasonable to ease the burden on users by deferring them for another release. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/52V6RP2WBC43OWTLBICS77MD3IGSV5CI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?
> PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are > the same object, then equality comparison returns True and inequality False. > No attempt is made to execute __eq__ or __ne__ methods in those cases. > > This has visible consequences all over the place, but they don't appear to be > documented. For example, > > ... > despite that math.nan == math.nan is False. > > It's usually clear which methods will be called, and when, but not really > here. Any _context_ that calls PyObject_RichCompareBool() under the covers, > for an equality or inequality test, may or may not invoke __eq__ or __ne__, > depending on whether the comparands are the same object. Also any context > that inlines these special cases to avoid the overhead of calling > PyObject_RichCompareBool() at all. > > If it's intended that Python-the-language requires this, that needs to be > documented. This has been slowly, but perhaps incompletely documented over the years and has become baked in the some of the collections ABCs as well. For example, Sequence.__contains__() is defined as: def __contains__(self, value): for v in self: if v is value or v == value: # note the identity test return True return False Various collections need to assume reflexivity, not just for speed, but so that we can reason about them and so that they can maintain internal consistency. For example, MutableSet defines pop() as: def pop(self): """Return the popped value. Raise KeyError if empty.""" it = iter(self) try: value = next(it) except StopIteration: raise KeyError from None self.discard(value) return value That pop() logic implicitly assumes an invariant between membership and iteration: assert(x in collection for x in collection) We really don't want to pop() a value *x* and then find that *x* is still in the container. This would happen if iter() found the *x*, but discard() couldn't find the object because the object can't or won't recognize itself: s = {float('NaN')} s.pop() assert not s # Do we want the language to guarantee that s is now empty? I think we must. The code for clear() depends on pop() working: def clear(self): """This is slow (creates N new iterators!) but effective.""" try: while True: self.pop() except KeyError: pass It would unfortunate if clear() could not guarantee a post-condition that the container is empty: s = {float('NaN')} s.clear() assert not s # Can this be allowed to fail? The case of count() is less clear-cut, but even there identity-implies-equality improves our ability to reason about code: Given some list, *s*, possibly already populated, would you want the following code to always work: c = s.count(x) s.append(x) assert s.count(x) == c + 1 # To me, this is fundamental to what the word "count" means. I can't find it now, but remember a possibly related discussion where we collectively rejected a proposal for an __is__() method. IIRC, the reasoning was that our ability to think about code correctly depended on this being true: a = b assert a is b Back to the discussion at hand, I had thought our position was roughly: * __eq__ can return anything it wants. * Containers are allowed but not required to assume that identity-implies-equality. * Python's core containers make that assumption so that we can keep the containers internally consistent and so that we can reason about the results of operations. Also, I believe that even very early dict code (at least as far back as Py 1.5.2) had logic for "v is value or v == value". As far as NaNs go, the only question is how far to propagate their notion of irreflexivity. Should "x == x" return False for them? We've decided yes. When it comes to containers, who makes the rules, the containers or their elements. Mostly, we let the elements rule, but containers are allowed to make useful assumptions about the elements when necessary. This isn't much different than the rules for the "==" operator where __eq__() can return whatever it wants, but functions are still allowed to write "if x == y: ..." and assumes that meaningful boolean value has been returned (even if it wasn't). Likewise, the rule for "<" is that it can return whatever it wants, but sorted() and min() are allowed to assume a meaningful total ordering (which might or might not be true). In other words, containers and functions are allowed, when necessary or useful, to override the decisions made by their data. This seems like a reasonable state of affairs. The current docs make an effort to describe what we have now: https://docs.python.org/3/reference/expressions.html#value-comparisons Sorry for the lack of concision. I'm
[Python-Dev] Re: Should set objects maintain insertion order too?
> On Dec 15, 2019, at 6:48 PM, Larry Hastings wrote: > > As of 3.7, dict objects are guaranteed to maintain insertion order. But set > objects make no such guarantee, and AFAIK in practice they don't maintain > insertion order either. Should they? I don't think they should. Several thoughts: * The corresponding mathematical concept is unordered and it would be weird to impose such as order. * You can already get membership testing while retaining insertion ordering by running dict.fromkeys(seq). * Set operations have optimizations that preclude giving a guaranteed order (for example, set intersection loops over the smaller of the two input sets). * To implement ordering, set objects would have to give-up their current membership testing optimization that exploits cache locality in lookups (it looks at several consecutive hashes at a time before jumping to the next random position in the table). * The ordering we have for dicts uses a hash table that indexes into a sequence. That works reasonably well for typical dict operations but is unsuitable for set operations where some common use cases make interspersed additions and deletions (that is why the LRU cache still uses a cheaply updated doubly-linked list rather that deleting and reinserting dict entries). * This idea has been discussed a couple times before and we've decided not to go down this path. I should document prominently because it is inevitable that it will be suggested periodically because it is such an obvious thing to consider. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6CO2CZS4CPP6MSJKRZXXQYFLY5T3UVDU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: [python-committers] PEP 581/588 RFC: Collecting feedback about GitHub Issues
> On Aug 27, 2019, at 10:44 AM, Mariatta wrote: > > (cross posting to python-committers, python-dev, core-workflow) > > PEP 581: Using GitHub Issues has been accepted by the steering council, but > PEP 588: GitHub Issues Migration plan is still in progress. > > I'd like to hear from core developers as well as heavy b.p.o users, the > following: > > • what features do they find lacking from GitHub issues, or > • what are the things you can do in b.p.o but not in GitHub, or > • Other workflow that will be blocked if we were to switch to GitHub > today > By understanding your needs, we can be better prepared for the migration, and > we can start looking for solutions. One other bit of workflow that would be blocked if there was a switch to GitHub today: * On the tracker, we have long running conversations, sometimes spanning years. We need to be able to continue those conversations even though the original participants may not have Github accounts (also, if they do have a Github account, we'll need to be able to link to the corresponding BPO account). * I believe some of the accounts are anonymous or have pseudonyms. Am not sure how those can be migrated, we know very little about the participant except for their recurring posts. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MT2PJFS2CVDSWQKCCRUR3BCNXR4OSEKU/
[Python-Dev] Re: [python-committers] PEP 581/588 RFC: Collecting feedback about GitHub Issues
> On Aug 27, 2019, at 10:44 AM, Mariatta wrote: > > (cross posting to python-committers, python-dev, core-workflow) > > PEP 581: Using GitHub Issues has been accepted by the steering council, but > PEP 588: GitHub Issues Migration plan is still in progress. > > I'd like to hear from core developers as well as heavy b.p.o users, the > following: > > • what features do they find lacking from GitHub issues, or > • what are the things you can do in b.p.o but not in GitHub, or > • Other workflow that will be blocked if we were to switch to GitHub > today > By understanding your needs, we can be better prepared for the migration, and > we can start looking for solutions. Thanks for soliciting input and working on this. I'm a heavy BPO user (often visiting many times per day for almost two decades). Here are some things that were working well that I would miss: * We controlled the landing page, giving us - A professional, polished appearance - A prominent Python Logo - A search bar specific to the issue tracker - A link to Python Home and the Dev Guide - Hot links to Easy Issues, Issues created by You, Issues Assigned to You * The display format was terse so we could easily view the 50 most recent active issues (this is important because of the high volume of activity) See https://mail.python.org/pipermail/python-bugs-list/2019-July/date.html for an idea of the monthly volume. * The page used straight HTML anchor tags so my browser could mark which issue had been visited. This is important when handing a lot of issues which are constantly being reordered. * The input box allowed straight text input in a monospace font so it was easy to paste code snippets and traceback without incorporating markup. * Our page didn't have advertising on it. * Having a CSV download option was occasionally helpful. * BPO was well optimized for a high level of activity and high information density. * BPO existed for a very long time. It contains extensive internal links between issues. There are also a huge number of external deep links to specific messages and whatnot. Innumerable tweets, blog posts, code comments, design documents, and stack overflow questions all have deep links to the site. It would be a major bummer if these links were broken. It is my hope that they be preserved basically forever. Things that I look forward to with Github Issues: * Single sign-on * Better linkage between issues and PRs What I really don't want: * The typical Github project page prominently shows a list of files and directories before there is any description. If the CPython issues pages looks like this, it will be a big step backwards, making it look more like a weekend project than a mature professional project. It would be something I would not want to show to clients. It would not give us the desired level of control over the end-user experience. * If there are advertisements on the page that we don't control, that would be unprecedented and unwelcome. * On the one hand, we want issues to be easier to file. On the other hand, if the volume of low quality issues reports goes up, it will just add to the total labor and contribute to negativity (denying someone's request isn't fun for either the rejector or rejectee). * We need to retain control over our data so that we're free to make other migration decisions in the future. We can make a change now *because* we have the freedom. The migration needs to avoid vendor lock-in. I have high hopes for this being a successful migration but have to confess major disappointment that the steering committee approved this without talking with the heavy BPO users and without seeing what the new landing page would look like. In the end, the success of the migration depends on how the site works for the most active issue responders. If the workload goes up and becomes more awkward to do in volume, then heavy volunteer participation will necessarily decline. Perhaps a half-dozen individuals do more than half of the work on the tracker. I have high hopes for the success of the migration but success isn't a given. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BNHMLY4YEXIG4VANOXSOGNXO5Y7OT3BO/
[Python-Dev] Re: Announcing the new Python triage team on GitHub
Thanks for doing this. I hope it encourages more participation. The capabilities of a triager mostly look good except for "closing PRs and issues". This is a superpower that has traditionally been reserved for more senior developers because it grants the ability to shut-down the work of another aspiring contributor. Marking someone else's suggestion as rejected is the most perilous and least fun aspect of core development. Submitters tend to expect their idea won't be rejected without a good deal of thought and expert consideration. Our bar for becoming a triager is somewhat low, so I don't think it makes sense to give the authority to reject a PR or close an issue. ISTM the primary value of having triager is to tag issues appropriately, summon the appropriate experts, and make a first pass at review and/or improvements. FWIW, the definition of the word triage is "in medical use: the assignment of degrees of urgency to wounds or illnesses to decide the order of treatment of a large number of patients or casualties." That doesn't imply making a final disposition. Put another way, the only remaining distinction between a "triager" and a "core developer" is the ability to push the "commit" button. In a way, that is the least interesting part of the process and is often a foregone conclusion by the time it happens. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CIBKJSXQX5DZDKPA6TYTKNLHS4TA2LXM/
[Python-Dev] Re: What to do about invalid escape sequences
This isn't about me. As a heavy user of the 3.8 beta, I'm just the canary in the coal mine. After many encounters with these warnings, I'm starting to believe that Python's long-standing behavior was convenient for users. Effectively, "\-" wasn't an error, it was just a way of writing "\-". For the most part, that worked out fine. Sure, we all seen interactive prompt errors from having \t in a pathname but not in production (likely because a FileNotFoundError would surface immediately). ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4YNZYCOBWGMLC6BDXQFJJWLXEK47I5PU/
[Python-Dev] Re: What to do about invalid escape sequences
For me, these warnings are continuing to arise almost daily. See two recent examples below. In both cases, the code previously had always worked without complaint. - Example from yesterday's class ''' How old-style formatting works with positional placeholders print('The answer is %d today, but was %d yesterday' % (new, old)) \o \o ''' SyntaxWarning: invalid escape sequence \- - Example from today's class # Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A= Baytown, LA 30314=0D=0AUnited States of America EMAIL:forrestg...@example.com REV:20080424T195243Z END:VCARD ''' SyntaxWarning: invalid escape sequence \, ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OYGRL5AWSJZ34MDLGIFTWJXQPLNSK23S/
[Python-Dev] Re: What to do about invalid escape sequences
End-user experience isn't something that can just be argued away. Steve and I are reporting a recurring annoyance. The point of a beta release is to elicit these kinds of reports so they can be addressed before it is too late. ISTM you are choosing not to believe the early feedback and don't want to provide a mitigation. This decision reverses 25+ years of Python practice and is the software equivalent of telling users "you're holding it wrong". Instead of an awareness campaign to use the silent-by-default warnings, we're going directly towards breaking working code. That seems pretty user hostile to me. Chris's language survey one shows only language, Lua, that treated this an error. For compiled languages that emit warnings, the end-user will never see those warning so there is no end-user consequence. In our case though, end-users will see the messages and may not have an ability to do anything about it. I wish people with more product management experience would chime in; otherwise, 3.8 is going to ship with an intentional hard-to-ignore annoyance on the premise that we don't like the way people have been programming and that they need to change their code even if it was working just fine. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/D4ETYYRD4RB37BFZ35STKTDKVT7WH3E2/
[Python-Dev] Re: What to do about invalid escape sequences
> I broadly agree that the warning is very annoying, particularly > when it comes from third-party packages (I see it from some > of pip's vendored dependencies all the time), The same here as well. The other annoyance is that it pops up during live demos, student teaching sessions, and during ipython data analysis in a way that becomes a distractor and makes Python look and feel like it is broken. I haven't found a since case where it improved the user experience. > though I do also see many people bitten by > FileNotFoundError because of a '\n' in their filename. Yes, I've seen that as well. Unfortunately, the syntax warning or error doesn't detect that case. It only complains about invalid sequences which weren't the actual problem we were trying to solve. The new warning soon-to-be error breaks code that currently works but is otherwise innocuous. > Raymond - a question if I may. How often do you see these > occurring from docstrings, compared to regular strings? About half. Thanks for weighing in. I think this is an important usability discussion. IMO it is the number one issue affecting the end user experience with this release. If we could get more people to actively use the beta release, the issue would stand-out front and center. But if people don't use the beta in earnest, we won't have confirmation until it is too late. We really don't have to go this path. Arguably, the implicit conversion of '\latex' to '\\latex' is a feature that has existed for three decades, and now we're deciding to turn it off to define existing practices as errors. I don't think any commercial product manager would allow this to occur without a lot of end user testing. Raymond P.S. In the world of C compilers, I suspect that if the relatively new compiler warnings were treated as errors, the breakage would be widespread. Presumably that's why they haven't gone down this road. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DH63MEQWGGJRMCDRC57F33DR7HH7HDIT/
[Python-Dev] Re: What to do about invalid escape sequences
Thanks for looking at other languages do. It gives some hope that this won't end-up being a usability fiasco. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OM3W7ABSARYDMUIEIGYUYAUSRVHLZ6T5/
[Python-Dev] What to do about invalid escape sequences
We should revisit what we want to do (if anything) about invalid escape sequences. For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9. This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting. I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \---> special case'? IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this. If Python 3.8 goes out as-is, we may be punching our users in the nose and getting almost no gain from it. ISTM this is a job best left for linters. For a very long time, Python has been accepting the likes of 'more \latex markup' and has been silently converting it to 'more \\latex markup'. I now think it should remain that way. This issue in the 3.8 beta releases has been an almost daily annoyance for me and my customers. Depending on how you use Python, this may not affect you or it may arise multiple times per day. Raymond P.S. Before responding, it would be a useful exercise to think for a moment about whether you remember exactly which characters must be escaped or whether you habitually put in an extra backslash when you aren't sure. Then see: https://bugs.python.org/issue32912 ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZX2JLOZDOXWVBQLKE4UCVTU5JABPQSLB/
[Python-Dev] Re: The order of operands in the comparison
FWIW, the bisect_left and bisect_right functions have different argument order so that they can both use __lt__, making them consistent with sorting and with the heapq functions. Raymond ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/O2AETKAY5EVLIQNRRSFV53NQ6K3TF5EN/
[Python-Dev] Re: What is a public API?
> On Jul 13, 2019, at 1:56 PM, Serhiy Storchaka wrote: > > Could we strictly define what is considered a public module interface in > Python? The RealDefinition™ is that whatever we include in the docs is public, otherwise not. Beyond that, there is a question of how users can deduce what is public when they run "import somemodule; print(dir(some module))". In some modules, we've been careful to use both __all__ and to use an underscore prefix to indicate private variables and helper functions (collections and random for example). IMO, when a module has shown that care, future maintainers should stick with that practice. The calendar module is an example of where that care was taken for many years and then a recent patch went against that practice. This came to my attention when an end-user questioned which functions were for internal use only and posted their question on Twitter. On the tracker, I then made a simple request to restore the module's convention but you seem steadfastly resistant to the suggestion. When we do have evidence of user confusion (as in the case with the calendar module), we should just fix it. IMO, it would be an undue burden on the user to have to check every method in dir() against the contents of __all__ to determine what is public (see below). Also, as a maintainer of the module, I would not have found it obvious whether the functions were public or not. The non-public functions look just like the public ones. It's true that the practices across the standard library have historically been loose and varied (__all__ wasn't always used and wasn't always kept up-to-date, some modules took care with private underscore names and some didn't). To me this has mostly worked out fine and didn't require a strict rule for all modules everywhere. IMO, there is no need to sweep through the library and change long-standing policies on existing modules. Raymond -- >>> import calendar >>> dir(calendar) ['Calendar', 'EPOCH', 'FRIDAY', 'February', 'HTMLCalendar', 'IllegalMonthError', 'IllegalWeekdayError', 'January', 'LocaleHTMLCalendar', 'LocaleTextCalendar', 'MONDAY', 'SATURDAY', 'SUNDAY', 'THURSDAY', 'TUESDAY', 'TextCalendar', 'WEDNESDAY', '_EPOCH_ORD', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_colwidth', '_locale', '_localized_day', '_localized_month', '_spacing', 'c', 'calendar', 'datetime', 'day_abbr', 'day_name', 'different_locale', 'error', 'firstweekday', 'format', 'formatstring', 'isleap', 'leapdays', 'main', 'mdays', 'month', 'month_abbr', 'month_name', 'monthcalendar', 'monthlen', 'monthrange', 'nextmonth', 'prcal', 'prevmonth', 'prmonth', 'prweek', 'repeat', 'setfirstweekday', 'sys', 'timegm', 'week', 'weekday', 'weekheader'] ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ACNDSI6FN6DZKOASNZS4AEQJWWXL6F7Q/
Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?
> On Mar 20, 2019, at 6:07 PM, Victor Stinner wrote: > > what's the rationale of this backward incompatible change? Please refrain from abusive mischaracterizations. It is only backwards incompatible if there was a guaranteed behavior. Whether there was or not is what this thread is about. My reading of this thread was that the various experts did not want to lock in the 3.7 behavior nor did they think the purpose of the XML modules is to produce an exact binary output. The lxml maintainer is dropping sorting (its expensive and it overrides the order specified by the user). Other XML modules don't sort. It only made sense as a way to produce a deterministic output within a feature release back when there was no other way to do it. For my part, any agreed upon outcome in fine. I'm not willing be debased further, so I am out of this discussion. It's up to you all to do the right thing. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?
> On Mar 20, 2019, at 5:22 PM, Victor Stinner wrote: > > I don't understand why such simple solution has been rejected. It hasn't been rejected. That is above my pay grade. Stefan and I recommended against going down this path. However, since you're in disagreement and have marked this as a release blocker, it is now time for the steering committee to earn their pay (which is at least double what I'm making) or defer to the principal module maintainer, Stefan. To recap reasons for not going down this path: 1) The only known use case for a "sort=True" parameter is to perpetuate the practice of byte-by-byte output comparisons guaranteed to work across feature releases. The various XML experts in this thread have opined that isn't something we should guarantee (and sorting isn't the only aspect detail subject to change, Stefan listed others). 2) The intent of the XML modules is to implement the specification and be interoperable with other languages and other XML tools. It is not intended to be used to generate an exact binary output. Per section 3.1 of the XML spec, "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant." 3) Mitigating a test failure is a one-time problem. API expansions are forever. 4) The existing API is not small and presents a challenge for teaching. Making the API bigger will make it worse. 5) As far as I can tell, XML tools in other languages (such as Java) don't sort (and likely for good reason). LXML is dropping its attribute sorting as well, so the standard library would become more of an outlier. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?
> On Mar 19, 2019, at 4:53 AM, Ned Batchelder wrote: > > None of this is impossible, but please try not to preach to us maintainers > that we are doing it wrong, that it will be easy to fix, etc There's no preaching and no judgment. We can't have a conversation though if we can't state the crux of the problem: some existing tests in third-party modules depend on the XML serialization being byte-for-byte identical forever. The various respondents to this thread have indicated that the standard library should only make that guarantee within a single feature release and that it may to vary across feature releases. For docutils, it may end-up being an easy fix (either with a semantic comparison or with regenerating the target files when point releases differ). For Coverage, I don't make any presumption that reengineering the tests will be easy or fun. Several mitigation strategies have been proposed: * alter to element creation code to create the attributes in the desired order * use a canonicalization tool to create output that is guarantee not to change * generate new baseline files when a feature release changes * apply Stefan's recipe for reordering attributes * make a semantic level comparison Will any other these work for you? Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Best way to specify docstrings for member objects
> On Mar 20, 2019, at 3:59 PM, Ethan Furman wrote: > > Hmm. Said somewhat less snarkily, is there a more general solution to the > problem of absent docstrings or do we have to attack this problem > piece-by-piece? I think this is the last piece. The pydoc help() utility already knows how to find docstrings for other class level descriptors: property, class method, staticmethod. Enum() already has nice looking help() output because the class variables are assigned values that have a nice __repr__, making them self documenting. By design, dataclasses aren't special -- they just make regular classes, similar to or better than you would write by hand. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Best way to specify docstrings for member objects
> On Mar 20, 2019, at 3:47 PM, Ivan Pozdeev via Python-Dev > wrote: > >> NormalDist.mu.__doc__ = 'Arithmetic mean' >> NormalDist.sigma.__doc__ = 'Standard deviation' > > IMO this is another manifestation of the problem that things in the class > definition have no access to the class object. > Logically speaking, a definition item should be able to see everything that > is defined before it. The member objects get created downstream by the type() metaclass. So, there isn't a visibility issue because the objects don't exist yet. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Best way to specify docstrings for member objects
> On Mar 20, 2019, at 3:30 PM, Gregory P. Smith wrote: > > I like the idea of documenting attributes, but we shouldn't force the user to > use __slots__ as that has significant side effects and is rarely something > people should bother to use. Member objects are like property objects in that they exist at the class level and show up in the help whether you want them to or not. AFAICT, they are the only such objects to not have a way to attach docstrings. For instance level attributes created by __init__, the usual way to document them is in either the class docstring or the __init__ docstring. This is because they don't actually exist until __init__ is run. No one is forcing anyone to use slots. I'm just proposing that for classes that do use them that there is currently no way to annotate them like we do for property objects (which people aren't being forced to use either). The goal is to make help() better for whatever people are currently doing. That shouldn't be controversial. Someone not liking or recommending slots is quite different from not wanting them documented. In the examples I posted (taken from the standard library), the help() is clearly better with the annotations than without. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Best way to specify docstrings for member objects
> On Mar 19, 2019, at 1:52 PM, MRAB wrote: > > Thinking ahead, could there ever be anything else that you might want also to > attach to member objects? Our experience with property object suggests that once docstrings are supported, there don't seem to be any other needs. But then, you never can tell ;-) Raymond "Difficult to see. Always in motion is the future." -- Master Yoda ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Best way to specify docstrings for member objects
I'm working on ways to make improve help() by giving docstrings to member objects. One way to do it is to wait until after the class definition and then make individual, direct assignments to __doc__ attributes.This way widely the separates docstrings from their initial __slots__ definition. Working downstream from the class definition feels awkward and doesn't look pretty. There's another way I would like to propose¹. The __slots__ definition already works with any iterable including a dictionary (the dict values are ignored), so we could use the values for the docstrings. This keeps all the relevant information in one place (much like we already do with property() objects). This way already works, we just need a few lines in pydoc to check to see if a dict if present. This way also looks pretty and doesn't feel awkward. I've included worked out examples below. What do you all think about the proposal? Raymond ¹ https://bugs.python.org/issue36326 == Desired help() output == >>> help(NormalDist) Help on class NormalDist in module __main__: class NormalDist(builtins.object) | NormalDist(mu=0.0, sigma=1.0) | | Normal distribution of a random variable | | Methods defined here: | | __init__(self, mu=0.0, sigma=1.0) | NormalDist where mu is the mean and sigma is the standard deviation. | | cdf(self, x) | Cumulative distribution function. P(X <= x) | | pdf(self, x) | Probability density function. P(x <= X < x+dx) / dx | | -- | Data descriptors defined here: | | mu | Arithmetic mean. | | sigma | Standard deviation. | | variance | Square of the standard deviation. == Example of assigning docstrings after the class definition == class NormalDist: 'Normal distribution of a random variable' __slots__ = ('mu', 'sigma') def __init__(self, mu=0.0, sigma=1.0): 'NormalDist where mu is the mean and sigma is the standard deviation.' self.mu = mu self.sigma = sigma @property def variance(self): 'Square of the standard deviation.' return self.sigma ** 2. def pdf(self, x): 'Probability density function. P(x <= X < x+dx) / dx' variance = self.variance return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance) def cdf(self, x): 'Cumulative distribution function. P(X <= x)' return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0 NormalDist.mu.__doc__ = 'Arithmetic mean' NormalDist.sigma.__doc__ = 'Standard deviation' == Example of assigning docstrings with a dict = class NormalDist: 'Normal distribution of a random variable' __slots__ = {'mu' : 'Arithmetic mean.', 'sigma': 'Standard deviation.'} def __init__(self, mu=0.0, sigma=1.0): 'NormalDist where mu is the mean and sigma is the standard deviation.' self.mu = mu self.sigma = sigma @property def variance(self): 'Square of the standard deviation.' return self.sigma ** 2. def pdf(self, x): 'Probability density function. P(x <= X < x+dx) / dx' variance = self.variance return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance) def cdf(self, x): 'Cumulative distribution function. P(X <= x)' return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?
> On Mar 18, 2019, at 4:15 PM, Nathaniel Smith wrote: > > I noticed that your list doesn't include "add a DOM equality operator". That > seems potentially simpler to implement than canonical XML serialization, and > like a useful thing to have in any case. Would it make sense as an option? Time machine! Stéphane Wirtel just posted a basic semantic comparison between two streams.¹ Presumably, there would need to be a range of options for specifying what constitutes equivalence but this is a nice start. Raymond ¹ https://bugs.python.org/file48217/test_xml_compare.py ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?
We're having a super interesting discussion on https://bugs.python.org/issue34160 . It is now marked as a release blocker and warrants a broader discussion. Our problem is that at least two distinct and important users have written tests that depend on exact byte-by-byte comparisons of the final serialization. So any changes to the XML modules will break those tests (not the applications themselves, just the test cases that assume the output will be forever, byte-by-byte identical). In theory, the tests are incorrectly designed and should not treat the module output as a canonical normal form. In practice, doing an equality test on the output is the simplest, most obvious approach, and likely is being done in other packages we don't know about yet. With pickle, json, and __repr__, the usual way to write a test is to verify a roundtrip: assert pickle.loads(pickle.dumps(data)) == data. With XML, the problem is that the DOM doesn't have an equality operator. The user is left with either testing specific fragments with element.find(xpath) or with using a standards compliant canonicalization package (not available from us). Neither option is pleasant. The code in the current 3.8 alpha differs from 3.7 in that it removes attribute sorting and instead preserves the order the user specified when creating an element. As far as I can tell, there is no objection to this as a feature. The problem is what to do about the existing tests in third-party code, what guarantees we want to make going forward, and what do we recommend as a best practice for testing XML generation. Things we can do: 1) Revert back to the 3.7 behavior. This of course, makes all the test pass :-) The downside is that it perpetuates the practice of bytewise equality tests and locks in all implementation quirks forever. I don't know of anyone advocating this option, but it is the simplest thing to do. 2). Go into every XML module and add attribute sorting options to each function that generate xml. This gives users a way to make their tests pass for now. There are several downsides. a) It grows the API in a way that is inconsistent with all the other XML packages I've seen. b) We'll have to test, maintain, and document the API forever -- the API is already large and time consuming to teach. c) It perpetuates the notion that bytewise equality tests are the right thing to do, so we'll have this problem again if substitute in another code generator or alter any of the other implementation quirks (i.e. how CDATA sections are serialized). 3) Add a standards compliant canonicalization tool (see https://en.wikipedia.org/wiki/Canonical_XML ). This is likely to be the right-way-to-do-it but takes time and energy. 4) Fix the tests in the third-party modules to be more focused on their actual test objectives, the semantics of the generated XML rather than the exact serialization. This option would seem like the right-thing-to-do but it isn't trivial because the entire premise of the existing test is invalid. For every case, we'll actually have to think through what the test objective really is. Of these, option 2 is my least preferred. Ideally, we don't guarantee bytewise identical output across releases, and ideally we don't grow a new API that perpetuates the issue. That said, I'm not wedded to any of these options and just want us to do what is best for the users in the long run. Regardless of option chosen, we should make explicit whether on not the Python standard library modules guarantee cross-release bytewise identical output for XML. That is really the core issue here. Had we had an explicit notice one way or the other, there wouldn't be an issue now. Any thoughts? Raymond Hettinger P.S. Stefan Behnel is planning to remove attribute sorting from lxml. On the bug tracker, he has clearly articulated his reasons. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible performance regression
On Feb 26, 2019, at 2:28 PM, Neil Schemenauer wrote: > > Are you compiling with --enable-optimizations (i.e. PGO)? In my > experience, that is needed to get meaningful results. I'm not and I would worry that PGO would give less stable comparisons because it is highly sensitive to changes its training set as well as the actual CPython implementation (two moving targets instead of one). That said, it doesn't really matter to the world how I build *my* Python. We're trying to keep performant the ones that people actually use. For the Mac, I think there are only four that matter: 1) The one we distribute on the python.org website at https://www.python.org/ftp/python/3.8.0/python-3.8.0a2-macosx10.9.pkg 2) The one installed by homebrew 3) The way folks typically roll their own: $ ./configure && make (or some variant of make install) 4) The one shipped by Apple and put in /usr/bin Of the four, the ones I've been timing are #1 and #3. I'm happy to drop this. I was looking for independent confirmation and didn't get it. We can't move forward unless some else also observes a consistently measurable regression for a benchmark they care about on a build that they care about. If I'm the only who notices then it really doesn't matter. Also, it was reassuring to not see the same effect on a GCC-8 build. Since the effect seems to be compiler specific, it may be that we knocked it out of a local minimum and that performance will return the next time someone touches the eval-loop. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible performance regression
On Feb 25, 2019, at 8:23 PM, Eric Snow wrote: > > So it looks like commit ef4ac967 is not responsible for a performance > regression. I did narrow it down to that commit and I can consistently reproduce the timing differences. That said, I'm only observing the effect when building with the Mac default Clang (Apple LLVM version 10.0.0 (clang-1000.11.45.5). When building GCC 8.3.0, there is no change in performance. I conclude this is only an issue for Mac builds. > I ran the "performance" suite (https://github.com/python/performance), > which has 57 different benchmarks. Many of those benchmarks don't measure eval-loop performance. Instead, they exercise json, pickle, sqlite etc. So, I would expect no change in many of those because they weren't touched. Victor said he generally doesn't care about 5% regressions. That makes sense for odd corners of Python. The reason I was concerned about this one is that it hits the eval-loop and seems to effect every single op code. The regression applies somewhat broadly (increasing the cost of reading and writing local variables by about 20%). The effect is somewhat broad based. That said, it seems to be compiler specific and only affects the Mac builds, so maybe we can decide that we don't care. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Compact ordered set
Quick summary of what I found when I last ran experiments with this idea: * To get the same lookup performance, the density of index table would need to go down to around 25%. Otherwise, there's no way to make up for the extra indirection and the loss of cache locality. * There was a small win on iteration performance because its cheaper to loop over a dense array than a sparse array (fewer memory access and elimination of the unpredictable branch). This is nice because iteration performance matters in some key use cases. * I gave up on ordering right away. If we care about performance, keys can be stored in the order added; but no effort should be expended to maintain order if subsequent deletions occur. Likewise, to keep set-to-set operations efficient (i.e. looping over the smaller input), no order guarantee should be given for those operations. In general, we can let order happen but should not guarantee it and work to maintain it or slow-down essential operations to make them ordered. * Compacting does make sets a little smaller but does cost an indirection and incurs a cost for switching index sizes between 1-byte arrays, 2-byte arrays, 4-byte arrays, and 8-byte arrays. Those don't seem very expensive; however, set lookups are already very cheap when the hash values are known (when they're not, the cost of computing the hash value tends to dominate anything done by the setobject itself). * I couldn't find any existing application that would notice the benefit of making sets a bit smaller. Most applications use dictionaries (directly or indirectly) everywhere, so compaction was an overall win. Sets tend to be used more sparsely (no pun intended) and tend to be only a small part of overall memory usage. I had to consider this when bumping the load factor down to 60%, prioritizing speed over space. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Compact ordered set
> On Feb 26, 2019, at 3:30 AM, INADA Naoki wrote: > > I'm working on compact and ordered set implementation. > It has internal data structure similar to new dict from Python 3.6. > > On Feb 26, 2019, at 3:30 AM, INADA Naoki wrote: > > I'm working on compact and ordered set implementation. > It has internal data structure similar to new dict from Python 3.6 I've also looked at this as well. Some thoughts: * Set objects have a different and conflicting optimization that works better for a broad range of use cases. In particular, there is a linear probing search step that gives excellent cache performance (multiple entries retrieved in a single cache line) and it reduces the cost of finding the next entry to a single increment (entry++). This greatly reduces the cost of collisions and makes it cheaper to verify an item is not in a set. * The technique for compaction involves making the key/hash entry array dense and augmenting it with a sparse array of indices. This necessarily involves adding a layer of indirection for every probe. * With the cache misses, branching costs, and extra layer of indirection, collisions would stop being cheap, so we would need to work to avoid them altogether. To get anything like the current performance for a collision of the first probe, I suspect we would have to lower the table density down from 60% to 25%. * The intersection operation has an important optimization where it loops over the smaller of its two inputs. To give a guaranteed order that preserves the order of the first input, you would have to forgo this optimization, possibly crippling any existing code that depends on it. * Maintaining order in the face of deletions adds a new workload to sets that didn't exist before. You risk thrashing the set support a feature that hasn't been asked for and that isn't warranted mathematically (where the notion of sets is unordered). * It takes a lot of care and planning to avoid fooling yourself with benchmarks on sets. Anything done with a small tight loop will tend to hide all branch prediction costs and cache miss costs, both of which are significant in real world uses of sets. * For sets, we care much more about look-up performance than space. And unlike dicts where we usually expect to find a key, sets are all about checking membership which means they have to be balanced for the case where the key is not in the set. * Having and preserving order is one of the least important things a set can offer (it does have some value, but it is the least important feature, one that was never contemplated by the original set PEP). After the success of the compact dict, I can understand an almost irresistible urge to apply the same technique to sets. If it was clear that it was a win, I would have already done it long ago, even before dicts (it was much harder to get buy in to changing the dicts). Please temper the enthusiasm with rationality and caution. The existing setobject code has been finely tuned and micro-optimized over the years, giving it excellent performance on workloads we care about. It would be easy throw all of that away. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible performance regression
> On Feb 25, 2019, at 2:54 AM, Antoine Pitrou wrote: > > Have you tried bisecting to find out the offending changeset, if there > any? I got it down to two checkins before running out of time: Between git checkout 463572c8beb59fd9d6850440af48a5c5f4c0c0c9 And: git checkout 3b0abb019662e42070f1d6f7e74440afb1808f03 So the subinterpreter patch was likely the trigger. I can reproduce it over and over again on Clang, but not for a GCC-8 build, so it is compiler specific (and possibly macOS specific). Will look at it more after work this evening. I posted here to try to solicit independent confirmation. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible performance regression
> On Feb 24, 2019, at 10:06 PM, Eric Snow wrote: > > I'll look into it in more depth tomorrow. FWIW, I have a few commits > in the range you described, so I want to make sure I didn't slow > things down for us. :) Thanks for looking into it. FWIW, I can consistently reproduce the results several times in row. Here's the bash script I'm using: #!/bin/bash make clean ./configure make# Apple LLVM version 10.0.0 (clang-1000.11.45.5) for i in `seq 1 3`; do git checkout d610116a2e48b55788b62e11f2e6956af06b3de0 # Go back to 2/23 make# Rebuild sleep 30# Let the system get quiet and cool echo ' baseline ---' >> results.txt # Label output ./python.exe Tools/scripts/var_access_benchmark.py >> results.txt # Run benchmark git checkout 16323cb2c3d315e02637cebebdc5ff46be32ecdf # Go to end-of-day 2/24 make# Rebuild sleep 30# Let the system get quiet and cool echo ' end of day ---' >> results.txt # Label output ./python.exe Tools/scripts/var_access_benchmark.py >> results.txt # Run benchmark > > -eric > > > * commit 175421b58cc97a2555e474f479f30a6c5d2250b0 (HEAD) > | Author: Pablo Galindo > | Date: Sat Feb 23 03:02:06 2019 + > | > | bpo-36016: Add generation option to gc.getobjects() (GH-11909) > > $ ./python Tools/scripts/var_access_benchmark.py > Variable and attribute read access: > 18.1 ns read_local > 19.4 ns read_nonlocal These timings are several times larger than they should be. Perhaps you're running a debug build? Or perhaps 32-bit? Or on VM or some such. Something looks way off because I'm getting 4 and 5 ns on my 2013 Haswell laptop. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Possible performance regression
I'll been running benchmarks that have been stable for a while. But between today and yesterday, there has been an almost across the board performance regression. It's possible that this is a measurement error or something unique to my system (my Mac installed the 10.14.3 release today), so I'm hoping other folks can run checks as well. Raymond -- Yesterday $ ./python.exe Tools/scripts/var_access_benchmark.py Variable and attribute read access: 4.0 ns read_local 4.5 ns read_nonlocal 13.1 ns read_global 17.4 ns read_builtin 17.4 ns read_classvar_from_class 15.8 ns read_classvar_from_instance 24.6 ns read_instancevar 19.7 ns read_instancevar_slots 18.5 ns read_namedtuple 26.3 ns read_boundmethod Variable and attribute write access: 4.6 ns write_local 4.8 ns write_nonlocal 17.5 ns write_global 39.1 ns write_classvar 34.4 ns write_instancevar 25.3 ns write_instancevar_slots Data structure read access: 17.5 ns read_list 18.4 ns read_deque 19.2 ns read_dict Data structure write access: 19.0 ns write_list 22.0 ns write_deque 24.4 ns write_dict Stack (or queue) operations: 55.5 ns list_append_pop 46.3 ns deque_append_pop 46.7 ns deque_append_popleft Timing loop overhead: 0.3 ns loop_overhead -- Today --- $ ./python.exe py Tools/scripts/var_access_benchmark.py Variable and attribute read access: 5.0 ns read_local 5.3 ns read_nonlocal 14.7 ns read_global 18.6 ns read_builtin 19.9 ns read_classvar_from_class 17.7 ns read_classvar_from_instance 26.1 ns read_instancevar 21.0 ns read_instancevar_slots 21.7 ns read_namedtuple 27.8 ns read_boundmethod Variable and attribute write access: 6.1 ns write_local 7.3 ns write_nonlocal 18.9 ns write_global 40.7 ns write_classvar 36.2 ns write_instancevar 26.1 ns write_instancevar_slots Data structure read access: 19.1 ns read_list 19.6 ns read_deque 20.6 ns read_dict Data structure write access: 22.8 ns write_list 23.5 ns write_deque 27.8 ns write_dict Stack (or queue) operations: 54.8 ns list_append_pop 49.5 ns deque_append_pop 49.4 ns deque_append_popleft Timing loop overhead: 0.3 ns loop_overhead ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add minimal information with a new issue?
On Feb 21, 2019, at 6:53 AM, Stephane Wirtel wrote: > > What do you think if we suggest a "template" for the new bugs? 99% of the time the template would be not applicable. Historically, we asked for more information when needed and that wasn't very often. I think that anything that raises the cost of filing a bug report will work to our detriment. Ideally, we want the barriers to reporting to be as low as possible. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Asking for reversion
> On Feb 5, 2019, at 9:52 AM, Giampaolo Rodola' wrote: > > The main problem I have with this PR is that it seems to introduce 8 brand > new APIs, but since there is no doc, docstrings or tests it's unclear which > ones are supposed to be used, how or whether they are supposed to supersede > or deprecate older (slower) ones involving inter process communication. The release manger already opined that if tests and docs get finished for the second alpha, he prefers not to have a reversion and would rather on build on top of what already shipped in the first alpha. FWIW, the absence of docs isn't desirable but it isn't atypical. PEP 572 code landed without the docs. Docs for dataclasses arrived much after the code. The same was true for the decimal module. Hopefully, everyone will team up with Davin and help him get the ball over the goal line. BTW, this is a feature we really want. Our multicore story for Python isn't a good one. Due to the GIL, threading usually can't exploit multiple cores for better performance. Async has lower overhead than threading but achieves its gains by keeping all the data in a single process. That leaves us with multiprocessing where the primary obstacle has been the heavy cost of moving data between processes. If that cost can be reduced, we've got a winning story for multicore. This patch is one of the better things that is happening to Python. Aside from last week's procedural missteps and communication issues surrounding the commit, the many months of prior work on this have been stellar. How about we stop using a highly public forum to pile up on Davin (being the subject of a thread like this can be a soul crushing experience). Right now, he could really use some help and support from everyone on the team. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Asking for reversion
> On Feb 4, 2019, at 2:36 AM, Łukasz Langa wrote: > > @Raymond, would you be willing to work with Davin on finishing this work in > time for alpha2? I would be happy to help, but this is beyond my technical ability. The people who are qualified to work on this have already chimed in on the discussion. Fortunately, I think this is a feature that everyone wants. So it just a matter of getting the experts on the subject to team-up and help get it done. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Asking for reversion
> On Feb 3, 2019, at 5:40 PM, Terry Reedy wrote: > > On 2/3/2019 7:55 PM, Guido van Rossum wrote: >> Also, did anyone ask Davin directly to roll it back? > > Antoine posted on the issue, along with Robert O. Robert reviewed and make > several suggestions. I think the PR sat in a stable state for many months, and it looks like RO's review comments came *after* the commit. FWIW, with dataclasses we decided to get the PR committed early, long before most of the tests and all of the docs. The principle was that bigger changes needed to go in as early as possible in the release cycle so that we could thoroughly exercise it (something that almost never happens while something is in the PR stage). It would be great if the same came happen here. IIRC, shared memory has long been the holy grail for multiprocessing, helping to mitigate its principle disadvantage (the cost of moving data between processes). It's something we really want. But let's see what the 3.8 release manager has to say. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Asking for reversion
> On Feb 3, 2019, at 1:03 PM, Antoine Pitrou wrote: > > I'd like to ask for the reversion of the changes done in > https://github.com/python/cpython/pull/11664 Please work *with* Davin on this one. It was only recently that you edited his name out of the list of maintainers for multiprocessing even though that is what he's been working on for the last two years and at the last two sprints. I'd like to see more team work here rather than applying social pressures via python-dev (which is a *very* public list). Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: How about updating OrderedDict in csv and configparser to regular dict?
> On Jan 31, 2019, at 3:06 AM, Steve Holden wrote: > > And I see that such a patch is now merged. Thanks, Raymond! And thank you for getting ordering into csv.DictReader. That was a significant improvement in usability :-) Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] How to update namedtuple asdict() to use dict instead of OrderedDict
> On Jan 30, 2019, at 9:11 PM, Tim Delaney wrote: > > Alternatively, would it be viable to make OrderedDict work in a way that so > long as you don't use any reordering operations it's essentially just a very > thin layer on top of a dict, There's all kinds of tricks we could do but none of them are worth it. It took Eric Snow a long time to write the OrderedDict patch and it took years to get most of the bugs out of it. I would really hate to go through a redesign and eat up our time for something that probably won't be much used any more. I'm really just aiming for something as simple as s/OrderedDict/dict in namedtuple :-) Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] How to update namedtuple asdict() to use dict instead of OrderedDict
> On Jan 30, 2019, at 6:00 PM, David Mertz wrote: > > Ditto +1 option 4 > > On Wed, Jan 30, 2019, 5:56 PM Paul Moore On Wed, 30 Jan 2019 at 22:35, Raymond Hettinger > wrote: > > My recommendation is Option 4 as being less disruptive and more beneficial > > than the other options. In the unlikely event that anyone is currently > > depending on the reordering methods for the output of _asdict(), the > > remediation is trivially simple: nt._asdict() -> > > OrderedDict(nt.as_dict()). > > > > What do you all think? >> >> +1 from me on option 4. >> >> Paul Thanks everyone. I'll move forward with option 4. In Barry's word, JFDI :-) > On Jan 30, 2019, at 6:10 PM, Nathaniel Smith wrote: > > How viable would it be to make OrderedDict smaller, faster, and give > it a cleaner looking repr? Not so much. The implementations substantially different because they have different superpowers. A regular dict is really good at being a dict while retaining order but it isn't good at reordering operations such as popitem(False), popitem(True), move_to_end(), and whatnot. An OrderedDict is a heavier weight structure (a hash table augmented by a doubly-linked link) -- it is worse at being a dictionary but really good at intensive reordering operations typical in cache recency tracking and whatnot. Also, there are long-standing API differences including weak references, ability to assign attributes, an equality operation that requires exact order when compared to another ordered dict etc, as well as the reordering methods. If it was easy, clean, and desirable, it would have already been done :-) Overall, I think the OrderedDict is increasingly irrelevant except for use cases requiring cross-version compatibility and for cases that need heavy reordering. Accordingly, I mostly expect to leave it alone and fall into the not-much-used category like UserDict, UserList, and UserString. > On Jan 30, 2019, at 3:41 PM, Glenn Linderman wrote: > Would it be practical to add deprecated methods to regular dict for the > OrderedDict reordering methods that raise with an error suggesting "To use > this method, convert dict to OrderedDict." (or some better wording). That's an interesting idea. Regular dicts aren't well suited to the reordering operations (like lists, repeated inserts at the front of the sequence wouldn't be performant relative to OrderedDict which uses double-linked lists internally). My instinct is to leave regular dicts alone so that they can focus on their primary task (being good a fast lookups). Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] How to update namedtuple asdict() to use dict instead of OrderedDict
Now that regular dicts are ordered and compact, it makes more sense for the _asdict() method to create a regular dict (as it did in its early days) rather than an OrderedDict. The regular dict is much smaller, much faster, and has a much cleaner looking repr. It would also help namedtuple() stay in sync with dataclasses which already take advantage of the ordering feature of regular dicts. The question is how to be go about making the change in a way gives the most benefit to users as soon as possible and that creates the least disruption. Option 1) Add a deprecation notice to 3.8, make no code change in 3.8, and then update the code in 3.9. This has several issues: a) it doesn't provide an executable DeprecationWarning in 3.8, b) it isn't really a deprecation, and c) it defers the benefits of the change for another release. Option 2) Add a deprecation notice to 3.8, add a DeprecationWarning to the _asdict() method, and make the actual improvement in 3.9. The main issue here is that it will create a lot of noise for normal uses of the _asdict() method which are otherwise unaffected by the change. The typical use cases for _asdict() are to create keyword arguments and to pass named tuple data into functions or methods that expect regular dictionaries. Those use cases would benefit from seeing the change made sooner and would suffer in the interim from their code slowing down for warnings that aren't useful. Option 3). Add a deprecation notice to 3.8 and have the _asdict() method create a subclass of OrderedDict that issues warnings only for the methods and attributes that will change (move_to_end, popitem, __eq__, __dict__, __weakref__). This is less noisy but it adds a lot of machinery just to make a notification of a minor change. Also, it fails to warn that the data type will change. And it may create more confusion than options 1 and 4 which are simpler. Option 4) Just make the change directly in 3.8, s/OrderedDict/dict/, and be done will it. This gives users the benefits right away and doesn't annoy them with warnings that they likely don't care about. There is some precedent for this. To make namedtuple class creation faster, the *verbose* option was dropped without any deprecation period. It looks like no one missed that feature at all, but they did get the immediate benefit of faster import times. In the case of using regular dicts in named tuples, people will get immediate and significant space savings as well as a speed benefit. My recommendation is Option 4 as being less disruptive and more beneficial than the other options. In the unlikely event that anyone is currently depending on the reordering methods for the output of _asdict(), the remediation is trivially simple: nt._asdict() -> OrderedDict(nt.as_dict()). What do you all think? Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Lost sight
> On Jan 19, 2019, at 2:12 AM, Serhiy Storchaka wrote: > > I have virtually completely lost the sight of my right eye (and the loss is > quickly progresses) and the sight of my left eye is weak. I hope this only temporary. Best wishes. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] General concerns about C API changes
Overall, I support the efforts to improve the C API, but over the last few weeks have become worried. I don't want to hold up progress with fear, uncertainty, and doubt. Yet, I would like to be more comfortable that we're all aware of what is occurring and what are the potential benefits and risks. * Inline functions are great. They provide true local variables, better separation of concerns, are far less kludgy than text based macro substitution, and will typically generate the same code as the equivalent macro. This is good tech when used with in a single source file where it has predictable results. However, I'm not at all confident about moving these into header files which are included in multiple target .c files which need be compiled into separate .o files and linked to other existing libraries. With a macro, I know for sure that the substitution is taking place. This happens at all levels of optimization and in a debug mode. The effects are 100% predictable and have a well-established track record in our mature battle-tested code base. With cross module function calls, I'm less confident about what is happening, partly because compilers are free to ignore inline directives and partly because the semantics of inlining are less clear when the crossing module boundaries. * Other categories of changes that we make tend to have only a shallow reach. However, these C API changes will likely touch every C extension that has ever been written, some of which is highly tuned but not actively re-examined. If any mistakes are make, they will likely be pervasive. Accordingly, caution is warranted. My expectation was that the changes would be conducted in experimental branches. But extensive changes are already being made (or about to be made) on the 3.8 master. If a year from now, we decide that the changes were destabilizing or that the promised benefits didn't materialize, they will be difficult to undo because there are so many of them and because they will be interleaved with other changes. The original motivation was to achieve a 2x speedup in return for significantly churning the C API. However, the current rearranging of the include files and macro-to-inline-function changes only give us churn. At the very best, they will be performance neutral. At worst, formerly cheap macro calls will become expensive in places that we haven't thought to run timings on. Given that compilers don't have to honor an inline directive, we can't really know for sure -- perhaps today it works out fine, and perhaps tomorrow the compilers opt for a different behavior. Maybe everything that is going on is fine. Maybe it's not. I am not expert enough to know for sure, but we should be careful before green-lighting such an extensive series of changes directly to master. Reasonable questions to ask are: 1) What are the risks to third party modules, 2) Do we really know that the macro-to-inline-function transformations are semantically neutral. 3) If there is no performance benefit (none has been seen so far, nor is any promised in the pending PRs), is it worth it? We do know that PyPy folks have had their share of issues with the C API, but I'm not sure that we can make any of this go away without changing the foundations of the whole ecosystem. It is inconvenient for a full GC environment to interact with the API for a reference counted environment -- I don't think we can make this challenge go away without giving up reference counting. It is inconvenient for a system that manifests objects on demand to interact with an API that assumes that objects have identity and never more once they are created -- I don't think we can make this go away either. It is inconvenient to a system that uses unboxed data to interact with our API where everything is an object that includes a type pointer and reference count -- We have provided an API for boxing and boxing, but the trip back-and-forth is inconveniently expensive -- I don't think we can make that go away either because too much of the ecosystem depends on that API. There are some things that ca n be mitigated such as challenges with borrowed references but that doesn't seem to have been the focus on any of the PRs. In short, I'm somewhat concerned about the extensive changes that are occurring. I do know they will touch substantially every C module in the entire ecosystem. I don't know whether they are safe or whether they will give any real benefit. FWIW, none of this is a criticism of the work being done. Someone needs to think deeply about the C API or else progress will never be made. That said, it is a high risk project with many PRs going directly into master, so it does warrant having buy in that the churn isn't destabilizing and will actually produce a benefit that is worth it. Raymond ___ Python-Dev mailing list Python-Dev@python.
Re: [Python-Dev] Postponed annotations break inspection of dataclasses
> On Sep 22, 2018, at 1:38 PM, Yury Selivanov wrote: > > On Sat, Sep 22, 2018 at 3:11 PM Guido van Rossum wrote: > [..] >> Still, I wonder if there's a tweak possible of the globals and locals used >> when exec()'ing the function definitions in dataclasses.py, so that >> get_type_hints() gets the right globals for this use case. >> >> It's really tough to be at the intersection of three PEPs... > > If it's possible to fix exec() to accept any Mapping (not just dicts), > then we can create a proxy mapping for "Dataclass.__init__.__module__" > module and everything would work as expected FWIW, the locals() dict for exec() already accepts any mapping (not just dicts): >>> class M: def __getitem__(self, key): return key.upper() def __setitem__(self, key, value): print(f'{key!r}: {value!r}') >>> exec('a=b', globals(), M()) 'a': 'B' Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Testing C API
> On Jul 30, 2018, at 12:06 AM, Serhiy Storchaka wrote: > > 30.07.18 09:46, Raymond Hettinger пише: >> I prefer the current organization that keeps the various tests together with >> the category being tested. I almost never need to run the C API tests all >> at once, but I do need to see all the tests for an object in one place. >> When maintaining something like marshal, it would be easy to miss some of >> the tests if they are in a separate file. IMO, the proposed change would >> hinder future maintenance and fly in the face of our traditional code >> organization. > > What about moving just test_capi.py, test_getargs2.py and > test_structmembers.py into Lib/test/test_capi? They are not related to > specific types or modules That would be reasonable. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Testing C API
> On Jul 29, 2018, at 4:53 AM, Serhiy Storchaka wrote: > > The benefit is that it will be easier to run all C API tests at once, and > only them, and it will be clearer what C API is covered by tests. The > disadvantage is that you will need to run several files for testing marshal > for example. > > What are your thoughts? I prefer the current organization that keeps the various tests together with the category being tested. I almost never need to run the C API tests all at once, but I do need to see all the tests for an object in one place. When maintaining something like marshal, it would be easy to miss some of the tests if they are in a separate file. IMO, the proposed change would hinder future maintenance and fly in the face of our traditional code organization. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [issue34221] Any plans to combine collections.OrderedDict with dict
> On Jul 26, 2018, at 10:23 AM, Terry Reedy wrote: > > On python-idea, Miro Hrončok asked today whether we can change the > OrderedDict repr from, for instance, > > OrderedDict([('a', '1'), ('b', '2')]) # to > OrderedDict({'a': '1', 'b': '2'}) > > I am not sure what our repr change policy is, as there is a > back-compatibility issue but I remember there being changes. We are allowed to change the repr in future versions of the language. Doing so does come at a cost though. There is a small performance penalty (see the timings below). Some doctests will break. And Python 3.8 printed output in books and blog posts would get shuffled if typed in to Python 3.5 -- this is problematic because one of the few remaining use cases for OrderedDict is to write code that is compatible with older Pythons. The proposed repr does look pretty but probably isn't worth the disruption. Raymond -- $ python3.7 -m timeit -r 7 'from collections import OrderedDict' "OrderedDict([('a', '1'), ('b', '2')])" 20 loops, best of 7: 1.12 usec per loop $ python3.7 -m timeit -r 7 'from collections import OrderedDict' "OrderedDict({'a': '1', 'b': '2'})" 20 loops, best of 7: 1.22 usec per loop $ python3.7 -m timeit -r 7 'from collections import OrderedDict' "OrderedDict([('a', '1'), ('b', '2')])" 20 loops, best of 7: 1.13 usec per loop $ python3.7 -m timeit -r 7 'from collections import OrderedDict' "OrderedDict({'a': '1', 'b': '2'})" 20 loops, best of 7: 1.2 usec per loop $ python3.7 -m timeit -r 7 'from collections import OrderedDict' "OrderedDict([('a', '1'), ('b', '2')])" 20 loops, best of 7: 1.12 usec per loop $ python3.7 -m timeit -r 7 'from collections import OrderedDict' "OrderedDict({'a': '1', 'b': '2'})" 20 loops, best of 7: 1.2 usec per loop ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [issue34221] Any plans to combine collections.OrderedDict with dict
> On Jul 25, 2018, at 8:23 PM, INADA Naoki wrote: > > On Thu, Jul 26, 2018 at 12:04 PM Zhao Lee wrote: >> >> >> Since Python 3.7,dicts remember the order that items were inserted, so any >> plans to combine collections.OrderedDict with dict? >> https://docs.python.org/3/library/collections.html?#collections.OrderedDict >> https://docs.python.org/3/library/stdtypes.html#dict > > No. There are some major difference. > > * d1 == d2 ignores order / od1 == od2 compares order > * OrderedDict has move_to_end() method. > * OrderedDict.pop() takes `last=True` keyword. In addition to the API differences noted by Naoki, there are also implementation differences. The regular dict implements a low-cost solution for common cases. The OrderedDict has a more complex scheme that can handle frequent rearrangements (move_to_end operations) without touching, resizing, or reordering the underlying dictionary. Roughly speaking, regular dicts emphasize fast, space-efficient core dictionary operations over ordering requirements while OrderedDicts prioritize ordering operations over other considerations. That said, now that regular dicts are ordered by default, the need for collections.OrderedDict() should diminish quite a bit. Mostly, I think people will ignore OrderedDict unless their application heavily exercises move to end operations. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add __reversed__ methods for dict
> On May 26, 2018, at 7:20 AM, INADA Naoki wrote: > > Because doubly linked list is very memory inefficient, every implementation > would be forced to implement dict like PyPy (and CPython) for efficiency. > But I don't know much about current MicroPython and other Python > implementation's > plan to catch Python 3.6 up. FWIW, Python 3.7 is the first Python that where the language guarantees that regular dicts are order preserving. And the feature being discussed in this thread is for Python 3.8. What potential implementation obstacles do you foresee? Can you imagine any possible way that an implementation would have an order preserving dict but would be unable to trivially implement __reversed__? How could an implementation have a __setitem__ that appends at the end, and a popitem() that pops from that same end, but still not be able to easily iterate in reverse? It really doesn't matter whether an implementer uses a dense array of keys or a doubly-linked-list; either way, looping backward is as easy as going forward. Raymond P.S. It isn't going to be hard to update MicroPython to have a compact and ordered dict (based on my review of their existing dict implementation). This is something they are really going to want because of the improved memory efficiency. Also, they're also already going to need it just to comply with guaranteed keyword argument ordering and guaranteed ordering of class dictionaries. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
> On May 24, 2018, at 10:57 AM, Antoine Pitrou wrote: > > While PEP 574 (pickle protocol 5 with out-of-band data) is still in > draft status, I've made available an implementation in branch "pickle5" > in my GitHub fork of CPython: > https://github.com/pitrou/cpython/tree/pickle5 > > Also I've published an experimental backport on PyPI, for Python 3.6 > and 3.7. This should help people play with the new API and features > without having to compile Python: > https://pypi.org/project/pickle5/ > > Any feedback is welcome. Thanks for doing this. Hope it isn't too late, but I would like to suggest that protocol 5 support fast compression by default. We normally pickle objects so that they can be transported (saved to a file or sent over a socket). Transport costs (reading and writing a file or socket) are generally proportional to size, so compression is likely to be a net win (much as it was for header compression in HTTP/2). The PEP lists compression as a possible a refinement only for large objects, but I expect is will be a win for most pickles to compress them in their entirety. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add __reversed__ methods for dict
> On May 25, 2018, at 9:32 AM, Antoine Pitrou wrote: > > It's worth nothing that OrderedDict already supports reversed(). > The argument could go both ways: > > 1. dict is similar to OrderedDict nowadays, so it should support > reversed() too; > > 2. you can use OrderedDict to signal explicitly that you care about > ordering; no need to add anything to dict. Those are both valid sentiments :-) My thought is that guaranteed insertion order for regular dicts is brand new, so it will take a while for the notion settle in and become part of everyday thinking about dicts. Once that happens, it is probably inevitable that use cases will emerge and that __reversed__ will get added at some point. The implementation seems straightforward and it isn't much of a conceptual leap to expect that a finite ordered collection would be reversible. Given that dicts now track insertion order, it seems reasonable to want to know the most recent insertions (i.e. looping over the most recently added tasks in a task dict). Other possible use cases will likely correspond to how we use the Unix tail command. If those use cases arise, it would be nice for __reversed__ to already be supported so that people won't be tempted to implement an ugly workaround using popitem() calls followed by reinsertions. Raymond . ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashes in Python3.5 for tuples and frozensets
> On May 16, 2018, at 5:48 PM, Anthony Flury via Python-Dev > wrote: > > However the frozen set hash, the same in both cases, as is the hash of the > tuples - suggesting that the vulnerability resolved in Python 3.3 wasn't > resolved across all potentially hashable values. You are correct. The hash randomization only applies to strings. None of the other object hashes were altered. Whether this is a vulnerability or not depends greatly on what is exposed to users (generally strings) and how it is used. For the most part, it is considered a feature that integers hash to themselves. That is very fast to compute :-) Also, it tends to prevent hash collisions for consecutive integers. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 572: Usage of assignment expressions in C
> On Apr 28, 2018, at 8:45 AM, Antoine Pitrou wrote: > >> I personally haven't written a lot of C, so have no personal experience, >> but if this is at all a common approach among experienced C developers, it >> tells us a lot. > > I think it's a matter of taste and personal habit. Some people will > often do it, some less. Note that C also has a tendency to make it > more useful, because doesn't have exceptions, so functions need to > (ab)use return values when they want to indicate an error. When you're > calling such functions (for example I/O functions), you routinely have > to check for special values indicating an error, so it's common to see > code such as: > > // Read up to n bytes from file descriptor > if ((bytes_read = read(fd, buf, n)) == -1) { > // Error occurred while reading, do something > } Thanks Antoine, this is an important point that I hope doesn't get lost. In a language with exceptions, assignment expressions are less needful. Also, the pattern of having of having mutating methods return None further limits the utility. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 572: A backward step in readability
> On Apr 30, 2018, at 9:37 AM, Steven D'Aprano wrote: > > On Mon, Apr 30, 2018 at 08:09:35AM +0100, Paddy McCarthy wrote: > [...] >> A PEP that can detract from readability; *readability*, a central >> tenet of Python, should >> be rejected, (on principle!), when such objections are treated so >> dismissively. > > Unless you have an objective measurement of readability, that objection > is mere subjective personal preference, and not one that everyone agrees > with. Sorry Steven, but that doesn't seem like it is being fair to Paddy. Of course, readability can't be measured objectively with ruler (that is a false standard). However, readability is still a real issue that affects us daily even though objective measurement aren't possible. All of us who do code reviews make assessments of readability on a daily basis even though we have no objective measures. We know hard to read when we see it. In this thread, several prominent and highly experienced devs reported finding it difficult to parse some of the examples and some mis-parsed the semantics of the examples. It is an objective fact that they reported readability issues. That is of great concern and shouldn't be blown off with a comment that readability, "is a mere subjective personal preference". At its heart, readability is the number one concern in language design. Also, there another area where it looks like valid concerns are being dismissed out of hand. Several respondents worried that the proposed feature will lead to writing bad code. Their comments seem to have been swept under the table with responses along the lines of "well any feature can be used badly, so we don't care about that, some people will write bad code no matter what we do". While that is true to some extent, there remains a valid issue concerning the propensity for misuse. ISTM the proposed feature relies on users showing a good deal of self-restriaint and having a clear knowledge of boundary between the "clear-win" cases (like the regex match object example) and the puzzling cases (assignments being used in and-operator and or-operator chains). It also relies on people not making hard to find mistakes (like mistyping := when == was intended). There is a real difference between a feature that could be abused versus a feature that has a propensity for being misused, being mistyped, or being misread (all of which have occurred multiple times in these threads). > The "not readable" objection has been made, extremely vehemently, > against nearly all major syntax changes to Python: I think that is a false recollection of history. Comprehensions were welcomed and highly desired. Decorators were also highly sought after -- there was only a question of the best possible syntax. The ternary operator was clamored for by an enormous number of users (though there was little agreement on the best spelling). Likewise, the case for augmented assignments was somewhat strong (eliminating having to spell the assignment target twice). Each of those proposals had their debates, but none of them had a bunch of core devs flat-out opposed like we do now. It really isn't the same at all. However, even if the history had been recalled correctly, it would still be a logical fallacy to posit "in the past, people opposed syntax changes that later proved to be popular, therefore we should ignore all concerns being expressed today". To me, that seems like a rhetorical trick for dismissing a bunch of thoughtful posts. Adding this new syntax is a one-way trip -- we don't get to express regrets later. Accordingly, it would be nice if the various concerns being presented were addressed directly rather than being dismissed with a turn of phrase. Nor should it matter whether concerns were articulately expressed (being articulate isn't always correlated with being right). Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (name := expression) doesn't fit the narrative of PEP 20
> On Apr 26, 2018, at 12:40 AM, Tim Peters wrote: > > [Raymond Hettinger ] >> After re-reading all the proposed code samples, I believe that >> adopting the PEP will make the language harder to teach to people >> who are not already software engineers. > > Can you elaborate on that? Just distinguishing between =, :=, and == will be a forever recurring discussion, far more of a source of confusion than the occasional question of why Python doesn't have embedded assignment. Also, it is of concern that a number of prominent core dev respondents to this thread have reported difficulty scanning the posted code samples. > I've used dozens of languages over the > decades, most of which did have some form of embedded assignment. Python is special, in part, because it is not one of those languages. It has virtues that make it suitable even for elementary school children. We can show well-written Python code to non-computer folks and walk them through what it does without their brains melting (something I can't do with many of the other languages I've used). There is a virtue in encouraging simple statements that read like English sentences organized into English-like paragraphs, presenting itself like "executable pseudocode". Perl does it or C++ does it is unpersuasive. Its omission from Python was always something that I thought Guido had left-out on purpose, intentionally stepping away from constructs that would be of help in an obfuscated Python contest. > Yes, I'm a software engineer, but I've always pitched in on "help > forums" too. That's not really the same. I've taught Python to many thousands of professionals, almost every week for over six years. That's given me a keen sense of what is hard to teach. It's okay to not agree with my assessment, but I would like for fruits of my experience to not be dismissed in a single wisp of a sentence. Any one feature in isolation is usually easy to explain, but showing how to combine them into readable, expressive code is another matter. And as Yuri aptly noted, we spend more time reading code than writing code. If some fraction of our users finds the code harder to scan because the new syntax, then it would be a net loss for the language. I hesitated to join this thread because you and Guido seemed to be pushing back so hard against anyone's who design instincts didn't favor the new syntax. It would be nice to find some common ground and perhaps stipulate that the grammar would grow in complexity, that a new operator would add to the current zoo of operators, that the visual texture of the language would change (and in a way that some including me do not find pleasing), and that while simplest cases may afford a small net win, it is a certitude that the syntax will routinely be pushed beyond our comfort zone. While the regex conditional example looks like a win, it is very modest win and IMHO not worth the overall net increase language complexity. Like Yuri, I'll drop-out now. Hopefully, you all wind find some value in what I had to contribute to the conversation. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (name := expression) doesn't fit the narrative of PEP 20
> On Apr 25, 2018, at 8:11 PM, Yury Selivanov wrote: > > FWIW I started my thread for allowing '=' in expressions to make sure that > we fully explore that path. I don't like ':=' and I thought that using '=' > can make the idea more appealing to myself and others. It didn't, sorry if > it caused any distraction. Although adding a new ':=' operator isn't my main > concern. > > I think it's a fact that PEP 572 makes Python more complex. > Teaching/learning Python will inevitably become harder, simply because > there's one more concept to learn. > > Just yesterday this snippet was used on python-dev to show how great the > new syntax is: > > my_func(arg, buffer=(buf := [None]*get_size()), size=len(buf)) > > To my eye this is an anti-pattern. One line of code was saved, but the > other line becomes less readable. The fact that 'buf' can be used after > that line means that it will be harder for a reader to trace the origin of > the variable, as a top-level "buf = " statement would be more visible. > > The PEP lists this example as an improvement: > > [(x, y, x/y) for x in input_data if (y := f(x)) > 0] > > I'm an experienced Python developer and I can't read/understand this > expression after one read. I have to read it 2-3 times before I trace where > 'y' is set and how it's used. Yes, an expanded form would be ~4 lines > long, but it would be simple to read and therefore review, maintain, and > update. > > Assignment expressions seem to optimize the *writing code* part, while > making *reading* part of the job harder for some of us. I write a lot of > Python, but I read more code than I write. If the PEP gets accepted I'll > use > the new syntax sparingly, sure. My main concern, though, is that this PEP > will likely make my job as a code maintainer harder in the end, not easier. > > I hope I explained my -1 on the PEP without sounding emotional. FWIW, I concur with all of Yuri's thoughtful comments. After re-reading all the proposed code samples, I believe that adopting the PEP will make the language harder to teach to people who are not already software engineers. To my eyes, the examples give ample opportunity for being misunderstood and will create a need to puzzle-out the intended semantics. On the plus side, the proposal does address the occasional minor irritant of writing an assignment on a separate line. On the minus side, the visual texture of the new code is less appealing. The proposal also messes with my mental model for the distinction between expressions and statements. It probably doesn't matter at this point (minds already seem to be made up), but put me down for -1. This is a proposal we can all easily live without. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 575: Unifying function/method classes
> On Apr 15, 2018, at 5:50 AM, Jeroen Demeyer wrote: > > On 2018-04-14 23:14, Guido van Rossum wrote: >> That actually sounds like a pretty big problem. I'm sure there is lots >> of code that doesn't *just* duck-type nor calls inspect but uses >> isinstance() to decide how to extract the desired information. > > In the CPython standard library, the *only* fixes that are needed because of > this are in: > > - inspect (obviously) > - doctest (to figure out the __module__ of an arbitrary object) > - multiprocessing.reduction (something to do with pickling) > - xml.etree.ElementTree (to determine whether a certain method was overridden) > - GDB support > > I've been told that there might also be a problem with Random._randbelow, > even though it doesn't cause test failures. Don't worry about Random._randbelow, we're already working on it and it is an easy fix. Instead, focus on Guido's comment. > The fact that there is so little breakage in the standard library makes > me confident that the problem is not so bad. And in the cases where it > does break, it's usually pretty easy to fix. I don't think that confidence is warranted. The world of Python is very large. When public APIs (such as that in the venerable types module) get changed, is virtually assured that some code will break. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 575: Unifying function/method classes
> On Apr 12, 2018, at 9:12 AM, Jeroen Demeyer wrote: > > I would like to request a review of PEP 575, which is about changing the > classes used for built-in functions and Python functions and methods. The > text of the PEP can be found at > > https://www.python.org/dev/peps/pep-0575/ Thanks for doing this work. The PEP is well written and I'm +1 on the general idea of what it's trying to do (I'm still taking in all the details). It would be nice to have a section that specifically discusses the implications with respect to other existing function-like tooling: classmethod, staticmethod, partial, itemgetter, attrgetter, methodgetter, etc. Also, please mention the backward compatibility issue that will arise for code that currently relies on types.MethodType, types.BuiltinFunctionType, types.BuiltinMethodType, etc. For example, I would need to update the code in random._randbelow(). That code uses the existing builtin-vs-pure-python type distinctions to determine whether either the random() or getrandbits() methods have been overridden. This is likely an easy change for me to make, but there may be code like it the wild, code that would be broken if the distinction is lost. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Soliciting comments on the future of the cmd module (bpo-33233)
> On Apr 6, 2018, at 3:02 PM, Ned Deily wrote: > > We could be even bolder and officially deprecate "cmd" and consider closing > open enhancement issues for it on b.p.o." FWIW, the pdb module depends on the cmd module. Also, I still teach people how to use cmd and I think it still serves a useful purpose. So, unless it is considered broken, I don't think it should be deprecated. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Replacing self.__dict__ in __init__
On Mar 25, 2018, at 8:08 AM, Tin Tvrtković wrote: > > That's reassuring, thanks. I misspoke. The object size is the same but the underlying dictionary loses key-sharing and doubles in size. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Replacing self.__dict__ in __init__
> On Mar 24, 2018, at 7:18 AM, Tin Tvrtković wrote: > > it's faster to do this: > > self.__dict__ = {'a': a, 'b': b, 'c': c} > > i.e. to replace the instance dictionary altogether. On PyPy, their core devs > inform me this is a bad idea because the instance dictionary is special > there, so we won't be doing this on PyPy. > > But is it safe to do on CPython? This should work. I've seen it done in other production tools without any ill effect. The dict can be replaced during __init__() and still get benefits of key-sharing. That benefit is lost only when the instance dict keys are modified downstream from __init__(). So, from a dict size point of view, your optimization is fine. Still, you should look at whether this would affect static type checkers, lint tools, and other tooling. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Symmetry arguments for API expansion
> On Mar 13, 2018, at 12:07 PM, Guido van Rossum wrote: > > OK, please make it so. Will do. I'll create a tracker issue right away. Since this one looks easy (as many things do at first), I would like to assign it to Nofar Schnider (one of my mentees). Raymond > > On Tue, Mar 13, 2018 at 11:39 AM, Raymond Hettinger > wrote: > > > > On Mar 13, 2018, at 10:43 AM, Guido van Rossum wrote: > > > > So let's make as_integer_ratio() the standard protocol for "how to make a > > Fraction out of a number that doesn't implement numbers.Rational". We > > already have two examples of this (float and Decimal) and perhaps numpy or > > the sometimes proposed fixed-width decimal type can benefit from it too. If > > this means we should add it to int, that's fine with me. > > I would like that outcome. > > The signature x.as_integer_ratio() -> (int, int) is pleasant to work with. > The output is easy to explain, and the denominator isn't tied to powers of > two or ten. Since Python ints are exact and unbounded, there isn't worry > about range or rounding issues. > > In contrast, math.frexp(float) ->(float, int) is a bit of pain because it > still leaves you in the domain of floats rather than letting you decompose to > more more basic types. It's nice to have a way to move down the chain from > ℚ, ℝ, or ℂ to the more basic ℤ (of course, that only works because floats and > complex are implemented in a way that precludes exact irrationals). > > > Raymond > > > > > > -- > --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Symmetry arguments for API expansion
> On Mar 13, 2018, at 10:43 AM, Guido van Rossum wrote: > > So let's make as_integer_ratio() the standard protocol for "how to make a > Fraction out of a number that doesn't implement numbers.Rational". We already > have two examples of this (float and Decimal) and perhaps numpy or the > sometimes proposed fixed-width decimal type can benefit from it too. If this > means we should add it to int, that's fine with me. I would like that outcome. The signature x.as_integer_ratio() -> (int, int) is pleasant to work with. The output is easy to explain, and the denominator isn't tied to powers of two or ten. Since Python ints are exact and unbounded, there isn't worry about range or rounding issues. In contrast, math.frexp(float) ->(float, int) is a bit of pain because it still leaves you in the domain of floats rather than letting you decompose to more more basic types. It's nice to have a way to move down the chain from ℚ, ℝ, or ℂ to the more basic ℤ (of course, that only works because floats and complex are implemented in a way that precludes exact irrationals). Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Symmetry arguments for API expansion
> On Mar 12, 2018, at 12:15 PM, Guido van Rossum wrote: > > There's a reason why adding this to int feels right to me. In mypy we treat > int as a sub*type* of float, even though technically it isn't a sub*class*. > The absence of an is_integer() method on int means that this code has a bug > that mypy doesn't catch: > > def f(x: float): > if x.is_integer(): > "do something" > else: > "do something else" > > f(12) Do you have any thoughts about the other non-corresponding float methods? >>> set(dir(float)) - set(dir(int)) {'as_integer_ratio', 'hex', '__getformat__', 'is_integer', '__setformat__', 'fromhex'} In general, would you prefer that functionality like is_integer() be a math module function or that is should be a method on all numeric types except Complex? I expect questions like this to recur over time. Also, do you have any thoughts on the feature itself? Serhiy ran a Github search and found that it was baiting people into worrisome code like: (x/5).is_integer() or (x**0.5).is_integer() > So I think the OP of the bug has a valid point, 27 years without this feature > notwithstanding. Okay, I'll ask the OP to update his patch :-) Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Symmetry arguments for API expansion
There is a feature request and patch to propagate the float.is_integer() API through rest of the numeric types ( https://bugs.python.org/issue26680 ). While I don't think it is a good idea, the OP has been persistent and wants his patch to go forward. It may be worthwhile to discuss on this list to help resolve this particular request and to address the more general, recurring design questions. Once a feature with a marginally valid use case is added to an API, it is common for us to get downstream requests to propagate that API to other places where it makes less sense but does restore a sense of symmetry or consistency. In cases where an abstract base class is involved, acceptance of the request is usually automatic (i.e. range() and tuple() objects growing index() and count() methods). However, when our hand hasn't been forced, there is still an opportunity to decline. That said, proponents of symmetry requests tend to feel strongly about it and tend to never fully accept such a request being declined (it leaves them with a sense that Python is disordered and unbalanced). Raymond My thoughts on the feature request - What is the proposal? * Add an is_integer() method to int(), Decimal(), Fraction(), and Real(). Modify Rational() to provide a default implementation. Starting point: Do we need this? * We already have a simple, traditional, portable, and readable way to make the test: int(x) == x * In the context of ints, the test x.is_integer() always returns True. This isn't very useful. * Aside from the OP, this behavior has never been requested in Python's 27 year history. Does it cost us anything? * Yes, adding a method to the numeric tower makes it a requirement for every class that ever has or ever will register or inherit from the tower ABCs. * Adding methods to a core object such as int() increases the cognitive load for everyday users who look at dir(), call help(), or read the main docs. * It conflicts with a design goal for the decimal module to not invent new functionality beyond the spec unless essential for integration with the rest of the language. The reasons included portability with other implementations and not trying to guess what the committee would have decided in the face of tricky questions such as whether Decimal('1.01').is_integer() should return True when the context precision is only three decimal places (i.e. whether context precision and rounding traps should be applied before the test and whether context flags should change after the test). Shouldn't everything in a concrete class also be in an ABC and all its subclasses? * In general, the answer is no. The ABCs are intended to span only basic functionality. For example, GvR intentionally omitted update() from the Set() ABC because the need was fulfilled by __ior__(). But int() already has real, imag, numerator, and denominator, why is this different? * Those attributes are central to the functioning of the numeric tower. * In contrast, the is_integer() method is a peripheral and incidental concept. What does "API Parsimony" mean? * Avoidance of feature creep. * Preference for only one obvious way to do things. * Practicality (not craving things you don't really need) beats purity (symmetry and foolish consistency). * YAGNI suggests holding off in the absence of clear need. * Recognition that smaller APIs are generally better for users. Are there problems with symmetry/consistency arguments? * The need for guard rails on an overpass doesn't imply the same need on a underpass even though both are in the category of grade changing byways. * "In for a penny, in for a pound" isn't a principle of good design; rather, it is a slippery slope whereby the acceptance of a questionable feature in one place seems to compel later decisions to propagate the feature to other places where the cost / benefit trade-offs are less favorable. Should float.as_integer() have ever been added in the first place? * Likely, it should have been a math module function like isclose() and isinf() so that it would not have been type specific. * However, that ship has sailed; instead, the question is whether we now have to double down and have to dispatch other ships as well. * There is some question as to whether it is even a good idea to be testing the results of floating point calculations for exact values. It may be useful for testing inputs, but is likely a trap for people using it other contexts. Have we ever had problems with just accepting requests solely based on symmetry? * Yes. The str.startswith() and str.endswith() methods were given optional start/end arguments to be consistent with str.index(), not because there were known use cases where code was made better with the new feature. This ended up conflicting with a later feature request that did have valid use cases (supporting multiple test prefixes/suffixes). As a result, we ended-up with an awkward and error-p
[Python-Dev] Should the dataclass frozen property apply to subclasses?
When working on the docs for dataclasses, something unexpected came up. If a dataclass is specified to be frozen, that characteristic is inherited by subclasses which prevents them from assigning additional attributes: >>> @dataclass(frozen=True) class D: x: int = 10 >>> class S(D): pass >>> s = S() >>> s.cached = True Traceback (most recent call last): File "", line 1, in s.cached = True File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/dataclasses.py", line 448, in _frozen_setattr raise FrozenInstanceError(f'cannot assign to field {name!r}') dataclasses.FrozenInstanceError: cannot assign to field 'cached' Other immutable classes in Python don't behave the same way: >>> class T(tuple): pass >>> t = T([10, 20, 30]) >>> t.cached = True >>> class F(frozenset): pass >>> f = F([10, 20, 30]) >>> f.cached = True >>> class B(bytes): pass >>> b = B() >>> b.cached = True Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dataclasses, frozen and __post_init__
> On Feb 20, 2018, at 2:38 PM, Guido van Rossum wrote: > > But then the class would also inherit a bunch of misfeatures from tuple (like > being indexable and having a length). It would be nicer if it used __slots__ > instead. FWIW, George Sakkis made a tool like this about nine years ago. https://code.activestate.com/recipes/576555-records It would need to be modernized to include default arguments, types annotations and whatnot, but otherwise it has great performance and low API complexity. > (Also, the problem with __slots__ is the same as the problem with inheriting > from tuple, and it should just be solved right, somehow.) Perhaps a new variant of __init_subclass__ would work. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is static typing still optional?
> On Jan 28, 2018, at 11:52 PM, Eric V. Smith wrote: > > I think it would be a bad design to have to opt-in to hashability if using > frozen=True. I respect that you see it that way, but it doesn't make sense to me. You can have either one without the other. It seems to me that it is clearer and more explicit to just say what you want rather than having implicit logic guess at what you meant. Otherwise, when something goes wrong, it is difficult to debug. The tooltips for the dataclass decorator are essentially of checklist of features that can be turned on or off. That list of features is mostly easy-to-use except for hash=None which has three possible values, only one of which is self-evident. We haven't had much in the way of user testing, so it is a significant data point that one of your first users (me) found was confounded by this API. I recommend putting various correct and incorrect examples in front of other users (preferably experienced Python programmers) and asking them to predict what the code does based on the source code. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is static typing still optional?
>>> 2) Change the default value for "hash" from "None" to "False". This might >>> take a little effort because there is currently an oddity where setting >>> hash=False causes it to be hashable. I'm pretty sure this wasn't intended >>> ;-) >> I haven't looked at this yet. > > I think the hashing logic explained in > https://bugs.python.org/issue32513#msg310830 is correct. It uses hash=None as > the default, so that frozen=True objects are hashable, which they would not > be if hash=False were the default. Wouldn't it be simpler to make the options orthogonal? Frozen need not imply hashable. I would think if a user wants frozen and hashable, they could just write frozen=True and hashable=True. That would more explicit and clear than just having frozen=True imply that hashability gets turned-on implicitly whether you want it or not. > If there's some case there that you disagree with, I'd be interested in > hearing about it. > > That logic is what is currently scheduled to go in to 3.7 beta 1. I have not > updated the PEP yet, mostly because it's so difficult to explain. That might be a strong hint that this part of the API needs to be simplified :-) "If the implementation is hard to explain, it's a bad idea." -- Zen If for some reason, dataclasses really do need tri-state logic, it may be better off with enum values (NOT_HASHABLE, VALUE_HASHABLE, IDENTITY_HASHABLE, HASHABLE_IF_FROZEN or some such) rather than with None, True, and False which don't communicate enough information to understand what the decorator is doing. > What's the case where setting hash=False causes it to be hashable? I don't > think that was ever the case, and I hope it's not the case now. Python 3.7.0a4+ (heads/master:631fd38dbf, Jan 28 2018, 16:20:11) [GCC 7.2.0] on darwin Type "copyright", "credits" or "license()" for more information. >>> from dataclasses import dataclass >>> @dataclass(hash=False) class A: x: int >>> hash(A(1)) 285969507 I'm hoping that this part of the API gets thought through before it gets set in stone. Since dataclasses code never got a chance to live in the wild (on PyPI or some such), it behooves us to think through all the usability issues. To me at least, the tri-state hashability was entirely unexpected and hard to debug -- I had to do a close reading of the source to figure-out what was happening. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses
> On Dec 29, 2017, at 4:52 PM, Guido van Rossum wrote: > > I still think it should overrides anything that's just inherited but nothing > that's defined in the class being decorated. This has the virtue of being easy to explain, and it will help with debugging by honoring the code proximate to the decorator :-) For what it is worth, the functools.total_ordering class decorator does something similar -- though not exactly the same. A root comparison method is considered user-specified if it is different than the default method provided by object: def total_ordering(cls): """Class decorator that fills in missing ordering methods""" # Find user-defined comparisons (not those inherited from object). roots = {op for op in _convert if getattr(cls, op, None) is not getattr(object, op, None)} ... The @dataclass decorator has a much broader mandate and we have almost no experience with it, so it is hard to know what legitimate use cases will arise. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pep-0557 dataclasses top level module vs part of collections?
> On Dec 21, 2017, at 3:21 PM, Gregory P. Smith wrote: > > It seems a suggested use is "from dataclasses import dataclass" > > But people are already familiar with "from collections import namedtuple" > which suggests to me that "from collections import dataclass" would be a more > natural sounding API addition. This might make sense if it were a single self contained function. But dataclasses are their own little ecosystem that warrants its own module namespace: >>> import dataclasses >>> dataclasses.__all__ ['dataclass', 'field', 'FrozenInstanceError', 'InitVar', 'fields', 'asdict', 'astuple', 'make_dataclass', 'replace'] Also, remember that dataclasses have a dual role as a data holder (which is collection-like) and as a generator of boilerplate code (which is more like functools.total_ordering). I support Eric's decision to make this a separate module. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
> On Dec 15, 2017, at 1:47 PM, Guido van Rossum wrote: > > On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger > wrote: > > > On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > > > Make it so. "Dict keeps insertion order" is the ruling. > > On Twitter, someone raised an interesting question. > > Is the guarantee just for 3.7 and later? Or will the blessing also cover 3.6 > where it is already true. > > The 3.6 guidance is to use OrderedDict() when ordering is required. As of > now, that guidance seems superfluous and may no longer be a sensible > practice. For example, it would be nice for Eric Smith when he does his 3.6 > dataclasses backport to not have to put OrderedDict back in the code. > > For 3.6 we can't change the language specs, we can just document how it works > in CPython. I don't know what other Python implementations do in their > version that's supposed to be compatible with 3.6 but I don't want to > retroactively declare them non-conforming. (However for 3.7 they have to > follow suit.) I also don't think that the "it stays ordered across deletions" > part of the ruling is true in CPython 3.6. FWIW, the regular dict does stay ordered across deletions in CPython3.6: >>> d = dict(a=1, b=2, c=3, d=4) >>> del d['b'] >>> d['b'] = 5 >>> d {'a': 1, 'c': 3, 'd': 4, 'b': 5} Here's are more interesting demonstration: from random import randrange, shuffle from collections import OrderedDict population = 100 s = list(range(population // 4)) shuffle(s) d = dict.fromkeys(s) od = OrderedDict.fromkeys(s) for i in range(50): k = randrange(population) d[k] = i od[k] = i k = randrange(population) if k in d: del d[k] del od[k] assert list(d.items()) == list(od.items()) The dict object insertion logic just appends to the arrays of keys, values, and hashvalues. When the number of usable elements decreases to zero (reaching the limit of the most recent array allocation), the dict is resized (compacted) left-to-right so that order is preserved. Here are some of the relevant sections from the 3.6 source tree: Objects/dictobject.c line 89: Preserving insertion order It's simple for combined table. Since dk_entries is mostly append only, we can get insertion order by just iterating dk_entries. One exception is .popitem(). It removes last item in dk_entries and decrement dk_nentries to achieve amortized O(1). Since there are DKIX_DUMMY remains in dk_indices, we can't increment dk_usable even though dk_nentries is decremented. In split table, inserting into pending entry is allowed only for dk_entries[ix] where ix == mp->ma_used. Inserting into other index and deleting item cause converting the dict to the combined table. Objects/dictobject.c::insertdict() line 1140: if (mp->ma_keys->dk_usable <= 0) { /* Need to resize. */ if (insertion_resize(mp) < 0) { Py_DECREF(value); return -1; } hashpos = find_empty_slot(mp->ma_keys, key, hash); } Objects/dictobject.c::dictresize() line 1282: PyDictKeyEntry *ep = oldentries; for (Py_ssize_t i = 0; i < numentries; i++) { while (ep->me_value == NULL) ep++; newentries[i] = *ep++; } > > I don't know what guidance to give Eric, because I don't know what other > implementations do nor whether Eric cares about being compatible with those. > IIUC micropython does not guarantee this currently, but I don't know if they > claim Python 3.6 compatibility -- in fact I can't find any document that > specifies the Python version they're compatible with more precisely than > "Python 3". I did a little research and here' what I found: "MicroPython aims to implement the Python 3.4 standard (with selected features from later versions)" -- http://docs.micropython.org/en/latest/pyboard/reference/index.html "PyPy is a fast, compliant alternative implementation of the Python language (2.7.13 and 3.5.3)." -- http://pypy.org/ "Jython 2.7.0 Final Released (May 2015)" -- http://www.jython.org/ "IronPython 2.7.7 released on 2016-12-07" -- http://ironpython.net/ So, it looks like your could say 3.6 does whatever CPython 3.6 already does and not worry about leaving other implementations behind. (And PyPy is actually ahead of us here, having compact and order-preserving dicts for quite a while). Cheers, Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
> On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > Make it so. "Dict keeps insertion order" is the ruling. On Twitter, someone raised an interesting question. Is the guarantee just for 3.7 and later? Or will the blessing also cover 3.6 where it is already true. The 3.6 guidance is to use OrderedDict() when ordering is required. As of now, that guidance seems superfluous and may no longer be a sensible practice. For example, it would be nice for Eric Smith when he does his 3.6 dataclasses backport to not have to put OrderedDict back in the code. Do you still have the keys to the time machine? Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New crash in test_embed on macOS 10.12
> On Dec 15, 2017, at 11:55 AM, Barry Warsaw wrote: > > I haven’t bisected this yet, but with git head, built and tested on macOS > 10.12.6 and Xcode 9.2, I’m seeing this crash in test_embed: > > == > FAIL: test_bpo20891 (test.test_embed.EmbeddingTests) > -- > Traceback (most recent call last): > File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line > 207, in test_bpo20891 >out, err = self.run_embedded_interpreter("bpo20891") > File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 59, > in run_embedded_interpreter >(p.returncode, err)) > AssertionError: -6 != 0 : bad returncode -6, stderr is 'Fatal Python error: > PyEval_SaveThread: NULL tstate\n\nCurrent thread 0x7fffcb58a3c0 (most > recent call first):\n' > > Seems reproducible across different machines (all running 10.12.6 and Xcode > 9.2), even after a make clean and configure. I don’t see the same failure on > Debian, and I don’t see the crashes on the buildbots. > > Can anyone verify? I saw this same test failure. After a "make distclean", it went away. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com