Re: [Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?
On Fri, Sep 14, 2018 at 6:08 PM Larry Hastings wrote: > I can suggest that, based on conversation from Carl, that adding the stat > calls back in costs you half the startup. So any mechanism where we're > talking to the disk _at all_ simply isn't going to be as fast. Is that cost for when the stat calls are done in parallel with the new loading mechanism? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register
There is much reason to believe it won't help, and you have given little reason to believe that it will help, or that you even properly understand how the flag mitigates Spectre in programs where it does help. I am not undermining your attempts for the sake of performance. I am trying to help you understand why no one is taking you seriously, and suggesting what you can do so that they will. One way is to explain why an unsandboxed interpreter needs to protect against a sandbox escape. On Sep 21, 2018 3:12 AM, "Wes Turner" wrote: I feel like you are actively undermining attempts to prevent exploitation of known vulnerabilities because the software in question is currently too slow. For a 4-8% performance penalty, we could just add the CFLAGS to the build now and not worry about it. I give up. On Friday, September 21, 2018, Franklin? Lee wrote: > On Thu, Sep 20, 2018 at 2:10 PM Wes Turner wrote: > > > > On Thursday, September 20, 2018, Stefan Ring > wrote: > >> > >> On Tue, Sep 18, 2018 at 8:38 AM INADA Naoki > wrote: > >> > >> > I think this topic should split to two topics: (1) Guard Python > >> > process from Spectre/Meltdown > >> > attack from other process, (2) Prohibit Python code attack other > >> > processes by using > >> > Spectre/Meltdown. > >> > >> (3) Guard Python from performance degradation by overly aggressive > >> Spectre "mitigation". > > > > > > > Spectre has the potential of having a greater impact on cloud > providers than Meltdown. Whereas Meltdown allows unauthorized applications > to read from privileged memory to obtain sensitive data from processes > running on the same cloud server, Spectre can allow malicious programs to > induce a hypervisor to transmit the data to a guest system running on top > of it. > > - Private SSL certs > > - Cached keys and passwords in non-zeroed RAM > > - [...] > > > > https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) > > It's true that the attacks should worry cloud providers. Doesn't that > mean that companies like Amazon, Microsoft (Steve), and Docker should > have done analyses on CPython's vulnerability to these exploits? > Has/should/can anyone officially representing Python contact the > companies and ask them? > > When I followed your quote to find the context, I found it uses, as > its source, a Forbes article. The source cited by THAT article is > Daniel Gruss, who was one of the researchers. Should someone from the > PSF contact the researchers? Steve says he spoke to some of them to > judge whether the proposed compiler flags would help, and decided > against it. > > Absent of expert input, here's my non-expert take: That quote requires > an OS-level fix. A Python program without the proper permissions can't > do such things unless there is a vulnerability with the OS, and it is > extremely unlikely for anyone to update Python for Spectre but not > update the OS (and they'd be screwed in any case). And even if there > is a vulnerability in the OS, maybe the way to exploit it is by using > arbitrary Python execution (which you need before you can TRY to use > Spectre) on this Python interpreter. You can then write a new binary > file and run THAT, and it will be fast enough. That's not something > you can fix about CPython. > > Also, (again with my understanding) the problem of Spectre and > Meltdown are that you can escape sandboxes and the like, such as the > user/kernel divide, or a software sandbox like that provided by a > JavaScript VM. For CPython to be "vulnerable" to these attacks, it > needs to have some kind of sandbox or protection to break out of. > Instead, we sometimes have sandboxes AROUND CPython (like Jupyter) or > WITHIN CPython. I don't see how it makes sense to talk about a sandbox > escape FOR CPython (yet). > > Your original post linked to a discussion about Linux using those > build flags. Linux is a kernel, and has such protections that can be > bypassed, so it has something to worry about. Malicious code can be > native code, which (to my understanding) will be fast enough to > exploit the cache miss time. Here's Google's article about the > retpoline and why it helps: > https://support.google.com/faqs/answer/7625886 > > As of yet, you have quoted passages that have little relevance to > interpreter devs, especially non-JIT interpreters, and you have linked > to entire articles for non-experts with little relevance to > interpreter devs. This doesn't show that you have any better of an > understanding than I have, which is less than the understanding that > some
Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register
On Thu, Sep 20, 2018 at 2:10 PM Wes Turner wrote: > > On Thursday, September 20, 2018, Stefan Ring wrote: >> >> On Tue, Sep 18, 2018 at 8:38 AM INADA Naoki wrote: >> >> > I think this topic should split to two topics: (1) Guard Python >> > process from Spectre/Meltdown >> > attack from other process, (2) Prohibit Python code attack other >> > processes by using >> > Spectre/Meltdown. >> >> (3) Guard Python from performance degradation by overly aggressive >> Spectre "mitigation". > > > > Spectre has the potential of having a greater impact on cloud providers > > than Meltdown. Whereas Meltdown allows unauthorized applications to read > > from privileged memory to obtain sensitive data from processes running on > > the same cloud server, Spectre can allow malicious programs to induce a > > hypervisor to transmit the data to a guest system running on top of it. > - Private SSL certs > - Cached keys and passwords in non-zeroed RAM > - [...] > > https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) It's true that the attacks should worry cloud providers. Doesn't that mean that companies like Amazon, Microsoft (Steve), and Docker should have done analyses on CPython's vulnerability to these exploits? Has/should/can anyone officially representing Python contact the companies and ask them? When I followed your quote to find the context, I found it uses, as its source, a Forbes article. The source cited by THAT article is Daniel Gruss, who was one of the researchers. Should someone from the PSF contact the researchers? Steve says he spoke to some of them to judge whether the proposed compiler flags would help, and decided against it. Absent of expert input, here's my non-expert take: That quote requires an OS-level fix. A Python program without the proper permissions can't do such things unless there is a vulnerability with the OS, and it is extremely unlikely for anyone to update Python for Spectre but not update the OS (and they'd be screwed in any case). And even if there is a vulnerability in the OS, maybe the way to exploit it is by using arbitrary Python execution (which you need before you can TRY to use Spectre) on this Python interpreter. You can then write a new binary file and run THAT, and it will be fast enough. That's not something you can fix about CPython. Also, (again with my understanding) the problem of Spectre and Meltdown are that you can escape sandboxes and the like, such as the user/kernel divide, or a software sandbox like that provided by a JavaScript VM. For CPython to be "vulnerable" to these attacks, it needs to have some kind of sandbox or protection to break out of. Instead, we sometimes have sandboxes AROUND CPython (like Jupyter) or WITHIN CPython. I don't see how it makes sense to talk about a sandbox escape FOR CPython (yet). Your original post linked to a discussion about Linux using those build flags. Linux is a kernel, and has such protections that can be bypassed, so it has something to worry about. Malicious code can be native code, which (to my understanding) will be fast enough to exploit the cache miss time. Here's Google's article about the retpoline and why it helps: https://support.google.com/faqs/answer/7625886 As of yet, you have quoted passages that have little relevance to interpreter devs, especially non-JIT interpreters, and you have linked to entire articles for non-experts with little relevance to interpreter devs. This doesn't show that you have any better of an understanding than I have, which is less than the understanding that some of the Python devs have, and much less than what Steve has. In short, it LOOKS like you don't know what you're talking about. If you have a different and deeper understanding of the problem, then you need to show it, and say why there is a problem for CPython specifically. Or find someone who can do that for you. > Here's one: > https://github.com/Eugnis/spectre-attack/blob/master/Source.c > > Is this too slow in CPython with: > - Coroutines (asyncio (tulip)) > - PyPy JIT * > - Numba JIT * > - C Extensions * > - Cython * > > * Not anyone here's problem. C extensions are obviously fast enough. I think most of the other starred examples are fast enough, but it's probably more subtle than I think and requires further analysis by their devs. I also think there's something important I'm still missing about what's required and what it can do. I don't see what coroutines have to do with it. Coroutines are still Python code, and they're subject to the GIL. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register
On Tue, Sep 18, 2018 at 2:40 AM INADA Naoki wrote: > > On Tue, Sep 18, 2018 at 7:08 AM Wes Turner wrote: > > > > To summarize: > > > > - CPython may be vulnerable to speculative execution vulnerabilities, but > > none are known. > > - In general, CPython is currently too slow for speculative execution > > exploitation to be practical. > > - Sandboxed, JIT'ed JS is not too slow for speculative execution > > exploitation to be practical > > - (Not otherwise discussed here: PyPy's sandboxed JIT may not be too > > slow for speculative execution exploitation to be practical.) > > > > As far as I know, execution speed is important for attacker, not victim. > In case of JavaScript, browser may load attacking code and run it while > user watching websites. > Browsers provides sandbox for JS, but attacker code may be able to > bypass the sandbox by Spectre or Meltdown. So browsers disabled > high precision timer until OSes are updated. > > This topic is totally unrelated to compiler options: these compiler options > doesn't prohibit running attacking code, it just guard branches from > branch target injection. > > Does my understanding collect? Why should we discuss about execution speed? According to this article, the malicious program needs to act in the amount of time it takes for the CPU to load a value from memory and invalidate a branch prediction: https://hackernoon.com/timing-is-everything-understanding-the-meltdown-and-spectre-attacks-5e1946e44f9f ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register
I believe this is the article Wes wanted to link to: https://www.thomas-krenn.com/en/wiki/Safety_instructions_for_Meltdown_and_Spectre On Mon, Sep 17, 2018 at 6:10 PM Wes Turner wrote: > > To summarize: > > - CPython may be vulnerable to speculative execution vulnerabilities, but > none are known. > - In general, CPython is currently too slow for speculative execution > exploitation to be practical. > - Sandboxed, JIT'ed JS is not too slow for speculative execution > exploitation to be practical > - (Not otherwise discussed here: PyPy's sandboxed JIT may not be too slow > for speculative execution exploitation to be practical.) I'm no security researcher, and I barely remember much about Spectre/Meltdown, but I think the idea is that, if Python takes about 2 milliseconds to run your code, then a difference of +- 10 microseconds is indistinguishable from noise. Try to write software that can learn to distinguish two similar computers using the running time of certain functions. Javascript can be crafted to get close enough to some C programs. ASM.js and WebAssembly might help. PyPy's need for mitigation is independent of CPython's. > - Because there is no exploit provided (or currently thought possible with > just CPython), this security-related dialogue is regarded as a nuisance. More than that. Steve says that he looked into it and decided there wasn't really anything to worry about. He believes that any exploit of it will also imply an easier exploit is possible. He also says that this particular fix won't really help. Nathaniel is annoyed because Spectre is tricky to understand, and he assumes you don't understand it as well as you think because you haven't shown him that you have the expertise to understand it. > Here's a good write-up: > Safety_instructions_for_Meltdown_and_Spectre But how does that apply to CPython? What specifically about CPython makes the interpreter vulnerable to the attack? Under what conditions would CPython be vulnerable to this attack, but not an easier attack of at least the same severity? The article I linked at the top of this email does not give advice for interpreter writers at all. It only says what end users and chip manufacturers ought to do. It is not relevant. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Looking for examples: proof that a list comp is a function
On Mon, May 14, 2018, 03:36 Chris Angelico wrote: > Guido has stated that this parallel is desired and important: > > result = [f(x) for x in iter if g(x)] > result = list(f(x) for x in iter if g(x)) > > Obviously the genexp has to be implemented with a nested function, > since there's no guarantee that it'll be iterated over in this way. > With current semantics, you can easily prove that a list comp is > implemented with a function by looking at how it interacts with other > scopes (mainly class scope), but Tim's proposal may change that. > > So I'm looking for examples that prove that a list comp is executed > inside an implicit function. Ideally, examples that are supported by > language guarantees, but something that's "CPython has done it this > way since 3.0" is important too. > > I'm aware of just two: the name lookup interaction that may be > changing, and the fact that there's an extra line in a traceback. And > the latter, as far as I know, is not guaranteed (and I doubt anyone > would care if it changed). Are there any other provable points? > Related to the traceback one: the extra stack frame shows up in a debugger, and a profiler counts the extra frame separately. The first often confuses me because I don't immediately see which frame I'm in just by seeing the line of code. There are odd interactions between `yield`/`yield from` and comprehensions that was discussed some months ago: "[Python-Dev] Tricky way of of creating a generator via a comprehension expression". Wait, is this a continuation of that discussion? > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendel wrote: > > I don't know if this is the right place to put this, > but I've found the following lines of code results in an incredibly long > processing time. > Perhaps it might be of use to someone. > > import re > pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$') > pat.match('/t/a-aa-aa-a-aa-aa-aa-aa-aa-aa./') (I think the correct place is python-list. python-dev is primarily for the developers of Python itself. python-ideas is for proposing new features and changes to the language. python-list is for general discussion. Bug reports and feature requests belong in https://bugs.python.org/ (where your post could also have gone).) The textbook regular expression algorithm (which I believe grep uses) runs in linear time with respect to the text length. The algorithm used by Perl, Java, Python, JavaScript, Ruby, and many other languages instead use a backtracking algorithm, which can run up to exponential time with respect to text length. This worst-case is in fact necessary (assuming P != NP): Perl allows (introduced?) backreferences, which are NP-hard[1]. Perl also added some other features which complicate things, but backreferences are enough. The user-level solution is to understand how regexes are executed, and to work around it. Here are library-level solutions for your example: 1. Perl now has a regex optimizer, which will eliminate some redundancies. Something similar can be added to Python, at first as a third-party library. 2. In theory, we can use the textbook algorithm when possible, and the backtracking algorithm when necessary. However, the textbook version won't necessarily be faster, and may take more time to create, so there's a tradeoff here. 3. To go even further, I believe it's possible to use the textbook algorithm for subexpressions, while the overall expression uses backtracking, internally iterating through the matches of the textbook algorithm. There's a series of articles by Russ Cox that try to get us back to the textbook (see [2]). He and others implemented the ideas in the C++ library RE2[3], which has Python bindings[4]. RE2 was made for and used on Google Code Search[5] (described in his articles), a (now discontinued) search engine for open-source repos which allowed regular expressions in the queries. You can get a whiff of the limitations of the textbook algorithm by checking out RE2's syntax[6] and seeing what features aren't supported, though some features may be unsupported for different reasons (such as being redundant syntax). - Backreferences and lookaround assertions don't have a known solution.[7] - Bounded repetition is only supported up to a limit (1000), because each possible repetition needs its own set of states. - Possessive quantifiers aren't supported. Greedy and reluctant quantifiers are. - Groups and named groups _are_ supported. See the second and third Russ Cox articles, with the term "submatch".[2] (Apologies: I am making up reference syntax on-the-fly.) [1] "Perl Regular Expression Matching is NP-Hard" https://perl.plover.com/NPC/ [2] "Regular Expression Matching Can Be Simple And Fast" https://swtch.com/~rsc/regexp/regexp1.html "Regular Expression Matching: the Virtual Machine Approach" https://swtch.com/~rsc/regexp/regexp2.html "Regular Expression Matching in the Wild" https://swtch.com/~rsc/regexp/regexp3.html "Regular Expression Matching with a Trigram Index" https://swtch.com/~rsc/regexp/regexp4.html [3] RE2: https://github.com/google/re2 [4] pyre2: https://github.com/facebook/pyre2/ Also see re2 and re3 on PyPI, which intend to be a drop-in replacement. re3 is a Py3-compatible fork of re2, which last updated in 2015. [5] https://en.wikipedia.org/wiki/Google_Code_Search [6] https://github.com/google/re2/wiki/Syntax [7] Quote: "As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported." https://github.com/google/re2/wiki/WhyRE2 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On Thu, Jan 26, 2017 at 4:13 PM, Sven R. Kunze wrote: > Hi folks, > > I recently refreshed regular expressions theoretical basics *indulging in > reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html > > However, reaching the chart in the lower third of the article, I saw Python > 2.4 measured against a naive Thompson matching implementation. And I was > surprised about how bad it performed compared to an unoptimized version of > an older than dirt algorithm. > From my perspective, I can say, that regular expressions might worth > optimizing especially for web applications (url matching usually uses > regexes) but also for other applications where I've seen many tight loops > using regexes as well. So, I am probing interest on this topic here. What I (think I) know: - Both re and regex use the same C backend, which is not based on NFA. - The re2 library, which the writer of that article made, allows capture groups (but only up to a limit) and bounded repetitions (up to a limit). - Perl has started to optimize some regex patterns. What I think: - The example is uncharacteristic of what people write, like URL matching. - But enabling naive code to perform well is usually a good thing. The fewer details newbies need to know to write code, the better. - It's possible for Python to optimize a large category of patterns, even if not all. - It's also possible to optimize large parts of patterns with backrefs. E.g. If there's a backref, the group that the backref refers to can still be made into an NFA. - To do the above, you'd need a way to generate all possible matches. - Optimization can be costly. The full NFA construction could be generated only upon request, or maybe the code automatically tries to optimize after 100 uses (like a JIT). This should only be considered if re2's construction really is costly. If people want NFAs, I think the "easiest" way is to use re2. Jakub Wilk mentioned it before, but here it is again. https://github.com/google/re2 re2 features: https://github.com/google/re2/wiki/Syntax Named references aren't supported, but I think that can be worked around with some renumbering. It's just a matter of translating the pattern. It also doesn't like lookarounds, which AFAIK are perfectly doable with NFAs. Looks like lookaheads and lookbehinds are hard to do without a lot of extra space (https://github.com/google/re2/issues/5). Facebook has a Python wrapper for re2. https://github.com/facebook/pyre2/ In a post linked to from this thread, Serhiy mentioned another Python wrapper for re2. This wrapper is designed to be like re, and should probably be the basis of any efforts. It's not been updated for 11 months, though. https://pypi.python.org/pypi/re2/ https://github.com/axiak/pyre2/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Check dict implementation details
On Sat, Oct 8, 2016 at 6:01 AM, Serhiy Storchaka wrote: > Since dict is ordered in CPython 3.6, it can be used instead of OrderedDict > in some places (e.g. for implementing simple limited caches). But since this > is implementation detail, it can't be used in the stdlib unconditionally. > Needed a way to check whether dict is ordered. > > Actually there are two levels of "ordering". > > 1. Dict without deletions is iterated in the order of adding items. > Raymond's original compact dict implementation satisfied this claim. > > 2. In addition the order is preserved after deletion operations. Naoki's > implementation satisfies this more strong claim. Sidenote: OrderedDict, unlike dict, is a sequential container (though not a Sequence), so order matters when doing comparisons, and OrderedDicts can be reverse-iterated. That might keep dict from replacing OrderedDict in some cases. Something to keep in mind if this topic is revisited. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
On Sep 14, 2016 8:29 AM, "Paul Moore" wrote: > > On 14 September 2016 at 13:18, Franklin? Lee > wrote: > > On Sep 9, 2016 1:35 AM, "Benjamin Peterson" wrote: > >> On Thu, Sep 8, 2016, at 22:33, Tim Delaney wrote: > >> > Are sets also ordered by default now? None of the PEPs appear to mention > >> > it. > >> > >> No. > > > > Is there anyone working to move sets in the same direction for 3.6? > > It won't happen for 3.6, as we're now in feature freeze. So it'd be > 3.7 at the earliest. > > What exactly do you mean by "in the same direction" anyway? Remember > that ordering isn't guaranteed (it's an implementation detail), so are > you just saying "can sets benefit from the improvements this change > provided to dicts"? If you *are* hoping for ordered sets, what's your > use case? (Dictionaries had particular use cases - retaining ordering > of keyword arguments and class definitions, the existence of > OrderedDict, ...) > > Paul I mean using a compact representation, if not an ordered one. I have no particular usecase in mind. As far as I understand the compact implementation, sets can do it just as well. The original discussion proposed trying to implement it for sets first. Like dict, they would (probably) use less memory, and would usually have a more readable (i.e. less jarring to read) print order. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
On Sep 9, 2016 1:35 AM, "Benjamin Peterson" wrote: > On Thu, Sep 8, 2016, at 22:33, Tim Delaney wrote: > > Are sets also ordered by default now? None of the PEPs appear to mention > > it. > > No. Is there anyone working to move sets in the same direction for 3.6? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict
On Jun 21, 2016 11:12 AM, "INADA Naoki" wrote: > > I'm sorry, but I hadn't realized which compact ordered dict is > not ordered for split table. > > For example: > >>> class A: > ... ... > ... > >>> a = A() > >>> b = A() > >>> a.a = 1 > >>> a.b = 2 > >>> b.b = 3 > >>> b.a = 4 > >>> a.__dict__.items() > dict_items([('a', 1), ('b', 2)]) > >>> b.__dict__.items() > dict_items([('a', 4), ('b', 3)]) > > > This doesn't affects to **kwargs and class namespace. > > But if we change the language spec to dict preserves insertion order, > this should be addressed. Is that really how it works? From my understanding of PEP 412, they should have different keysets, because one diverged in keys from the other at an intermediate step. Another idea (though it has several issues and seems like a step backward): a split-table dict can have a separate iteration list, indexing into the entry table. There are ways to share iteration lists, and make it so that adding the same keys in the same order each time results in the same iteration list each time, but this costs overhead. There might be ways of reducing the overhead, or the overhead might be replacing bigger overhead, but we should decide if the behavior is what we want in the first place. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Compact dict implementations (was: PEP 468
In the original discussion, I think they decided to reimplement set before dict. The original discussion is here, for anyone else: https://mail.python.org/pipermail/python-dev/2012-December/123028.html On Jun 18, 2016 3:15 AM, "INADA Naoki" wrote: > If builtin dict in both of PyPy and CPython is ordered, many people > will relying it. > It will force other Python implementations to implement it for compatibility. > In other words, it may be de-facto "Python Language", even if Python > Language spec > say it's an implementation detail. > > Is it OK? Ordered, or just initially ordered? I mean, "ordered if no deletion". They discussed scrambling the order. (Subdiscussion was here: https://mail.python.org/pipermail/python-dev/2012-December/123041.html) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
Compact OrderedDicts can leave gaps, and once in a while compactify. For example, whenever the entry table is full, it can decide whether to resize (and only copy non-gaps), or just compactactify Compact regular dicts can swap from the back and have no gaps. I don't see the point of discussing these details. Isn't it enough to say that these are solvable problems, which we can worry about if/when someone actually decides to sit down and implement compact dicts? P.S.: Sorry about the repeated emails. I think it was the iOS Gmail app. On Jun 13, 2016 10:23 PM, "Ethan Furman" wrote: > > On 06/13/2016 05:47 PM, Larry Hastings wrote: >> >> On 06/13/2016 05:05 PM, MRAB wrote: > > >>> This could be avoided by expanding the items to include the index of >>> the 'previous' and 'next' item, so that they could be handled like a >>> doubly-linked list. >>> >>> The disadvantage would be that it would use more memory. >> >> >> Another, easier technique: don't fill holes. Same disadvantage >> (increased memory use), but easier to write and maintain. > > > I hope this is just an academic discussion: suddenly having Python's dicts grow continuously is going to have nasty consequences somewhere. > > -- > ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 468
Eric, have you any work in progress on compact dicts? On Fri, Jun 10, 2016 at 12:54 PM, Eric Snow wrote: > On Thu, Jun 9, 2016 at 1:10 PM, Émanuel Barry wrote: >> As stated by Guido (and pointed out in the PEP): >> >> Making **kwds ordered is still open, but requires careful design and >> implementation to avoid slowing down function calls that don't benefit. >> >> The PEP has not been updated in a while, though. Python 3.5 has been >> released, and with it a C implementation of OrderedDict. >> >> Eric, are you still interested in this? > > Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6). > I'm certainly not opposed to someone picking up the banner. > > >> IIRC that PEP was one of the >> motivating use cases for implementing OrderedDict in C. > > Correct, though I'm not sure OrderedDict needs to be involved any more. > >> Maybe it's time for >> a second round of discussion on Python-ideas? > > Fine with me, though I won't have a lot of time in the 3.6 timeframe > to handle a high-volume discussion or push through an implementation. > > -eric > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview
On Jun 8, 2016 8:13 AM, "Paul Sokolovsky" wrote: > > Hello, > > On Wed, 8 Jun 2016 14:45:22 +0300 > Serhiy Storchaka wrote: > > [] > > > > $ ./run-bench-tests bench/bytealloc* > > > bench/bytealloc: > > > 3.333s (+00.00%) bench/bytealloc-1-bytes_n.py > > > 11.244s (+237.35%) bench/bytealloc-2-repeat.py > > > > If the performance of creating an immutable array of n zero bytes is > > important in MicroPython, it is worth to optimize b"\0" * n. > > No matter how you optimize calloc + something, it's always slower than > just calloc. `bytes(n)` *is* calloc + something. It's a lookup of and call to a global function. (Unless MicroPython optimizes away lookups for builtins, in which case it can theoretically optimize b"\0".__mul__.) On the other hand, b"\0" is a constant, and * is an operator lookup that succeeds on the first argument (meaning, perhaps, a successful branch prediction). As a constant, it is only created once, so there's no intermediate object created. AFAICT, the first requires optimizing global function lookups + calls, and the second requires optimizing lookup and *successful* application of __mul__ (versus failure + fallback to some __rmul__), and repetitions of a particular `bytes` object (which can be interned and checked against). That means there is room for either to win, depending on the efforts of the implementers. (However, `bytearray` has no syntax for literals (and therefore easy constants), and is a more valid and, AFAIK, more practical concern.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace
On Jun 7, 2016 8:52 PM, "Eric Snow" wrote: > * the default class *definition* namespace is now ``OrderdDict`` > * the order in which class attributes are defined is preserved in the By using an OrderedDict, names are ordered by first definition point, rather than location of the used definition. For example, the definition order of the following will be "x, y", even though the definitions actually bound to the name are in order "y, x". class C: x = 0 def y(self): return 'y' def x(self): return 'x' Is that okay? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] devinabox has moved to GitHub
It's just that I don't know whether any of them require particular versions. If you say the latest is fine, then okay. On Wed, May 25, 2016 at 1:37 PM, Brett Cannon wrote: > > > On Wed, 25 May 2016 at 10:24 Franklin? Lee > wrote: >> >> Should these notes come with version requirements/minimums? >> >> "OS X users should be told to download XCode from the Apple App Store >> ahead of time." >> "If new contributors think they may be doing C development, suggest >> the use of LLVM + clang as this provides better error reporting than >> gcc." >> "For Windows users, ask them to download and install Visual Studio >> Community edition ahead of time." > > > If you want to submit a PR to say "the latest" that's fine, but I don't know > if any of those tools really promote downloading older versions such that > one has to worry about it. > >> >> >> On Sun, May 8, 2016 at 4:41 PM, Brett Cannon wrote: >> > https://github.com/python/devinabox >> > >> > The single issue for devinabox has moved to its own issue tracker, so >> > there's no need to worry about those issues cluttering b.p.o in the >> > future. >> > I have made the Python core team I created on GitHub last week have >> > write >> > privileges and Nick and I as admins on the repository. I have also >> > turned on >> > the CLA bot for the repository (FYI >> > https://github.com/python/the-knights-who-say-ni has that bot's code). >> > >> > I've asked Georg, Antoine, and Benjamin to tell me what I need to do to >> > shut >> > off -- either by making it read-only or just deleting -- >> > hg.python.org/devinabox. >> > >> > Now that the migration has seriously begun, the next repos will be the >> > peps >> > and devguide (slightly more complicated thanks to needing to update >> > commands >> > for building their online versions). There's also the benchmarks repo, >> > but >> > that might not get migrated if we start from scratch (see the speed@ ML >> > about that). >> > >> > As for cpython, I've been asked to talk about it at the language summit >> > where I will start a conversation about what the minimal feature set >> > will >> > need to be to migrate. Once we have settled on what has to be in place >> > to >> > migrate the cpython repo then we can start working on that TODO list. >> > >> > ___ >> > Python-Dev mailing list >> > Python-Dev@python.org >> > https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> > >> > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com >> > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] devinabox has moved to GitHub
Should these notes come with version requirements/minimums? "OS X users should be told to download XCode from the Apple App Store ahead of time." "If new contributors think they may be doing C development, suggest the use of LLVM + clang as this provides better error reporting than gcc." "For Windows users, ask them to download and install Visual Studio Community edition ahead of time." On Sun, May 8, 2016 at 4:41 PM, Brett Cannon wrote: > https://github.com/python/devinabox > > The single issue for devinabox has moved to its own issue tracker, so > there's no need to worry about those issues cluttering b.p.o in the future. > I have made the Python core team I created on GitHub last week have write > privileges and Nick and I as admins on the repository. I have also turned on > the CLA bot for the repository (FYI > https://github.com/python/the-knights-who-say-ni has that bot's code). > > I've asked Georg, Antoine, and Benjamin to tell me what I need to do to shut > off -- either by making it read-only or just deleting -- > hg.python.org/devinabox. > > Now that the migration has seriously begun, the next repos will be the peps > and devguide (slightly more complicated thanks to needing to update commands > for building their online versions). There's also the benchmarks repo, but > that might not get migrated if we start from scratch (see the speed@ ML > about that). > > As for cpython, I've been asked to talk about it at the language summit > where I will start a conversation about what the minimal feature set will > need to be to migrate. Once we have settled on what has to be in place to > migrate the cpython repo then we can start working on that TODO list. > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Terminal console
On Apr 26, 2016 4:02 AM, "Paul Moore" wrote: > > On 25 April 2016 at 23:55, Franklin? Lee wrote: > > FWIW, Gmail's policies require: > [...] > > That link is currently the only obvious way to unsubscribe. > > I'm not sure why gmail's policies should apply to this list. They're Gmail's policies on how not to get your messages filtered by Gmail as spam. I am not clear on whether they're descriptive (i.e. users will mark you as spam) or prescriptive (i.e. Google's algorithms will determine that you're spam). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Terminal console
FWIW, Gmail's policies require: """ A user must be able to unsubscribe from your mailing list through one of the following means: * A prominent link in the body of an email leading users to a page confirming his or her unsubscription (no input from the user, other than confirmation, should be required). * By replying to your email with an unsubscribe request. """ (https://support.google.com/mail/answer/81126) That link is currently the only obvious way to unsubscribe. On Mon, Apr 25, 2016 at 6:12 PM, Zachary Ware wrote: > On Apr 25, 2016 17:08, "Brett Cannon" wrote: >> >> Good point. Hopefully that's all it was then. > > Is there any particular reason we include that link in python-dev emails? We > don't for any other list as far as I know. > > -- > Zach > (On a phone) > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python should be easily compilable on Windows with MinGW
For this particular case, is there someone generous enough (or, can someone apply for a PSF grant) to ship Mathieu a DVD/two/flash drive? On Feb 26, 2016 12:18 PM, "Mathieu Dupuy" wrote: > Hi. > I am currently working on adding some functionality on a standard > library module (http://bugs.python.org/issue15873). The Python part > went fine, but now I have to do the C counterpart, and I have ran into > in several problems, which, stacked up, are a huge obstacle to easily > contribute further. Currently, despite I could work, I can't go > further > on my patch. > > I am currently working in very limited network, CPU and time > ressources* which are quite uncommon in the western world, but are > much less in the rest of the world. I have a 2GB/month mobile data > plan and a 100KB/s speed. For the C part of my patch, I should > download Visual Studio. The Express Edition 2015 is roughly 9GB. I > can't afford that. > > I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > Fedora 23). Shortly, I couldn't get something working quickly and > simply (quickly = less than 2 hours, downloading time NOT included, > which is anyway way too already much). What went wrong and why it went > wrong could be a whole new thread and is outside of the scope of this > message. > Let me precise this : at my work I use many virtualbox instances > automatically fired and run in parallel to test new deployments and > run unittests. I like this tool, > but despite its simple look, it (most of the time) can not be used > simply by a profane. The concepts it requires you to understand are > not intuitive at first sight and there is *always* a thing that go > wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > shipped for a moment a broken version of mount.vboxsf, preventing > sharing folder to mount. Despite it's fixed, the broken releases > spread everywhere and you may encounter them a lot in various Ubuntu > and Virtualbox version. I downloaded the last versions of both and I > am yet infected. https://www.virtualbox.org/ticket/12879). I could do > whole new thread on why you can't ask newcomers to use Virtualbox > (currently, at least). > > I ran into is a whole patch set to make CPython compile on MinGW > (https://bugs.python.org/issue3871#msg199695). But it is not denying > it's very experimental, and I know I would again spent useless hours > trying to get it work rather than joyfully improving Python, and > that's exactly what I do not want to happen. > > Getting ready to contribute to CPython pure python modules from an > standard, average mr-everyone Windows PC for a beginner-to-medium > contributor only require few megabytes of internet and few minutes of his > time: getting a tarball of CPython sources (or cloning the github CPython > mirror)**, a basic text editor and msys-git. The step further, if doing > some -even basic- C code is required, implies downloading 9GB of Visual > Studio and countless hours for it to be ready to use. > I think downloading the whole Visual Studio suite is a huge stopper to > contribute further for an average medium-or-below-contributor. > > I think (and I must not be the only one since CPython is to be moved > to github), that barriers to contribute to CPython should be set to > the lowest. > Of course my situation is a bit special but I think it represents > daily struggle of a *lot* of non-western programmer (at least for > limited internet)(even here in Australia, landline limited internet > connections are very common). > It's not a big deal if the MinGW result build is twenty time slower or > if some of the most advanced modules can't be build. But everyone > programmer should be able to easily make some C hacks and get them to > work. > > Hoping you'll be receptive to my pleas, > Cheers > > > * I am currently picking fruits in the regional Australia. I live in a van > and have internet through with smartphone through an EDGE connection. I can > plug the laptop in the farm but not in the van. > ** No fresh programmer use mercurial unless he has a gun pointed on his > head. > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expression bytecode
I think it would be nice for manipulating (e.g. optimizing, possibly with JIT-like analysis) and comparing regexes. It can also be useful as a teaching tool, e.g. exercises in optimizing and comparing regexes. I think the discussion should be on python-ideas, though. On Feb 14, 2016 2:01 PM, "Jonathan Goble" wrote: > I'm new to Python's mailing lists, so please forgive me if I'm sending > this to the wrong list. :) > > I filed http://bugs.python.org/issue26336 a few days ago, but now I > think this list might be a better place to get discussion going. > Basically, I'd like to see the bytecode of a compiled regex object > exposed as a public (probably read-only) attribute of the object. > > Currently, although compiled in pure Python through modules > sre_compile and sre_parse, the list of opcodes is then passed into C > and copied into an array in a C struct, without being publicly exposed > in any way. The only way for a user to get an internal representation > of the regex is the re.DEBUG flag, which only produces an intermediate > representation rather than the actual bytecode and only goes to > stdout, which makes it useless for someone who wants to examine it > programmatically. > > I'm sure others can think of other potential use cases for this, but > one in particular would be that someone could write a debugger that > can allow a user to step through a regex one opcode at a time to see > exactly where it is failing. It would also perhaps be nice to have a > public constructor for the regex object type, which would enable users > to modify the bytecode and directly create a new regex object from it, > similar to what is currently possible through the types.FunctionType > and types.CodeType constructors. > > In addition to exposing the code in a public attribute, a helper > module written in Python similar to the dis module (which is for > Python's own bytecode) would be very helpful, allowing the code to be > easily disassembled and examined at a higher level. > > Is this a good idea, or am I barking up the wrong tree? I think it's a > great idea, but I'm open to being told this is a horrible idea. :) I > welcome any and all comments both here and on the bug tracker. > > Jonathan Goble > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev wrote: > (Also, either way, it seems more like a thread for -ideas than -dev...) I said this early on in this thread! Should I try to write up my idea as a single thing, instead of a bunch of responses, and post it in -ideas? Should I call them "parent scope" and "parent refcell"? On Fri, Dec 18, 2015 at 7:56 AM, Steven D'Aprano wrote: > I'm not quite sure about this. In principle, every name lookup looks in > four scopes, LEGB as you describe above: > > - locals > - non-locals, a.k.a. enclosing or lexical scope(s) > - globals (i.e. the module) > - builtins > > > although Python can (usually?) optimise away some of those lookups. The > relationship of locals to enclosing scopes, and to globals in turn, > involve actual nesting of indented blocks in Python, but that's not > necessarily the case. As I understand, L vs E vs GB is known at compile-time. That is, your exec example doesn't work for me in Python 3, because all names are scoped at compile-time. x = 5 def f(): exec('x = 111') print(x) f() #prints 5 print(x) #prints 5 This means that my idea only really works for GB lookups. > On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote: >> So, trying to generalize global vs. builtin to a general notion of >> "nested scope" that isn't necessary for builtins and doesn't work for >> anything else seems like overcomplicating things for no benefit. > > Well, putting aside the question of whether this is useful or not, and > putting aside efficiency concerns, let's just imagine a hypothetical > implementation where name lookups used ChainMaps instead of using > separate LOAD_* lookups of special dicts. Then a function could set up a > ChainMap: > > function.__scopes__ = ChainMap(locals, enclosing, globals, builtins) > > and a name lookup for (say) "x" would always be a simple: > > function.__scopes__["x"] > > Of course this would be harder to optimize, and hence probably slower, > than the current arrangement, This is where the ChainRefCell idea comes in. If a ChainRefCell is empty, it would ask its parent dicts for a value. If it finds a value in parent n, it would replace parent n with a refcell into parent n, and similarly for parents 0, 1, ... n-1. It won't need to do hash lookups in those parents again, while allowing for those parents to acquire names. (This means parent n+1 won't need to create refcells, so we don't make unnecessary refcells in `object` and `__builtin__`.) Unfortunately, classes are more complicated than nested scopes. 1. We skip MRO if we define classes as having their direct supers as parents. (Solution: Define classes as having all supers as parents, and make non-recursive Refcell.resolve() requests.) (Objects have their class as a parent, always.) 2. Classes can replace their bases. (I have some ideas for this, but see #3.) 3. I get the impression that attribute lookups are already pretty optimized. On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev wrote: > I think it kind of _has_ to optimize away, or at least tweak, some of those > things, rather than just acting as if globals and builtins were just two more > enclosing scopes. For example, global to builtins has to go through > globals()['__builtins__'], or act as if it does, or code that relies on, say, > the documented behavior of exec can be broken. It would or could, in my idea of __builtins__ being a parent scope of globals() (though I'm not sure whether it'd be the case for any other kind of nesting). Each refcell in globals() will hold a reference to __builtins__ (if they didn't successfully look it up yet) or to a refcell in __builtins__ (if there was once a successful lookup). Since globals() knows when globals()['__builtins__'] is modified, it can invalidate all its refcells' parent cells (by making them hold references to the new __builtins__). This will be O(len(table) + (# of refcells)), but swapping out __builtins__ shouldn't be something you keep doing. Even if it is a concern, I have More Ideas to remove the "len(table) +" (but with Raymond Hettinger's compact dicts, it wouldn't be necessary). It would be worse for classes, because it would require potentially many notifications. (But it would also save future lookups. And I have More Ideas.) This idea (of the owner dict "knowing" about its changed parent) also applies to general chained scopes, but flattenings like MRO would mess it up. Again, though, More Ideas. And more importantly, from what I understand of Victor's response, the current implementation would probably be efficient enough, or more efficient. > And you have to be able to modify the global scope after compile time and > have that modification be effective, which means you'd have to allow the same > things on locals and closures if they were to act the same. Not sure what you mean, but since I demand (possibly empty) refcells from
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 6:41 PM, Franklin? Lee wrote: > Then Descriptors are in the dict, so MIGHT benefit from refcells. The > memory cost might be higher, though. Might be worse than the savings, I mean. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
I already know that we can't use recursion, because it bypasses MRO. I'm also not yet sure whether it makes sense to use refs for classes in the first place. As I understood it, an attribute will resolve in this order: - __getattribute__ up the MRO. (raises AttributeError) - __dict__ up the MRO. (raises KeyError) - __getattr__ up the MRO. (raises AttributeError) My new understanding: - __getattribute__. (raises AttributeError) - (default implementation:) __dict__.__getitem__. (raises KeyError) - __getattr__ up the MRO. (raises AttributeError) If this is the case, then (the default) __getattribute__ will be making the repeated lookups, and might be the one requesting the refcells (for the ones it wants). Descriptors seem to be implemented as: Store a Descriptor object as an attribute. When a Descriptor is accessed, if it is being accessed from its owner, then unbox it and use its methods. Otherwise, it's a normal attribute. Then Descriptors are in the dict, so MIGHT benefit from refcells. The memory cost might be higher, though. On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert wrote: > On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev > wrote: >> >> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee >> wrote: >> >> >>> ... >>> as soon as I figure out how descriptors actually work... >> >> >> I think you need to learn what LOAD_ATTR and the machinery around it >> actually does before I can explain why trying to optimize it like >> globals-vs.-builtins doesn't make sense. Maybe someone who's better at >> explaining than me can come up with something clearer than the existing >> documentation, but I can't. > > I take that back. First, it was harsher than I intended. Second, I think I > can explain things. I appreciate it! Tracking function definitions in the source can make one want to do something else. > First, for non-attribute lookups: > > (Non-shared) locals just load and save from an array. > > Free variables and shared locals load and save by going through an extra > dereference on a cell object in an array. In retrospect, of course they do. It sounds like the idea is what's already used there, except the refs are synced to the locals array instead of a hash table. > Globals do a single dict lookup. A single dict lookup per function definition per name used? That's what I'm proposing. For example, (and I only just remembered that comprehensions and gen expressions create scope) [f(x) for x in range(1)] would look up the name `f` at most twice (once in globals(), once in builtins() if needed), and will always have the latest version of `f`. And if it's in a function, the refcell(s) would be saved by the function. > Builtins do two dict lookups. Two? > Class attributes (including normal methods, @property, etc.) do two or more > dict lookups--first the instance, then the class, then each class on the > class's MRO. Then, if the result has a __get__ method, it's called with the > instance and class to get the actual value. This is how bound methods get > created, property lookup functions get called, etc. The result of the > descriptor call can't get cached (that would mean, for example, that every > time you access the same @property on an instance, you'd get the same value). Yeah, I would only try to save in a dict lookup to get the descriptor, and I'm not sure it's worth it. (Victor's response says that class attributes are already efficient, though.) > So, if the globals->builtins optimization is worth doing, don't tie it to > another optimization that's much more complicated and less useful like this, > or we'll never get your simple and useful idea. Sure. I couldn't figure out where to even save the refcells for attributes, so I only really saw an opportunity for name lookups. Since locals and nonlocals don't require dict lookups, this means globals() and __builtin__. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 12:30 PM, Andrew Barnert wrote: > On Dec 17, 2015, at 07:38, Franklin? Lee > wrote: >> >> The nested dictionaries are only for nested scopes (and inner >> functions don't create nested scopes). Nested scopes will already >> require multiple lookups in parents. > > I think I understand what you're getting at here, but it's a really confusing > use of terminology. In Python, and in programming in general, nested scopes > refer to exactly inner functions (and classes) being lexically nested and > doing lookup through outer scopes. The fact that this is optimized at compile > time to FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at > function-creation time, and only global and name have to be resolved at the > last second doesn't mean that there's no scoping, or some other form of > scoping besides lexical. The actual semantics are LEGB, even if L vs. E vs. > GB and E vs. further-out E can be optimized. Oh, I've never actually read the Python scoping rules spelled out. I wasn't sure if there were other cases. The other two cases I thought of as "nesting" were: object to its class, and class to its superclasses. > Also, reading your earlier post, it sounds like you're trying to treat > attribute lookup as a special case of global lookup, only with a chain of > superclasses beyond the class instead of just a single builtins. But they're > totally different. Class lookup doesn't just look in a series of dicts, it > calls __getattribute__ which usually calls __getattr__ which may or may not > look in the __dict__s (which may not even exist) to find a descriptor and > then calls its __get__ method to get the value. You'd have to somehow handle > the case where the search only went through object.__getattribute__ and > __getattr__ and found a result by looking in a dict, to make a RefCell to > that dict which is marked in some way that says "I'm not a value, I'm a > descriptor you have to call each time", and then apply some guards that will > detect whether that class or any intervening class dict touched that key, > whether the MRO changed, whether that class or any intervening class added or > changed implementations f or __getatttibute__ or __getattr__, and probably more things I haven't thought of. What do those guards look like? (Also, you need a different set of rules to cache, and guard for, special method lookup--you could just ignore that, but I think those are the lookups that would benefit most from optimization.) Doesn't __getattr__ only get called if all the mro __dict__ lookups failed? I forgot about __getattribute__. That might be the point at which refs are optimized. As for descriptors versus RefCells, I'm guessing that can be resolved, as soon as I figure out how descriptors actually work... If descriptors don't modify the __dict__, then RefCells shouldn't get involved. If they do, then there's some unwrapping going on there, and RefCells should fit right in (though whether they'll improve anything is a different question). RefCells are just a shortcut for dict lookups. For guards, I think Victor Stinner's idea could supplement this. Alternatively, in my other email, I said there could be a rule of, "Create intermediate RefCells for anything BEFORE a successful lookup." So if we look in A, B, C, D, and find it in C, then we create and save RefCells in A, B, C, but not D (where D = object). This MIGHT result in a lot of intermediate RefCells, but I'd guess most things aren't looked up just once, and it's saying, "It's possible for B to gain member B.x and catch me on my way to C.x." > So, trying to generalize global vs. builtin to a general notion of "nested > scope" that isn't necessary for builtins and doesn't work for anything else > seems like overcomplicating things for no benefit. Probably. The globals() and __builtin__ case is simpler than the class case. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 11:50 AM, Victor Stinner wrote: > I don't understand how you plan to avoid guards. The purpose of guards > is to respect the Python semantic by falling back to the "slow" > bytecode if something changes. So I don't understand your idea of > avoiding completly guards. Again, that's why I guess that it's > unrelated to FAT Python... Yeah, I guess it is. Maybe we could've moved to python-ideas. As far as I understand, this concept can be put into CPython. (Note to self: Look up how PyPy handles repeated lookups.) >> Guards would also have an issue >> with nested scopes. You have a note on your website about it: >> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins) >> >> "The optimization is disabled when the builtin function is >> modified or if a variable with the same name is added to the global >> namespace of the function." > > FAT Python doesn't emit a specialized version if it requires a builtin > function, but a local variable with the same name is found. > > The check is done in the current function but also in the parent > namespaces, up to the global namespace. I'm talking about the static > analysis of the code. > > If the specialized version is built, a guard is created on the builtin > namespace and another on the global namespace. > > Sorry, I don't understand the "problem" you are referring to. Can you > maybe show an example of code where FAT Python doesn't work or where > you idea can help? If we have to look in scopes A, B, C in order, where the name is in C but never in B, and there's no nesting relationship between B and C. In that case, I do not create a refcell in B chained to C (because there's no relationship), so I keep doing lookups in B. That's the problem. For that, guards and versioning can prevent unnecessary lookups in B. Though I think a better solution might be: If a name is found in C, create empty refcells in B and A (i.e. in all previous dicts). (There's a nesting relationship between globals() and __builtin__, so that's fine.) >> RefCells "know" whether they're part of a living dict. (The dict marks >> them as such upon its death.) If they are not, they will decref their >> value upon their death. They do not hold a reference to their owner >> dict. > > The dict contains the list all of its "own" RefCell objects, right? It contains a table of pointers. The pointers are to RefCell objects or NULL. The refs table is exactly the same size as the internal hash table. This makes indexing it efficient: to find the pointer to a refcell, find the index of the key in the hash table, then use that SAME index on the refs table. You never need to find a refcell without also finding its hash index, so this is cheap. > In short, RefCell is a view on a single dict entry, to be able to > "watch" a dict entry without the cost of a lookup? And a RefCell can > be created, even if the dict entry doesn't exist right? My "implementation", which had nesting and recursion in mind, had a "create_if_none" parameter, which meant that the requester can ask for it to be created even if the key didn't exist in the table. Pre-creation is useful for functions which refer to globals() names before they're defined. No-creation is useful in... I can only think of nesting as a use (globals() -> __builtin__ shouldn't create empty cells in __builtin__). See `getref` in here: https://mail.python.org/pipermail/python-dev/2015-December/142489.html > Hum, creating many RefCell introduces an overhead somewhere. For > example, deleting a key has to update N RefCell objects linked to this > key, right? So del dict[x] takes O(number of RefCell), right? There are no "outside" updates, except when a dict moves to a different internal table or deletes its internal table. In that case, the dict has to move and update each exposed RefCell. For each dict, for each key, there is at most one RefCell. As long as the dict is alive, that RefCell will hold a pointer to the pointer to the value (enforced by the dict). When the dict dies, it makes the Refcell point to the object, and tells the RefCell it's free (so it's in charge of cleaning up its value). Dict entries look like this. typedef struct { /* Cached hash code of me_key. */ Py_hash_t me_hash; PyObject *me_key; PyObject *me_value; /* This field is only meaningful for combined tables */ } PyDictKeyEntry; The internal table (which the ref table will sync with) is an array of PyDictKeyEntrys. (Raymond Hettinger made a design with a smaller table, and the hash lookup would be into an array of indices. This would make synced metadata tables both easier and smaller. See https://mail.python.org/pipermail/python-dev/2012-December/123028.html and latest relevant discussion https://mail.python.org/pipermail/python-ideas/2015-December/037468.html ) The refcell will hold this: RefCell(&PyDictKeyEntry.me_value) A pointer to the field, not to the value itse
Re: [Python-Dev] Idea: Dictionary references
(Previous thread was here, by the way: https://mail.python.org/pipermail/python-dev/2015-December/142437.html) On Thu, Dec 17, 2015 at 8:48 AM, Steven D'Aprano wrote: > On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted: >> 2015-12-17 11:54 GMT+01:00 Franklin? Lee : > >> > Each function keeps an indirect, automagically updated >> > reference to the current value of the names they use, > > Isn't that a description of globals()? If you want to look up a name > "spam", you grab an indirect reference to it: > > globals()["spam"] > > which returns the current value of the name "spam". The *current value*. I'm proposing that we instead do this: spamref = globals().getref('spam') Every time we want to find the current, updated value of 'spam', we just do spam = spamref.resolve() which will skip the hash lookup and go directly to the value. >> > and will never need to look things up again.[*] > > How will this work? > > Naively, it sounds to me like Franklin is suggesting that on every > global assignment, the interpreter will have to touch every single > function in the module to update that name. A refcell holds a pointer to the location in the dict itself where the value pointer is. When the value is updated in the dict, the refcell will not need to be updated. My original proposal wanted to keep cells in the "real" dict, and update them. Like so: class RefCell: __slots__ = ['value'] class ScopeDict(dict): def __getitem__(self, key): value = super()[key].value #may raise if value is NULL: raise KeyError(key) return value def __setitem__(self, key, value): if key in super(): super()[key].value = value else: cell = super()[key] = RefCell() cell.value = value def __delitem__(self, key, value): cell = super()[key] #may raise if cell.value is NULL: raise KeyError(key) cell.value = NULL I realized later that this isn't necessary. Most dict operations don't need to know about the indirection, so I make the inner dict a normal dict (with a few more holes than normal). But this would show how you can avoid manual updates for references. > And besides, you *still* need to deal with the case that the name isn't > a global at all, but in the built-ins namespace. globals() to __builtin__ is a nesting relationship. At the bottom of the following email, I have a pseudocode implementation which knows how to deal with nested scopes. https://mail.python.org/pipermail/python-dev/2015-December/142489.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 6:53 AM, Victor Stinner wrote: > 2015-12-17 11:54 GMT+01:00 Franklin? Lee : >> My suggestion should improve *all* function calls which refer to >> outside names. > > Ok, I now think that you should stop hijacking the FAT Python thread. > I start a new thread. IMHO your dictionary reference idea is completly > unrelated to FAT Python. > > FAT Python is about guards and specialized bytecode. Yeah, maybe it's out of the scope of bytecode optimization. But I think this would make guards unnecessary, since once a name is found, there's a quick way to refer to it. >> Each function keeps an indirect, automagically updated >> reference to the current value of the names they use, and will never >> need to look things up again.[*] > > Indirections, nested dictionaries, creation of new "reference" > objects... IMHO you are going to have major implementation issues :-/ > The design looks *very* complex. I'm quite sure that you are going to > make namespace lookups *slower*. The nested dictionaries are only for nested scopes (and inner functions don't create nested scopes). Nested scopes will already require multiple lookups in parents. I think this is strictly an improvement, except perhaps in memory. Guards would also have an issue with nested scopes. You have a note on your website about it: (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins) "The optimization is disabled when the builtin function is modified or if a variable with the same name is added to the global namespace of the function." With a NestedRefCell, it would check globals() (a simple dereference and `pointer != NULL`) and then check __builtin__. If it finds it in __builtin__, it will save a reference to that. It will only do repeated lookups in __builtin__ if each of the previous lookups fail. As far as I know, these repeated lookups are already necessary, and anything that can be used to avoid them (e.g. guards) can be used for repeated failed lookups, too. For non-nested scopes, it will look things up once, costing an extra RefCell creation if necessary, and the only other costs are on resizing, deletion of the dict, and working with a larger dict in general. The important parts of the design is pretty much in the code that I posted. We keep an extra hash table for refs, and keep it the same size as the original hash table, so that we pay a single lookup cost to get the index in both. > It reminds me Python before the with statement and PyPy garbage > collector. Many applications relied on the exact behaviour of CPython > garbage collector. For example, they expected that a file is written > on disk when the last reference to the file object is destroyed. In > PyPy, it wasn't (it isn't) true, the write can be delayed. It should not affect the semantic. Things should still happen as they used to, as far as I can figure. Or at least as far as the rules of the interpreter are concerned. (That is, values might live a little longer in PyPy, but can't be forced to live longer than they were formerly allowed to.) > I guess that with all your complex machinery for dict lookups, The only cost to a normal getitem (again, other than from looking it up in a bigger dict) is to make sure the return value isn't NULL. The machinery is involved in function creation and resolution of names: On function creation, get refs to each name used. When the name is used, try to resolve the refs. >you > will have similar issues of object lifetime. It's unclear to me when > and how "reference" objects are destroyed, nor when dict values are > destroyed. RefCells are ref-counted PyObjects. That is not an issue. A RefCell will live while it is useful (= it has an outside reference) or while it's not useful but its owner dict hasn't been resized/deleted yet (at which time RefCells without outside references will be deleted). RefCells "know" whether they're part of a living dict. (The dict marks them as such upon its death.) If they are not, they will decref their value upon their death. They do not hold a reference to their owner dict. If it's part of a living dict, we have a choice: the dict can be responsible for deletion, or the RefCell can be responsible for deletion. It doesn't really matter which design we go with. > What happens if a dict key is removed and a reference object is still > alive? Is the dict value immediatly destroyed? Does the reference > object contain a strong or a weak reference to the value? If a dict key is removed, the inner dict will still have the key (which is true in the current implementation), but the value will be decref'd and the value pointer will be NULL'd. The RefCell will not need
Re: [Python-Dev] Third milestone of FAT Python
On Wed, Dec 16, 2015 at 2:01 AM, Victor Stinner wrote: > Le mercredi 16 décembre 2015, Franklin? Lee > a écrit : >> >> I am confident that the time overhead and the savings will beat the >> versioning dict. The versioning dict method has to save a reference to >> the variable value and a reference to the name, and regularly test >> whether the dict has changed. > > > The performance of guards matters less than the performance of regular usage > of dict. If we have to make a choice, I prefer "slow" guard but no impact on > regular dict methods. It's very important that enabling FAT mode doesn't > kill performances. Remember that FAT python is a static optimizer and so can > only optimize some patterns, not all Python code. > > In my current implementation, a lookup is only needed when aguard is checked > if the dict was modified. The dict version doesn't change if a mutable > object was modified in place for example. I didn't benchmark, but I expect > that the lookup is avoided in most cases. You should try FAT python and > implement statistics before going too far in your idea. My suggestion should improve *all* function calls which refer to outside names. Each function keeps an indirect, automagically updated reference to the current value of the names they use, and will never need to look things up again.[*] There wouldn't be a need to save global names as default arguments (`def f(x, list=list):`). [*]: Not exactly true. Nested scopes can cause an issue. I'm not sure what happens if you redefine the __builtin__ name after these functions are defined. My suggestion would not know that the __builtin__ was switched out, since it saves a ref into the original __builtin__. I'm not sure about how to deal with nested scopes (for example, class inheritance). I think the "chained RefCells" idea works. Chained Refcell idea: There are three cases where we can have nested scopes: 1. An object's __dict__ nests its class. 2. A class's __dict__ nests its superclasses. 3. globals() nests __builtin__. 4? Does a package nest its modules? 5?. Does `from module import *` nest the module's __dict__? (Nonlocal variables in nested functions, and probably nested classes, are not a case of nested scope, since scope of each name is determined during compilation of the function/class.) RefCells of nested scopes will have a pointer to their value (or NULL), and an array/linkedlist of pointers to refcells from their parent dicts (or to their parent dicts if a refcell wasn't successfully acquired yet). When you request a RefCell from a nested Scope, it will return its value if it exists. Otherwise, it requests refcells from each parent (parents won't create refcells unless there's a value) until it gets one. When you ask a RefCell to resolve, it will check its own value, then ask each parent for a value (creating intermediate refcells *if* value exist). It will not need to do lookups in parents if it got a refcell before (though the refcell might be null). Problem: If you have class A(B, C), and you resolve a refcell for a name which exists in C but not B, you will look things up through B's dict every single time. It will fail, every single time. We can't skip B, since B is allowed to get such a name later, but I don't want to add refs to names that might never exist. This can be solved either through versioning or by checking whether a dict is read-only (such as for built-in types). In fact, in the code I wrote at the end of this email, RefCell.resolve() might even look things up in a shared ancestor multiple times. However, this would be incorrect anyway, since it doesn't respect Python's MRO resolution. So we can just fix that. RefCell.resolve would need a `search_parents: bool` parameter. >> I've read it again. By subclass, I mean that it implements the same >> interface. But at the C level, I want to have it be a fork(?) of the >> current dict implementation. As for `exec`, I think it might be okay >> for it to be slower at the early stages of this game. > > > Be careful, dict methods are hardcoded in the C code. If your type is not a > subtype, there is risk of crashes. Not exactly, and this is important. Many functions are called via pointer. It's like C++'s virtual methods, but more dynamic, since they can be changed per object. See https://github.com/python/cpython/blob/master/Objects/dict-common.h#L17. For example, the lookup method for str-only dicts swaps itself for a general object lookup method if it finds a non-string key. See https://github.com/python/cpython/blob/master/Objects/dictobject.c#L549. I'm now suggesting that there be additional space in the dict object itself to hold more function pointers (get_ref and get_hash_table_index), which would slight
Re: [Python-Dev] Third milestone of FAT Python
I realized yet another thing, which will reduce overhead: the original array can store values directly, and you maintain the refs by repeatedly updating them when moving refs around. RefCells will point to a pointer to the value cell (which already exists in the table). - `getitem` will be almost the same as a normal dict: it has to check if value is valid (which it already checked, but it will be invalid a lot more often). - `setitem` the same as a normal dict (since the RefCells will just point to the _address_ of the value pointer in the main table), except that the dict will be bigger, and compaction/expansion has more overhead. No creation of refcells here. - `delitem` will just null the value pointer, which shouldn't cost much more, if it doesn't cost less. - Expansion and compaction will cost more, since we have to copy over the RefCell pointers and also check whether they should be copied. - Deletion of the dict will cost more, due to the additional logic for deciding what to delete, and the RefCells can no longer point into the entry table. They would have to point at the value (requiring logic, or the replacement of a function pointer) or at a new allocated object (requiring an allocation of sizeof(PyObject*) bytes). On Tue, Dec 15, 2015 at 5:38 PM, Victor Stinner wrote: > Sorry, I didn't read carefully your email, but I don't think that it's > acceptable to make Python namespaces slower. In FAT mode, we need > versionned dictionaries for module namespace, type namespace, global > namespace, etc. It was actually more "it might be a problem" than "it will be a problem". I don't know if the overhead will be high enough to worry about. It might be dominated by whatever savings there would be by not having to look up names more than once. (Unless builtins get mixed with globals? I think that's solvable, though. It's just that the solutions I can think of have different tradeoffs.) I am confident that the time overhead and the savings will beat the versioning dict. The versioning dict method has to save a reference to the variable value and a reference to the name, and regularly test whether the dict has changed. This method only has to save a reference to a reference to the value (though it might need the name to allow debugging), doesn't care if it's changed, will be an identity (to NULL?) test if it's deleted (and only if it's not replaced after), and absolutely doesn't care if the dict had other updates (which might increase the version number). >>> Do you have an estimation of the cost of the "extra pointer"? Impact >>> on memory and CPU. dict is really a very important type for the >>> performance of Python. If you make dict slower, I'm sure that Python >>> overall will be slower. >> >> I'm proposing it as a subclass. > > Please read the "Versionned dictionary" section of my email: > https://mail.python.org/pipermail/python-dev/2015-December/142397.html > > I explained why using a subclass doesn't work in practice. I've read it again. By subclass, I mean that it implements the same interface. But at the C level, I want to have it be a fork(?) of the current dict implementation. As for `exec`, I think it might be okay for it to be slower at the early stages of this game. Here's the lookup function for a string-only dict (used both for setting and getting): https://github.com/python/cpython/blob/master/Objects/dictobject.c#L443 I want to break that up into two parts: - Figure out the index of the {hash, *key, *val} entry in the array. - Do whatever to it. (In the original: make *value_addr point to the value pointer.) I want to do this so that I can use that index to point into ANOTHER array, which will be the metadata for the refcells (whatever it ends up being). This will mean that there's no second lookup. This has to be done at the C level, because the dict object doesn't expose the index of the {hash, *key, *val} entries on lookup. If you don't want to make it a subclass, then we can propose a new function `get_ref` (or something) for dict's C API (probably a hard sell), which returns RefCell objects, and an additional pointer in `dict` to the RefCells table (so a total of two pointers). When `get_ref` is first called, it will - calloc the RefCell table (which will be the same length as the entry table) - replace all of the dict's functions with ones that know how to deal with the RefCells, - replace itself with a function that knows how to return these refs. - call its replacement. If the dict never gets RefCells, you only pay a few pointers in size, and a few creation/deletion values. This is possible now that the dictionary itself will store values as normal. There might be more necessary. For example, the replaced functions might need to keep pointers to their originals (so that you can slip additional deep C subclasses in). And it might be nice if the `get_index` function could be internally relied upon by the C-level subclasses, because "keeping a metadata ta
Re: [Python-Dev] Third milestone of FAT Python
More thoughts. (Stealing your style of headers.) Just store a pointer to value = Instead of having the inner dict store k_v pairs. In C, the values in our hash tables will be: struct refcell{ PyObject *value; // NULL if deleted }; It's not necessary to store the key. I think I only had it so I could mark it None in the Python implementation, to denote a deleted key. But a deleted entry could just have `cell.value is ScopeDict.NULL` (C: cell.value == NULL). The scope dict will own all values which don't have exposed references, and all exposed references (which own the value they reference). (Alternatively, store the value directly in the hash table. If something asks for a reference to it, replace the value with a PyObject that refers to it, and flag that entry in the auxilary hash table.) This might be what PyCellObject is, which is how I realized that I didn't need the key: https://docs.python.org/3.5/c-api/cell.html Deleting from scope === When deleting a key, don't remove the key from the inner dict, and just set the reference to NULL. Outside code should never get the raw C `refcell`, only a Python object. This makes it possible to clean up unused references when the dict expands or contracts: for each `refcell`, if it has no Pair object or its Pair object is not referenced by anything else, and if its value is NULL (i.e. deleted), don't store it in the new hash table. Get pairs before their keys are defined === When the interpreter compiles a function, it can request references which _don't exist yet_. The scope dict would simply create the entry in its inner dict and fill it in when needed. That means that each name only needs to be looked up when a function is created. scope = Scopedict() f = scope.ref('f') scope['f'] = float f.value('NaN') This would be a memory issue if many functions are created with typo'd names. But - You're not making a gigantic amount of functions in the first place. - You'll eventually remove these unused entries when you resize the inner dict. (See previous section.) I was concerned about which scope would be responsible for creating the entry, but it turns out that if you use a name in a function, every use of that name has to be for the same scope. So the following causes a NameError: def f(): def g(x): return abs(x) for i in range(30): print(g(i)) if i == 10: def abs(x): return "abs" + str(x) if d == 20: del abs and you can tell which scope to ask for the reference during function compilation. Does this simplify closures? (I haven't yet looked at Python's closure implementation.) A refcell (C struct) will be exposed as a RefCell (PyObject), which owns it. This means that RefCell is reference-counted, and if something saved a reference to it, it will persist even after its owning dict is deleted. Thus, when a scope dict is deleted, each refcell without a RefCell object is deleted (and its value is DecRef'd), and the other ones just have their RefCell object decrement a reference. Then closures are free: each inner function has refcounted references to the cells that it uses, and it doesn't need to know whether its parent is alive. (The implementation of closures involves cell objects.) Overhead If inner functions are being created a lot, that's extra work. But I guess you should expect a lot of overhead if you're doing such a thing. Readonly refs = It might be desirable to have refs that are allowed to write (e.g. from `global` and `nonlocal`) and refs that aren't. The CellObject would just hold a count for the number of writing refs, separate from the number of refs. This might result in some optimizations for values with no writing refs. For example, it's possible to implement copying of dicts as a shallow copy if it's KNOWN that any modification would result in a call to its set/del functions, which would initiate a deep copy. On Tue, Dec 15, 2015 at 3:29 PM, Victor Stinner wrote: > 2015-12-15 12:23 GMT+01:00 Franklin? Lee : >> I was thinking (as an alternative to versioning dicts) about a >> dictionary which would be able to return name/value pairs, which would >> also be internally used by the dictionary. This would be way less >> sensitive to irrelevant changes in the scope dictionary, but cost an >> extra pointer to each key. > > Do you have an estimation of the cost of the "extra pointer"? Impact > on memory and CPU. dict is really a very important type for the > performance of Python. If you make dict slower, I'm sure that Python > overall will be slower.
[Python-Dev] Third milestone of FAT Python
On Sat, Dec 04, 2015 at 7:49 AM, Victor Stinner wrote: > Versionned dictionary > = > > In the previous milestone of FAT Python, the versionned dictionary was a > new type inherited from the builtin dict type which added a __version__ > read-only (global "version" of dictionary, incremented at each change), > a getversion(key) method (version of a one key) and it added support for > weak references. I was thinking (as an alternative to versioning dicts) about a dictionary which would be able to return name/value pairs, which would also be internally used by the dictionary. This would be way less sensitive to irrelevant changes in the scope dictionary, but cost an extra pointer to each key. Here's how it would work: pair = scope.item(name) scope[name] = newval assert pair.value is newval assert pair.key is name assert pair is scope.item(name) # Alternatively, to only create pair objects when `item` is called, have `==` compare the underlying pair. del scope[name] assert pair.key is None # name-dicts can't have `None` keys assert pair.value is None # Alternatively, pair.value is scope.NULL This dict will allow one to hold references to its entries (with the caller promising not to change them, enforced by exceptions). You won't have to keep looking up keys (unless the name is deleted), and functions are allowed to change. For inlining, you can detect whether the function has been redefined by testing the saved pair.value against the saved function, and go into the slow path if needed (or recompile the inlining). I am not sure whether deleting from the dict and then readding the same key should reuse the pair container. I think the only potential issue for the Python version is memory use. There aren't going to be THAT many names being deleted, right? So I say that deleted things in the scope dict should not be removed from the inner dict. I predict that this will simplify a lot of other things, especially when deleting and readding the same name: if you save a pair, and it becomes invalid, you don't have to do another lookup to make sure that it's REALLY gone. If memory is a real concern, deleted pairs can be weakrefed (and saved in a second dict?) until they are reused. This way, pairs which aren't saved by something outside will be removed. For implementation, a Python implementation of the idea has probably already been done. Here are some details: - set: Internally store d._d[k] = k,v. - get: Reject k if d._d[k].key is None. (Names must be strings.) - del: Set d._d[k].key = None and .val = d.NULL to invalidate this entry. For the CPython version, CPython's dict already stores its entries as PyDictKeyEntry (hash, *key, *value), but those entries can move around on resizing. Two possible implementations: 1. Fork dict to store {hash, *kv_pair}. 2. Use an inner dict (like in the Python suggestion) where values are kv_pair. Write the indirection code in C, because scope dicts must be fast. For exposing a pair to Python code, here are two possibilities: 1. Make them Python objects in the first place. 2. Keep a second hash table in lockstep with the first (so that you can do a lookup to find the index in the first, and then use that same index with the second). In this table, store pair objects that have been created. (They can be weakrefed, as before. Unless it's impossible to weakref something you're returning?) This will save memory for pairs that aren't ever exposed. If compact dictionaries are implemented, the second hash table will be a second array instead. To use this kind of scopedict, functions would have to store a list of used names, which is memory overhead. But for what you're doing, some overhead will be necessary anyway. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python semantic: Is it ok to replace not x == y with x != y? (no)
On Tue, Dec 15, 2015 at 8:04 AM, Victor Stinner wrote: > Is it expected that "not x.__eq__(y)" can be different than > "x.__ne__(y)"? Is it part of the Python semantic? In Numpy, `x != y` returns an array of bools, while `not x == y` creates an array of bools and then tries to convert it to a bool, which fails, because a non-singleton Numpy array is not allowed to be converted to a bool. But in the context of `if`, both `not x == y` and `x != y` will fail. >From the docs, on implementing comparison: https://docs.python.org/3/reference/datamodel.html#object.__ne__ """ By default, __ne__() delegates to __eq__() and inverts the result unless it is NotImplemented. There are no other implied relationships among the comparison operators, for example, the truth of (xhttps://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com