from:"Franklin\? Lee"

Re: [Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?

2018-09-24 Thread Franklin? Lee

On Fri, Sep 14, 2018 at 6:08 PM Larry Hastings  wrote:
> I can suggest that, based on conversation from Carl, that adding the stat 
> calls back in costs you half the startup.  So any mechanism where we're 
> talking to the disk _at all_ simply isn't going to be as fast.

Is that cost for when the stat calls are done in parallel with the new
loading mechanism?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register

2018-09-21 Thread Franklin? Lee

There is much reason to believe it won't help, and you have given little
reason to believe that it will help, or that you even properly understand
how the flag mitigates Spectre in programs where it does help.

I am not undermining your attempts for the sake of performance. I am trying
to help you understand why no one is taking you seriously, and suggesting
what you can do so that they will. One way is to explain why an unsandboxed
interpreter needs to protect against a sandbox escape.

On Sep 21, 2018 3:12 AM, "Wes Turner"  wrote:

I feel like you are actively undermining attempts to prevent exploitation
of known vulnerabilities because the software in question is currently too
slow.

For a 4-8% performance penalty, we could just add the CFLAGS to the build
now and not worry about it.

I give up.


On Friday, September 21, 2018, Franklin? Lee 
wrote:

> On Thu, Sep 20, 2018 at 2:10 PM Wes Turner  wrote:
> >
> > On Thursday, September 20, 2018, Stefan Ring 
> wrote:
> >>
> >> On Tue, Sep 18, 2018 at 8:38 AM INADA Naoki 
> wrote:
> >>
> >> > I think this topic should split to two topics: (1) Guard Python
> >> > process from Spectre/Meltdown
> >> > attack from other process, (2) Prohibit Python code attack other
> >> > processes by using
> >> > Spectre/Meltdown.
> >>
> >> (3) Guard Python from performance degradation by overly aggressive
> >> Spectre "mitigation".
> >
> >
> > > Spectre has the potential of having a greater impact on cloud
> providers than Meltdown. Whereas Meltdown allows unauthorized applications
> to read from privileged memory to obtain sensitive data from processes
> running on the same cloud server, Spectre can allow malicious programs to
> induce a hypervisor to transmit the data to a guest system running on top
> of it.
> > - Private SSL certs
> > - Cached keys and passwords in non-zeroed RAM
> > - [...]
> >
> > https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
>
> It's true that the attacks should worry cloud providers. Doesn't that
> mean that companies like Amazon, Microsoft (Steve), and Docker should
> have done analyses on CPython's vulnerability to these exploits?
> Has/should/can anyone officially representing Python contact the
> companies and ask them?
>
> When I followed your quote to find the context, I found it uses, as
> its source, a Forbes article. The source cited by THAT article is
> Daniel Gruss, who was one of the researchers. Should someone from the
> PSF contact the researchers? Steve says he spoke to some of them to
> judge whether the proposed compiler flags would help, and decided
> against it.
>
> Absent of expert input, here's my non-expert take: That quote requires
> an OS-level fix. A Python program without the proper permissions can't
> do such things unless there is a vulnerability with the OS, and it is
> extremely unlikely for anyone to update Python for Spectre but not
> update the OS (and they'd be screwed in any case). And even if there
> is a vulnerability in the OS, maybe the way to exploit it is by using
> arbitrary Python execution (which you need before you can TRY to use
> Spectre) on this Python interpreter. You can then write a new binary
> file and run THAT, and it will be fast enough. That's not something
> you can fix about CPython.
>
> Also, (again with my understanding) the problem of Spectre and
> Meltdown are that you can escape sandboxes and the like, such as the
> user/kernel divide, or a software sandbox like that provided by a
> JavaScript VM. For CPython to be "vulnerable" to these attacks, it
> needs to have some kind of sandbox or protection to break out of.
> Instead, we sometimes have sandboxes AROUND CPython (like Jupyter) or
> WITHIN CPython. I don't see how it makes sense to talk about a sandbox
> escape FOR CPython (yet).
>
> Your original post linked to a discussion about Linux using those
> build flags. Linux is a kernel, and has such protections that can be
> bypassed, so it has something to worry about. Malicious code can be
> native code, which (to my understanding) will be fast enough to
> exploit the cache miss time. Here's Google's article about the
> retpoline and why it helps:
> https://support.google.com/faqs/answer/7625886
>
> As of yet, you have quoted passages that have little relevance to
> interpreter devs, especially non-JIT interpreters, and you have linked
> to entire articles for non-experts with little relevance to
> interpreter devs. This doesn't show that you have any better of an
> understanding than I have, which is less than the understanding that
> some

Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register

2018-09-20 Thread Franklin? Lee

On Thu, Sep 20, 2018 at 2:10 PM Wes Turner  wrote:
>
> On Thursday, September 20, 2018, Stefan Ring  wrote:
>>
>> On Tue, Sep 18, 2018 at 8:38 AM INADA Naoki  wrote:
>>
>> > I think this topic should split to two topics: (1) Guard Python
>> > process from Spectre/Meltdown
>> > attack from other process, (2) Prohibit Python code attack other
>> > processes by using
>> > Spectre/Meltdown.
>>
>> (3) Guard Python from performance degradation by overly aggressive
>> Spectre "mitigation".
>
>
> > Spectre has the potential of having a greater impact on cloud providers 
> > than Meltdown. Whereas Meltdown allows unauthorized applications to read 
> > from privileged memory to obtain sensitive data from processes running on 
> > the same cloud server, Spectre can allow malicious programs to induce a 
> > hypervisor to transmit the data to a guest system running on top of it.
> - Private SSL certs
> - Cached keys and passwords in non-zeroed RAM
> - [...]
>
> https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)

It's true that the attacks should worry cloud providers. Doesn't that
mean that companies like Amazon, Microsoft (Steve), and Docker should
have done analyses on CPython's vulnerability to these exploits?
Has/should/can anyone officially representing Python contact the
companies and ask them?

When I followed your quote to find the context, I found it uses, as
its source, a Forbes article. The source cited by THAT article is
Daniel Gruss, who was one of the researchers. Should someone from the
PSF contact the researchers? Steve says he spoke to some of them to
judge whether the proposed compiler flags would help, and decided
against it.

Absent of expert input, here's my non-expert take: That quote requires
an OS-level fix. A Python program without the proper permissions can't
do such things unless there is a vulnerability with the OS, and it is
extremely unlikely for anyone to update Python for Spectre but not
update the OS (and they'd be screwed in any case). And even if there
is a vulnerability in the OS, maybe the way to exploit it is by using
arbitrary Python execution (which you need before you can TRY to use
Spectre) on this Python interpreter. You can then write a new binary
file and run THAT, and it will be fast enough. That's not something
you can fix about CPython.

Also, (again with my understanding) the problem of Spectre and
Meltdown are that you can escape sandboxes and the like, such as the
user/kernel divide, or a software sandbox like that provided by a
JavaScript VM. For CPython to be "vulnerable" to these attacks, it
needs to have some kind of sandbox or protection to break out of.
Instead, we sometimes have sandboxes AROUND CPython (like Jupyter) or
WITHIN CPython. I don't see how it makes sense to talk about a sandbox
escape FOR CPython (yet).

Your original post linked to a discussion about Linux using those
build flags. Linux is a kernel, and has such protections that can be
bypassed, so it has something to worry about. Malicious code can be
native code, which (to my understanding) will be fast enough to
exploit the cache miss time. Here's Google's article about the
retpoline and why it helps:
https://support.google.com/faqs/answer/7625886

As of yet, you have quoted passages that have little relevance to
interpreter devs, especially non-JIT interpreters, and you have linked
to entire articles for non-experts with little relevance to
interpreter devs. This doesn't show that you have any better of an
understanding than I have, which is less than the understanding that
some of the Python devs have, and much less than what Steve has. In
short, it LOOKS like you don't know what you're talking about. If you
have a different and deeper understanding of the problem, then you
need to show it, and say why there is a problem for CPython
specifically. Or find someone who can do that for you.

> Here's one:
> https://github.com/Eugnis/spectre-attack/blob/master/Source.c
>
> Is this too slow in CPython with:
> - Coroutines (asyncio (tulip))
> - PyPy JIT *
> - Numba JIT *
> - C Extensions *
> - Cython *
>
> * Not anyone here's problem.

C extensions are obviously fast enough. I think most of the other
starred examples are fast enough, but it's probably more subtle than I
think and requires further analysis by their devs. I also think
there's something important I'm still missing about what's required
and what it can do.

I don't see what coroutines have to do with it. Coroutines are still
Python code, and they're subject to the GIL.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register

2018-09-18 Thread Franklin? Lee

On Tue, Sep 18, 2018 at 2:40 AM INADA Naoki  wrote:
>
> On Tue, Sep 18, 2018 at 7:08 AM Wes Turner  wrote:
> >
> > To summarize:
> >
> > - CPython may be vulnerable to speculative execution vulnerabilities, but 
> > none are known.
> > - In general, CPython is currently too slow for speculative execution 
> > exploitation to be practical.
> >   - Sandboxed, JIT'ed JS is not too slow for speculative execution 
> > exploitation to be practical
> > - (Not otherwise discussed here: PyPy's sandboxed JIT may not be too 
> > slow for speculative execution exploitation to be practical.)
> >
>
> As far as I know, execution speed is important for attacker, not victim.
> In case of JavaScript, browser may load attacking code and run it while
> user watching websites.
> Browsers provides sandbox for JS, but attacker code may be able to
> bypass the sandbox by Spectre or Meltdown.  So browsers disabled
> high precision timer until OSes are updated.
>
> This topic is totally unrelated to compiler options: these compiler options
> doesn't prohibit running attacking code, it just guard branches from
> branch target injection.
>
> Does my understanding collect?  Why should we discuss about execution speed?

According to this article, the malicious program needs to act in the
amount of time it takes for the CPU to load a value from memory and
invalidate a branch prediction:
https://hackernoon.com/timing-is-everything-understanding-the-meltdown-and-spectre-attacks-5e1946e44f9f
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] SEC: Spectre variant 2: GCC: -mindirect-branch=thunk -mindirect-branch-register

2018-09-17 Thread Franklin? Lee

I believe this is the article Wes wanted to link to:
https://www.thomas-krenn.com/en/wiki/Safety_instructions_for_Meltdown_and_Spectre

On Mon, Sep 17, 2018 at 6:10 PM Wes Turner  wrote:
>
> To summarize:
>
> - CPython may be vulnerable to speculative execution vulnerabilities, but 
> none are known.
> - In general, CPython is currently too slow for speculative execution 
> exploitation to be practical.
>   - Sandboxed, JIT'ed JS is not too slow for speculative execution 
> exploitation to be practical
> - (Not otherwise discussed here: PyPy's sandboxed JIT may not be too slow 
> for speculative execution exploitation to be practical.)

I'm no security researcher, and I barely remember much about
Spectre/Meltdown, but I think the idea is that, if Python takes about
2 milliseconds to run your code, then a difference of +- 10
microseconds is indistinguishable from noise. Try to write software
that can learn to distinguish two similar computers using the running
time of certain functions.

Javascript can be crafted to get close enough to some C programs.
ASM.js and WebAssembly might help.

PyPy's need for mitigation is independent of CPython's.

> - Because there is no exploit provided (or currently thought possible with 
> just CPython), this security-related dialogue is regarded as a nuisance.

More than that. Steve says that he looked into it and decided there
wasn't really anything to worry about. He believes that any exploit of
it will also imply an easier exploit is possible. He also says that
this particular fix won't really help.

Nathaniel is annoyed because Spectre is tricky to understand, and he
assumes you don't understand it as well as you think because you
haven't shown him that you have the expertise to understand it.

> Here's a good write-up:
> Safety_instructions_for_Meltdown_and_Spectre

But how does that apply to CPython? What specifically about CPython
makes the interpreter vulnerable to the attack? Under what conditions
would CPython be vulnerable to this attack, but not an easier attack
of at least the same severity?

The article I linked at the top of this email does not give advice for
interpreter writers at all. It only says what end users and chip
manufacturers ought to do. It is not relevant.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Looking for examples: proof that a list comp is a function

2018-05-14 Thread Franklin? Lee

On Mon, May 14, 2018, 03:36 Chris Angelico  wrote:

> Guido has stated that this parallel is desired and important:
>
> result = [f(x) for x in iter if g(x)]
> result = list(f(x) for x in iter if g(x))
>
> Obviously the genexp has to be implemented with a nested function,
> since there's no guarantee that it'll be iterated over in this way.
> With current semantics, you can easily prove that a list comp is
> implemented with a function by looking at how it interacts with other
> scopes (mainly class scope), but Tim's proposal may change that.
>
> So I'm looking for examples that prove that a list comp is executed
> inside an implicit function. Ideally, examples that are supported by
> language guarantees, but something that's "CPython has done it this
> way since 3.0" is important too.
>
> I'm aware of just two: the name lookup interaction that may be
> changing, and the fact that there's an extra line in a traceback. And
> the latter, as far as I know, is not guaranteed (and I doubt anyone
> would care if it changed). Are there any other provable points?
>

Related to the traceback one: the extra stack frame shows up in a debugger,
and a profiler counts the extra frame separately. The first often confuses
me because I don't immediately see which frame I'm in just by seeing the
line of code.

There are odd interactions between `yield`/`yield from` and comprehensions
that was discussed some months ago: "[Python-Dev] Tricky way of of creating
a generator via a comprehension expression". Wait, is this a continuation
of that discussion?

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (no subject)

2017-12-26 Thread Franklin? Lee

On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendel  wrote:
>
> I don't know if this is the right place to put this,
> but I've found the following lines of code results in an incredibly long
> processing time.
> Perhaps it might be of use to someone.
>
> import re
> pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')
> pat.match('/t/a-aa-aa-a-aa-aa-aa-aa-aa-aa./')

(I think the correct place is python-list. python-dev is primarily for
the developers of Python itself. python-ideas is for proposing new
features and changes to the language. python-list is for general
discussion. Bug reports and feature requests belong in
https://bugs.python.org/ (where your post could also have gone).)

The textbook regular expression algorithm (which I believe grep uses)
runs in linear time with respect to the text length. The algorithm
used by Perl, Java, Python, JavaScript, Ruby, and many other languages
instead use a backtracking algorithm, which can run up to exponential
time with respect to text length. This worst-case is in fact necessary
(assuming P != NP): Perl allows (introduced?) backreferences, which
are NP-hard[1]. Perl also added some other features which complicate
things, but backreferences are enough.

The user-level solution is to understand how regexes are executed, and
to work around it.

Here are library-level solutions for your example:
1. Perl now has a regex optimizer, which will eliminate some
redundancies. Something similar can be added to Python, at first as a
third-party library.
2. In theory, we can use the textbook algorithm when possible, and the
backtracking algorithm when necessary. However, the textbook version
won't necessarily be faster, and may take more time to create, so
there's a tradeoff here.
3. To go even further, I believe it's possible to use the textbook
algorithm for subexpressions, while the overall expression uses
backtracking, internally iterating through the matches of the textbook
algorithm.

There's a series of articles by Russ Cox that try to get us back to
the textbook (see [2]). He and others implemented the ideas in the C++
library RE2[3], which has Python bindings[4]. RE2 was made for and
used on Google Code Search[5] (described in his articles), a (now
discontinued) search engine for open-source repos which allowed
regular expressions in the queries.

You can get a whiff of the limitations of the textbook algorithm by
checking out RE2's syntax[6] and seeing what features aren't
supported, though some features may be unsupported for different
reasons (such as being redundant syntax).
- Backreferences and lookaround assertions don't have a known solution.[7]
- Bounded repetition is only supported up to a limit (1000), because
each possible repetition needs its own set of states.
- Possessive quantifiers aren't supported. Greedy and reluctant quantifiers are.
- Groups and named groups _are_ supported. See the second and third
Russ Cox articles, with the term "submatch".[2]

(Apologies: I am making up reference syntax on-the-fly.)
[1] "Perl Regular Expression Matching is NP-Hard"
https://perl.plover.com/NPC/
[2] "Regular Expression Matching Can Be Simple And Fast"
https://swtch.com/~rsc/regexp/regexp1.html
"Regular Expression Matching: the Virtual Machine Approach"
https://swtch.com/~rsc/regexp/regexp2.html
"Regular Expression Matching in the Wild"
https://swtch.com/~rsc/regexp/regexp3.html
"Regular Expression Matching with a Trigram Index"
https://swtch.com/~rsc/regexp/regexp4.html
[3] RE2: https://github.com/google/re2
[4] pyre2: https://github.com/facebook/pyre2/
Also see re2 and re3 on PyPI, which intend to be a drop-in
replacement. re3 is a Py3-compatible fork of re2, which last updated
in 2015.
[5] https://en.wikipedia.org/wiki/Google_Code_Search
[6] https://github.com/google/re2/wiki/Syntax
[7] Quote: "As a matter of principle, RE2 does not support constructs
for which only backtracking solutions are known to exist. Thus,
backreferences and look-around assertions are not supported."
https://github.com/google/re2/wiki/WhyRE2
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] re performance

2017-02-01 Thread Franklin? Lee

On Thu, Jan 26, 2017 at 4:13 PM, Sven R. Kunze  wrote:
> Hi folks,
>
> I recently refreshed regular expressions theoretical basics *indulging in
> reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html
>
> However, reaching the chart in the lower third of the article, I saw Python
> 2.4 measured against a naive Thompson matching implementation. And I was
> surprised about how bad it performed compared to an unoptimized version of
> an older than dirt algorithm.

> From my perspective, I can say, that regular expressions might worth
> optimizing especially for web applications (url matching usually uses
> regexes) but also for other applications where I've seen many tight loops
> using regexes as well. So, I am probing interest on this topic here.

What I (think I) know:
- Both re and regex use the same C backend, which is not based on NFA.
- The re2 library, which the writer of that article made, allows
capture groups (but only up to a limit) and bounded repetitions (up to
a limit).
- Perl has started to optimize some regex patterns.

What I think:
- The example is uncharacteristic of what people write, like URL matching.
- But enabling naive code to perform well is usually a good thing. The
fewer details newbies need to know to write code, the better.
- It's possible for Python to optimize a large category of patterns,
even if not all.
- It's also possible to optimize large parts of patterns with
backrefs. E.g. If there's a backref, the group that the backref refers
to can still be made into an NFA.
- To do the above, you'd need a way to generate all possible matches.
- Optimization can be costly. The full NFA construction could be
generated only upon request, or maybe the code automatically tries to
optimize after 100 uses (like a JIT). This should only be considered
if re2's construction really is costly.

If people want NFAs, I think the "easiest" way is to use re2. Jakub
Wilk mentioned it before, but here it is again.
https://github.com/google/re2

re2 features:
https://github.com/google/re2/wiki/Syntax

Named references aren't supported, but I think that can be worked
around with some renumbering. It's just a matter of translating the
pattern.

It also doesn't like lookarounds, which AFAIK are perfectly doable
with NFAs. Looks like lookaheads and lookbehinds are hard to do
without a lot of extra space (https://github.com/google/re2/issues/5).

Facebook has a Python wrapper for re2.
https://github.com/facebook/pyre2/

In a post linked to from this thread, Serhiy mentioned another Python
wrapper for re2. This wrapper is designed to be like re, and should
probably be the basis of any efforts. It's not been updated for 11
months, though.
https://pypi.python.org/pypi/re2/
https://github.com/axiak/pyre2/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Check dict implementation details

2016-10-09 Thread Franklin? Lee

On Sat, Oct 8, 2016 at 6:01 AM, Serhiy Storchaka  wrote:
> Since dict is ordered in CPython 3.6, it can be used instead of OrderedDict
> in some places (e.g. for implementing simple limited caches). But since this
> is implementation detail, it can't be used in the stdlib unconditionally.
> Needed a way to check whether dict is ordered.
>
> Actually there are two levels of "ordering".
>
> 1. Dict without deletions is iterated in the order of adding items.
> Raymond's original compact dict implementation satisfied this claim.
>
> 2. In addition the order is preserved after deletion operations. Naoki's
> implementation satisfies this more strong claim.

Sidenote: OrderedDict, unlike dict, is a sequential container (though
not a Sequence), so order matters when doing comparisons, and
OrderedDicts can be reverse-iterated. That might keep dict from
replacing OrderedDict in some cases. Something to keep in mind if this
topic is revisited.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

2016-09-14 Thread Franklin? Lee

On Sep 14, 2016 8:29 AM, "Paul Moore"  wrote:
>
> On 14 September 2016 at 13:18, Franklin? Lee
>  wrote:
> > On Sep 9, 2016 1:35 AM, "Benjamin Peterson"  wrote:
> >> On Thu, Sep 8, 2016, at 22:33, Tim Delaney wrote:
> >> > Are sets also ordered by default now? None of the PEPs appear to
mention
> >> > it.
> >>
> >> No.
> >
> > Is there anyone working to move sets in the same direction for 3.6?
>
> It won't happen for 3.6, as we're now in feature freeze. So it'd be
> 3.7 at the earliest.
>
> What exactly do you mean by "in the same direction" anyway? Remember
> that ordering isn't guaranteed (it's an implementation detail), so are
> you just saying "can sets benefit from the improvements this change
> provided to dicts"? If you *are* hoping for ordered sets, what's your
> use case? (Dictionaries had particular use cases - retaining ordering
> of keyword arguments and class definitions, the existence of
> OrderedDict, ...)
>
> Paul

I mean using a compact representation, if not an ordered one.

I have no particular usecase in mind. As far as I understand the compact
implementation, sets can do it just as well. The original discussion
proposed trying to implement it for sets first.

Like dict, they would (probably) use less memory, and would usually have a
more readable (i.e. less jarring to read) print order.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

2016-09-14 Thread Franklin? Lee

On Sep 9, 2016 1:35 AM, "Benjamin Peterson"  wrote:
> On Thu, Sep 8, 2016, at 22:33, Tim Delaney wrote:
> > Are sets also ordered by default now? None of the PEPs appear to mention
> > it.
>
> No.

Is there anyone working to move sets in the same direction for 3.6?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict

2016-06-25 Thread Franklin? Lee

On Jun 21, 2016 11:12 AM, "INADA Naoki"  wrote:
>
> I'm sorry, but I hadn't realized which compact ordered dict is
> not ordered for split table.
>
> For example:
> >>> class A:
> ...   ...
> ...
> >>> a = A()
> >>> b = A()
> >>> a.a = 1
> >>> a.b = 2
> >>> b.b = 3
> >>> b.a = 4
> >>> a.__dict__.items()
> dict_items([('a', 1), ('b', 2)])
> >>> b.__dict__.items()
> dict_items([('a', 4), ('b', 3)])
>
>
> This doesn't affects to **kwargs and class namespace.
>
> But if we change the language spec to dict preserves insertion order,
> this should be addressed.

Is that really how it works? From my understanding of PEP 412, they should
have different keysets, because one diverged in keys from the other at an
intermediate step.

Another idea (though it has several issues and seems like a step backward):
a split-table dict can have a separate iteration list, indexing into the
entry table. There are ways to share iteration lists, and make it so that
adding the same keys in the same order each time results in the same
iteration list each time, but this costs overhead. There might be ways of
reducing the overhead, or the overhead might be replacing bigger overhead,
but we should decide if the behavior is what we want in the first place.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Compact dict implementations (was: PEP 468

2016-06-18 Thread Franklin Lee

In the original discussion, I think they decided to reimplement set before
dict.

The original discussion is here, for anyone else:
https://mail.python.org/pipermail/python-dev/2012-December/123028.html

On Jun 18, 2016 3:15 AM, "INADA Naoki"  wrote:
> If builtin dict in both of PyPy and CPython is ordered, many people
> will relying it.
> It will force other Python implementations to implement it for
compatibility.
> In other words, it may be de-facto "Python Language", even if Python
> Language spec
> say it's an implementation detail.
>
> Is it OK?

Ordered, or just initially ordered? I mean, "ordered if no deletion".

They discussed scrambling the order.

(Subdiscussion was here:
https://mail.python.org/pipermail/python-dev/2012-December/123041.html)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-14 Thread Franklin? Lee

Compact OrderedDicts can leave gaps, and once in a while compactify. For
example, whenever the entry table is full, it can decide whether to resize
(and only copy non-gaps), or just compactactify

Compact regular dicts can swap from the back and have no gaps.

I don't see the point of discussing these details. Isn't it enough to say
that these are solvable problems, which we can worry about if/when someone
actually decides to sit down and implement compact dicts?

P.S.: Sorry about the repeated emails. I think it was the iOS Gmail app.

On Jun 13, 2016 10:23 PM, "Ethan Furman"  wrote:
>
> On 06/13/2016 05:47 PM, Larry Hastings wrote:
>>
>> On 06/13/2016 05:05 PM, MRAB wrote:
>
>
>>> This could be avoided by expanding the items to include the index of
>>> the 'previous' and 'next' item, so that they could be handled like a
>>> doubly-linked list.
>>>
>>> The disadvantage would be that it would use more memory.
>>
>>
>> Another, easier technique: don't fill holes.  Same disadvantage
>> (increased memory use), but easier to write and maintain.
>
>
> I hope this is just an academic discussion: suddenly having Python's
dicts grow continuously is going to have nasty consequences somewhere.
>
> --
> ~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-13 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-13 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-13 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-13 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-11 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Franklin? Lee

I am. I was just wondering if there was an in-progress effort I should be
looking at, because I am interested in extensions to it.

P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts
are inherently ordered until a delitem happens.[1] That could be "good
enough" for many purposes, including kwargs and class definition. If
CPython implements efficient compact dicts, it would be easier to propose
order-preserving (or initially-order-preserving) dicts in some places in
the standard.

[1] Whether delitem preserves order depends on whether you want to allow
gaps in your compact entry table. PyPy implemented compact dicts and
chose(?) to make dicts ordered.

On Saturday, June 11, 2016, Eric Snow  wrote:

> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee
> > wrote:
> > Eric, have you any work in progress on compact dicts?
>
> Nope.  I presume you are talking the proposal Raymond made a while back.
>
> -eric
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 468

2016-06-10 Thread Franklin? Lee

Eric, have you any work in progress on compact dicts?

On Fri, Jun 10, 2016 at 12:54 PM, Eric Snow  wrote:
> On Thu, Jun 9, 2016 at 1:10 PM, Émanuel Barry  wrote:
>> As stated by Guido (and pointed out in the PEP):
>>
>> Making **kwds ordered is still open, but requires careful design and
>> implementation to avoid slowing down function calls that don't benefit.
>>
>> The PEP has not been updated in a while, though. Python 3.5 has been
>> released, and with it a C implementation of OrderedDict.
>>
>> Eric, are you still interested in this?
>
> Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6).
> I'm certainly not opposed to someone picking up the banner.
> 
>
>> IIRC that PEP was one of the
>> motivating use cases for implementing OrderedDict in C.
>
> Correct, though I'm not sure OrderedDict needs to be involved any more.
>
>> Maybe it's time for
>> a second round of discussion on Python-ideas?
>
> Fine with me, though I won't have a lot of time in the 3.6 timeframe
> to handle a high-volume discussion or push through an implementation.
>
> -eric
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Franklin? Lee

On Jun 8, 2016 8:13 AM, "Paul Sokolovsky"  wrote:
>
> Hello,
>
> On Wed, 8 Jun 2016 14:45:22 +0300
> Serhiy Storchaka  wrote:
>
> []
>
> > > $ ./run-bench-tests bench/bytealloc*
> > > bench/bytealloc:
> > >  3.333s (+00.00%) bench/bytealloc-1-bytes_n.py
> > >  11.244s (+237.35%) bench/bytealloc-2-repeat.py
> >
> > If the performance of creating an immutable array of n zero bytes is
> > important in MicroPython, it is worth to optimize b"\0" * n.
>
> No matter how you optimize calloc + something, it's always slower than
> just calloc.

`bytes(n)` *is* calloc + something. It's a lookup of and call to a global
function. (Unless MicroPython optimizes away lookups for builtins, in which
case it can theoretically optimize b"\0".__mul__.)

On the other hand, b"\0" is a constant, and * is an operator lookup that
succeeds on the first argument (meaning, perhaps, a successful branch
prediction). As a constant, it is only created once, so there's no
intermediate object created.

AFAICT, the first requires optimizing global function lookups + calls, and
the second requires optimizing lookup and *successful* application of
__mul__ (versus failure + fallback to some __rmul__), and repetitions of a
particular `bytes` object (which can be interned and checked against). That
means there is room for either to win, depending on the efforts of the
implementers.

(However, `bytearray` has no syntax for literals (and therefore easy
constants), and is a more valid and, AFAIK, more practical concern.)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace

2016-06-08 Thread Franklin? Lee

On Jun 7, 2016 8:52 PM, "Eric Snow"  wrote:
> * the default class *definition* namespace is now ``OrderdDict``
> * the order in which class attributes are defined is preserved in the

By using an OrderedDict, names are ordered by first definition point,
rather than location of the used definition.

For example, the definition order of the following will be "x, y", even
though the definitions actually bound to the name are in order "y, x".
class C:
x = 0
def y(self): return 'y'
def x(self): return 'x'

Is that okay?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] devinabox has moved to GitHub

2016-05-25 Thread Franklin? Lee

It's just that I don't know whether any of them require particular
versions. If you say the latest is fine, then okay.

On Wed, May 25, 2016 at 1:37 PM, Brett Cannon  wrote:
>
>
> On Wed, 25 May 2016 at 10:24 Franklin? Lee 
> wrote:
>>
>> Should these notes come with version requirements/minimums?
>>
>> "OS X users should be told to download XCode from the Apple App Store
>> ahead of time."
>> "If new contributors think they may be doing C development, suggest
>> the use of LLVM + clang as this provides better error reporting than
>> gcc."
>> "For Windows users, ask them to download and install Visual Studio
>> Community edition ahead of time."
>
>
> If you want to submit a PR to say "the latest" that's fine, but I don't know
> if any of those tools really promote downloading older versions such that
> one has to worry about it.
>
>>
>>
>> On Sun, May 8, 2016 at 4:41 PM, Brett Cannon  wrote:
>> > https://github.com/python/devinabox
>> >
>> > The single issue for devinabox has moved to its own issue tracker, so
>> > there's no need to worry about those issues cluttering b.p.o in the
>> > future.
>> > I have made the Python core team I created on GitHub last week have
>> > write
>> > privileges and Nick and I as admins on the repository. I have also
>> > turned on
>> > the CLA bot for the repository (FYI
>> > https://github.com/python/the-knights-who-say-ni has that bot's code).
>> >
>> > I've asked Georg, Antoine, and Benjamin to tell me what I need to do to
>> > shut
>> > off -- either by making it read-only or just deleting --
>> > hg.python.org/devinabox.
>> >
>> > Now that the migration has seriously begun, the next repos will be the
>> > peps
>> > and devguide (slightly more complicated thanks to needing to update
>> > commands
>> > for building their online versions). There's also the benchmarks repo,
>> > but
>> > that might not get migrated if we start from scratch (see the speed@ ML
>> > about that).
>> >
>> > As for cpython, I've been asked to talk about it at the language summit
>> > where I will start a conversation about what the minimal feature set
>> > will
>> > need to be to migrate. Once we have  settled on what has to be in place
>> > to
>> > migrate the cpython repo then we can start working on that TODO list.
>> >
>> > ___
>> > Python-Dev mailing list
>> > Python-Dev@python.org
>> > https://mail.python.org/mailman/listinfo/python-dev
>> > Unsubscribe:
>> >
>> > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>> >
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] devinabox has moved to GitHub

2016-05-25 Thread Franklin? Lee

Should these notes come with version requirements/minimums?

"OS X users should be told to download XCode from the Apple App Store
ahead of time."
"If new contributors think they may be doing C development, suggest
the use of LLVM + clang as this provides better error reporting than
gcc."
"For Windows users, ask them to download and install Visual Studio
Community edition ahead of time."

On Sun, May 8, 2016 at 4:41 PM, Brett Cannon  wrote:
> https://github.com/python/devinabox
>
> The single issue for devinabox has moved to its own issue tracker, so
> there's no need to worry about those issues cluttering b.p.o in the future.
> I have made the Python core team I created on GitHub last week have write
> privileges and Nick and I as admins on the repository. I have also turned on
> the CLA bot for the repository (FYI
> https://github.com/python/the-knights-who-say-ni has that bot's code).
>
> I've asked Georg, Antoine, and Benjamin to tell me what I need to do to shut
> off -- either by making it read-only or just deleting --
> hg.python.org/devinabox.
>
> Now that the migration has seriously begun, the next repos will be the peps
> and devguide (slightly more complicated thanks to needing to update commands
> for building their online versions). There's also the benchmarks repo, but
> that might not get migrated if we start from scratch (see the speed@ ML
> about that).
>
> As for cpython, I've been asked to talk about it at the language summit
> where I will start a conversation about what the minimal feature set will
> need to be to migrate. Once we have  settled on what has to be in place to
> migrate the cpython repo then we can start working on that TODO list.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Terminal console

2016-04-26 Thread Franklin? Lee

On Apr 26, 2016 4:02 AM, "Paul Moore"  wrote:
>
> On 25 April 2016 at 23:55, Franklin? Lee 
wrote:
> > FWIW, Gmail's policies require:
> [...]
> > That link is currently the only obvious way to unsubscribe.
>
> I'm not sure why gmail's policies should apply to this list.

They're Gmail's policies on how not to get your messages filtered by Gmail
as spam.

I am not clear on whether they're descriptive (i.e. users will mark you as
spam) or prescriptive (i.e. Google's algorithms will determine that you're
spam).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Terminal console

2016-04-25 Thread Franklin? Lee

FWIW, Gmail's policies require:
"""
A user must be able to unsubscribe from your mailing list through
one of the following means:

* A prominent link in the body of an email leading users to a page
confirming his or her unsubscription (no input from the user, other
than confirmation, should be required).
* By replying to your email with an unsubscribe request.
"""
(https://support.google.com/mail/answer/81126)

That link is currently the only obvious way to unsubscribe.


On Mon, Apr 25, 2016 at 6:12 PM, Zachary Ware
 wrote:
> On Apr 25, 2016 17:08, "Brett Cannon"  wrote:
>>
>> Good point. Hopefully that's all it was then.
>
> Is there any particular reason we include that link in python-dev emails? We
> don't for any other list as far as I know.
>
> --
> Zach
> (On a phone)
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python should be easily compilable on Windows with MinGW

2016-02-27 Thread Franklin? Lee

For this particular case, is there someone generous enough (or, can someone
apply for a PSF grant) to ship Mathieu a DVD/two/flash drive?
On Feb 26, 2016 12:18 PM, "Mathieu Dupuy"  wrote:

> Hi.
> I am currently working on adding some functionality on a standard
> library module (http://bugs.python.org/issue15873). The Python part
> went fine, but now I have to do the C counterpart, and I have ran into
> in several problems, which, stacked up, are a huge obstacle to easily
> contribute further. Currently, despite I could work, I can't go
> further
> on my patch.
>
> I am currently working in very limited network, CPU and time
> ressources* which are quite uncommon in the western world, but are
> much less in the rest of the world. I have a 2GB/month mobile data
> plan and a 100KB/s speed. For the C part of my patch, I should
> download Visual Studio. The Express Edition 2015 is roughly 9GB. I
> can't afford that.
>
> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and
> Fedora 23). Shortly, I couldn't get something working quickly and
> simply (quickly = less than 2 hours, downloading time NOT included,
> which is anyway way too already much). What went wrong and why it went
> wrong could be a whole new thread and is outside of the scope of this
> message.
> Let me precise this : at my work I use many virtualbox instances
> automatically fired and run in parallel to test new deployments and
> run unittests. I like this tool,
> but despite its simple look, it (most of the time) can not be used
> simply by a profane. The concepts it requires you to understand are
> not intuitive at first sight and there is *always* a thing that go
> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox
> shipped for a moment a broken version of mount.vboxsf, preventing
> sharing folder to mount. Despite it's fixed, the broken releases
> spread everywhere and you may encounter them a lot in various Ubuntu
> and Virtualbox version. I downloaded the last versions of both and I
> am yet infected. https://www.virtualbox.org/ticket/12879). I could do
> whole new thread on why you can't ask newcomers to use Virtualbox
> (currently, at least).
>
> I ran into is a whole patch set to make CPython compile on MinGW
> (https://bugs.python.org/issue3871#msg199695). But it is not denying
> it's very experimental, and I know I would again spent useless hours
> trying to get it work rather than joyfully improving Python, and
> that's exactly what I do not want to happen.
>
> Getting ready to contribute to CPython pure python modules from an
> standard, average mr-everyone Windows PC for a beginner-to-medium
> contributor only require few megabytes of internet and few minutes of his
> time: getting a tarball of CPython sources (or cloning the github CPython
> mirror)**, a basic text editor and msys-git. The step further, if doing
> some -even basic- C code is required, implies downloading 9GB of Visual
> Studio and countless hours for it to be ready to use.
> I think downloading the whole Visual Studio suite is a huge stopper to
> contribute further for an average medium-or-below-contributor.
>
> I think (and I must not be the only one since CPython is to be moved
> to github), that barriers to contribute to CPython should be set to
> the lowest.
> Of course my situation is a bit special but I think it represents
> daily struggle of a *lot* of non-western programmer (at least for
> limited internet)(even here in Australia, landline limited internet
> connections are very common).
> It's not a big deal if the MinGW result build is twenty time slower or
> if some of the most advanced modules can't be build. But everyone
> programmer should be able to easily make some C hacks and get them to
> work.
>
> Hoping you'll be receptive to my pleas,
> Cheers
>
>
> * I am currently picking fruits in the regional Australia. I live in a van
> and have internet through with smartphone through an EDGE connection. I can
> plug the laptop in the farm but not in the van.
> ** No fresh programmer use mercurial unless he has a gun pointed on his
> head.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Regular expression bytecode

2016-02-14 Thread Franklin? Lee

I think it would be nice for manipulating (e.g. optimizing, possibly with
JIT-like analysis) and comparing regexes. It can also be useful as a
teaching tool, e.g. exercises in optimizing and comparing regexes.

I think the discussion should be on python-ideas, though.
On Feb 14, 2016 2:01 PM, "Jonathan Goble"  wrote:

> I'm new to Python's mailing lists, so please forgive me if I'm sending
> this to the wrong list. :)
>
> I filed http://bugs.python.org/issue26336 a few days ago, but now I
> think this list might be a better place to get discussion going.
> Basically, I'd like to see the bytecode of a compiled regex object
> exposed as a public (probably read-only) attribute of the object.
>
> Currently, although compiled in pure Python through modules
> sre_compile and sre_parse, the list of opcodes is then passed into C
> and copied into an array in a C struct, without being publicly exposed
> in any way. The only way for a user to get an internal representation
> of the regex is the re.DEBUG flag, which only produces an intermediate
> representation rather than the actual bytecode and only goes to
> stdout, which makes it useless for someone who wants to examine it
> programmatically.
>
> I'm sure others can think of other potential use cases for this, but
> one in particular would be that someone could write a debugger that
> can allow a user to step through a regex one opcode at a time to see
> exactly where it is failing. It would also perhaps be nice to have a
> public constructor for the regex object type, which would enable users
> to modify the bytecode and directly create a new regex object from it,
> similar to what is currently possible through the types.FunctionType
> and types.CodeType constructors.
>
> In addition to exposing the code in a public attribute, a helper
> module written in Python similar to the dis module (which is for
> Python's own bytecode) would be very helpful, allowing the code to be
> easily disassembled and examined at a higher level.
>
> Is this a good idea, or am I barking up the wrong tree? I think it's a
> great idea, but I'm open to being told this is a horrible idea. :) I
> welcome any and all comments both here and on the bug tracker.
>
> Jonathan Goble
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

2015-12-18 Thread Franklin? Lee

On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev
 wrote:

> (Also, either way, it seems more like a thread for -ideas than -dev...)

I said this early on in this thread!

Should I try to write up my idea as a single thing, instead of a bunch
of responses, and post it in -ideas?

Should I call them "parent scope" and "parent refcell"?

On Fri, Dec 18, 2015 at 7:56 AM, Steven D'Aprano  wrote:

> I'm not quite sure about this. In principle, every name lookup looks in
> four scopes, LEGB as you describe above:
>
> - locals
> - non-locals, a.k.a. enclosing or lexical scope(s)
> - globals (i.e. the module)
> - builtins
>
>
> although Python can (usually?) optimise away some of those lookups. The
> relationship of locals to enclosing scopes, and to globals in turn,
> involve actual nesting of indented blocks in Python, but that's not
> necessarily the case.

As I understand, L vs E vs GB is known at compile-time.

That is, your exec example doesn't work for me in Python 3, because
all names are scoped at compile-time.

x = 5
def f():
exec('x = 111')
print(x)

f() #prints 5
print(x) #prints 5

This means that my idea only really works for GB lookups.

> On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote:

>> So, trying to generalize global vs. builtin to a general notion of
>> "nested scope" that isn't necessary for builtins and doesn't work for
>> anything else seems like overcomplicating things for no benefit.
>
> Well, putting aside the question of whether this is useful or not, and
> putting aside efficiency concerns, let's just imagine a hypothetical
> implementation where name lookups used ChainMaps instead of using
> separate LOAD_* lookups of special dicts. Then a function could set up a
> ChainMap:
>
> function.__scopes__ = ChainMap(locals, enclosing, globals, builtins)
>
> and a name lookup for (say) "x" would always be a simple:
>
> function.__scopes__["x"]
>
> Of course this would be harder to optimize, and hence probably slower,
> than the current arrangement,

This is where the ChainRefCell idea comes in.

If a ChainRefCell is empty, it would ask its parent dicts for a value.
If it finds a value in parent n, it would replace parent n with a
refcell into parent n, and similarly for parents 0, 1, ... n-1. It
won't need to do hash lookups in those parents again, while allowing
for those parents to acquire names. (This means parent n+1 won't need
to create refcells, so we don't make unnecessary refcells in `object`
and `__builtin__`.)

Unfortunately, classes are more complicated than nested scopes.

1. We skip MRO if we define classes as having their direct supers as
parents. (Solution: Define classes as having all supers as parents,
and make non-recursive Refcell.resolve() requests.) (Objects have
their class as a parent, always.)

2. Classes can replace their bases. (I have some ideas for this, but see #3.)

3. I get the impression that attribute lookups are already pretty optimized.

On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev
 wrote:

> I think it kind of _has_ to optimize away, or at least tweak, some of those 
> things, rather than just acting as if globals and builtins were just two more 
> enclosing scopes. For example, global to builtins has to go through 
> globals()['__builtins__'], or act as if it does, or code that relies on, say, 
> the documented behavior of exec can be broken.

It would or could, in my idea of __builtins__ being a parent scope of
globals() (though I'm not sure whether it'd be the case for any other
kind of nesting).

Each refcell in globals() will hold a reference to __builtins__ (if
they didn't successfully look it up yet) or to a refcell in
__builtins__ (if there was once a successful lookup). Since globals()
knows when globals()['__builtins__'] is modified, it can invalidate
all its refcells' parent cells (by making them hold references to the
new __builtins__).

This will be O(len(table) + (# of refcells)), but swapping out
__builtins__ shouldn't be something you keep doing. Even if it is a
concern, I have More Ideas to remove the "len(table) +" (but with
Raymond Hettinger's compact dicts, it wouldn't be necessary). It would
be worse for classes, because it would require potentially many
notifications. (But it would also save future lookups. And I have More
Ideas.)

This idea (of the owner dict "knowing" about its changed parent) also
applies to general chained scopes, but flattenings like MRO would mess
it up. Again, though, More Ideas. And more importantly, from what I
understand of Victor's response, the current implementation would
probably be efficient enough, or more efficient.

> And you have to be able to modify the global scope after compile time and 
> have that modification be effective, which means you'd have to allow the same 
> things on locals and closures if they were to act the same.

Not sure what you mean, but since I demand (possibly empty) refcells
from

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee

On Thu, Dec 17, 2015 at 6:41 PM, Franklin? Lee
 wrote:
> Then Descriptors are in the dict, so MIGHT benefit from refcells. The
> memory cost might be higher, though.

Might be worse than the savings, I mean.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee

I already know that we can't use recursion, because it bypasses MRO.
I'm also not yet sure whether it makes sense to use refs for classes
in the first place.

As I understood it, an attribute will resolve in this order:
- __getattribute__ up the MRO. (raises AttributeError)
- __dict__ up the MRO. (raises KeyError)
- __getattr__ up the MRO. (raises AttributeError)

My new understanding:
- __getattribute__. (raises AttributeError)
- (default implementation:) __dict__.__getitem__. (raises KeyError)
- __getattr__ up the MRO. (raises AttributeError)

If this is the case, then (the default) __getattribute__ will be
making the repeated lookups, and might be the one requesting the
refcells (for the ones it wants).

Descriptors seem to be implemented as:
Store a Descriptor object as an attribute. When a Descriptor is
accessed, if it is being accessed from its owner, then unbox it and
use its methods. Otherwise, it's a normal attribute.

Then Descriptors are in the dict, so MIGHT benefit from refcells. The
memory cost might be higher, though.

On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert  wrote:
> On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
>  wrote:
>>
>> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>>  wrote:
>>
>>
>>> ...
>>> as soon as I figure out how descriptors actually work...
>>
>>
>> I think you need to learn what LOAD_ATTR and the machinery around it 
>> actually does before I can explain why trying to optimize it like 
>> globals-vs.-builtins doesn't make sense. Maybe someone who's better at 
>> explaining than me can come up with something clearer than the existing 
>> documentation, but I can't.
>
> I take that back. First, it was harsher than I intended. Second, I think I 
> can explain things.

I appreciate it! Tracking function definitions in the source can make
one want to do something else.

> First, for non-attribute lookups:
>
> (Non-shared) locals just load and save from an array.
>
> Free variables and shared locals load and save by going through an extra 
> dereference on a cell object in an array.

In retrospect, of course they do.

It sounds like the idea is what's already used there, except the refs
are synced to the locals array instead of a hash table.

> Globals do a single dict lookup.

A single dict lookup per function definition per name used? That's
what I'm proposing.

For example, (and I only just remembered that comprehensions and gen
expressions create scope)

[f(x) for x in range(1)]

would look up the name `f` at most twice (once in globals(), once in
builtins() if needed), and will always have the latest version of `f`.

And if it's in a function, the refcell(s) would be saved by the function.

> Builtins do two dict lookups.

Two?

> Class attributes (including normal methods, @property, etc.) do two or more 
> dict lookups--first the instance, then the class, then each class on the 
> class's MRO. Then, if the result has a __get__ method, it's called with the 
> instance and class to get the actual value. This is how bound methods get 
> created, property lookup functions get called, etc. The result of the 
> descriptor call can't get cached (that would mean, for example, that every 
> time you access the same @property on an instance, you'd get the same value).

Yeah, I would only try to save in a dict lookup to get the descriptor,
and I'm not sure it's worth it.

(Victor's response says that class attributes are already efficient, though.)

> So, if the globals->builtins optimization is worth doing, don't tie it to 
> another optimization that's much more complicated and less useful like this, 
> or we'll never get your simple and useful idea.

Sure. I couldn't figure out where to even save the refcells for
attributes, so I only really saw an opportunity for name lookups.
Since locals and nonlocals don't require dict lookups, this means
globals() and __builtin__.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee

On Thu, Dec 17, 2015 at 12:30 PM, Andrew Barnert  wrote:
> On Dec 17, 2015, at 07:38, Franklin? Lee  
> wrote:
>>
>> The nested dictionaries are only for nested scopes (and inner
>> functions don't create nested scopes). Nested scopes will already
>> require multiple lookups in parents.
>
> I think I understand what you're getting at here, but it's a really confusing 
> use of terminology. In Python, and in programming in general, nested scopes 
> refer to exactly inner functions (and classes) being lexically nested and 
> doing lookup through outer scopes. The fact that this is optimized at compile 
> time to FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at 
> function-creation time, and only global and name have to be resolved at the 
> last second doesn't mean that there's no scoping, or some other form of 
> scoping besides lexical. The actual semantics are LEGB, even if L vs. E vs. 
> GB and E vs. further-out E can be optimized.

Oh, I've never actually read the Python scoping rules spelled out. I
wasn't sure if there were other cases.

The other two cases I thought of as "nesting" were: object to its
class, and class to its superclasses.

> Also, reading your earlier post, it sounds like you're trying to treat 
> attribute lookup as a special case of global lookup, only with a chain of 
> superclasses beyond the class instead of just a single builtins. But they're 
> totally different. Class lookup doesn't just look in a series of dicts, it 
> calls __getattribute__ which usually calls __getattr__ which may or may not 
> look in the __dict__s (which may not even exist) to find a descriptor and 
> then calls its __get__ method to get the value. You'd have to somehow handle 
> the case where the search only went through object.__getattribute__ and 
> __getattr__ and found a result by looking in a dict, to make a RefCell to 
> that dict which is marked in some way that says "I'm not a value, I'm a 
> descriptor you have to call each time", and then apply some guards that will 
> detect whether that class or any intervening class dict touched that key, 
> whether the MRO changed, whether that class or any intervening class added or 
> changed implementations f
 or __getatttibute__ or __getattr__, and probably more things I haven't thought 
of. What do those guards look like? (Also, you need a different set of rules to 
cache, and guard for, special method lookup--you could just ignore that, but I 
think those are the lookups that would benefit most from optimization.)

Doesn't __getattr__ only get called if all the mro __dict__ lookups failed?

I forgot about __getattribute__. That might be the point at which refs
are optimized.

As for descriptors versus RefCells, I'm guessing that can be resolved,
as soon as I figure out how descriptors actually work... If
descriptors don't modify the __dict__, then RefCells shouldn't get
involved. If they do, then there's some unwrapping going on there, and
RefCells should fit right in (though whether they'll improve anything
is a different question). RefCells are just a shortcut for dict
lookups.

For guards, I think Victor Stinner's idea could supplement this.
Alternatively, in my other email, I said there could be a rule of,
"Create intermediate RefCells for anything BEFORE a successful
lookup." So if we look in A, B, C, D, and find it in C, then we create
and save RefCells in A, B, C, but not D (where D = object). This MIGHT
result in a lot of intermediate RefCells, but I'd guess most things
aren't looked up just once, and it's saying, "It's possible for B to
gain member B.x and catch me on my way to C.x."

> So, trying to generalize global vs. builtin to a general notion of "nested 
> scope" that isn't necessary for builtins and doesn't work for anything else 
> seems like overcomplicating things for no benefit.

Probably. The globals() and __builtin__ case is simpler than the class case.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee

On Thu, Dec 17, 2015 at 11:50 AM, Victor Stinner
 wrote:
> I don't understand how you plan to avoid guards. The purpose of guards
> is to respect the Python semantic by falling back to the "slow"
> bytecode if something changes. So I don't understand your idea of
> avoiding completly guards. Again, that's why I guess that it's
> unrelated to FAT Python...

Yeah, I guess it is. Maybe we could've moved to python-ideas. As far
as I understand, this concept can be put into CPython.

(Note to self: Look up how PyPy handles repeated lookups.)

>> Guards would also have an issue
>> with nested scopes. You have a note on your website about it:
>> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
>>
>> "The optimization is disabled when the builtin function is
>> modified or if a variable with the same name is added to the global
>> namespace of the function."
>
> FAT Python doesn't emit a specialized version if it requires a builtin
> function, but a local variable with the same name is found.
>
> The check is done in the current function but also in the parent
> namespaces, up to the global namespace. I'm talking about the static
> analysis of the code.
>
> If the specialized version is built, a guard is created on the builtin
> namespace and another on the global namespace.
>
> Sorry, I don't understand the "problem" you are referring to. Can you
> maybe show an example of code where FAT Python doesn't work or where
> you idea can help?

If we have to look in scopes A, B, C in order, where the name is in C
but never in B, and there's no nesting relationship between B and C.
In that case, I do not create a refcell in B chained to C (because
there's no relationship), so I keep doing lookups in B. That's the
problem. For that, guards and versioning can prevent unnecessary
lookups in B. Though I think a better solution might be: If a name is
found in C, create empty refcells in B and A (i.e. in all previous
dicts).

(There's a nesting relationship between globals() and __builtin__, so
that's fine.)

>> RefCells "know" whether they're part of a living dict. (The dict marks
>> them as such upon its death.) If they are not, they will decref their
>> value upon their death. They do not hold a reference to their owner
>> dict.
>
> The dict contains the list all of its "own" RefCell objects, right?

It contains a table of pointers. The pointers are to RefCell objects or NULL.

The refs table is exactly the same size as the internal hash table.
This makes indexing it efficient: to find the pointer to a refcell,
find the index of the key in the hash table, then use that SAME index
on the refs table. You never need to find a refcell without also
finding its hash index, so this is cheap.

> In short, RefCell is a view on a single dict entry, to be able to
> "watch" a dict entry without the cost of a lookup? And a RefCell can
> be created, even if the dict entry doesn't exist right?

My "implementation", which had nesting and recursion in mind, had a
"create_if_none" parameter, which meant that the requester can ask for
it to be created even if the key didn't exist in the table.
Pre-creation is useful for functions which refer to globals() names
before they're defined. No-creation is useful in... I can only think
of nesting as a use (globals() -> __builtin__ shouldn't create empty
cells in __builtin__).

See `getref` in here:
https://mail.python.org/pipermail/python-dev/2015-December/142489.html

> Hum, creating many RefCell introduces an overhead somewhere. For
> example, deleting a key has to update N RefCell objects linked to this
> key, right? So del dict[x] takes O(number of RefCell), right?

There are no "outside" updates, except when a dict moves to a
different internal table or deletes its internal table. In that case,
the dict has to move and update each exposed RefCell.

For each dict, for each key, there is at most one RefCell. As long as
the dict is alive, that RefCell will hold a pointer to the pointer to
the value (enforced by the dict). When the dict dies, it makes the
Refcell point to the object, and tells the RefCell it's free (so it's
in charge of cleaning up its value).

Dict entries look like this.

typedef struct {
/* Cached hash code of me_key. */
Py_hash_t me_hash;
PyObject *me_key;
PyObject *me_value; /* This field is only meaningful for
combined tables */
} PyDictKeyEntry;

The internal table (which the ref table will sync with) is an array of
PyDictKeyEntrys.

(Raymond Hettinger made a design with a smaller table, and the hash
lookup would be into an array of indices. This would make synced
metadata tables both easier and smaller. See
https://mail.python.org/pipermail/python-dev/2012-December/123028.html
and latest relevant discussion
https://mail.python.org/pipermail/python-ideas/2015-December/037468.html
)

The refcell will hold this:

RefCell(&PyDictKeyEntry.me_value)

A pointer to the field, not to the value itse

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee

(Previous thread was here, by the way:
https://mail.python.org/pipermail/python-dev/2015-December/142437.html)

On Thu, Dec 17, 2015 at 8:48 AM, Steven D'Aprano  wrote:
> On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
>> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>
>> > Each function keeps an indirect, automagically updated
>> > reference to the current value of the names they use,
>
> Isn't that a description of globals()? If you want to look up a name
> "spam", you grab an indirect reference to it:
>
> globals()["spam"]
>
> which returns the current value of the name "spam".

The *current value*. I'm proposing that we instead do this:

spamref = globals().getref('spam')

Every time we want to find the current, updated value of 'spam', we just do

spam = spamref.resolve()

which will skip the hash lookup and go directly to the value.

>> > and will never need to look things up again.[*]
>
> How will this work?
>
> Naively, it sounds to me like Franklin is suggesting that on every
> global assignment, the interpreter will have to touch every single
> function in the module to update that name.

A refcell holds a pointer to the location in the dict itself where the
value pointer is. When the value is updated in the dict, the refcell
will not need to be updated.

My original proposal wanted to keep cells in the "real" dict, and
update them. Like so:

class RefCell:
__slots__ = ['value']

class ScopeDict(dict):
def __getitem__(self, key):
value = super()[key].value #may raise
if value is NULL:
raise KeyError(key)
return value

def __setitem__(self, key, value):
if key in super():
super()[key].value = value
else:
cell = super()[key] = RefCell()
cell.value = value
def __delitem__(self, key, value):
cell = super()[key] #may raise
if cell.value is NULL:
raise KeyError(key)
cell.value = NULL

I realized later that this isn't necessary. Most dict operations don't
need to know about the indirection, so I make the inner dict a normal
dict (with a few more holes than normal). But this would show how you
can avoid manual updates for references.

> And besides, you *still* need to deal with the case that the name isn't
> a global at all, but in the built-ins namespace.

globals() to __builtin__ is a nesting relationship. At the bottom of
the following email, I have a pseudocode implementation which knows
how to deal with nested scopes.
https://mail.python.org/pipermail/python-dev/2015-December/142489.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee

On Thu, Dec 17, 2015 at 6:53 AM, Victor Stinner
 wrote:
> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>> My suggestion should improve *all* function calls which refer to
>> outside names.
>
> Ok, I now think that you should stop hijacking the FAT Python thread.
> I start a new thread. IMHO your dictionary reference idea is completly
> unrelated to FAT Python.
>
> FAT Python is about guards and specialized bytecode.

Yeah, maybe it's out of the scope of bytecode optimization. But I
think this would make guards unnecessary, since once a name is found,
there's a quick way to refer to it.

>> Each function keeps an indirect, automagically updated
>> reference to the current value of the names they use, and will never
>> need to look things up again.[*]
>
> Indirections, nested dictionaries, creation of new "reference"
> objects... IMHO you are going to have major implementation issues :-/
> The design looks *very* complex. I'm quite sure that you are going to
> make namespace lookups *slower*.

The nested dictionaries are only for nested scopes (and inner
functions don't create nested scopes). Nested scopes will already
require multiple lookups in parents. I think this is strictly an
improvement, except perhaps in memory. Guards would also have an issue
with nested scopes. You have a note on your website about it:
(https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)

"The optimization is disabled when the builtin function is
modified or if a variable with the same name is added to the global
namespace of the function."

With a NestedRefCell, it would check globals() (a simple dereference
and `pointer != NULL`) and then check __builtin__. If it finds it in
__builtin__, it will save a reference to that. It will only do
repeated lookups in __builtin__ if each of the previous lookups fail.
As far as I know, these repeated lookups are already necessary, and
anything that can be used to avoid them (e.g. guards) can be used for
repeated failed lookups, too.

For non-nested scopes, it will look things up once, costing an extra
RefCell creation if necessary, and the only other costs are on
resizing, deletion of the dict, and working with a larger dict in
general.

The important parts of the design is pretty much in the code that I
posted. We keep an extra hash table for refs, and keep it the same
size as the original hash table, so that we pay a single lookup cost
to get the index in both.

> It reminds me Python before the with statement and PyPy garbage
> collector. Many applications relied on the exact behaviour of CPython
> garbage collector. For example, they expected that a file is written
> on disk when the last reference to the file object is destroyed. In
> PyPy, it wasn't (it isn't) true, the write can be delayed.

It should not affect the semantic. Things should still happen as they
used to, as far as I can figure. Or at least as far as the rules of
the interpreter are concerned. (That is, values might live a little
longer in PyPy, but can't be forced to live longer than they were
formerly allowed to.)

> I guess that with all your complex machinery for dict lookups,

The only cost to a normal getitem (again, other than from looking it
up in a bigger dict) is to make sure the return value isn't NULL. The
machinery is involved in function creation and resolution of names: On
function creation, get refs to each name used. When the name is used,
try to resolve the refs.

>you
> will have similar issues of object lifetime. It's unclear to me when
> and how "reference" objects are destroyed, nor when dict values are
> destroyed.

RefCells are ref-counted PyObjects. That is not an issue. A RefCell
will live while it is useful (= it has an outside reference) or while
it's not useful but its owner dict hasn't been resized/deleted yet (at
which time RefCells without outside references will be deleted).

RefCells "know" whether they're part of a living dict. (The dict marks
them as such upon its death.) If they are not, they will decref their
value upon their death. They do not hold a reference to their owner
dict.

If it's part of a living dict, we have a choice: the dict can be
responsible for deletion, or the RefCell can be responsible for
deletion. It doesn't really matter which design we go with.

> What happens if a dict key is removed and a reference object is still
> alive? Is the dict value immediatly destroyed? Does the reference
> object contain a strong or a weak reference to the value?

If a dict key is removed, the inner dict will still have the key
(which is true in the current implementation), but the value will be
decref'd and the value pointer will be NULL'd. The RefCell will not
need

Re: [Python-Dev] Third milestone of FAT Python

2015-12-17 Thread Franklin? Lee

On Wed, Dec 16, 2015 at 2:01 AM, Victor Stinner
 wrote:
> Le mercredi 16 décembre 2015, Franklin? Lee 
> a écrit :
>>
>> I am confident that the time overhead and the savings will beat the
>> versioning dict. The versioning dict method has to save a reference to
>> the variable value and a reference to the name, and regularly test
>> whether the dict has changed.
>
>
> The performance of guards matters less than the performance of regular usage
> of dict. If we have to make a choice, I prefer "slow" guard but no impact on
> regular dict methods. It's very important that enabling FAT mode doesn't
> kill performances. Remember that FAT python is a static optimizer and so can
> only optimize some patterns, not all Python code.
>
> In my current implementation, a lookup is only needed when aguard is checked
> if the dict was modified. The dict version doesn't change if a mutable
> object was modified in place for example. I didn't benchmark, but I expect
> that the lookup is avoided in most cases. You should try FAT python and
> implement statistics before going too far in your idea.

My suggestion should improve *all* function calls which refer to
outside names. Each function keeps an indirect, automagically updated
reference to the current value of the names they use, and will never
need to look things up again.[*] There wouldn't be a need to save
global names as default arguments (`def f(x, list=list):`).

[*]: Not exactly true. Nested scopes can cause an issue.

I'm not sure what happens if you redefine the __builtin__ name after
these functions are defined. My suggestion would not know that the
__builtin__ was switched out, since it saves a ref into the original
__builtin__.

I'm not sure about how to deal with nested scopes (for example, class
inheritance). I think the "chained RefCells" idea works.

Chained Refcell idea:

There are three cases where we can have nested scopes:
1. An object's __dict__ nests its class.
2. A class's __dict__ nests its superclasses.
3. globals() nests __builtin__.
4? Does a package nest its modules?
5?. Does `from module import *` nest the module's __dict__?

(Nonlocal variables in nested functions, and probably nested classes,
are not a case of nested scope, since scope of each name is determined
during compilation of the function/class.)

RefCells of nested scopes will have a pointer to their value (or
NULL), and an array/linkedlist of pointers to refcells from their
parent dicts (or to their parent dicts if a refcell wasn't
successfully acquired yet).

When you request a RefCell from a nested Scope, it will return its
value if it exists. Otherwise, it requests refcells from each parent
(parents won't create refcells unless there's a value) until it gets
one.

When you ask a RefCell to resolve, it will check its own value, then
ask each parent for a value (creating intermediate refcells *if* value
exist). It will not need to do lookups in parents if it got a refcell
before (though the refcell might be null).

Problem: If you have class A(B, C), and you resolve a refcell for a
name which exists in C but not B, you will look things up through B's
dict every single time. It will fail, every single time. We can't skip
B, since B is allowed to get such a name later, but I don't want to
add refs to names that might never exist. This can be solved either
through versioning or by checking whether a dict is read-only (such as
for built-in types).

In fact, in the code I wrote at the end of this email,
RefCell.resolve() might even look things up in a shared ancestor
multiple times. However, this would be incorrect anyway, since it
doesn't respect Python's MRO resolution. So we can just fix that.
RefCell.resolve would need a `search_parents: bool` parameter.

>> I've read it again. By subclass, I mean that it implements the same
>> interface. But at the C level, I want to have it be a fork(?) of the
>> current dict implementation. As for `exec`, I think it might be okay
>> for it to be slower at the early stages of this game.
>
>
> Be careful, dict methods are hardcoded in the C code. If your type is not a
> subtype, there is risk of crashes.

Not exactly, and this is important. Many functions are called via
pointer. It's like C++'s virtual methods, but more dynamic, since they
can be changed per object. See
https://github.com/python/cpython/blob/master/Objects/dict-common.h#L17.

For example, the lookup method for str-only dicts swaps itself for a
general object lookup method if it finds a non-string key. See
https://github.com/python/cpython/blob/master/Objects/dictobject.c#L549.

I'm now suggesting that there be additional space in the dict object
itself to hold more function pointers (get_ref and
get_hash_table_index), which would slight

Re: [Python-Dev] Third milestone of FAT Python

2015-12-15 Thread Franklin? Lee

I realized yet another thing, which will reduce overhead: the original
array can store values directly, and you maintain the refs by
repeatedly updating them when moving refs around. RefCells will point
to a pointer to the value cell (which already exists in the table).

  - `getitem` will be almost the same as a normal dict: it has to
check if value is valid (which it already checked, but it will be
invalid a lot more often).

  - `setitem` the same as a normal dict (since the RefCells will just
point to the _address_ of the value pointer in the main table), except
that the dict will be bigger, and compaction/expansion has more
overhead. No creation of refcells here.

  - `delitem` will just null the value pointer, which shouldn't cost
much more, if it doesn't cost less.

  - Expansion and compaction will cost more, since we have to copy
over the RefCell pointers and also check whether they should be
copied.

  - Deletion of the dict will cost more, due to the additional logic
for deciding what to delete, and the RefCells can no longer point into
the entry table. They would have to point at the value (requiring
logic, or the replacement of a function pointer) or at a new allocated
object (requiring an allocation of sizeof(PyObject*) bytes).

On Tue, Dec 15, 2015 at 5:38 PM, Victor Stinner
 wrote:
> Sorry, I didn't read carefully your email, but I don't think that it's
> acceptable to make Python namespaces slower. In FAT mode, we need
> versionned dictionaries for module namespace, type namespace, global
> namespace, etc.

It was actually more "it might be a problem" than "it will be a
problem". I don't know if the overhead will be high enough to worry
about. It might be dominated by whatever savings there would be by not
having to look up names more than once. (Unless builtins get mixed
with globals? I think that's solvable, though. It's just that the
solutions I can think of have different tradeoffs.)

I am confident that the time overhead and the savings will beat the
versioning dict. The versioning dict method has to save a reference to
the variable value and a reference to the name, and regularly test
whether the dict has changed. This method only has to save a reference
to a reference to the value (though it might need the name to allow
debugging), doesn't care if it's changed, will be an identity (to
NULL?) test if it's deleted (and only if it's not replaced after), and
absolutely doesn't care if the dict had other updates (which might
increase the version number).

>>> Do you have an estimation of the cost of the "extra pointer"? Impact
>>> on memory and CPU. dict is really a very important type for the
>>> performance of Python. If you make dict slower, I'm sure that Python
>>> overall will be slower.
>>
>> I'm proposing it as a subclass.
>
> Please read the "Versionned dictionary" section of my email:
> https://mail.python.org/pipermail/python-dev/2015-December/142397.html
>
> I explained why using a subclass doesn't work in practice.

I've read it again. By subclass, I mean that it implements the same
interface. But at the C level, I want to have it be a fork(?) of the
current dict implementation. As for `exec`, I think it might be okay
for it to be slower at the early stages of this game.

Here's the lookup function for a string-only dict (used both for
setting and getting):
https://github.com/python/cpython/blob/master/Objects/dictobject.c#L443

I want to break that up into two parts:
- Figure out the index of the {hash, *key, *val} entry in the array.
- Do whatever to it. (In the original: make *value_addr point to the
value pointer.)

I want to do this so that I can use that index to point into ANOTHER
array, which will be the metadata for the refcells (whatever it ends
up being). This will mean that there's no second lookup. This has to
be done at the C level, because the dict object doesn't expose the
index of the {hash, *key, *val} entries on lookup.

If you don't want to make it a subclass, then we can propose a new
function `get_ref` (or something) for dict's C API (probably a hard
sell), which returns RefCell objects, and an additional pointer in
`dict` to the RefCells table (so a total of two pointers). When
`get_ref` is first called, it will
- calloc the RefCell table (which will be the same length as the entry table)
- replace all of the dict's functions with ones that know how to deal
with the RefCells,
- replace itself with a function that knows how to return these refs.
- call its replacement.

If the dict never gets RefCells, you only pay a few pointers in size,
and a few creation/deletion values. This is possible now that the
dictionary itself will store values as normal.

There might be more necessary. For example, the replaced functions
might need to keep pointers to their originals (so that you can slip
additional deep C subclasses in). And it might be nice if the
`get_index` function could be internally relied upon by the C-level
subclasses, because "keeping a metadata ta

Re: [Python-Dev] Third milestone of FAT Python

2015-12-15 Thread Franklin? Lee

More thoughts. (Stealing your style of headers.)

Just store a pointer to value
=

Instead of having the inner dict store k_v pairs.

In C, the values in our hash tables will be:
struct refcell{
PyObject *value; // NULL if deleted
};

It's not necessary to store the key. I think I only had it so I could
mark it None in the Python implementation, to denote a deleted key.
But a deleted entry could just have `cell.value is ScopeDict.NULL` (C:
cell.value == NULL).

The scope dict will own all values which don't have exposed
references, and all exposed references (which own the value they
reference).

(Alternatively, store the value directly in the hash table. If
something asks for a reference to it, replace the value with a
PyObject that refers to it, and flag that entry in the auxilary hash
table.)

This might be what PyCellObject is, which is how I realized that I
didn't need the key: https://docs.python.org/3.5/c-api/cell.html

Deleting from scope
===

When deleting a key, don't remove the key from the inner dict, and
just set the reference to NULL.

Outside code should never get the raw C `refcell`, only a Python
object. This makes it possible to clean up unused references when the
dict expands or contracts: for each `refcell`, if it has no Pair
object or its Pair object is not referenced by anything else, and if
its value is NULL (i.e. deleted), don't store it in the new hash
table.

Get pairs before their keys are defined
===

When the interpreter compiles a function, it can request references
which _don't exist yet_. The scope dict would simply create the entry
in its inner dict and fill it in when needed. That means that each
name only needs to be looked up when a function is created.

scope = Scopedict()
f = scope.ref('f')
scope['f'] = float
f.value('NaN')

This would be a memory issue if many functions are created with typo'd
names. But
- You're not making a gigantic amount of functions in the first place.
- You'll eventually remove these unused entries when you resize the
inner dict. (See previous section.)

I was concerned about which scope would be responsible for creating
the entry, but it turns out that if you use a name in a function,
every use of that name has to be for the same scope. So the following
causes a NameError:

def f():
def g(x):
return abs(x)

for i in range(30):
print(g(i))

if i == 10:
def abs(x):
return "abs" + str(x)

if d == 20:
del abs

and you can tell which scope to ask for the reference during function
compilation.

Does this simplify closures?

(I haven't yet looked at Python's closure implementation.)

A refcell (C struct) will be exposed as a RefCell (PyObject), which
owns it. This means that RefCell is reference-counted, and if
something saved a reference to it, it will persist even after its
owning dict is deleted. Thus, when a scope dict is deleted, each
refcell without a RefCell object is deleted (and its value is
DecRef'd), and the other ones just have their RefCell object decrement
a reference.

Then closures are free: each inner function has refcounted references
to the cells that it uses, and it doesn't need to know whether its
parent is alive.

(The implementation of closures involves cell objects.)

Overhead

If inner functions are being created a lot, that's extra work. But I
guess you should expect a lot of overhead if you're doing such a
thing.

Readonly refs
=

It might be desirable to have refs that are allowed to write (e.g.
from `global` and `nonlocal`) and refs that aren't.

The CellObject would just hold a count for the number of writing refs,
separate from the number of refs. This might result in some
optimizations for values with no writing refs. For example, it's
possible to implement copying of dicts as a shallow copy if it's KNOWN
that any modification would result in a call to its set/del functions,
which would initiate a deep copy.

On Tue, Dec 15, 2015 at 3:29 PM, Victor Stinner
 wrote:
> 2015-12-15 12:23 GMT+01:00 Franklin? Lee :
>> I was thinking (as an alternative to versioning dicts) about a
>> dictionary which would be able to return name/value pairs, which would
>> also be internally used by the dictionary. This would be way less
>> sensitive to irrelevant changes in the scope dictionary, but cost an
>> extra pointer to each key.
>
> Do you have an estimation of the cost of the "extra pointer"? Impact
> on memory and CPU. dict is really a very important type for the
> performance of Python. If you make dict slower, I'm sure that Python
> overall will be slower.

[Python-Dev] Third milestone of FAT Python

2015-12-15 Thread Franklin? Lee

On Sat, Dec 04, 2015 at 7:49 AM, Victor Stinner
 wrote:

> Versionned dictionary
> =
>
> In the previous milestone of FAT Python, the versionned dictionary was a
> new type inherited from the builtin dict type which added a __version__
> read-only (global "version" of dictionary, incremented at each change),
> a getversion(key) method (version of a one key) and it added support for
> weak references.

I was thinking (as an alternative to versioning dicts) about a
dictionary which would be able to return name/value pairs, which would
also be internally used by the dictionary. This would be way less
sensitive to irrelevant changes in the scope dictionary, but cost an
extra pointer to each key.

Here's how it would work:
pair = scope.item(name)
scope[name] = newval
assert pair.value is newval
assert pair.key is name
assert pair is scope.item(name)
# Alternatively, to only create pair objects when `item` is
called, have `==` compare the underlying pair.

del scope[name]
assert pair.key is None
# name-dicts can't have `None` keys
assert pair.value is None
# Alternatively, pair.value is scope.NULL

This dict will allow one to hold references to its entries (with the
caller promising not to change them, enforced by exceptions). You
won't have to keep looking up keys (unless the name is deleted), and
functions are allowed to change. For inlining, you can detect whether
the function has been redefined by testing the saved pair.value
against the saved function, and go into the slow path if needed (or
recompile the inlining).

I am not sure whether deleting from the dict and then readding the
same key should reuse the pair container. I think the only potential
issue for the Python version is memory use. There aren't going to be
THAT many names being deleted, right? So I say that deleted things in
the scope dict should not be removed from the inner dict. I predict
that this will simplify a lot of other things, especially when
deleting and readding the same name: if you save a pair, and it
becomes invalid, you don't have to do another lookup to make sure that
it's REALLY gone.

If memory is a real concern, deleted pairs can be weakrefed (and saved
in a second dict?) until they are reused. This way, pairs which aren't
saved by something outside will be removed.

For implementation, a Python implementation of the idea has probably
already been done. Here are some details:
- set: Internally store d._d[k] = k,v.
- get: Reject k if d._d[k].key is None. (Names must be strings.)
- del: Set d._d[k].key = None and .val = d.NULL to invalidate this entry.

For the CPython version, CPython's dict already stores its entries as
PyDictKeyEntry (hash, *key, *value), but those entries can move around
on resizing. Two possible implementations:
1. Fork dict to store {hash, *kv_pair}.
2. Use an inner dict (like in the Python suggestion) where values are
kv_pair. Write the indirection code in C, because scope dicts must be
fast.

For exposing a pair to Python code, here are two possibilities:
1. Make them Python objects in the first place.
2. Keep a second hash table in lockstep with the first (so that you
can do a lookup to find the index in the first, and then use that same
index with the second). In this table, store pair objects that have
been created. (They can be weakrefed, as before. Unless it's
impossible to weakref something you're returning?) This will save
memory for pairs that aren't ever exposed. If compact dictionaries are
implemented, the second hash table will be a second array instead.

To use this kind of scopedict, functions would have to store a list of
used names, which is memory overhead. But for what you're doing, some
overhead will be necessary anyway.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python semantic: Is it ok to replace not x == y with x != y? (no)

2015-12-15 Thread Franklin? Lee

On Tue, Dec 15, 2015 at 8:04 AM, Victor Stinner
 wrote:
> Is it expected that "not x.__eq__(y)" can be different than
> "x.__ne__(y)"? Is it part of the Python semantic?

In Numpy, `x != y` returns an array of bools, while `not x == y`
creates an array of bools and then tries to convert it to a bool,
which fails, because a non-singleton Numpy array is not allowed to be
converted to a bool. But in the context of `if`, both `not x == y` and
`x != y` will fail.

>From the docs, on implementing comparison:
https://docs.python.org/3/reference/datamodel.html#object.__ne__
"""
By default, __ne__() delegates to __eq__() and inverts the result
unless it is NotImplemented. There are no other implied relationships
among the comparison operators, for example, the truth of (xhttps://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

41 matches

Mail list logo