Re: [Python-Dev] PEP 509: Add a private version to dict (version 3)

2016-04-20 Thread Victor Stinner
Hi,

Guido van Rossum and Jim J. Jewett suggested me to *not require* to
always increase the dict version if a dict method does not modify its
content. I modified the Changes section to only require that the
version is increased when the dictionary content is modified.

I also explained the nice side effect of having an unique identifier
for two empty dictionaries: it avoids a strong reference when checking
if a namespace (dictionary) was replaced. Yury Selivanov's opcode
cache uses this property to avoid a strong refence on builtin and
global namespaces. It's also a written reply to Armin Rigo's
suggestion to use the version 0 for empty dictionaries (new empty dict
or for dict.clear()).

The modified Changes section:

Changes
===

Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global
dictionary version.

Each time a dictionary is created, the global version is incremented and
the dictionary version is initialized to the global version.

Each time the dictionary content is modified, the global version must be
incremented and copied to the dictionary version. Dictionary methods
which can modify its content:

* ``clear()``
* ``pop(key)``
* ``popitem()``
* ``setdefault(key, value)``
* ``__delitem__(key)``
* ``__setitem__(key, value)``
* ``update(...)``

The choice of increasing or not the version when a dictionary method
does not change its content is left to the Python implementation. A
Python implementation can decide to not increase the version to avoid
dictionary lookups in guards. Examples of cases when dictionary methods
don't modify its content:

* ``clear()`` if the dict is already empty
* ``pop(key)`` if the key does not exist
* ``popitem()`` if the dict is empty
* ``setdefault(key, value)`` if the key already exists
* ``__delitem__(key)`` if the key does not exist
* ``__setitem__(key, value)`` if the new value is identical to the
  current value
* ``update()`` if called without argument or if new values are identical
  to current values

Setting a key to a new value equals to the old value is also considered
as an operation modifying the dictionary content.

Two different empty dictionaries must have a different version to be
able to identify a dictionary just by its version. It allows to verify
in a guard that a namespace was not replaced without storing a strong
reference to the dictionary. Using a borrowed reference does not work:
if the old dictionary is destroyed, it is possible that a new dictionary
is allocated at the same memory address. By the way, dictionaries don't
support weak references.

The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects ``dict`` methods to make changes atomic.

Example using an hypothetical ``dict_get_version(dict)`` function::

>>> d = {}
>>> dict_get_version(d)
100
>>> d['key'] = 'value'
>>> dict_get_version(d)
101
>>> d['key'] = 'new value'
>>> dict_get_version(d)
102
>>> del d['key']
>>> dict_get_version(d)
103

The field is called ``ma_version_tag``, rather than ``ma_version``, to
suggest to compare it using ``version_tag == old_version_tag``, rather
than ``version <= old_version`` which becomes wrong after an integer
overflow.

--
Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict (version 3)

2016-04-19 Thread Victor Stinner
Hi,

> Backwards Compatibility
> ===
>
> Since the ``PyDictObject`` structure is not part of the stable ABI and
> the new dictionary version not exposed at the Python scope, changes are
> backward compatible.

My current implementation inserts the new ma_version_tag field in the
middle of the PyDictObject structure, so it obviously changes the ABI.

Can someone please confirm (double check) that the PyDictObject
structure is explicitly excluded from the stable ABI? I'm talking
about about the "#ifndef Py_LIMITED_API" in Include/dictobject.h.

I understood what is an ABI in the hard way. When I ran the perf.py
benchmark, I got a crash in ctypes on django_v3. The ctypes module
uses a C type which inherits from the dict type. I compiled Python
with and without my patch in the same directory and then I renamed the
./python binary, but the _ctypes.so was shared between the two
binaries.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 509: Add a private version to dict (version 3)

2016-04-19 Thread Victor Stinner
13, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``.  This field is incremented
when the dictionary is modified.


PySizer
---

`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
Google Summer of Code 2005 project by Nick Smallbone.

This project has a patch for CPython 2.4 which adds ``key_time`` and
``value_time`` fields to dictionary entries. It uses a global
process-wide counter for dictionaries, incremented each time that a
dictionary is modified. The times are used to decide when child objects
first appeared in their parent objects.


Discussion
==

Thread on the mailing lists:

* python-dev: `Updated PEP 509
  <https://mail.python.org/pipermail/python-dev/2016-April/144250.html>`_
* python-dev: `RFC: PEP 509: Add a private version to dict
  <https://mail.python.org/pipermail/python-dev/2016-April/144137.html>`_
* python-dev: `PEP 509: Add a private version to dict
  <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
  (january 2016)
* python-ideas: `RFC: PEP: Add dict.__version__
  <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
  (january 2016)


Copyright
=

This document has been placed in the public domain.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-21 Thread Victor Stinner
2016-01-21 2:54 GMT+01:00 Andrew Barnert :
> This idea worries me. I'm not sure why, but I think because of threading. 
> After all, it's pretty rare for two threads to both want to work on the same 
> dict, but very, very common for two threads to both want to work on _any_ 
> dict. So, imagine someone manages to remove the GIL from CPython by using 
> STM: (...)

That's a huge project :-) PyPy works one this, but PyPy is not CPython.

> (...) now most transactions are bumping that global counter, meaning most 
> transactions fail and have to be retried,

Python has atomic types which work well with multiple concurrent threads.

> And that also affects something like PyPy being able to use FAT-Python-style 
> AoT optimizations via cpyext.

I don't think that using a static optimizer with PyPy  makes sense.
Reminder that a dynamic optimize (JIT compiler) beats a static
compiler on performance ;-)

PyPy has a very different design, it's a very bad idea to implement
optimizations on cpyext which is emulated and known to be slow.

> Is there a way to define this loosely enough so that the implementation _can_ 
> be a single global counter, if that turns out to be most efficient, but can 
> also be a counter per dictionary and a globally-unique ID per dictionary?

I don't see how a single global counter for all dictionary can be used
to implement fast guards on namespaces. See the rationale of the PEP
509: I wrote it to implement fast guards on namespaces.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-21 Thread Victor Stinner
2016-01-21 6:08 GMT+01:00 Yury Selivanov :
> Yeah, I think that's what we agreed on:
> https://mail.python.org/pipermail/python-dev/2016-January/142837.html
>
> The only advantage of ma_extra pointer is that it allows to add more stuff
> to dicts in the future.

I don't agree on ma_extra since I don't understand it :-) What is the
advantage compared to a simple integer? If it's a pointer, it requires
to dereference the pointer? You say that it can be NULL, does it mean
that we also have to first test if the pointer is NULL. Does it mean
that we have to allocate a second memory block to store a version tag
object? When do you create a version tag or not?

Note: The PEP 509 proposes to use 64-bit integer for the version on
32-bit systems to avoid integer overflow after a few seconds. I first
proposed to use the size_t type (which has the size of a pointer) but
it doesn't work.

I tried to fix FAT Python to handle your use case: function defined in
a namespace and run in a different namespace (especially for the
builtin dictionary). It looks like I don't need the discussion change
to use a global version, FAT Python guards have a different design
than your cache.

But if a global counter doesn't make the slow more complex and opens
new kinds of optimization, I agree to change my PEP 509.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-21 Thread Yury Selivanov



On 2016-01-21 3:34 AM, Victor Stinner wrote:
[..]

But if a global counter doesn't make the slow more complex and opens
new kinds of optimization, I agree to change my PEP 509.


Please.  That would allow us to use ma_version to
implement caches in CPython itself.


Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov



On 2016-01-20 8:54 PM, Andrew Barnert via Python-Dev wrote:

>I think Glenn was assuming we had a single, global version # that all dicts 
shared without having a per-dict version ID. The key thing here is that we have a 
global counter that tracks the number of mutations for all dictionaries but whose 
value we store as a per-dictionary value. That ends up making the version ID 
inherently both a token representing the state of any dict but also the uniqueness 
of the dict since no two dictionaries will ever have the same version ID.

This idea worries me. I'm not sure why, but I think because of threading. After 
all, it's pretty rare for two threads to both want to work on the same dict, 
but very, very common for two threads to both want to work on_any_  dict. So, 
imagine someone manages to remove the GIL from CPython by using STM: now most 
transactions are bumping that global counter, meaning most transactions fail 
and have to be retried, so you end up with 8 cores each running at 1/64th the 
speed of a single core but burning 100% CPU. Obviously a real-life 
implementation wouldn't be_that_  stupid; you'd special-case the 
version-bumping (maybe unconditionally bump it N times before starting the 
transaction, and then as long as you don't bump more than N times during the 
transaction, you can commit without touching it), but there's still going to be 
a lot of contention.


Well, PEP 509 proposes to add ma_version only for CPython.  It's an 
implementation detail of CPython.  Victor's FAT optimizer is also 
tailored specifically for CPython, and making it work on PyPy would 
require a completely different set of hacks.


To remove the GIL or implement an efficient STM one will have to rewrite 
(and potentially break) so much code in CPython, that ma_version won't 
be a concern.


For now, though, ma_version will be protected by GIL, so threading 
shouldn't be a problem.




And that also affects something like PyPy being able to use FAT-Python-style 
AoT optimizations via cpyext. At first glance that sounds like a stupid 
idea--why would you want to run an optimizer through a slow emulator? But the 
optimizer only runs once and transforms the function code, which runs a zillion 
times, so who cares how slow the optimizer is? Of course it may still be true 
that many of the AoT optimizations that FAT makes don't apply very well to 
PyPy, in which case it doesn't matter. But I don't think we can assume that a 
priori.


The idea of FAT is that it will also generate optimized code objects 
with guards.  I doubt it would make any sense to use it under PyPy or 
any jitted Python implementation.  JITs have a far better understanding 
of the code than any static optimizer.




Is there a way to define this loosely enough so that the implementation_can_  
be a single global counter, if that turns out to be most efficient, but can 
also be a counter per dictionary and a globally-unique ID per dictionary?


Defining it "loosely" means that you can't trust it.  I'd just 
explicitly say that:


- ma_version is an implementation detail of CPython and may not be 
implemented on other platforms;

- ma_version can be removed from future CPython releases;
- ma_version can be used by code optimizers tailored specifically for 
CPython and CPython itself.


Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Andrew Barnert via Python-Dev
On Wednesday, January 20, 2016 4:10 PM, Brett Cannon  wrote:


>I think Glenn was assuming we had a single, global version # that all dicts 
>shared without having a per-dict version ID. The key thing here is that we 
>have a global counter that tracks the number of mutations for all dictionaries 
>but whose value we store as a per-dictionary value. That ends up making the 
>version ID inherently both a token representing the state of any dict but also 
>the uniqueness of the dict since no two dictionaries will ever have the same 
>version ID.

This idea worries me. I'm not sure why, but I think because of threading. After 
all, it's pretty rare for two threads to both want to work on the same dict, 
but very, very common for two threads to both want to work on _any_ dict. So, 
imagine someone manages to remove the GIL from CPython by using STM: now most 
transactions are bumping that global counter, meaning most transactions fail 
and have to be retried, so you end up with 8 cores each running at 1/64th the 
speed of a single core but burning 100% CPU. Obviously a real-life 
implementation wouldn't be _that_ stupid; you'd special-case the 
version-bumping (maybe unconditionally bump it N times before starting the 
transaction, and then as long as you don't bump more than N times during the 
transaction, you can commit without touching it), but there's still going to be 
a lot of contention.

And that also affects something like PyPy being able to use FAT-Python-style 
AoT optimizations via cpyext. At first glance that sounds like a stupid 
idea--why would you want to run an optimizer through a slow emulator? But the 
optimizer only runs once and transforms the function code, which runs a zillion 
times, so who cares how slow the optimizer is? Of course it may still be true 
that many of the AoT optimizations that FAT makes don't apply very well to 
PyPy, in which case it doesn't matter. But I don't think we can assume that a 
priori.

Is there a way to define this loosely enough so that the implementation _can_ 
be a single global counter, if that turns out to be most efficient, but can 
also be a counter per dictionary and a globally-unique ID per dictionary?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov



On 2016-01-20 5:01 PM, Greg Ewing wrote:

Yury Selivanov wrote:

What I propose is to add a pointer "ma_extra" (same 64bits),
which will be set to NULL for most dict instances (instead of
ma_version).  "ma_extra" can then point to a struct that has a
globally unique dict ID (uint64), and a version tag (unit64).


Why not just use a single global counter for allocating
dict version tags, instead of one per dict?




Yeah, I think that's what we agreed on: 
https://mail.python.org/pipermail/python-dev/2016-January/142837.html


The only advantage of ma_extra pointer is that it allows to add more 
stuff to dicts in the future.


Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Brett Cannon
On Wed, 20 Jan 2016, 17:54 Andrew Barnert  wrote:

> On Wednesday, January 20, 2016 4:10 PM, Brett Cannon 
> wrote:
>
>
> >I think Glenn was assuming we had a single, global version # that all
> dicts shared without having a per-dict version ID. The key thing here is
> that we have a global counter that tracks the number of mutations for all
> dictionaries but whose value we store as a per-dictionary value. That ends
> up making the version ID inherently both a token representing the state of
> any dict but also the uniqueness of the dict since no two dictionaries will
> ever have the same version ID.
>
> This idea worries me. I'm not sure why, but I think because of threading.
> After all, it's pretty rare for two threads to both want to work on the
> same dict, but very, very common for two threads to both want to work on
> _any_ dict. So, imagine someone manages to remove the GIL from CPython by
> using STM: now most transactions are bumping that global counter, meaning
> most transactions fail and have to be retried, so you end up with 8 cores
> each running at 1/64th the speed of a single core but burning 100% CPU.
> Obviously a real-life implementation wouldn't be _that_ stupid; you'd
> special-case the version-bumping (maybe unconditionally bump it N times
> before starting the transaction, and then as long as you don't bump more
> than N times during the transaction, you can commit without touching it),
> but there's still going to be a lot of contention.
>

This is all being regarded as an implementation detail of CPython, so in
this hypothetical STM world we can drop all of this (or lock it).


> And that also affects something like PyPy being able to use
> FAT-Python-style AoT optimizations via cpyext. At first glance that sounds
> like a stupid idea--why would you want to run an optimizer through a slow
> emulator? But the optimizer only runs once and transforms the function
> code, which runs a zillion times, so who cares how slow the optimizer is?
> Of course it may still be true that many of the AoT optimizations that FAT
> makes don't apply very well to PyPy, in which case it doesn't matter. But I
> don't think we can assume that a priori.
>
> Is there a way to define this loosely enough so that the implementation
> _can_ be a single global counter, if that turns out to be most efficient,
> but can also be a counter per dictionary and a globally-unique ID per
> dictionary?
>

There's no need to if this is all under the hood and in no way affects
anyone but the eval loop and those who choose to use it. We can make sure
to preface all of this with underscores so it's obvious they are private
and so use at your own peril.

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Greg Ewing

Andrew Barnert via Python-Dev wrote:

imagine someone manages to remove the GIL from CPython by using
STM: now most transactions are bumping that global counter, meaning most
transactions fail and have to be retried,


If this becomes a problem, the tag could be split into two
parts of m and n bits, with m + n = 64. Use a global counter
for allocating the high half, and increment the low half
locally. When the low half overflows, allocate a new high
half.

A value of n = 16 or so ought to reduce contention for the
global counter to something fairly negligible, I would
think, without much risk of the high half ever wrapping
around.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Maciej Fijalkowski
The easiest version is to have global numbering (as opposed to local).

Anyway, I would strongly suggest getting some benchmarks done and
showing performance benefits first, because you don't want PEPs to be
final when you don't exactly know the details.

On Wed, Jan 20, 2016 at 7:02 PM, Yury Selivanov  wrote:
> On 2016-01-18 5:43 PM, Victor Stinner wrote:
>>
>> Is someone opposed to this PEP 509?
>>
>> The main complain was the change on the public Python API, but the PEP
>> doesn't change the Python API anymore.
>>
>> I'm not aware of any remaining issue on this PEP.
>
>
> Victor,
>
> I've been experimenting with the PEP to implement a per-opcode
> cache in ceval loop (I'll share my progress on that in a few
> days).  This allows to significantly speedup LOAD_GLOBAL and
> LOAD_METHOD opcodes, to the point, where they don't require
> any dict lookups at all.  Some macro-benchmarks (such as
> chameleon_v2) demonstrate impressive ~10% performance boost.
>
> I rely on your dict->ma_version to implement cache invalidation.
>
> However, besides guarding against version change, I also want
> to guard against the dict being swapped for another dict, to
> avoid situations like this:
>
>
> def foo():
> print(bar)
>
> exec(foo.__code__, {'bar': 1}, {})
> exec(foo.__code__, {'bar': 2}, {})
>
>
> What I propose is to add a pointer "ma_extra" (same 64bits),
> which will be set to NULL for most dict instances (instead of
> ma_version).  "ma_extra" can then point to a struct that has a
> globally unique dict ID (uint64), and a version tag (unit64).
> A macro like PyDict_GET_ID and PyDict_GET_VERSION could then
> efficiently fetch the version/unique ID of the dict for guards.
>
> "ma_extra" would also make it easier for us to extend dicts
> in the future.
>
> Yury
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Maciej Fijalkowski
On Wed, Jan 20, 2016 at 7:22 PM, Brett Cannon  wrote:
>
>
> On Wed, 20 Jan 2016 at 10:11 Yury Selivanov  wrote:
>>
>> On 2016-01-18 5:43 PM, Victor Stinner wrote:
>> > Is someone opposed to this PEP 509?
>> >
>> > The main complain was the change on the public Python API, but the PEP
>> > doesn't change the Python API anymore.
>> >
>> > I'm not aware of any remaining issue on this PEP.
>>
>> Victor,
>>
>> I've been experimenting with the PEP to implement a per-opcode
>> cache in ceval loop (I'll share my progress on that in a few
>> days).  This allows to significantly speedup LOAD_GLOBAL and
>> LOAD_METHOD opcodes, to the point, where they don't require
>> any dict lookups at all.  Some macro-benchmarks (such as
>> chameleon_v2) demonstrate impressive ~10% performance boost.
>
>
> Ooh, now my brain is trying to figure out the design of the cache. :)
>
>>
>>
>> I rely on your dict->ma_version to implement cache invalidation.
>>
>> However, besides guarding against version change, I also want
>> to guard against the dict being swapped for another dict, to
>> avoid situations like this:
>>
>>
>>  def foo():
>>  print(bar)
>>
>>  exec(foo.__code__, {'bar': 1}, {})
>>  exec(foo.__code__, {'bar': 2}, {})
>>
>>
>> What I propose is to add a pointer "ma_extra" (same 64bits),
>> which will be set to NULL for most dict instances (instead of
>> ma_version).  "ma_extra" can then point to a struct that has a
>> globally unique dict ID (uint64), and a version tag (unit64).
>> A macro like PyDict_GET_ID and PyDict_GET_VERSION could then
>> efficiently fetch the version/unique ID of the dict for guards.
>>
>> "ma_extra" would also make it easier for us to extend dicts
>> in the future.
>
>
> Why can't you simply use the id of the dict object as the globally unique
> dict ID? It's already globally unique amongst all Python objects which makes
> it inherently unique amongst dicts.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

Brett, you need two things - the ID of the dict and the version tag.
What we do in pypy is we have a small object (called, surprisingly,
VersionTag()) and we use the ID of that. That way you can change the
version id of an existing dict and have only one field.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov

Brett,

On 2016-01-20 1:22 PM, Brett Cannon wrote:



On Wed, 20 Jan 2016 at 10:11 Yury Selivanov > wrote:


On 2016-01-18 5:43 PM, Victor Stinner wrote:
> Is someone opposed to this PEP 509?
>
> The main complain was the change on the public Python API, but
the PEP
> doesn't change the Python API anymore.
>
> I'm not aware of any remaining issue on this PEP.

Victor,

I've been experimenting with the PEP to implement a per-opcode
cache in ceval loop (I'll share my progress on that in a few
days).  This allows to significantly speedup LOAD_GLOBAL and
LOAD_METHOD opcodes, to the point, where they don't require
any dict lookups at all.  Some macro-benchmarks (such as
chameleon_v2) demonstrate impressive ~10% performance boost.


Ooh, now my brain is trying to figure out the design of the cache. :)


Yeah, it's tricky.  I'll need some time to draft a comprehensible
overview.  And I want to implement a couple more optimizations and
benchmark it better.

BTW, I've some updates (html5lib benchmark for py3, new benchmarks
for calling C methods, and I want to port some PyPy benchmakrs)
to the benchmarks suite.  Should I just commit them, or should I
use bugs.python.org?



I rely on your dict->ma_version to implement cache invalidation.

However, besides guarding against version change, I also want
to guard against the dict being swapped for another dict, to
avoid situations like this:


 def foo():
 print(bar)

 exec(foo.__code__, {'bar': 1}, {})
 exec(foo.__code__, {'bar': 2}, {})


What I propose is to add a pointer "ma_extra" (same 64bits),
which will be set to NULL for most dict instances (instead of
ma_version).  "ma_extra" can then point to a struct that has a
globally unique dict ID (uint64), and a version tag (unit64).
A macro like PyDict_GET_ID and PyDict_GET_VERSION could then
efficiently fetch the version/unique ID of the dict for guards.

"ma_extra" would also make it easier for us to extend dicts
in the future.


Why can't you simply use the id of the dict object as the globally 
unique dict ID? It's already globally unique amongst all Python 
objects which makes it inherently unique amongst dicts.


We have a freelist for dicts -- so if the dict dies, there
could be a new dict in its place, with the same ma_version.

While the probability of such hiccups is low, we still have
to account for it.

Yury

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov

On 2016-01-18 5:43 PM, Victor Stinner wrote:

Is someone opposed to this PEP 509?

The main complain was the change on the public Python API, but the PEP
doesn't change the Python API anymore.

I'm not aware of any remaining issue on this PEP.


Victor,

I've been experimenting with the PEP to implement a per-opcode
cache in ceval loop (I'll share my progress on that in a few
days).  This allows to significantly speedup LOAD_GLOBAL and
LOAD_METHOD opcodes, to the point, where they don't require
any dict lookups at all.  Some macro-benchmarks (such as
chameleon_v2) demonstrate impressive ~10% performance boost.

I rely on your dict->ma_version to implement cache invalidation.

However, besides guarding against version change, I also want
to guard against the dict being swapped for another dict, to
avoid situations like this:


def foo():
print(bar)

exec(foo.__code__, {'bar': 1}, {})
exec(foo.__code__, {'bar': 2}, {})


What I propose is to add a pointer "ma_extra" (same 64bits),
which will be set to NULL for most dict instances (instead of
ma_version).  "ma_extra" can then point to a struct that has a
globally unique dict ID (uint64), and a version tag (unit64).
A macro like PyDict_GET_ID and PyDict_GET_VERSION could then
efficiently fetch the version/unique ID of the dict for guards.

"ma_extra" would also make it easier for us to extend dicts
in the future.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Brett Cannon
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov  wrote:

> On 2016-01-18 5:43 PM, Victor Stinner wrote:
> > Is someone opposed to this PEP 509?
> >
> > The main complain was the change on the public Python API, but the PEP
> > doesn't change the Python API anymore.
> >
> > I'm not aware of any remaining issue on this PEP.
>
> Victor,
>
> I've been experimenting with the PEP to implement a per-opcode
> cache in ceval loop (I'll share my progress on that in a few
> days).  This allows to significantly speedup LOAD_GLOBAL and
> LOAD_METHOD opcodes, to the point, where they don't require
> any dict lookups at all.  Some macro-benchmarks (such as
> chameleon_v2) demonstrate impressive ~10% performance boost.
>

Ooh, now my brain is trying to figure out the design of the cache. :)


>
> I rely on your dict->ma_version to implement cache invalidation.
>
> However, besides guarding against version change, I also want
> to guard against the dict being swapped for another dict, to
> avoid situations like this:
>
>
>  def foo():
>  print(bar)
>
>  exec(foo.__code__, {'bar': 1}, {})
>  exec(foo.__code__, {'bar': 2}, {})
>
>
> What I propose is to add a pointer "ma_extra" (same 64bits),
> which will be set to NULL for most dict instances (instead of
> ma_version).  "ma_extra" can then point to a struct that has a
> globally unique dict ID (uint64), and a version tag (unit64).
> A macro like PyDict_GET_ID and PyDict_GET_VERSION could then
> efficiently fetch the version/unique ID of the dict for guards.
>
> "ma_extra" would also make it easier for us to extend dicts
> in the future.
>

Why can't you simply use the id of the dict object as the globally unique
dict ID? It's already globally unique amongst all Python objects which
makes it inherently unique amongst dicts.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Maciej Fijalkowski
On Wed, Jan 20, 2016 at 8:00 PM, Yury Selivanov  wrote:
>
>
> On 2016-01-20 1:36 PM, Maciej Fijalkowski wrote:
>>
>> On Wed, Jan 20, 2016 at 7:22 PM, Brett Cannon  wrote:
>>>
>>>
>>> On Wed, 20 Jan 2016 at 10:11 Yury Selivanov 
>>> wrote:
>
> [..]

 "ma_extra" would also make it easier for us to extend dicts
 in the future.
>>>
>>>
>>> Why can't you simply use the id of the dict object as the globally unique
>>> dict ID? It's already globally unique amongst all Python objects which
>>> makes
>>> it inherently unique amongst dicts.
>>>
>>>
>> Brett, you need two things - the ID of the dict and the version tag.
>> What we do in pypy is we have a small object (called, surprisingly,
>> VersionTag()) and we use the ID of that. That way you can change the
>> version id of an existing dict and have only one field.
>
>
>
> Yeah, that's essentially what I propose with ma_extra.
>
> Yury

The trick is we use only one field :-)

you're proposing to have both fields - version tag and dict id. Why
not just use the id of the object (without any fields)?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Glenn Linderman

On 1/20/2016 10:36 AM, Maciej Fijalkowski wrote:

Why can't you simply use the id of the dict object as the globally unique
>dict ID? It's already globally unique amongst all Python objects which makes
>it inherently unique amongst dicts.
>
>___
>Python-Dev mailing list
>Python-Dev@python.org
>https://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe:
>https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

Brett, you need two things - the ID of the dict and the version tag.
What we do in pypy is we have a small object (called, surprisingly,
VersionTag()) and we use the ID of that. That way you can change the
version id of an existing dict and have only one field.
For the reuse case, can't you simply keep the ma_version "live" in dict 
items on the free list, rather than starting over at (presumably) 0 ?  
Then if the dict is reused, it bumps the ma_version, and the fallback 
code goes on with (presumably) relocating the original dict (oops, it's 
gone), and dealing with the fallout.


Then you can use the regular dict id as the globally unique dict id?  
And only need the one uint64, rather than a separately allocated extra 
pair of uint64?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov



On 2016-01-20 1:36 PM, Maciej Fijalkowski wrote:

On Wed, Jan 20, 2016 at 7:22 PM, Brett Cannon  wrote:


On Wed, 20 Jan 2016 at 10:11 Yury Selivanov  wrote:

[..]

"ma_extra" would also make it easier for us to extend dicts
in the future.


Why can't you simply use the id of the dict object as the globally unique
dict ID? It's already globally unique amongst all Python objects which makes
it inherently unique amongst dicts.



Brett, you need two things - the ID of the dict and the version tag.
What we do in pypy is we have a small object (called, surprisingly,
VersionTag()) and we use the ID of that. That way you can change the
version id of an existing dict and have only one field.



Yeah, that's essentially what I propose with ma_extra.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov



On 2016-01-20 2:09 PM, Maciej Fijalkowski wrote:

>

You don't free a version tag that's stored in the guard. You store the
object and not id


Ah, got it.  Although for my current cache design it would be
more memory efficient to use the dict itself to store its own
unique id and tag, hence my "ma_extra" proposal.  In any case,
the current "ma_version" proposal is flawed :(

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Maciej Fijalkowski
there is also the problem that you don't want it on all dicts. So
having two extra words is more to pay than having extra objects (also,
comparison is cheaper for guards)

On Wed, Jan 20, 2016 at 8:23 PM, Yury Selivanov  wrote:
>
>
> On 2016-01-20 2:09 PM, Maciej Fijalkowski wrote:
>>>
>>> >
>>
>> You don't free a version tag that's stored in the guard. You store the
>> object and not id
>
>
> Ah, got it.  Although for my current cache design it would be
> more memory efficient to use the dict itself to store its own
> unique id and tag, hence my "ma_extra" proposal.  In any case,
> the current "ma_version" proposal is flawed :(
>
> Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov


On 2016-01-20 2:02 PM, Maciej Fijalkowski wrote:

On Wed, Jan 20, 2016 at 8:00 PM, Yury Selivanov  wrote:


[..]

Brett, you need two things - the ID of the dict and the version tag.
What we do in pypy is we have a small object (called, surprisingly,
VersionTag()) and we use the ID of that. That way you can change the
version id of an existing dict and have only one field.

Yeah, that's essentially what I propose with ma_extra.

Yury

The trick is we use only one field :-)

you're proposing to have both fields - version tag and dict id. Why
not just use the id of the object (without any fields)?


What if your old dict is GCed, its "VersionTag()" (1) object is
freed, and you have a new dict, for which a new "VersionTag()" (2)
object happens to be allocated at the same address as (1)?

Yury

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Maciej Fijalkowski
On Wed, Jan 20, 2016 at 8:08 PM, Yury Selivanov  wrote:
>
> On 2016-01-20 2:02 PM, Maciej Fijalkowski wrote:
>>
>> On Wed, Jan 20, 2016 at 8:00 PM, Yury Selivanov 
>> wrote:
>>
> [..]

 Brett, you need two things - the ID of the dict and the version tag.
 What we do in pypy is we have a small object (called, surprisingly,
 VersionTag()) and we use the ID of that. That way you can change the
 version id of an existing dict and have only one field.
>>>
>>> Yeah, that's essentially what I propose with ma_extra.
>>>
>>> Yury
>>
>> The trick is we use only one field :-)
>>
>> you're proposing to have both fields - version tag and dict id. Why
>> not just use the id of the object (without any fields)?
>
>
> What if your old dict is GCed, its "VersionTag()" (1) object is
> freed, and you have a new dict, for which a new "VersionTag()" (2)
> object happens to be allocated at the same address as (1)?
>
> Yury
>

You don't free a version tag that's stored in the guard. You store the
object and not id
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Mark Shannon



On 11/01/16 16:49, Victor Stinner wrote:

Hi,

After a first round on python-ideas, here is the second version of my
PEP. The main changes since the first version are that the dictionary
version is no more exposed at the Python level and the field type now
also has a size of 64-bit on 32-bit platforms.

The PEP is part of a serie of 3 PEP adding an API to implement a
static Python optimizer specializing functions with guards. The second
PEP is currently discussed on python-ideas and I'm still working on
the third PEP.


If anyone wants to experiment (at the C, not Python, level) with dict 
versioning to optimise load-global/builtins, then you can do so without 
adding a version number.


A "version" can created by splitting the dict with "make_keys_shared" 
and then making the keys-object immutable by setting "dk_usable" to zero.
This means that any change to the keys will force a keys-object change, 
but changes to the values will not.

For many optimisations, this is want you want.

Using this trick:
To read a global, check that the keys is the expected keys and read the 
value straight out of the values array at the known index.


To read a builtins, check that the module keys is the expected keys and 
thus cannot shadow the builtins, then read the builtins as above.


I don't know how much help this will be for a static optimiser, but it 
could work well for a dynamic optimiser. I used this optimisation in 
HotPy for optimising object attribute lookups.



Cheers,
Mark.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Glenn Linderman

On 1/20/2016 12:50 PM, Brett Cannon wrote:


A global (shared between all dicts) unit64 ma_version is actually
quite
reliable -- if a program does 1,000,000 dict modifications per second,
it would take it 600,000 years till wrap-around.



But would invalidate everything, instead of just a fraction of things, 
on every update to anything that is monitored...
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Brett Cannon
On Wed, 20 Jan 2016 at 10:46 Yury Selivanov  wrote:

> Brett,
>
> On 2016-01-20 1:22 PM, Brett Cannon wrote:
> >
> >
> > On Wed, 20 Jan 2016 at 10:11 Yury Selivanov  > > wrote:
> >
> > On 2016-01-18 5:43 PM, Victor Stinner wrote:
> > > Is someone opposed to this PEP 509?
> > >
> > > The main complain was the change on the public Python API, but
> > the PEP
> > > doesn't change the Python API anymore.
> > >
> > > I'm not aware of any remaining issue on this PEP.
> >
> > Victor,
> >
> > I've been experimenting with the PEP to implement a per-opcode
> > cache in ceval loop (I'll share my progress on that in a few
> > days).  This allows to significantly speedup LOAD_GLOBAL and
> > LOAD_METHOD opcodes, to the point, where they don't require
> > any dict lookups at all.  Some macro-benchmarks (such as
> > chameleon_v2) demonstrate impressive ~10% performance boost.
> >
> >
> > Ooh, now my brain is trying to figure out the design of the cache. :)
>
> Yeah, it's tricky.  I'll need some time to draft a comprehensible
> overview.  And I want to implement a couple more optimizations and
> benchmark it better.
>
> BTW, I've some updates (html5lib benchmark for py3, new benchmarks
> for calling C methods, and I want to port some PyPy benchmakrs)
> to the benchmarks suite.  Should I just commit them, or should I
> use bugs.python.org?
>

I actually emailed speed@ to see if people were interested in finally
sitting down with all the various VM implementations at PyCon and trying to
come up with a reasonable base set of benchmarks that better reflect modern
Python usage, but I never heard back.

Anyway, issues on bugs.python.org are probably best to talk about new
benchmarks before adding them (fixes and updates to pre-existing benchmarks
can just go in).


>
> >
> > I rely on your dict->ma_version to implement cache invalidation.
> >
> > However, besides guarding against version change, I also want
> > to guard against the dict being swapped for another dict, to
> > avoid situations like this:
> >
> >
> >  def foo():
> >  print(bar)
> >
> >  exec(foo.__code__, {'bar': 1}, {})
> >  exec(foo.__code__, {'bar': 2}, {})
> >
> >
> > What I propose is to add a pointer "ma_extra" (same 64bits),
> > which will be set to NULL for most dict instances (instead of
> > ma_version).  "ma_extra" can then point to a struct that has a
> > globally unique dict ID (uint64), and a version tag (unit64).
> > A macro like PyDict_GET_ID and PyDict_GET_VERSION could then
> > efficiently fetch the version/unique ID of the dict for guards.
> >
> > "ma_extra" would also make it easier for us to extend dicts
> > in the future.
> >
> >
> > Why can't you simply use the id of the dict object as the globally
> > unique dict ID? It's already globally unique amongst all Python
> > objects which makes it inherently unique amongst dicts.
>
> We have a freelist for dicts -- so if the dict dies, there
> could be a new dict in its place, with the same ma_version.
>

Ah, I figured it would be too simple to use something we already had.


>
> While the probability of such hiccups is low, we still have
> to account for it.
>

Yep.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov



On 2016-01-20 2:45 PM, Glenn Linderman wrote:
For the reuse case, can't you simply keep the ma_version "live" in 
dict items on the free list, rather than starting over at (presumably) 
0 ?  Then if the dict is reused, it bumps the ma_version, and the 
fallback code goes on with (presumably) relocating the original dict 
(oops, it's gone), and dealing with the fallout.


Not all dicts are created from a freelist, and not all dicts go to the 
freelist when they are GCed.


You still can have this situation:

- dict "A" is used as f_locals for a frame, its ma_version is set to X
- dict "A" is GCed, but the freelist is full, so it's just freed
- after a while, you call the code object, a new dict "B" is allocated 
with malloc (since now the freelist happens to be empty) for f_locals
- it happens that "B" is allocated where "A" was, and its ma_version 
happens to be X by an accident




Then you can use the regular dict id as the globally unique dict id?  
And only need the one uint64, rather than a separately allocated extra 
pair of uint64?


In my design only namespace dicts will have a non-NULL ma_extra, which 
means that only a fraction of dicts will actually have a separated pair 
of uint64s.


I think that we should either use one global ma_version (as Maciej 
suggested) or ma_extra, as it gives more flexibility for us to extend 
dicts in the future.


A global (shared between all dicts) unit64 ma_version is actually quite 
reliable -- if a program does 1,000,000 dict modifications per second, 
it would take it 600,000 years till wrap-around.


Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Brett Cannon
On Wed, 20 Jan 2016 at 12:27 Yury Selivanov  wrote:

>
>
> On 2016-01-20 2:45 PM, Glenn Linderman wrote:
> > For the reuse case, can't you simply keep the ma_version "live" in
> > dict items on the free list, rather than starting over at (presumably)
> > 0 ?  Then if the dict is reused, it bumps the ma_version, and the
> > fallback code goes on with (presumably) relocating the original dict
> > (oops, it's gone), and dealing with the fallout.
>
> Not all dicts are created from a freelist, and not all dicts go to the
> freelist when they are GCed.
>
> You still can have this situation:
>
> - dict "A" is used as f_locals for a frame, its ma_version is set to X
> - dict "A" is GCed, but the freelist is full, so it's just freed
> - after a while, you call the code object, a new dict "B" is allocated
> with malloc (since now the freelist happens to be empty) for f_locals
> - it happens that "B" is allocated where "A" was, and its ma_version
> happens to be X by an accident
>
> >
> > Then you can use the regular dict id as the globally unique dict id?
> > And only need the one uint64, rather than a separately allocated extra
> > pair of uint64?
>
> In my design only namespace dicts will have a non-NULL ma_extra, which
> means that only a fraction of dicts will actually have a separated pair
> of uint64s.
>
> I think that we should either use one global ma_version (as Maciej
> suggested) or ma_extra, as it gives more flexibility for us to extend
> dicts in the future.
>
> A global (shared between all dicts) unit64 ma_version is actually quite
> reliable -- if a program does 1,000,000 dict modifications per second,
> it would take it 600,000 years till wrap-around.
>

Since you're going to need to hold the GIL for the modifications there
won't be any locking or contention problems, so it sounds like the global
value is the best since that's simple, uses the least amount of memory, and
will be easiest to use as a modification check since that will be a simple
uint64 comparison instead of comparing a GUID + version.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Greg Ewing

Yury Selivanov wrote:

What I propose is to add a pointer "ma_extra" (same 64bits),
which will be set to NULL for most dict instances (instead of
ma_version).  "ma_extra" can then point to a struct that has a
globally unique dict ID (uint64), and a version tag (unit64).


Why not just use a single global counter for allocating
dict version tags, instead of one per dict?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Yury Selivanov

On 2016-01-20 1:15 PM, Maciej Fijalkowski wrote:

[..]

Anyway, I would strongly suggest getting some benchmarks done and
showing performance benefits first, because you don't want PEPs to be
final when you don't exactly know the details.


I agree 100%.

Yury

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Victor Stinner
Hi,

2016-01-20 22:18 GMT+01:00 Glenn Linderman :
> On 1/20/2016 12:50 PM, Brett Cannon wrote:
>>
>> A global (shared between all dicts) unit64 ma_version is actually quite
>> reliable -- if a program does 1,000,000 dict modifications per second,
>> it would take it 600,000 years till wrap-around.

I think that Yury found a bug in FAT Python. I didn't test the case
when the builtins dictionary is replaced after the definition of the
function. To be more concrete: when a function is executed in a
different namespace using exec(code, namespace). That's why I like the
PEP process, it helps to find all issues before going too far :-)

I like the idea of global counter for dictionary versions. It means
that the dictionary constructor increases this counter instead of
always starting to 0.

FYI a fat.GuardDict keeps a strong reference to the dictionary. For
some guards, I hesitated to store the object identifier and/or using a
weak reference. An object identifier is not reliable because the
object can be destroyed and a new object, completly different, or of
the same type, can get the same identifier.

> But would invalidate everything, instead of just a fraction of things, on
> every update to anything that is monitored...

I don't understand this point.

In short, the guard only has to compare two 64 bit integers in the
fast-path, when nothing changed. For a namespace, it means that no
value was replaced in this namespace.

If a different namespace is modified, the version of the watched
namespace does not change, so we are still in the fast-path.

If a value is replaced in the watched namespace, but not the watched
variable, we have to take a slow-path, hopefully only once.

The worst case is when a value different than the watched value is
modified between each guard check. In this case, we always need a dict
lookup. An heuristic can be chosen to decide to give up after N tries.
Currently, fat.GuardDict always retries.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Brett Cannon
On Wed, 20 Jan 2016 at 15:46 Victor Stinner 
wrote:

> Hi,
>
> 2016-01-20 22:18 GMT+01:00 Glenn Linderman :
> > On 1/20/2016 12:50 PM, Brett Cannon wrote:
> >>
> >> A global (shared between all dicts) unit64 ma_version is actually quite
> >> reliable -- if a program does 1,000,000 dict modifications per second,
> >> it would take it 600,000 years till wrap-around.
>
> I think that Yury found a bug in FAT Python. I didn't test the case
> when the builtins dictionary is replaced after the definition of the
> function. To be more concrete: when a function is executed in a
> different namespace using exec(code, namespace). That's why I like the
> PEP process, it helps to find all issues before going too far :-)
>
> I like the idea of global counter for dictionary versions. It means
> that the dictionary constructor increases this counter instead of
> always starting to 0.
>
> FYI a fat.GuardDict keeps a strong reference to the dictionary. For
> some guards, I hesitated to store the object identifier and/or using a
> weak reference. An object identifier is not reliable because the
> object can be destroyed and a new object, completly different, or of
> the same type, can get the same identifier.
>
> > But would invalidate everything, instead of just a fraction of things, on
> > every update to anything that is monitored...
>
> I don't understand this point.
>

I think Glenn was assuming we had a single, global version # that all dicts
shared *without* having a per-dict version ID. The key thing here is that
we have a global counter that tracks the number of mutations for *all*
dictionaries
but whose value we store as a *per-dictionary* value. That ends up making
the version ID inherently both a token representing the state of any dict
but also the uniqueness of the dict since no two dictionaries will ever
have the same version ID.


>
> In short, the guard only has to compare two 64 bit integers in the
> fast-path, when nothing changed. For a namespace, it means that no
> value was replaced in this namespace.
>
> If a different namespace is modified, the version of the watched
> namespace does not change, so we are still in the fast-path.
>
> If a value is replaced in the watched namespace, but not the watched
> variable, we have to take a slow-path, hopefully only once.
>
> The worst case is when a value different than the watched value is
> modified between each guard check. In this case, we always need a dict
> lookup. An heuristic can be chosen to decide to give up after N tries.
> Currently, fat.GuardDict always retries.
>

Does "retries" mean "check if the value really changed, and if it hasn't
then just update the version ID the guard checks"?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Glenn Linderman

On 1/20/2016 4:08 PM, Brett Cannon wrote:



On Wed, 20 Jan 2016 at 15:46 Victor Stinner > wrote:


Hi,

2016-01-20 22:18 GMT+01:00 Glenn Linderman >:
> On 1/20/2016 12:50 PM, Brett Cannon wrote:
>>
>> A global (shared between all dicts) unit64 ma_version is
actually quite
>> reliable -- if a program does 1,000,000 dict modifications per
second,
>> it would take it 600,000 years till wrap-around.

I think that Yury found a bug in FAT Python. I didn't test the case
when the builtins dictionary is replaced after the definition of the
function. To be more concrete: when a function is executed in a
different namespace using exec(code, namespace). That's why I like the
PEP process, it helps to find all issues before going too far :-)

I like the idea of global counter for dictionary versions. It means
that the dictionary constructor increases this counter instead of
always starting to 0.

FYI a fat.GuardDict keeps a strong reference to the dictionary. For
some guards, I hesitated to store the object identifier and/or using a
weak reference. An object identifier is not reliable because the
object can be destroyed and a new object, completly different, or of
the same type, can get the same identifier.

> But would invalidate everything, instead of just a fraction of
things, on
> every update to anything that is monitored...

I don't understand this point.


I think Glenn was assuming we had a single, global version # that all 
dicts shared *without* having a per-dict version ID. The key thing 
here is that we have a global counter that tracks the number of 
mutations for *all* dictionaries but whose value we store as a 
*per-dictionary* value. That ends up making the version ID inherently 
both a token representing the state of any dict but also the 
uniqueness of the dict since no two dictionaries will ever have the 
same version ID.


This would work. You were correct about my assumptions.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-20 Thread Victor Stinner
2016-01-21 1:08 GMT+01:00 Brett Cannon :
> On Wed, 20 Jan 2016 at 15:46 Victor Stinner 
>> The worst case is when a value different than the watched value is
>> modified between each guard check. In this case, we always need a dict
>> lookup. An heuristic can be chosen to decide to give up after N tries.
>> Currently, fat.GuardDict always retries.
>
> Does "retries" mean "check if the value really changed, and if it hasn't
> then just update the version ID the guard checks"?

If the dict version changes (because a value different than the
watched value is modified) each time that the guard is checked, the
guard always require a dict lookup to check if the watched value
changed.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-18 Thread Victor Stinner
Is someone opposed to this PEP 509?

The main complain was the change on the public Python API, but the PEP
doesn't change the Python API anymore.

I'm not aware of any remaining issue on this PEP.

Victor

2016-01-11 17:49 GMT+01:00 Victor Stinner :
> Hi,
>
> After a first round on python-ideas, here is the second version of my
> PEP. The main changes since the first version are that the dictionary
> version is no more exposed at the Python level and the field type now
> also has a size of 64-bit on 32-bit platforms.
>
> The PEP is part of a serie of 3 PEP adding an API to implement a
> static Python optimizer specializing functions with guards. The second
> PEP is currently discussed on python-ideas and I'm still working on
> the third PEP.
>
> Thanks to Red Hat for giving me time to experiment on this.
>
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> 
>
> Add a new private version to builtin ``dict`` type, incremented at each
> change, to implement fast guards on namespaces.
>
>
> Rationale
> =
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> `_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
> UNSET = object()
>
> class GuardDictKey:
> def __init__(self, dict, key):
> self.dict = dict
> self.key = key
> self.value = dict.get(key, UNSET)
> self.version = dict_get_version(dict)
>
> def check(self):
> """Return True if the dictionary entry did not changed."""
>
> # read the version field of the dict structure
> version = dict_get_version(self.dict)
> if version == self.version:
> # Fast-path: dictionary lookup avoided
> return True
>
> # lookup in the dictionary
> value = self.dict.get(self.key, UNSET)
> if value is self.value:
> # another key was modified:
> # cache the new dictionary version
> self.version = version
> return True
>
> # the key was modified
> return False
>
>
> Usage of the dict version
> =
>
> Specialized functions using guards
> --
>
> The `PEP 510 -- Specialized functions with guards
> `_ proposes an API to support
> specialized functions with guards. It allows to implement static
> optimizers for Python without breaking the Python semantics.
>
> Example of a static Python optimizer: the astoptimizer of the `FAT
> Python `_ project
> implements many optimizations which require guards on namespaces.
> Examples:
>
> * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
>   ``builtins.__dict__['len']`` and 

Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-18 Thread Barry Warsaw
On Jan 18, 2016, at 11:43 PM, Victor Stinner wrote:

>Is someone opposed to this PEP 509?
>
>The main complain was the change on the public Python API, but the PEP
>doesn't change the Python API anymore.
>
>I'm not aware of any remaining issue on this PEP.

Have you done any performance analysis for a wide range of Python
applications?  Did you address my suggestion on python-ideas to make the new C
API optionally compiled in?

I still think this is maintenance and potential performance overhead we don't
want to commit to long term unless it enables significant optimization.  Since
you probably can't prove that without some experimentation, this API should be
provisional.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-14 Thread Stephen J. Turnbull
Terry Reedy writes:

 > While I understand the rationale against __version__, it strikes me as a 
 > better description of what it is, and easier on the brain than 
 > __cache_token__.  Maybe there is something even better, such as 
 > __seqnum__.

Or __generation__, as in "generational garbage collector"?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-12 Thread Victor Stinner
Well, it was just a remark.

2016-01-12 0:35 GMT+01:00 Andrew Barnert :
> Are you saying that d[key] = d[key] may or may not increment the version, so 
> any optimizer can't rely on the fact that it doesn't?

Optimizers don't have to rely on this exactly behaviour. Not
incrementing the version on such case avoids dictionary lookups in the
guard.

My current patch does not increment if the value is the same, and I'm
unable to see any performance regression on *micro* benchmarks:
https://bugs.python.org/issue26058

So I'm in favor of making guards as efficient as possible and not
increment the version in dict ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Victor Stinner
2016-01-12 19:34 GMT+01:00 Jim J. Jewett :
> (1)  Please make it clear within the abstract what counts as a change.

I don't think that an abstract must give the long list of cases when
the version is modified or not. It's explained in detail at:
https://www.python.org/dev/peps/pep-0509/#changes

> (1b)  Is there a way to force a version update?

No. Why would you do that? (What is your use case.)

FYI there is a private API in _testcapi to set the version, for unit tests.

> (2)  I would like to see a .get on the guard object, so that it could
> be used in place of the dict lookup even from python.  If this doesn't
> make sense (e.g., doesn't really save time since the guard has to be
> used from python), please mention that in the Guard Example text.

Optimizations are out of the scope of this PEP.

Please see https://www.python.org/dev/peps/pep-0510/ to more examples
of specialization with guards.

See also https://faster-cpython.readthedocs.org/fat_python.html for
concrete optimizations using specialization with guards.

> (3)  It would be possible to define the field as reserved in the main
> header, and require another header to use it even from C.
>
> (3a)  This level of privacy might be overkill, but I would prefer that
> the decision be explicit.
>
> (3b)  The change should almost certainly be hidden from the ABI / 
> Py_LIMITED_API

Oh, the PyDictObject structure is not part of the stable ABI. It seems
worth to mention it in the PEP.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Victor Stinner
2016-01-12 19:52 GMT+01:00 Ethan Furman :
> [1] We're not going to call it __version__ are we?  Seems like
> __cache_token__ is a much better name.

See the online version to the most recent version of the PEP:
https://www.python.org/dev/peps/pep-0509/

In the first version I proposed to expose the version, but I changed
my mind, it's not private (only exposed in the C API).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Victor Stinner
2016-01-12 23:24 GMT+01:00 Ethan Furman :
> Even if not exposed at the Python layer, it's still exposed when working at
> the C layer.  Is __version__ any less confusing there?  (I only work in C
> when working on Python, and only occasionally, so my question is real.)

Fields of the PyDictObject must be prefixed with "ma_". If you read
the prior art of the PEP, you will see that Antoine Pitrou also
proposed the "ma_version" name. The existing version on types used the
"version_tag" name. Maybe I should pick this one.

Dunder names like __version__ is not used in the C language.

Do you expect "__version__" to be somehow "protected" or "private"?
The field is definitely public in the Python C API, and I don't think
that it's a problem. It's not really a choice, there is no way to hide
a field of a C structure. If you start to do random things on the C
API, it's your responsability :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Terry Reedy

On 1/12/2016 5:24 PM, Ethan Furman wrote:

On 01/12/2016 01:34 PM, Victor Stinner wrote:

2016-01-12 19:52 GMT+01:00 Ethan Furman :

[1] We're not going to call it __version__ are we?  Seems like
__cache_token__ is a much better name.


While I understand the rationale against __version__, it strikes me as a 
better description of what it is, and easier on the brain than 
__cache_token__.  Maybe there is something even better, such as 
__seqnum__.  This is literally what the attribute
is, a sequence/revision number, as with the Python repository, without 
the connotations of 'version', as in 'Python version'.  Each commit to 
CPython changes the repository state without changing the 'version'. A 
dict.update may run thru 1000s of sequence numbers, but only the final 
result is a 'version' from the programmers point of view.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Ethan Furman

On 01/12/2016 01:34 PM, Victor Stinner wrote:

2016-01-12 19:52 GMT+01:00 Ethan Furman :

[1] We're not going to call it __version__ are we?  Seems like
__cache_token__ is a much better name.


See the online version to the most recent version of the PEP:
https://www.python.org/dev/peps/pep-0509/

In the first version I proposed to expose the version, but I changed
my mind, it's not private (only exposed in the C API).


Even if not exposed at the Python layer, it's still exposed when working 
at the C layer.  Is __version__ any less confusing there?  (I only work 
in C when working on Python, and only occasionally, so my question is real.)


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 509

2016-01-12 Thread Jim J. Jewett
(1)  Please make it clear within the abstract what counts as a change.

(1a)  E.g., a second paragraph such as "Adding or removing a key, or
replacing a value, counts as a change.  Modifying an object in place,
or replacing it with itself may not be picked up."

(1b)  Is there a way to force a version update?

d[k]=d[k] seems like it should do that (absent the optimization to
prevent it), but I confess that I can't come up with a good use case
that doesn't start seeming internal to a specific optimizer.

(1c)  Section "Guard against changing dict during iteration" says
"Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation."  Why not?  Wouldn't that mutation involve
replacing a value, which ought to trigger a version change?


(2)  I would like to see a .get on the guard object, so that it could
be used in place of the dict lookup even from python.  If this doesn't
make sense (e.g., doesn't really save time since the guard has to be
used from python), please mention that in the Guard Example text.


(3)  It would be possible to define the field as reserved in the main
header, and require another header to use it even from C.

(3a)  This level of privacy might be overkill, but I would prefer that
the decision be explicit.

(3b)  The change should almost certainly be hidden from the ABI / Py_LIMITED_API

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Ethan Furman

On 01/12/2016 10:34 AM, Jim J. Jewett wrote:


(1c)  Section "Guard against changing dict during iteration" says
"Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation."  Why not?  Wouldn't that mutation involve
replacing a value, which ought to trigger a version change?



Yes it would, but mutating a dictionary value during iteration is legal, 
so we cannot use the __version__ [1] change to tell us that something 
illegal happened.


[1] We're not going to call it __version__ are we?  Seems like 
__cache_token__ is a much better name.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509

2016-01-12 Thread Ethan Furman

On 01/12/2016 03:24 PM, Victor Stinner wrote:
> 2016-01-12 23:24 GMT+01:00 Ethan Furman wrote:

>> Even if not exposed at the Python layer, it's still exposed when
>> working at the C layer.  Is __version__ any less confusing there?
>> (I only work in C when working on Python, and only occasionally, so
>> my question is real.)
>
> Fields of the PyDictObject must be prefixed with "ma_". If you read
> the prior art of the PEP, you will see that Antoine Pitrou also
> proposed the "ma_version" name. The existing version on types used the
> "version_tag" name. Maybe I should pick this one.
>
> Dunder names like __version__ is not used in the C language.
>
> Do you expect "__version__" to be somehow "protected" or "private"?

Nope, I just don't want to be misdirected when I see the name. :)  I 
think ma_version (or ma_seqnum) will be fine.


--
~Ethan~


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Gregory P. Smith
On Mon, Jan 11, 2016 at 8:50 AM Victor Stinner 
wrote:

> Hi,
>
> After a first round on python-ideas, here is the second version of my
> PEP. The main changes since the first version are that the dictionary
> version is no more exposed at the Python level and the field type now
> also has a size of 64-bit on 32-bit platforms.
>
> The PEP is part of a serie of 3 PEP adding an API to implement a
> static Python optimizer specializing functions with guards. The second
> PEP is currently discussed on python-ideas and I'm still working on
> the third PEP.
>
> Thanks to Red Hat for giving me time to experiment on this.
>
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> 
>
> Add a new private version to builtin ``dict`` type, incremented at each
> change, to implement fast guards on namespaces.
>
>
> Rationale
> =
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> `_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
> UNSET = object()
>
> class GuardDictKey:
> def __init__(self, dict, key):
> self.dict = dict
> self.key = key
> self.value = dict.get(key, UNSET)
> self.version = dict_get_version(dict)
>
> def check(self):
> """Return True if the dictionary entry did not changed."""
>
> # read the version field of the dict structure
> version = dict_get_version(self.dict)
> if version == self.version:
> # Fast-path: dictionary lookup avoided
> return True
>
> # lookup in the dictionary
> value = self.dict.get(self.key, UNSET)
> if value is self.value:
> # another key was modified:
> # cache the new dictionary version
> self.version = version
> return True
>
> # the key was modified
> return False
>
>
> Usage of the dict version
> =
>
> Specialized functions using guards
> --
>
> The `PEP 510 -- Specialized functions with guards
> `_ proposes an API to support
> specialized functions with guards. It allows to implement static
> optimizers for Python without breaking the Python semantics.
>
> Example of a static Python optimizer: the astoptimizer of the `FAT
> Python `_ project
> implements many optimizations which require guards on namespaces.
> Examples:
>
> * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
>   ``builtins.__dict__['len']`` and ``globals()['len']`` are required
> * Loop unrolling: to unroll the loop ``for i in range(...): ...``,
>   guards on ``builtins.__dict__['range']`` and ``globals()['range']``
>   are required
>
>
> Pyjion

Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Victor Stinner
2016-01-12 0:07 GMT+01:00 Gregory P. Smith :
>> Changes
>> ===
>>
>> (...)
>
> Please be more explicit about what tests you are performing on the values.
> setitem's "if the value is different" really should mean "if value is not
> dict['key']".  similarly for update, there should never be equality checks
> performed on the values.  just an "is" test of it they are the same object
> or not.

Ok, done. By the way, it's also explained below: values are compared
by their identify, not by their content.

For best dict efficiency, we can not implement this micro-optimization
(to avoid a potential branch misprediction in the CPU) and always
increase the version. But for guards, the micro-optimization can avoid
a lot of dictionary lookups, especially when a guard watches for a
large number of keys.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Andrew Barnert via Python-Dev
On Jan 11, 2016, at 15:24, Victor Stinner  wrote:
> 
> 2016-01-12 0:07 GMT+01:00 Gregory P. Smith :
>>> Changes
>>> ===
>>> 
>>> (...)
>> 
>> Please be more explicit about what tests you are performing on the values.
>> setitem's "if the value is different" really should mean "if value is not
>> dict['key']".  similarly for update, there should never be equality checks
>> performed on the values.  just an "is" test of it they are the same object
>> or not.
> 
> Ok, done. By the way, it's also explained below: values are compared
> by their identify, not by their content.
> 
> For best dict efficiency, we can not implement this micro-optimization
> (to avoid a potential branch misprediction in the CPU) and always
> increase the version. But for guards, the micro-optimization can avoid
> a lot of dictionary lookups, especially when a guard watches for a
> large number of keys.

Are you saying that d[key] = d[key] may or may not increment the version, so 
any optimizer can't rely on the fact that it doesn't?

If so, that seems reasonable. (The worst case in incrementing the version 
unnecessarily is that you miss an optimization that would have been safe, 
right?).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Victor Stinner
Hi,

After a first round on python-ideas, here is the second version of my
PEP. The main changes since the first version are that the dictionary
version is no more exposed at the Python level and the field type now
also has a size of 64-bit on 32-bit platforms.

The PEP is part of a serie of 3 PEP adding an API to implement a
static Python optimizer specializing functions with guards. The second
PEP is currently discussed on python-ideas and I'm still working on
the third PEP.

Thanks to Red Hat for giving me time to experiment on this.


HTML version:
https://www.python.org/dev/peps/pep-0509/


PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6


Abstract


Add a new private version to builtin ``dict`` type, incremented at each
change, to implement fast guards on namespaces.


Rationale
=

In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards
on namespaces.

Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.

Example of optimization: copy the value of a global variable to function
constants.  This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime when the function is called, instead of using the
constant.

See the `PEP 510 -- Specialized functions with guards
`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static
optimizers.


Guard example
=

Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::

UNSET = object()

class GuardDictKey:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.value = dict.get(key, UNSET)
self.version = dict_get_version(dict)

def check(self):
"""Return True if the dictionary entry did not changed."""

# read the version field of the dict structure
version = dict_get_version(self.dict)
if version == self.version:
# Fast-path: dictionary lookup avoided
return True

# lookup in the dictionary
value = self.dict.get(self.key, UNSET)
if value is self.value:
# another key was modified:
# cache the new dictionary version
self.version = version
return True

# the key was modified
return False


Usage of the dict version
=

Specialized functions using guards
--

The `PEP 510 -- Specialized functions with guards
`_ proposes an API to support
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.

Example of a static Python optimizer: the astoptimizer of the `FAT
Python `_ project
implements many optimizations which require guards on namespaces.
Examples:

* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
  ``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
  guards on ``builtins.__dict__['range']`` and ``globals()['range']``
  are required


Pyjion
--

According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can also benefit from dictionary version to implement
optimizations.

Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
Core runtime).


Unladen Swallow
---

Even if dictionary version was not 

Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Maciej Fijalkowski
Hi Victor.

You know that pypy does this stuff without changing and exposing
python semantics right? We have a version dict that does not leak
abstractions to the user.

In general, doing stuff like that where there is a public API that
leaks details of certain optimizations makes it harder and harder for
optimizing compilers to do their job properly, if you want to do
something slightly different.

Can we make this happen (as you noted in the prior art) WITHOUT
changing ANY of the things exposed to the user?

On Mon, Jan 11, 2016 at 6:49 PM, Victor Stinner
 wrote:
> Hi,
>
> After a first round on python-ideas, here is the second version of my
> PEP. The main changes since the first version are that the dictionary
> version is no more exposed at the Python level and the field type now
> also has a size of 64-bit on 32-bit platforms.
>
> The PEP is part of a serie of 3 PEP adding an API to implement a
> static Python optimizer specializing functions with guards. The second
> PEP is currently discussed on python-ideas and I'm still working on
> the third PEP.
>
> Thanks to Red Hat for giving me time to experiment on this.
>
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> 
>
> Add a new private version to builtin ``dict`` type, incremented at each
> change, to implement fast guards on namespaces.
>
>
> Rationale
> =
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> `_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
> UNSET = object()
>
> class GuardDictKey:
> def __init__(self, dict, key):
> self.dict = dict
> self.key = key
> self.value = dict.get(key, UNSET)
> self.version = dict_get_version(dict)
>
> def check(self):
> """Return True if the dictionary entry did not changed."""
>
> # read the version field of the dict structure
> version = dict_get_version(self.dict)
> if version == self.version:
> # Fast-path: dictionary lookup avoided
> return True
>
> # lookup in the dictionary
> value = self.dict.get(self.key, UNSET)
> if value is self.value:
> # another key was modified:
> # cache the new dictionary version
> self.version = version
> return True
>
> # the key was modified
> return False
>
>
> Usage of the dict version
> =
>
> Specialized functions using guards
> --
>
> The `PEP 510 -- Specialized functions with guards
> `_ proposes an API to support
> specialized functions with guards. It allows to implement static
> optimizers for Python without breaking the Python semantics.
>
> Example of a static 

Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Victor Stinner
Le 11 janv. 2016 8:09 PM, "Maciej Fijalkowski"  a écrit :
> Hi Victor.
>
> You know that pypy does this stuff without changing and exposing
> python semantics right? We have a version dict that does not leak
> abstractions to the user.

The PEP adds a field to the C structure PyDictObject. Are you asking me to
hide it from the C structure?

The first version of my PEP added a public read-only property at Python
level, but I changed the PEP. See the alternatives section for more detail.

Victor

> In general, doing stuff like that where there is a public API that
> leaks details of certain optimizations makes it harder and harder for
> optimizing compilers to do their job properly, if you want to do
> something slightly different.
>
> Can we make this happen (as you noted in the prior art) WITHOUT
> changing ANY of the things exposed to the user?
>
> On Mon, Jan 11, 2016 at 6:49 PM, Victor Stinner
>  wrote:
> > Hi,
> >
> > After a first round on python-ideas, here is the second version of my
> > PEP. The main changes since the first version are that the dictionary
> > version is no more exposed at the Python level and the field type now
> > also has a size of 64-bit on 32-bit platforms.
> >
> > The PEP is part of a serie of 3 PEP adding an API to implement a
> > static Python optimizer specializing functions with guards. The second
> > PEP is currently discussed on python-ideas and I'm still working on
> > the third PEP.
> >
> > Thanks to Red Hat for giving me time to experiment on this.
> >
> >
> > HTML version:
> > https://www.python.org/dev/peps/pep-0509/
> >
> >
> > PEP: 509
> > Title: Add a private version to dict
> > Version: $Revision$
> > Last-Modified: $Date$
> > Author: Victor Stinner 
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 4-January-2016
> > Python-Version: 3.6
> >
> >
> > Abstract
> > 
> >
> > Add a new private version to builtin ``dict`` type, incremented at each
> > change, to implement fast guards on namespaces.
> >
> >
> > Rationale
> > =
> >
> > In Python, the builtin ``dict`` type is used by many instructions. For
> > example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> > global namespace, or in the builtins namespace (two dict lookups).
> > Python uses ``dict`` for the builtins namespace, globals namespace, type
> > namespaces, instance namespaces, etc. The local namespace (namespace of
> > a function) is usually optimized to an array, but it can be a dict too.
> >
> > Python is hard to optimize because almost everything is mutable: builtin
> > functions, function code, global variables, local variables, ... can be
> > modified at runtime. Implementing optimizations respecting the Python
> > semantics requires to detect when "something changes": we will call
> > these checks "guards".
> >
> > The speedup of optimizations depends on the speed of guard checks. This
> > PEP proposes to add a version to dictionaries to implement fast guards
> > on namespaces.
> >
> > Dictionary lookups can be skipped if the version does not change which
> > is the common case for most namespaces. The performance of a guard does
> > not depend on the number of watched dictionary entries, complexity of
> > O(1), if the dictionary version does not change.
> >
> > Example of optimization: copy the value of a global variable to function
> > constants.  This optimization requires a guard on the global variable to
> > check if it was modified. If the variable is modified, the variable must
> > be loaded at runtime when the function is called, instead of using the
> > constant.
> >
> > See the `PEP 510 -- Specialized functions with guards
> > `_ for the concrete usage of
> > guards to specialize functions and for the rationale on Python static
> > optimizers.
> >
> >
> > Guard example
> > =
> >
> > Pseudo-code of an fast guard to check if a dictionary entry was modified
> > (created, updated or deleted) using an hypothetical
> > ``dict_get_version(dict)`` function::
> >
> > UNSET = object()
> >
> > class GuardDictKey:
> > def __init__(self, dict, key):
> > self.dict = dict
> > self.key = key
> > self.value = dict.get(key, UNSET)
> > self.version = dict_get_version(dict)
> >
> > def check(self):
> > """Return True if the dictionary entry did not changed."""
> >
> > # read the version field of the dict structure
> > version = dict_get_version(self.dict)
> > if version == self.version:
> > # Fast-path: dictionary lookup avoided
> > return True
> >
> > # lookup in the dictionary
> > value = self.dict.get(self.key, UNSET)
> > if value is self.value:
> > # another key was modified:
> > # cache the new 

Re: [Python-Dev] PEP 509: Add a private version to dict

2016-01-11 Thread Maciej Fijalkowski
On Mon, Jan 11, 2016 at 9:56 PM, Victor Stinner
 wrote:
> Le 11 janv. 2016 8:09 PM, "Maciej Fijalkowski"  a écrit :
>> Hi Victor.
>>
>> You know that pypy does this stuff without changing and exposing
>> python semantics right? We have a version dict that does not leak
>> abstractions to the user.
>
> The PEP adds a field to the C structure PyDictObject. Are you asking me to
> hide it from the C structure?
>
> The first version of my PEP added a public read-only property at Python
> level, but I changed the PEP. See the alternatives section for more detail.
>
> Victor

I asked you to hide it from python, read the wrong version :-)

Cool!
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com