[issue46713] Provide a C implementation of collections.abc.KeysView and friends

2022-02-11 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

- arbitrary mappings supports by the view ABCs
+ arbitrary mappings supported by the view ABCs

- A first look,
+ At first glance,

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46713] Provide a C implementation of collections.abc.KeysView and friends

2022-02-11 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Some thoughts:

* Other than set operations, most of the pure python code in the dict view ABCs 
are fast pass throughs.  There is no point in rewriting these in C:

def __contains__(self, key):
return key in self._mapping

def __iter__(self):
yield from self._mapping

def __len__(self):
return len(self._mapping)

* For the set operations, the pure python code has only a simple for-loop and 
contains test.  There isn't much fat to be cut:

def __ge__(self, other):
if not isinstance(other, Set):
return NotImplemented
if len(self) < len(other):
return False
for elem in other:
if elem not in self:
return False
return True

* Possibly the eval-loop overhead can be eliminated with Python by using 
itertools:

def __ge__(self, other):
if not isinstance(other, Set):
return NotImplemented
if len(self) < len(other):
return False
for elem in filterfalse(self.__contains__, other):
return False
return True

* That leaves the question of why the dict views are so much faster (presuming 
that the posted timings are representative).  I haven't looked in detail, but 
the first candidate that comes to mind is that dictviews_to_set() has a fast 
path for exact dicts.  That lets it bypass the mapping proxy and exploit the 
fast path in PySet_New for exact dicts.  That fast path reuses the stored hash 
values and exploits knowing that the input has no duplicates.  Off-hand, I 
don't see how that can generalize to the arbitrary mappings supports by the 
view ABCs.  

* Summary:  A first look, it doesn't seem like a C rewrite of the view ABCs 
would bear fruit.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46713] Provide a C implementation of collections.abc.KeysView and friends

2022-02-10 Thread Alex Waygood


Change by Alex Waygood :


--
nosy: +rhettinger
type:  -> performance
versions: +Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46713] Provide a C implementation of collections.abc.KeysView and friends

2022-02-10 Thread Joshua Bronson

New submission from Joshua Bronson :

As suggested by @rhettinger in https://bugs.python.org/msg409443, I'm creating 
a feature request for C implementations of collections.abc.KeysView, 
ValuesView, and ItemsView.

Because these do not currently benefit from C speedups, they're a lot slower 
than their dict_keys, dict_values, and dict_items counterparts. As a result, 
libraries that implement custom Mapping types that are backed by dicts are 
incentivized to override the implementations of keys(), values(), and items() 
they inherit from collections.abc.Mapping to instead return their backing 
dicts' mapping views, causing a potential abstraction leak.

An example can be found in https://github.com/jab/bidict, which implements 
bidirectional mapping types that wrap a forward and an inverse dict which are 
kept in sync with one another.

>>> from bidict import *
>>> bi = bidict({1: 'one', 2: 'two'})
>>> bi.items()  # Overridden for performance:
dict_items([(1, 'one'), (2, 'two')])

Ditto for OrderedBidict:

>>> OrderedBidict(bi).keys()
_OrderedBidictItemsView(OrderedBidict([(1, 'one'), (2, 'two')]))


(The _OrderedBidictItemsView is a custom view whose __iter__ uses the 
implementation inherited by its collections.abc.ItemsView base class so that 
the correct order is respected, but proxies other method calls through to the 
backing dict's dict_items object: 
https://github.com/jab/bidict/blob/2ab42a/bidict/_orderedbidict.py#L90-L150)


Here is a microbenchmark of calling __eq__ on an _OrderedBidictItemsView vs. a 
collections.abc.ItemsView, to estimate the performance impact (using Python 
3.10):

❯ set setup '
from collections.abc import ItemsView
from bidict import OrderedBidict
d = dict(zip(range(), range()))
ob = OrderedBidict(d)'

❯ python -m pyperf timeit -s $setup 'ob.items() == d.items()' -o 1.json

❯ python -m pyperf timeit -s $setup 'ItemsView(ob) == d.items()' -o 2.json

❯ pyperf compare_to 2.json 1.json
Mean +- std dev: [2] 4.21 ms +- 1.10 ms -> [1] 168 us +- 6 us: 25.13x faster


This demonstrates a potentially significant speedup. Similar microbenchmarks 
for ItemsView vs. dict_items, as well as KeysView vs. both dict_keys and 
_OrderedBidictKeysView, also indicate similarly significant potential.

Note that the performance benefits of this may propagate to other code as well. 
For example, bidicts' __eq__ methods are implemented in terms of their 
itemsviews (see 
https://github.com/jab/bidict/blob/2ab42a/bidict/_base.py#L285-L286), so 
speeding up bidict.items().__eq__ speeds up bidict.__eq__ commensurately.

--
messages: 413020
nosy: jab
priority: normal
severity: normal
status: open
title: Provide a C implementation of collections.abc.KeysView and friends

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com