Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

Thanks, I see what you're trying to do now:

1) Given a slow function 
2) that takes a complex argument 
   2a)  that includes a hashable unique identifier 
   2b)  and some unhashable data
3) Cache the function result using only the unique identifier

The lru_cache() currently can't be used directly because
all the function arguments must be hashable.

The proposed solution:
1) Write a helper function
   1a) that hash the same signature as the original function
   1b) that returns only the hashable unique identifier
2) With a single @decorator application, connect
   2a) the original function
   2b) the helper function
   2c) and the lru_cache logic


A few areas of concern come to mind:

* People have come to expect cached calls to be very cheap, but it is easy to 
write input transformations that aren't cheap (i.e. looping over all the inputs 
as in your example or converting entire mutable structures to immutable 
structures).

* While key-functions are relatively well understood, when we use them 
elsewhere key-functions only get called once per element.  Here, the 
lru_cache() would call the key function every time even if the arguments are 
identical.  This will be surprising to some users.

* The helper function signature needs exactly match the wrapped function.  
Changes would need to be made in both places.

* It would be hard to debug if the helper function return values ever stop 
being unique.  For example, if the timestamps start getting rounded to the 
nearest second, they will sporadically become non-unique.

* The lru_cache signature makes it awkward to add more arguments.  That is why 
your examples had to explicitly specify a maxsize of 128 even though 128 is the 
default. 

* API simplicity was an early design goal.  Already, I made a mistake by 
accepting the "typed" argument which is almost never used but regularly causes 
confusion and affects learnability.

* The use case is predicated on having a large unhashable dataset accompanied 
by a hashable identifier that is assumed to be unique.  This probably isn't 
common enough to warrant an API extension.  

Out of curiosity, what are you doing now without the proposed extension?  

As a first try, I would likely write a dataclass to be explicit about the types 
and about which fields are used in hashing and equality testing:

    @dataclass(unsafe_hash=True)
    class ItemsList:
        unique_id: float
        data: dict = field(hash=False, compare=False)

I expect that dataclasses like this will emerge as the standard solution 
whenever people need a mapping or dict to work with keys that have a mix of 
hashable and unhashable components.  This will work with the lru_cache(), 
dict(), defaultdict(), ChainMap(), set(), frozenset(), etc.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41220>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to