Itay azolay <itayazo...@gmail.com> added the comment:

Thanks, you have some very good points.
Let my try to address them

* cache functions really are expected to be cheap, but what they really need to 
be is *cheaper*. If my computation is expensive enough, I might be okay with 
making a less, still somewhat expensive computation instead. I believe it's for 
the developer to decide.

* key is usually used in a sequence of elements contexts, but it is expected to 
run multiple times. I believe that this is expected(what else could someone 
expect to happen?). I believe this is solvable through good docs(or change the 
name of the parameter?) 

* I believe that a matching signature key-make function is a good thing. It 
will enforce the user to address the key-make function if he changes the 
behaviour of the cached function, and he would rethink the cache, otherwise it 
will not work.

* I can't argue about API simplicity, you probably have much more experience 
there. However, I believe that if we can agree that this is a useful feature, 
we can find a way to make the API clear and welcoming.
BTW, I agree with the problems with the typed argument, never quite understood 
when can this be useful.

I'd like to compare the key argument suggested here, to key argument through 
other python functions. let's take `sorted` as example.
sorted supports key to be able to sort other types of data structures,
even though I like your suggestion, to use dataclass, I believe that if it's 
applicable here, we can say the same thing for sorted.
we could require sorted to work the same way:

@total_ordering # If I'm not mistaken
@dataclass
class MyData:
   ...
   fields
   ...
   def __gt__(self, other):
     return self.field > other.field

sorted(MyData(my_data_instance))


I think we both see the reason why this wouldn't be optimal in some cases here.
Without the key function, the sorted function doesn't support a big part of 
python objects.
I think the same applies for LRU cache. Right now, we just can use it with all 
python objects. we have to change the API, the way we move data around, the way 
we keep our objects, just so that lru_cache would work.
And after all that, lru_cache will just break if someone send some data in a 
list instead of tuple. I think that cause a lot of developers to give up the 
default stdlib lru_cache.

In my case, I have a few list of lists, each list indicates an event that 
happened. In each event, there is a unique timestamp. 
I have an object, that have few different lists
class Myobj:
    events_1: List[list]
    events_2: List[list]


I have a small, esoteric function, that looks like that now:
def calc(list_of_events):
  # calculation
  pass

and is being called from multiple places in the code, which takes a lot of 
time, like that
calc(events_1) # multiple times
calc(events_2) # multiple times

I wanted to cache the function calc, but now I have to do something like that:
@lru_cache
def calc_events_1(myobj):
  calc(myobj.events_1)

@lru_cache
def calc_events_2(myobj):
  calc(myobj.events_2)

right now I can't change the API of the lists, because they are being used in 
multiple places, some of this least(I have multiple events-lists) are being 
converted to numpy, some doesn't.

Regarding API, we could make it simpler by either use must have kwargs, like 
lru_cache(maxsize, typed, *, key=None)
or, like the property setters/getters case
lru_cache
def function(args, ...):
  pass
@function.make_key # or key, whatever name is good
def _(args, ...):
  return new_key

However I like the second option less.

Thanks

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41220>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to