[issue43468] functools.cached_property incorrectly locks the entire descriptor on class instead of per-instance locking

Raymond Hettinger Sat, 31 Jul 2021 20:36:23 -0700


Raymond Hettinger <[email protected]> added the comment:


Antti Haapala, I agree that this situation is catastrophic and that we need 
some way to avoid blocking parallel calculations of cached values for distinct 
instances of the same class. 

Here's an idea that might possibly work.  Perhaps, hold one lock only briefly 
to atomically test and set a variable to track which instances are actively 
being updated.  

If another thread is performing the update, use a separate condition condition 
variable to wait for the update to complete.

If no other thread is doing the update, we don't need to hold a lock while 
performing the I/O bound underlying function.  And when we're done updating 
this specific instance, atomically update the set of instances being actively 
updated and notify threads waiting on the condition variable to wake-up.

The key idea is to hold the lock only for variable updates (which are fast) 
rather than for the duration of the underlying function call (which is slow).  
Only when this specific instance is being updated do we use a separate lock 
(wrapped in a  condition variable) to block until the slow function call is 
complete.

The logic is hairy, so I've added Serhiy and Tim to the nosy list to help think 
it through.

--- Untested sketch ---------------------------------

class cached_property:
    def __init__(self, func):
        self.update_lock = RLock()
        self.instances_other_thread_is_updating = {}
        self.cv = Condition(RLock())
        ...
        
    def __get__(self, instance, owner=None):
        if instance is None:
            return self
        if self.attrname is None:
            raise TypeError(
                "Cannot use cached_property instance without calling 
__set_name__ on it.")
        try:
            cache = instance.__dict__
        except AttributeError:  # not all objects have __dict__ (e.g. class 
defines slots)
            msg = (
                f"No '__dict__' attribute on {type(instance).__name__!r} "
                f"instance to cache {self.attrname!r} property."
            )
            raise TypeError(msg) from None
        
        val = cache.get(self.attrname, _NOT_FOUND)
        if val is not _NOT_FOUND:
            return val
        
        # Now we need to either call the function or wait for another thread to 
do it
        with self.update_lock:
            # Invariant: no more than one thread can report
            # that the instance is actively being updated
            other_thread_is_updating = instance in instance_being_updated
            if other_thread_is_updating:
                instance_being_updated.add(instance)

        # ONLY if this is the EXACT instance being updated
        # will we block and wait for the computed result.
        # Other instances won't have to wait
        if other_thread_is_updating:
            with self.cv:
                while instance in instance_being_updated:
                    self.cv.wait()
                return cache[self.attrname]

        # Let this thread do the update in this thread
        val = self.func(instance)
        try:
            cache[self.attrname] = val
        except TypeError:
            msg = (
                f"The '__dict__' attribute on {type(instance).__name__!r} 
instance "
                f"does not support item assignment for caching 
{self.attrname!r} property."
            )
            raise TypeError(msg) from None
        with self.update_lock:
            instance_being_updated.discard(instance)
        self.cv.notify_all()
        return val   
        
        # XXX What to do about waiting threads when an exception occurs? 
        # We don't want them to hang.  Should they repeat the underlying
        # function call to get their own exception to propagate upwards?

----------
nosy: +serhiy.storchaka, tim.peters

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43468>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43468] functools.cached_property incorrectly locks the entire descriptor on class instead of per-instance locking

Reply via email to