On 23 Jan 2014, at 08:41, Joshua Harlow <harlo...@yahoo-inc.com> wrote:

> So to me memoizing is typically a premature optimization in a lot of cases. 
> And doing it incorrectly leads to overfilling the python processes memory 
> (your global dict will have objects in it that can't be garbage collected, 
> and with enough keys+values being stored will act just like a memory leak; 
> basically it acts as a new GC root object in a way) or more cache 
> invalidation races/inconsistencies than just recomputing the initial value…

I agree with your concerns here. At the same time, I think this thinking 
shouldn’t cancel cases of conscious usage of caching technics. A decent cache 
implementation would help to solve lots of performance problems (which 
eventually becomes a concern for any project).

> Overall though there are a few caching libraries I've seen being used, any of 
> which could be used for memoization.
> 
> - 
> https://github.com/openstack/oslo-incubator/tree/master/openstack/common/cache
> - 
> https://github.com/openstack/oslo-incubator/blob/master/openstack/common/memorycache.py

I looked at the code. I have lots of question to the implementation (like cache 
eviction policies, whether or not it works well with green threads, but I think 
it’s a subject for a separate discussion though). Could you please share your 
experience of using it? Were there specific problems that you could point to? 
May be they are already described somewhere?

> - dogpile cache @ https://pypi.python.org/pypi/dogpile.cache

This one looks really interesting in terms of claimed feature set. It seems to 
be compatible with Python 2.7, not sure about 2.6. As above, it would be cool 
you told about your experience with it.


> I am personally weary of using them for memoization, what expensive method 
> calls do u see the complexity of this being useful? I didn't think that many 
> method calls being done in openstack warranted the complexity added by doing 
> this (premature optimization is the root of all evil...). Do u have data 
> showing where it would be applicable/beneficial?

I believe there’s a great deal of use cases like caching db objects or more 
generally caching any heavy objects involving interprocess communication. For 
instance, API clients may be caching objects that are known to be immutable on 
the server side.


> 
> Sent from my really tiny device...
> 
>> On Jan 23, 2014, at 8:19 AM, "Shawn Hartsock" <harts...@acm.org> wrote:
>> 
>> I would like to have us adopt a memoizing caching library of some kind
>> for use with OpenStack projects. I have no strong preference at this
>> time and I would like suggestions on what to use.
>> 
>> I have seen a number of patches where people have begun to implement
>> their own caches in dictionaries. This typically confuses the code and
>> mixes issues of correctness and performance in code.
>> 
>> Here's an example:
>> 
>> We start with:
>> 
>> def my_thing_method(some_args):
>>   # do expensive work
>>   return value
>> 
>> ... but a performance problem is detected... maybe the method is
>> called 15 times in 10 seconds but then not again for 5 minutes and the
>> return value can only logically change every minute or two... so we
>> end up with ...
>> 
>> _GLOBAL_THING_CACHE = {}
>> 
>> def my_thing_method(some_args):
>>   key = key_from(some_args)
>>    if key in _GLOBAL_THING_CACHE:
>>        return _GLOBAL_THING_CACHE[key]
>>    else:
>>         # do expensive work
>>         _GLOBAL_THING_CACHE[key] = value
>>         return value
>> 
>> ... which is all well and good... but now as a maintenance programmer
>> I need to comprehend the cache mechanism, when cached values are
>> invalidated, and if I need to debug the "do expensive work" part I
>> need to tease out some test that prevents the cache from being hit.
>> Plus I've introduced a new global variable. We love globals right?
>> 
>> I would like us to be able to say:
>> 
>> @memoize(seconds=10)
>> def my_thing_method(some_args):
>>   # do expensive work
>>   return value
>> 
>> ... where we're clearly addressing the performance issue by
>> introducing a cache and limiting it's possible impact to 10 seconds
>> which allows for the idea that "do expensive work" has network calls
>> to systems that may change state outside of this Python process.
>> 
>> I'd like to see this done because I would like to have a place to
>> point developers to during reviews... to say: use "common/memoizer" or
>> use "Bob's awesome memoizer" because Bob has worked out all the cache
>> problems already and you can just use it instead of worrying about
>> introducing new bugs by building your own cache.
>> 
>> Does this make sense? I'd love to contribute something... but I wanted
>> to understand why this state of affairs has persisted for a number of
>> years... is there something I'm missing?
>> 
>> -- 
>> # Shawn.Hartsock - twitter: @hartsock - plus.google.com/+ShawnHartsock
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to