Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-24 Thread Renat Akhmerov
Joining to providing our backgrounds.. I’d be happy to help here too since I 
have pretty solid background in using and developing caching solutions, however 
mostly in Java world (expertise in GemFire and Coherence, developing GridGain 
distributed cache). 

Renat Akhmerov
@ Mirantis Inc.



On 23 Jan 2014, at 18:38, Joshua Harlow harlo...@yahoo-inc.com wrote:

 Same here; I've done pretty big memcache (and similar technologies) scale 
 caching  invalidations at Y! before so here to help…
 
 From: Morgan Fainberg m...@metacloud.com
 Reply-To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Date: Thursday, January 23, 2014 at 4:17 PM
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [oslo] memoizer aka cache
 
 Yes! There is a reason Keystone has a very small footprint of 
 caching/invalidation done so far.  It really needs to be correct when it 
 comes to proper invalidation logic.  I am happy to offer some help in 
 determining logic for caching/invalidation with Dogpile.cache in mind as we 
 get it into oslo and available for all to use.
 
 --Morgan
 
 
 
 On Thu, Jan 23, 2014 at 2:54 PM, Joshua Harlow harlo...@yahoo-inc.com wrote:
 Sure, no cancelling cases of conscious usage, but we need to be careful
 here and make sure its really appropriate. Caching and invalidation
 techniques are right up there in terms of problems that appear easy and
 simple to initially do/use, but doing it correctly is really really hard
 (especially at any type of scale).
 
 -Josh
 
 On 1/23/14, 1:35 PM, Renat Akhmerov rakhme...@mirantis.com wrote:
 
 
 On 23 Jan 2014, at 08:41, Joshua Harlow harlo...@yahoo-inc.com wrote:
 
  So to me memoizing is typically a premature optimization in a lot of
 cases. And doing it incorrectly leads to overfilling the python
 processes memory (your global dict will have objects in it that can't be
 garbage collected, and with enough keys+values being stored will act
 just like a memory leak; basically it acts as a new GC root object in a
 way) or more cache invalidation races/inconsistencies than just
 recomputing the initial valueŠ
 
 I agree with your concerns here. At the same time, I think this thinking
 shouldn¹t cancel cases of conscious usage of caching technics. A decent
 cache implementation would help to solve lots of performance problems
 (which eventually becomes a concern for any project).
 
  Overall though there are a few caching libraries I've seen being used,
 any of which could be used for memoization.
 
  -
 https://github.com/openstack/oslo-incubator/tree/master/openstack/common/
 cache
  -
 https://github.com/openstack/oslo-incubator/blob/master/openstack/common/
 memorycache.py
 
 I looked at the code. I have lots of question to the implementation (like
 cache eviction policies, whether or not it works well with green threads,
 but I think it¹s a subject for a separate discussion though). Could you
 please share your experience of using it? Were there specific problems
 that you could point to? May be they are already described somewhere?
 
  - dogpile cache @ https://pypi.python.org/pypi/dogpile.cache
 
 This one looks really interesting in terms of claimed feature set. It
 seems to be compatible with Python 2.7, not sure about 2.6. As above, it
 would be cool you told about your experience with it.
 
 
  I am personally weary of using them for memoization, what expensive
 method calls do u see the complexity of this being useful? I didn't
 think that many method calls being done in openstack warranted the
 complexity added by doing this (premature optimization is the root of
 all evil...). Do u have data showing where it would be
 applicable/beneficial?
 
 I believe there¹s a great deal of use cases like caching db objects or
 more generally caching any heavy objects involving interprocess
 communication. For instance, API clients may be caching objects that are
 known to be immutable on the server side.
 
 
 
  Sent from my really tiny device...
 
  On Jan 23, 2014, at 8:19 AM, Shawn Hartsock harts...@acm.org wrote:
 
  I would like to have us adopt a memoizing caching library of some kind
  for use with OpenStack projects. I have no strong preference at this
  time and I would like suggestions on what to use.
 
  I have seen a number of patches where people have begun to implement
  their own caches in dictionaries. This typically confuses the code and
  mixes issues of correctness and performance in code.
 
  Here's an example:
 
  We start with:
 
  def my_thing_method(some_args):
# do expensive work
return value
 
  ... but a performance problem is detected... maybe the method is
  called 15 times in 10 seconds but then not again for 5 minutes and the
  return value can only logically change every minute or two... so we
  end up with ...
 
  _GLOBAL_THING_CACHE = {}
 
  def my_thing_method(some_args):
key = key_from

[openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Shawn Hartsock
I would like to have us adopt a memoizing caching library of some kind
for use with OpenStack projects. I have no strong preference at this
time and I would like suggestions on what to use.

I have seen a number of patches where people have begun to implement
their own caches in dictionaries. This typically confuses the code and
mixes issues of correctness and performance in code.

Here's an example:

We start with:

def my_thing_method(some_args):
# do expensive work
return value

... but a performance problem is detected... maybe the method is
called 15 times in 10 seconds but then not again for 5 minutes and the
return value can only logically change every minute or two... so we
end up with ...

_GLOBAL_THING_CACHE = {}

def my_thing_method(some_args):
key = key_from(some_args)
 if key in _GLOBAL_THING_CACHE:
 return _GLOBAL_THING_CACHE[key]
 else:
  # do expensive work
  _GLOBAL_THING_CACHE[key] = value
  return value

... which is all well and good... but now as a maintenance programmer
I need to comprehend the cache mechanism, when cached values are
invalidated, and if I need to debug the do expensive work part I
need to tease out some test that prevents the cache from being hit.
Plus I've introduced a new global variable. We love globals right?

I would like us to be able to say:

@memoize(seconds=10)
def my_thing_method(some_args):
# do expensive work
return value

... where we're clearly addressing the performance issue by
introducing a cache and limiting it's possible impact to 10 seconds
which allows for the idea that do expensive work has network calls
to systems that may change state outside of this Python process.

I'd like to see this done because I would like to have a place to
point developers to during reviews... to say: use common/memoizer or
use Bob's awesome memoizer because Bob has worked out all the cache
problems already and you can just use it instead of worrying about
introducing new bugs by building your own cache.

Does this make sense? I'd love to contribute something... but I wanted
to understand why this state of affairs has persisted for a number of
years... is there something I'm missing?

-- 
# Shawn.Hartsock - twitter: @hartsock - plus.google.com/+ShawnHartsock

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Henry Gessau
Top posting to point out that:

In Python3 there is a generic memoizer in functools called lru_cache.
And here is a backport to Python 2.7:
  https://pypi.python.org/pypi/functools32

That leaves Python 2.6. Maybe some clever wrapping in Oslo can make it
available to all versions?

On Thu, Jan 23, at 11:07 am, Shawn Hartsock harts...@acm.org wrote:

 I would like to have us adopt a memoizing caching library of some kind
 for use with OpenStack projects. I have no strong preference at this
 time and I would like suggestions on what to use.
 
 I have seen a number of patches where people have begun to implement
 their own caches in dictionaries. This typically confuses the code and
 mixes issues of correctness and performance in code.
 
 Here's an example:
 
 We start with:
 
 def my_thing_method(some_args):
 # do expensive work
 return value
 
 ... but a performance problem is detected... maybe the method is
 called 15 times in 10 seconds but then not again for 5 minutes and the
 return value can only logically change every minute or two... so we
 end up with ...
 
 _GLOBAL_THING_CACHE = {}
 
 def my_thing_method(some_args):
 key = key_from(some_args)
  if key in _GLOBAL_THING_CACHE:
  return _GLOBAL_THING_CACHE[key]
  else:
   # do expensive work
   _GLOBAL_THING_CACHE[key] = value
   return value
 
 ... which is all well and good... but now as a maintenance programmer
 I need to comprehend the cache mechanism, when cached values are
 invalidated, and if I need to debug the do expensive work part I
 need to tease out some test that prevents the cache from being hit.
 Plus I've introduced a new global variable. We love globals right?
 
 I would like us to be able to say:
 
 @memoize(seconds=10)
 def my_thing_method(some_args):
 # do expensive work
 return value
 
 ... where we're clearly addressing the performance issue by
 introducing a cache and limiting it's possible impact to 10 seconds
 which allows for the idea that do expensive work has network calls
 to systems that may change state outside of this Python process.
 
 I'd like to see this done because I would like to have a place to
 point developers to during reviews... to say: use common/memoizer or
 use Bob's awesome memoizer because Bob has worked out all the cache
 problems already and you can just use it instead of worrying about
 introducing new bugs by building your own cache.
 
 Does this make sense? I'd love to contribute something... but I wanted
 to understand why this state of affairs has persisted for a number of
 years... is there something I'm missing?
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Joshua Harlow
So to me memoizing is typically a premature optimization in a lot of cases. And 
doing it incorrectly leads to overfilling the python processes memory (your 
global dict will have objects in it that can't be garbage collected, and with 
enough keys+values being stored will act just like a memory leak; basically it 
acts as a new GC root object in a way) or more cache invalidation 
races/inconsistencies than just recomputing the initial value...

Overall though there are a few caching libraries I've seen being used, any of 
which could be used for memoization.

- https://github.com/openstack/oslo-incubator/tree/master/openstack/common/cache
- 
https://github.com/openstack/oslo-incubator/blob/master/openstack/common/memorycache.py
- dogpile cache @ https://pypi.python.org/pypi/dogpile.cache

I am personally weary of using them for memoization, what expensive method 
calls do u see the complexity of this being useful? I didn't think that many 
method calls being done in openstack warranted the complexity added by doing 
this (premature optimization is the root of all evil...). Do u have data 
showing where it would be applicable/beneficial?

Sent from my really tiny device...

 On Jan 23, 2014, at 8:19 AM, Shawn Hartsock harts...@acm.org wrote:
 
 I would like to have us adopt a memoizing caching library of some kind
 for use with OpenStack projects. I have no strong preference at this
 time and I would like suggestions on what to use.
 
 I have seen a number of patches where people have begun to implement
 their own caches in dictionaries. This typically confuses the code and
 mixes issues of correctness and performance in code.
 
 Here's an example:
 
 We start with:
 
 def my_thing_method(some_args):
# do expensive work
return value
 
 ... but a performance problem is detected... maybe the method is
 called 15 times in 10 seconds but then not again for 5 minutes and the
 return value can only logically change every minute or two... so we
 end up with ...
 
 _GLOBAL_THING_CACHE = {}
 
 def my_thing_method(some_args):
key = key_from(some_args)
 if key in _GLOBAL_THING_CACHE:
 return _GLOBAL_THING_CACHE[key]
 else:
  # do expensive work
  _GLOBAL_THING_CACHE[key] = value
  return value
 
 ... which is all well and good... but now as a maintenance programmer
 I need to comprehend the cache mechanism, when cached values are
 invalidated, and if I need to debug the do expensive work part I
 need to tease out some test that prevents the cache from being hit.
 Plus I've introduced a new global variable. We love globals right?
 
 I would like us to be able to say:
 
 @memoize(seconds=10)
 def my_thing_method(some_args):
# do expensive work
return value
 
 ... where we're clearly addressing the performance issue by
 introducing a cache and limiting it's possible impact to 10 seconds
 which allows for the idea that do expensive work has network calls
 to systems that may change state outside of this Python process.
 
 I'd like to see this done because I would like to have a place to
 point developers to during reviews... to say: use common/memoizer or
 use Bob's awesome memoizer because Bob has worked out all the cache
 problems already and you can just use it instead of worrying about
 introducing new bugs by building your own cache.
 
 Does this make sense? I'd love to contribute something... but I wanted
 to understand why this state of affairs has persisted for a number of
 years... is there something I'm missing?
 
 -- 
 # Shawn.Hartsock - twitter: @hartsock - plus.google.com/+ShawnHartsock
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Stephen Gran

Hi,

First, I think common routines are great.  More DRY is always good.

Second, my personal feeling is that when you see a hard-coded in-memory 
cache like this, it's probably something that should be moved to be 
behind a more generic caching framework that allows for different 
backends such as memcache for larger deployments.


If you're interested in something like that, I'm sure it would be useful.

Cheers,

On 23/01/14 16:07, Shawn Hartsock wrote:

I would like to have us adopt a memoizing caching library of some kind
for use with OpenStack projects. I have no strong preference at this
time and I would like suggestions on what to use.

I have seen a number of patches where people have begun to implement
their own caches in dictionaries. This typically confuses the code and
mixes issues of correctness and performance in code.

Here's an example:

We start with:

def my_thing_method(some_args):
 # do expensive work
 return value

... but a performance problem is detected... maybe the method is
called 15 times in 10 seconds but then not again for 5 minutes and the
return value can only logically change every minute or two... so we
end up with ...

_GLOBAL_THING_CACHE = {}

def my_thing_method(some_args):
 key = key_from(some_args)
  if key in _GLOBAL_THING_CACHE:
  return _GLOBAL_THING_CACHE[key]
  else:
   # do expensive work
   _GLOBAL_THING_CACHE[key] = value
   return value

... which is all well and good... but now as a maintenance programmer
I need to comprehend the cache mechanism, when cached values are
invalidated, and if I need to debug the do expensive work part I
need to tease out some test that prevents the cache from being hit.
Plus I've introduced a new global variable. We love globals right?

I would like us to be able to say:

@memoize(seconds=10)
def my_thing_method(some_args):
 # do expensive work
 return value

... where we're clearly addressing the performance issue by
introducing a cache and limiting it's possible impact to 10 seconds
which allows for the idea that do expensive work has network calls
to systems that may change state outside of this Python process.

I'd like to see this done because I would like to have a place to
point developers to during reviews... to say: use common/memoizer or
use Bob's awesome memoizer because Bob has worked out all the cache
problems already and you can just use it instead of worrying about
introducing new bugs by building your own cache.

Does this make sense? I'd love to contribute something... but I wanted
to understand why this state of affairs has persisted for a number of
years... is there something I'm missing?




--
Stephen Gran
Senior Systems Integrator - theguardian.com
Please consider the environment before printing this email.
--
Visit theguardian.com   

On your mobile, download the Guardian iPhone app theguardian.com/iphone and our iPad edition theguardian.com/iPad   
Save up to 57% by subscribing to the Guardian and Observer - choose the papers you want and get full digital access.

Visit subscribe.theguardian.com

This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.

Guardian News  Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News  Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

--


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Doug Hellmann
The fact that it is already in the requirements list makes it a top
contender in my mind, unless we find some major issue with it.

Doug


On Thu, Jan 23, 2014 at 4:56 PM, Morgan Fainberg m...@metacloud.com wrote:

 Keystone uses dogpile.cache and I am making an effort to add it into the
 oslo incubator cache library that was recently merged.

 Cheers,
 --Morgan


 On Thu, Jan 23, 2014 at 1:35 PM, Renat Akhmerov rakhme...@mirantis.comwrote:


 On 23 Jan 2014, at 08:41, Joshua Harlow harlo...@yahoo-inc.com wrote:

  So to me memoizing is typically a premature optimization in a lot of
 cases. And doing it incorrectly leads to overfilling the python processes
 memory (your global dict will have objects in it that can't be garbage
 collected, and with enough keys+values being stored will act just like a
 memory leak; basically it acts as a new GC root object in a way) or more
 cache invalidation races/inconsistencies than just recomputing the initial
 value...

 I agree with your concerns here. At the same time, I think this thinking
 shouldn't cancel cases of conscious usage of caching technics. A decent
 cache implementation would help to solve lots of performance problems
 (which eventually becomes a concern for any project).

  Overall though there are a few caching libraries I've seen being used,
 any of which could be used for memoization.
 
  -
 https://github.com/openstack/oslo-incubator/tree/master/openstack/common/cache
  -
 https://github.com/openstack/oslo-incubator/blob/master/openstack/common/memorycache.py

 I looked at the code. I have lots of question to the implementation (like
 cache eviction policies, whether or not it works well with green threads,
 but I think it's a subject for a separate discussion though). Could you
 please share your experience of using it? Were there specific problems that
 you could point to? May be they are already described somewhere?

  - dogpile cache @ https://pypi.python.org/pypi/dogpile.cache

 This one looks really interesting in terms of claimed feature set. It
 seems to be compatible with Python 2.7, not sure about 2.6. As above, it
 would be cool you told about your experience with it.


  I am personally weary of using them for memoization, what expensive
 method calls do u see the complexity of this being useful? I didn't think
 that many method calls being done in openstack warranted the complexity
 added by doing this (premature optimization is the root of all evil...). Do
 u have data showing where it would be applicable/beneficial?

 I believe there's a great deal of use cases like caching db objects or
 more generally caching any heavy objects involving interprocess
 communication. For instance, API clients may be caching objects that are
 known to be immutable on the server side.


 
  Sent from my really tiny device...
 
  On Jan 23, 2014, at 8:19 AM, Shawn Hartsock harts...@acm.org
 wrote:
 
  I would like to have us adopt a memoizing caching library of some kind
  for use with OpenStack projects. I have no strong preference at this
  time and I would like suggestions on what to use.
 
  I have seen a number of patches where people have begun to implement
  their own caches in dictionaries. This typically confuses the code and
  mixes issues of correctness and performance in code.
 
  Here's an example:
 
  We start with:
 
  def my_thing_method(some_args):
# do expensive work
return value
 
  ... but a performance problem is detected... maybe the method is
  called 15 times in 10 seconds but then not again for 5 minutes and the
  return value can only logically change every minute or two... so we
  end up with ...
 
  _GLOBAL_THING_CACHE = {}
 
  def my_thing_method(some_args):
key = key_from(some_args)
 if key in _GLOBAL_THING_CACHE:
 return _GLOBAL_THING_CACHE[key]
 else:
  # do expensive work
  _GLOBAL_THING_CACHE[key] = value
  return value
 
  ... which is all well and good... but now as a maintenance programmer
  I need to comprehend the cache mechanism, when cached values are
  invalidated, and if I need to debug the do expensive work part I
  need to tease out some test that prevents the cache from being hit.
  Plus I've introduced a new global variable. We love globals right?
 
  I would like us to be able to say:
 
  @memoize(seconds=10)
  def my_thing_method(some_args):
# do expensive work
return value
 
  ... where we're clearly addressing the performance issue by
  introducing a cache and limiting it's possible impact to 10 seconds
  which allows for the idea that do expensive work has network calls
  to systems that may change state outside of this Python process.
 
  I'd like to see this done because I would like to have a place to
  point developers to during reviews... to say: use common/memoizer or
  use Bob's awesome memoizer because Bob has worked out all the cache
  problems already and you can just use it instead of worrying about
  introducing new bugs by 

Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Joshua Harlow
Sure, no cancelling cases of conscious usage, but we need to be careful
here and make sure its really appropriate. Caching and invalidation
techniques are right up there in terms of problems that appear easy and
simple to initially do/use, but doing it correctly is really really hard
(especially at any type of scale).

-Josh

On 1/23/14, 1:35 PM, Renat Akhmerov rakhme...@mirantis.com wrote:


On 23 Jan 2014, at 08:41, Joshua Harlow harlo...@yahoo-inc.com wrote:

 So to me memoizing is typically a premature optimization in a lot of
cases. And doing it incorrectly leads to overfilling the python
processes memory (your global dict will have objects in it that can't be
garbage collected, and with enough keys+values being stored will act
just like a memory leak; basically it acts as a new GC root object in a
way) or more cache invalidation races/inconsistencies than just
recomputing the initial valueŠ

I agree with your concerns here. At the same time, I think this thinking
shouldn¹t cancel cases of conscious usage of caching technics. A decent
cache implementation would help to solve lots of performance problems
(which eventually becomes a concern for any project).

 Overall though there are a few caching libraries I've seen being used,
any of which could be used for memoization.
 
 - 
https://github.com/openstack/oslo-incubator/tree/master/openstack/common/
cache
 - 
https://github.com/openstack/oslo-incubator/blob/master/openstack/common/
memorycache.py

I looked at the code. I have lots of question to the implementation (like
cache eviction policies, whether or not it works well with green threads,
but I think it¹s a subject for a separate discussion though). Could you
please share your experience of using it? Were there specific problems
that you could point to? May be they are already described somewhere?

 - dogpile cache @ https://pypi.python.org/pypi/dogpile.cache

This one looks really interesting in terms of claimed feature set. It
seems to be compatible with Python 2.7, not sure about 2.6. As above, it
would be cool you told about your experience with it.


 I am personally weary of using them for memoization, what expensive
method calls do u see the complexity of this being useful? I didn't
think that many method calls being done in openstack warranted the
complexity added by doing this (premature optimization is the root of
all evil...). Do u have data showing where it would be
applicable/beneficial?

I believe there¹s a great deal of use cases like caching db objects or
more generally caching any heavy objects involving interprocess
communication. For instance, API clients may be caching objects that are
known to be immutable on the server side.


 
 Sent from my really tiny device...
 
 On Jan 23, 2014, at 8:19 AM, Shawn Hartsock harts...@acm.org wrote:
 
 I would like to have us adopt a memoizing caching library of some kind
 for use with OpenStack projects. I have no strong preference at this
 time and I would like suggestions on what to use.
 
 I have seen a number of patches where people have begun to implement
 their own caches in dictionaries. This typically confuses the code and
 mixes issues of correctness and performance in code.
 
 Here's an example:
 
 We start with:
 
 def my_thing_method(some_args):
   # do expensive work
   return value
 
 ... but a performance problem is detected... maybe the method is
 called 15 times in 10 seconds but then not again for 5 minutes and the
 return value can only logically change every minute or two... so we
 end up with ...
 
 _GLOBAL_THING_CACHE = {}
 
 def my_thing_method(some_args):
   key = key_from(some_args)
if key in _GLOBAL_THING_CACHE:
return _GLOBAL_THING_CACHE[key]
else:
 # do expensive work
 _GLOBAL_THING_CACHE[key] = value
 return value
 
 ... which is all well and good... but now as a maintenance programmer
 I need to comprehend the cache mechanism, when cached values are
 invalidated, and if I need to debug the do expensive work part I
 need to tease out some test that prevents the cache from being hit.
 Plus I've introduced a new global variable. We love globals right?
 
 I would like us to be able to say:
 
 @memoize(seconds=10)
 def my_thing_method(some_args):
   # do expensive work
   return value
 
 ... where we're clearly addressing the performance issue by
 introducing a cache and limiting it's possible impact to 10 seconds
 which allows for the idea that do expensive work has network calls
 to systems that may change state outside of this Python process.
 
 I'd like to see this done because I would like to have a place to
 point developers to during reviews... to say: use common/memoizer or
 use Bob's awesome memoizer because Bob has worked out all the cache
 problems already and you can just use it instead of worrying about
 introducing new bugs by building your own cache.
 
 Does this make sense? I'd love to contribute something... but I wanted
 to understand why this state of 

Re: [openstack-dev] [oslo] memoizer aka cache

2014-01-23 Thread Joshua Harlow
Same here; I've done pretty big memcache (and similar technologies) scale 
caching  invalidations at Y! before so here to help…

From: Morgan Fainberg m...@metacloud.commailto:m...@metacloud.com
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Thursday, January 23, 2014 at 4:17 PM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [oslo] memoizer aka cache

Yes! There is a reason Keystone has a very small footprint of 
caching/invalidation done so far.  It really needs to be correct when it comes 
to proper invalidation logic.  I am happy to offer some help in determining 
logic for caching/invalidation with Dogpile.cache in mind as we get it into 
oslo and available for all to use.

--Morgan



On Thu, Jan 23, 2014 at 2:54 PM, Joshua Harlow 
harlo...@yahoo-inc.commailto:harlo...@yahoo-inc.com wrote:
Sure, no cancelling cases of conscious usage, but we need to be careful
here and make sure its really appropriate. Caching and invalidation
techniques are right up there in terms of problems that appear easy and
simple to initially do/use, but doing it correctly is really really hard
(especially at any type of scale).

-Josh

On 1/23/14, 1:35 PM, Renat Akhmerov 
rakhme...@mirantis.commailto:rakhme...@mirantis.com wrote:


On 23 Jan 2014, at 08:41, Joshua Harlow 
harlo...@yahoo-inc.commailto:harlo...@yahoo-inc.com wrote:

 So to me memoizing is typically a premature optimization in a lot of
cases. And doing it incorrectly leads to overfilling the python
processes memory (your global dict will have objects in it that can't be
garbage collected, and with enough keys+values being stored will act
just like a memory leak; basically it acts as a new GC root object in a
way) or more cache invalidation races/inconsistencies than just
recomputing the initial valueŠ

I agree with your concerns here. At the same time, I think this thinking
shouldn¹t cancel cases of conscious usage of caching technics. A decent
cache implementation would help to solve lots of performance problems
(which eventually becomes a concern for any project).

 Overall though there are a few caching libraries I've seen being used,
any of which could be used for memoization.

 -
https://github.com/openstack/oslo-incubator/tree/master/openstack/common/
cache
 -
https://github.com/openstack/oslo-incubator/blob/master/openstack/common/
memorycache.py

I looked at the code. I have lots of question to the implementation (like
cache eviction policies, whether or not it works well with green threads,
but I think it¹s a subject for a separate discussion though). Could you
please share your experience of using it? Were there specific problems
that you could point to? May be they are already described somewhere?

 - dogpile cache @ https://pypi.python.org/pypi/dogpile.cache

This one looks really interesting in terms of claimed feature set. It
seems to be compatible with Python 2.7, not sure about 2.6. As above, it
would be cool you told about your experience with it.


 I am personally weary of using them for memoization, what expensive
method calls do u see the complexity of this being useful? I didn't
think that many method calls being done in openstack warranted the
complexity added by doing this (premature optimization is the root of
all evil...). Do u have data showing where it would be
applicable/beneficial?

I believe there¹s a great deal of use cases like caching db objects or
more generally caching any heavy objects involving interprocess
communication. For instance, API clients may be caching objects that are
known to be immutable on the server side.



 Sent from my really tiny device...

 On Jan 23, 2014, at 8:19 AM, Shawn Hartsock 
 harts...@acm.orgmailto:harts...@acm.org wrote:

 I would like to have us adopt a memoizing caching library of some kind
 for use with OpenStack projects. I have no strong preference at this
 time and I would like suggestions on what to use.

 I have seen a number of patches where people have begun to implement
 their own caches in dictionaries. This typically confuses the code and
 mixes issues of correctness and performance in code.

 Here's an example:

 We start with:

 def my_thing_method(some_args):
   # do expensive work
   return value

 ... but a performance problem is detected... maybe the method is
 called 15 times in 10 seconds but then not again for 5 minutes and the
 return value can only logically change every minute or two... so we
 end up with ...

 _GLOBAL_THING_CACHE = {}

 def my_thing_method(some_args):
   key = key_from(some_args)
if key in _GLOBAL_THING_CACHE:
return _GLOBAL_THING_CACHE[key]
else:
 # do expensive work
 _GLOBAL_THING_CACHE[key] = value
 return value

 ... which is all well and good... but now as a maintenance programmer
 I