Re: [Components] Cache: Design for hierarchical caching

Tobias Schlitt Wed, 09 Apr 2008 06:59:10 -0700

Hi Derick!

On 04/09/2008 02:13 PM Derick Rethans wrote:
> On Mon, 7 Apr 2008, Tobias Schlitt wrote:


>> I just finished a the first draft of my design for hierarchical (multi
>> level) caching in the Cache component. Please take some minutes and
>> review my proposal. You can find the RST in SVN under
>>
>> trunk/Cache/design/design-1.4.txt

> Some comments:

>> Hierarchical stacking
>> ---------------------

> [snip]

>> - **Bubbling restored data up through the cache stack**
>>   According to the replacement strategy, cache items need to be placed into
>>   higher levels of the hierarchy, as soon as they get restored from a deeper
>>   level, to make them available faster on subsequent restore requests. For 
>> more
>>   information see the `Replacement strategies`_ section.

> Sometimes you might not want the items stored in a higher stack level,
> for example because of data size reasons. As an example, you want a 1mb
> PDF file stored in a cache on disk, but definitely not in the memory
> cache. How are we going to handle this?

You need to create a dedicated cache storage for these files.

>> Propagate on replacement
>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> Using this strategy, a newly stored cache item is only put into the top most
>> storage in the hierarchy. As soon as it needs to be replaced there, it is
>> propagated down one level, before being removed from the higher level cache.
>>
>> Pros:
>>
>> - The initial storage of a cache item is faster, since it only affects one
>>   storage. In addition, this should be the fastest storage (the top most).
>> - Purging of an item does only affect 1 single cache.
>> - If all storages reached their maximum number of stored items, only a single
>>   item is bubbled down to the lowest level.
>>
>> Cons:
>>
>> - Additional work is to be done on each replacement of a cache item.

> Another con: 
> 
> - If the cache storage disappears (for an in-memory storage) then it
>   hasn't be cached in the lower levels yet.

Yes, exactly.

> I think we should not implement this at all, but just the "Propagate on
> store" variant. It can also make things simpler in the API and thus
> provide more performance.

I also thought about this and agree with you. We can then leave the
complete storage strategy class layer. I will still keep my eyes open
during implementation to allow us a possible implementation for later.

>> ezcCacheStack
>> -------------
>>
>> An object of the ezcCacheStack class is the main instance to provide the
>> hierarchical stack mechanism. The stack object takes care of managing several
>> cache storages, the unified access for storing and restoring cache items and
>> the associated objects needed to realize this.

> [snip]

>>     class ezcCacheStack extends ezcCacheStorage
>>     {
>>         public function __construct( $location, $options );
>>         public function store( $id, $data, $attributes = array() );
>>         public function restore( $id, $attributes, $search );
>>         public function delete( $id, $attributes, $search );
>>         public function countDataItems( $id, $attributes );
>>         public function getRemainingLifetime( $id, $attributes );
>>
>>         public function getStackedCaches();
>>     }

> I miss a method to add a new storage to the stack. I know you mention
> this as an option in "ezcCacheStackOptions", but I don't think this fits
> as an option easily. I would go for a method that adds a new storage
> configuration to the bottom of the "stack" here.

I don't see the sense here. In general you want to configure the stack
once and be happy to never change it. Could  you give me an example of a
use case of such a method?

>> ezcCacheStackableStorage
>> ------------------------
>>
>> The interface ezcCacheStackableStorage is used to ensure, that storage 
>> classes
>> that can be stacked implement the necessary functionality. The following
>> methods are needed: ::
>>
>>     interface ezcCacheStackableStorage
>>     {
>>         restoreMetaInfo();
>>         storeMetaInfo( array $metaInfo );
>>
>>         purge();

> Maybe we can add an option to purge() to clear out the whole cache?

I would suggest to add another method for this, named empty() or something.

> [snip]

>> The purge method is needed to make the storage purge all outdated items. In
>> case a cache storage runs full (determined by the replacement strategy), 
>> first
>> all outdated items will be purged, before items are deleted using the 
>> original
>> strategy. The purge() method needs to return the IDs, attributes and data of
>> the purged items to allow the replacement and storage strategy objects to
>> update their information.

> I am not sure if that it's very wise to return the data as well. This
> can be a lot of stuff...

Ah, I think I added this sentence in the wrong place. There is not need
to return the data of outdated cache items. Only the IDs need to be
returned so that the meta data can be updated accordingly. The data and
attributes are only needed, if data needs to be bubbled downwarts and is
replaced due to the replacement strategy. Since don't want to implement
a bubble-down-strategy, we don't need such a method at all.

>> ezcCacheStackOptions
>> --------------------
>>
>> An object of this class is used to configures the cache stack. It extends the
>> ezcCacheStorageOptions class, to be compatible with all other mechanisms. The
>> 'ttl' and 'extension' options are ignored, because each of the stacked caches
>> must be able to implement its own set of options. The following options are
>> part of this class:
>>
>> 'storageStrategy'
>>     This option contains a class name, which is to be instantiated to perform
>>     storage operations in the stack. The class must extend the abstract
>>     ezcCacheStackStorageStrategy class.
>> 'storages'
>>     This option is an array of ezcCacheStackStorageConfiguration objects, 
>> that
>>     will be used to define the cache storages contained in the stack. Per
>>     default, no storages will be defined. In this case, a call to any of the
>>     methods defined by ezcCacheStorage will result in an exception. 

> I don't think we should either of those options. As I just mentioned,
> storages should be maintained through methods and the first one,
> storageStrategy I suggest not to have at all to make the implementation
> easier. That means no different stack-storage-strategies.

If we go this way, we cannot support the delayed initialization
mechanism provided by ezcCacheManager for the stack. If this is ok, I
don't have a problem with it. ezcCacheStack would then not extend
ezcCacheStorage and we can design the API for it new from scratch.

>> ezcCacheStackStorageConfiguration
>> ---------------------------------
>>
>> An instance of this struct like class is used in the 'storages' option of
>> ezcCacheStackOptions to define the configuration of a single cache storage. 
>> It
>> contains the 3 parameters necessary to instantiate a new cache storage, as 
>> well
>> as additional information, needed by the ezcCacheStack instance and its
>> aggregated objects. The properties are contained in the class:

> [snip]

>> 'freeRate'
>>     This option is an integer that indicates a percentage value. In case the
>>     cache storage runs full, this amount of items will be removed. This
>>     mechanism ensures that running full of a cache does not occur too often.

> I don't think this should be an integer in procent, but instead just a
> floating point from 0 to 1. This is also mentioned a bit further here:

Sure, I missed to update this.

>> ezcCacheReplacementStrategy
>> ---------------------------

> [snip]

>> The constructor of a replacement strategy receives a ready to use storage
>> object, which fulfills all needs  by implementing the 
>> ezcCacheStackableStorage
>> interface. The $limit parameter indicates the maximum number of items to be
>> stored in the storage. The free rate is a percentage value, that indicates 
>> how
>> many items are to be purged, whenever a cache runs full.

> [snip]

>> The restore() method does not work as complex as the store() method does. Its
>> algorithm is defined by the following pseudo code: ::
>>
>>     $storage->lock();
>>     $item = $storage->restore(...);
>>
>>     if ( $item !== false )
>>     {
>>         // Notice access to this item in meta information
>>         $meta = $storage->restoreMetaInfo();
>>         update( $meta );
>>         $storage->storeMetaInfo( $meta );

> If $meta would be an object, or returned by reference, there would be no
> need for storeMetaInfo() right here.

I think we should keep it. The problem is that we need to restore the
meta info every time and store it again while the storage is locked. If
we obtain the storeMetaInfo() methods, there is no chance for the
storage to see when it needs to be stored again, except to automatically
store it inside unlock(). This is quite dirty and untransparent, so I
prefer to keep the calls.

> One thing that I miss in the design is the performance impact on
> restoring items from the cache, perhaps you could add something about
> that?

You mean the overhead in using the stack at all?

Thanks for your input!
Regards,
Toby
-- 
Mit freundlichen Grüßen / Med vennlig hilsen / With kind regards

Tobias Schlitt (GPG: 0xC462BC14) eZ Components Developer

[EMAIL PROTECTED] | eZ Systems AS | ez.no
-- 
Components mailing list
Components@lists.ez.no
http://lists.ez.no/mailman/listinfo/components

Re: [Components] Cache: Design for hierarchical caching

Reply via email to