[ 
https://issues.apache.org/jira/browse/SAMZA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155646#comment-14155646
 ] 

Sriram Subramanian commented on SAMZA-424:
------------------------------------------

Copying my response in the JIRA

The differences between the cache and store are subtle. There are few options 
here - 

1. Merge the configs and do not differentiate between key value store and cache 
explicitly. The new cache configs are just extended configs for the key value 
store. The pros with this approach is that it avoids creating a new API and 
helps us to use the existing store types. The cons with this approach is that 
it becomes very subtle on which configs to use and if the configs can be made 
to work across all underlying key value stores. Also, what does caching mean 
here? It looks like we just want to provide an eviction policy to the store and 
we should avoid calling it as an explicit cache mode.

2. Have a cache API that only exposes get and flush. In this model, we would 
define cache store explicitly in the config and also provide the cache factory. 
For example, if we just wanted a cache backed by Voldemort, we could simply 
have a VoldemortCacheStore that populates the store on demand. It could have an 
option to write the changes to the changelog that would help to avoid the cold 
start. The pros with this approach is that everything happens behind the API 
and framework user would simply call get when they need. This is not possible 
in option 1 since they would have to explicitly put messages into the store 
after reading it from a remote store. The code can also not be shared if the 
cache store backed by a remote store is a common use case. The cons is that it 
introduces a new store that is strictly read only and limits the caching 
functionality to just reads.

3. We could also do both 1 and 2. The key value store could have an eviction 
policy to bound the memory and cache store is used explicitly for cases where 
we want a backing store and have the store do all the heavy lifting of 
populating the cache. 

> Add a Cache state API to the Samza container
> --------------------------------------------
>
>                 Key: SAMZA-424
>                 URL: https://issues.apache.org/jira/browse/SAMZA-424
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>            Reporter: Chinmay Soman
>            Assignee: Chinmay Soman
>         Attachments: SAMZA-424-Cache-API_0.pdf
>
>
> There are cases when the user code needs access to a 'cache' which can be 
> used to store custom data. This cache is different from the KeyValue store in 
> the following ways:
> * At the very least Needs to support LRU (Least Recently Used) and TTL (Time 
> To Live) eviction strategies
> * May not support all() and range() operations (since this wreaks havoc with 
> the eviction operation)
> * Needs to exist at a per task or a per container level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to