The differences between the cache and store are subtle. There are few options here -
1. Merge the configs and do not differentiate between key value store and cache explicitly. The new cache configs are just extended configs for the key value store. The pros with this approach is that it avoids creating a new API and helps us to use the existing store types. The cons with this approach is that it becomes very subtle on which configs to use and if the configs can be made to work across all underlying key value stores. Also, what does caching mean here? It looks like we just want to provide an eviction policy to the store and we should avoid calling it as an explicit cache mode. 2. Have a cache API that only exposes get and flush. In this model, we would define cache store explicitly in the config and also provide the cache factory. For example, if we just wanted a cache backed by Voldemort, we could simply have a VoldemortCacheStore that populates the store on demand. It could have an option to write the changes to the changelog that would help to avoid the cold start. The pros with this approach is that everything happens behind the API and framework user would simply call get when they need. This is not possible in option 1 since they would have to explicitly put messages into the store after reading it from a remote store. The code can also not be shared if the cache store backed by a remote store is a common use case. The cons is that it introduces a new store that is strictly read only and limits the caching functionality to just reads. 3. We could also do both 1 and 2. The key value store could have an eviction policy to bound the memory and cache store is used explicitly for cases where we want a backing store and have the store do all the heavy lifting of populating the cache. On Wed, Oct 1, 2014 at 10:50 AM, Chris Riccomini (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/SAMZA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155191#comment-14155191 > ] > > Chris Riccomini commented on SAMZA-424: > --------------------------------------- > > I think what we have now is composable config-based assembly, which is > what we want to avoid. Composability in code is probably OK, but a more > drastic change. If we could find a way to get pre-defined teplate config > working in a good way for the stores, I think that's ideal, since it's the > closest to what we've already got. > > > Add a Cache state API to the Samza container > > -------------------------------------------- > > > > Key: SAMZA-424 > > URL: https://issues.apache.org/jira/browse/SAMZA-424 > > Project: Samza > > Issue Type: New Feature > > Components: container > > Reporter: Chinmay Soman > > Assignee: Chinmay Soman > > Attachments: SAMZA-424-Cache-API_0.pdf > > > > > > There are cases when the user code needs access to a 'cache' which can > be used to store custom data. This cache is different from the KeyValue > store in the following ways: > > * At the very least Needs to support LRU (Least Recently Used) and TTL > (Time To Live) eviction strategies > > * May not support all() and range() operations (since this wreaks havoc > with the eviction operation) > > * Needs to exist at a per task or a per container level. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
