[ 
https://issues.apache.org/jira/browse/SAMZA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165519#comment-14165519
 ] 

Chinmay Soman commented on SAMZA-424:
-------------------------------------

Sorry for creating that confusion [~martinkl] . I should've added an 
introductory paragraph in the doc. 

bq. Is the idea to separate the caching concern from the storage engine 
concern, so that the same caching layer can be reused with different storage 
engines?

Yes - in a way. You're right in the sense that we could change the underlying 
storage engine to something like a 'VoldemortBackingStore' (which can fetch the 
records from a remote DB). You could also chose not to have any underlying 
storage engine at all. In addition to this, we also need to support an optional 
expiration property on this cache. This might be important for time-series 
style data or other TTL based data. 

bq. In the _2 proposal, you define two stores (page-key-counts and 
member-cache) but I don't understand how they are being composed. Is there 
supposed to be something in the configuration that tells the cache to wrap the 
other store?

Hey [~martinkl] : this was just an example to indicate that the config based 
composition can get confusing. These two 'states' are not related at all. The 
job might need such different 'states' for different use cases. My point was 
that it might be easy for the job owner to make mistakes while configuring this.

bq. 1) where does the user input values to the cache layer in a real use case?
Well one option could be that you configure your stack to use a TTL based 
caching layer with a VoldemortBackingStore underneath (ignoring logged and k-v 
store). So the user does not explicitly enter any values. If there is a cache 
miss - the backing store will fetch it from the remote DB.

bq. 2) can the cache layer share the same keyValueStore as the store? e.g. the 
cache and the store both are using the same RocksDB.
Good question ! This is where Jay's comment might be of use 
(https://issues.apache.org/jira/browse/SAMZA-428). But in general - I think you 
can come up with an implementation of the Caching layer which uses RocksDB and 
the K-V store layer will be a No-op: although I haven't thought about this in 
detail. 

bq. There is some elegance to composing layers, but it only really works if the 
things you're composing all have the same API, e.g. KeyValueStore
That is correct. But you can define a new API for whatever storage engine you 
need underneath. You will also have to implement a corresponding cache layer 
for this API. Same goes for the Backing store layer : asked by [~closeuris] . 
In general what the framework sees is a 'StorageEngine' stack. And what the 
user sees is whatever API we implement (which at the moment is KeyValueStore). 
Hope this does not add to the confusion :)

> Add a Cache state API to the Samza container
> --------------------------------------------
>
>                 Key: SAMZA-424
>                 URL: https://issues.apache.org/jira/browse/SAMZA-424
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>            Reporter: Chinmay Soman
>            Assignee: Chinmay Soman
>         Attachments: SAMZA-424-Cache-API_0.pdf, SAMZA-424-Cache-API_1.md, 
> SAMZA-424-Cache-API_2.md, SAMZA-424-Cache-API_2.pdf, samza-424-cache-api_1.pdf
>
>
> There are cases when the user code needs access to a 'cache' which can be 
> used to store custom data. This cache is different from the KeyValue store in 
> the following ways:
> * At the very least Needs to support LRU (Least Recently Used) and TTL (Time 
> To Live) eviction strategies
> * May not support all() and range() operations (since this wreaks havoc with 
> the eviction operation)
> * Needs to exist at a per task or a per container level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to