[
https://issues.apache.org/jira/browse/SAMZA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165000#comment-14165000
]
Martin Kleppmann commented on SAMZA-424:
----------------------------------------
I must admit I'm confused — I've not quite managed to follow this discussion,
and even in the latest design doc I'm not sure I understand exactly what is
being proposed. Is the idea to separate the caching concern from the storage
engine concern, so that the same caching layer can be reused with different
storage engines?
If so, would it be possible to make the cache a shared class that storage
engine implementations can use (e.g. if you're implementing a Voldemort-backed
storage engine, you don't have to reinvent the cache), but which is not exposed
in the job configuration? In that case it would be the storage engine's
responsibility to instantiate the cache and pass through any parameters from
the configuration.
If we definitely want a composing approach (where the storage engine knows
nothing about the fact that it's being wrapped by a caching layer), then I
agree with what has been said: composition in config can get ugly, but using a
builder pattern inside a StreamTask initializer doesn't really make sense,
especially when shared stores are involved. Composition through code seems fine
to me if that code runs during job submission or job startup (so the builder
code is just a nicer way of specifying a configuration).
[~cpsoman]: In the _2 proposal, you define two stores (page-key-counts and
member-cache) but I don't understand how they are being composed. Is there
supposed to be something in the configuration that tells the cache to wrap the
other store?
There is some elegance to composing layers, but it only really works if the
things you're composing all have the same API, e.g. KeyValueStore. I can't
quite see how you could wrap, say, a Lucene storage engine with a
key-value-oriented cache. On the other hand, a storage engine that exposes a
full-text search API may be able to benefit from a caching layer, but it would
be that storage engine's responsibility to use the cache appropriately.
Apologies if I've totally misunderstood and none of this makes any sense. I'm
just a bit confused about this whole discussion.
> Add a Cache state API to the Samza container
> --------------------------------------------
>
> Key: SAMZA-424
> URL: https://issues.apache.org/jira/browse/SAMZA-424
> Project: Samza
> Issue Type: New Feature
> Components: container
> Reporter: Chinmay Soman
> Assignee: Chinmay Soman
> Attachments: SAMZA-424-Cache-API_0.pdf, SAMZA-424-Cache-API_1.md,
> SAMZA-424-Cache-API_2.md, SAMZA-424-Cache-API_2.pdf, samza-424-cache-api_1.pdf
>
>
> There are cases when the user code needs access to a 'cache' which can be
> used to store custom data. This cache is different from the KeyValue store in
> the following ways:
> * At the very least Needs to support LRU (Least Recently Used) and TTL (Time
> To Live) eviction strategies
> * May not support all() and range() operations (since this wreaks havoc with
> the eviction operation)
> * Needs to exist at a per task or a per container level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)