[
https://issues.apache.org/jira/browse/SAMZA-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051414#comment-14051414
]
Martin Kleppmann commented on SAMZA-256:
----------------------------------------
Sounds good!
bq. The In-memory KV store can reside within samza-kv module
IMHO the main value of splitting Samza into multiple modules is that any
particular job only needs to bring in those transitive dependencies that it
needs, and no more. That's why the JSON serde is in a separate module (it
depends on Jackson) but the String and Integer serdes are in samza-core, for
example.
This would suggest that it would be better to put the in-memory KV store in
samza-core, since it doesn't have any library dependencies (apart from perhaps
Guava, if we choose to use that). By the same logic, the LevelDB storage engine
should be in a samza-leveldb module, the RocksDB storage engine in a
samza-rocksdb module, etc. Perhaps we can remove the samza-kv module in 0.8.0
in favour of modules that make the dependencies more explicit.
What do you think?
bq. Modify Samza container (I think) to map something like:
org.apache.samza.storage.kv.KeyValueStorageEngineFactory =>
org.apache.samza.storage.kv.LevelDBKeyValueStorageEngineFactory
Rather than aliasing this in SamzaContainer, I think it would be ok to just
keep the KeyValueStorageEngineFactory class, and have it delegate to
LevelDBKeyValueStorageEngineFactory.
> Provide in-memory data store implementation
> -------------------------------------------
>
> Key: SAMZA-256
> URL: https://issues.apache.org/jira/browse/SAMZA-256
> Project: Samza
> Issue Type: Improvement
> Components: kv
> Affects Versions: 0.6.0
> Reporter: Jakob Homan
> Assignee: Chinmay Soman
> Fix For: 0.8.0
>
>
> The sole current kv store, LevelDbKeyValueStore, works well when the amount
> of data to be stored is prohibitively large to keep it all in memory.
> However, in cases where the state is small enough to comfortably fit in
> whatever memory is available, it would be better to provide an in-memory
> implementation. This can be backed by either a native Java class, or perhaps
> a Guava class, if that is found to scale better (or, of course, the backing
> implementation could be configurable).
--
This message was sent by Atlassian JIRA
(v6.2#6252)