[
https://issues.apache.org/jira/browse/SAMZA-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048322#comment-14048322
]
Chinmay Soman commented on SAMZA-256:
-------------------------------------
Copying over from the internal email thread (sorry for that)
I like option (2). One thing to consider is the submodule structure in this
approach. If we have KeyValueStorageEngineFactory in samza-kv, but we then have
samza-kv-leveldb and samza-kv-rocksdb components, then samza-kv must depend on
both (i.e. Pull in dependencies from both projects--even if you only use
RocksDB, you'd still have LevelDB junk in your classpath). This is a little
ugly from a hygiene perspective, but I don't think it should cause any
problems. Alternatively, we could just have one samza-kv submodule, and put all
implementations in there, but that seems a bit nastier even than separate
submodules. Alternatively^2, we could have samza-kv do a
Class.forName().newInstance to create the actual StorageEngine, but this seems
likely to introduce even more runtime errors due to improper dependencies.
Other than that, I don't see an immediate problem with approach (2). It seems
preferable to approach (1) in your list below. That said, let's move over to
JIRA. I'm sure other folks will have feedback as well.
Cheers,
Chris
----------------
Currently, it seems the Samza job config has something like this:
org.apache.samza.storage.kv.KeyValueStorageEngineFactory
Which today -> defaults to a LevelDbKeyValueStore. To make this more pluggable,
I think we can use 2 approaches:
i) Separate Factory:
Have something like:
org.apache.samza.storage.kv.PersistentKeyValueStorageEngineFactory
org.apache.samza.storage.kv.InMemoryKeyValueStorageEngineFactory
ii) Additional factory config:
In this case, we can keep the same factory:
org.apache.samza.storage.kv.KeyValueStorageEngineFactory,
but have additional parameters to determine the type. Example:
stores.*.factory.persistent=true / false
To add to that, I think option (ii) might be better since we can abstract all
key-value stores (RocksDB / LevelDB / In-memory / blah) with one factory and
use the additional config parameter to determine what type ?
This way, the different storage engines can be categorized by their types in a
hierarchy ( KeyValue / BitMap / Document structured / blah ...)
Personally, I'm biased towards option (ii) since the existing jobs don't need
to change their configs (and we default to LevelDB). What do you guys think ?
C
> Provide in-memory data store implementation
> -------------------------------------------
>
> Key: SAMZA-256
> URL: https://issues.apache.org/jira/browse/SAMZA-256
> Project: Samza
> Issue Type: Improvement
> Components: kv
> Affects Versions: 0.6.0
> Reporter: Jakob Homan
> Assignee: Jakob Homan
> Fix For: 0.8.0
>
>
> The sole current kv store, LevelDbKeyValueStore, works well when the amount
> of data to be stored is prohibitively large to keep it all in memory.
> However, in cases where the state is small enough to comfortably fit in
> whatever memory is available, it would be better to provide an in-memory
> implementation. This can be backed by either a native Java class, or perhaps
> a Guava class, if that is found to scale better (or, of course, the backing
> implementation could be configurable).
--
This message was sent by Atlassian JIRA
(v6.2#6252)