[ 
https://issues.apache.org/jira/browse/SAMZA-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048322#comment-14048322
 ] 

Chinmay Soman commented on SAMZA-256:
-------------------------------------

Copying over from the internal email thread (sorry for that)

I like option (2). One thing to consider is the submodule structure in this 
approach. If we have KeyValueStorageEngineFactory in samza-kv, but we then have 
samza-kv-leveldb and samza-kv-rocksdb components, then samza-kv must depend on 
both (i.e. Pull in dependencies from both projects--even if you only use 
RocksDB, you'd still have LevelDB junk in your classpath). This is a little 
ugly from a hygiene perspective, but I don't think it should cause any 
problems. Alternatively, we could just have one samza-kv submodule, and put all 
implementations in there, but that seems a bit nastier even than separate 
submodules. Alternatively^2, we could have samza-kv do a 
Class.forName().newInstance to create the actual StorageEngine, but this seems 
likely to introduce even more runtime errors due to improper dependencies.
Other than that, I don't see an immediate problem with approach (2). It seems 
preferable to approach (1) in your list below. That said, let's move over to 
JIRA. I'm sure other folks will have feedback as well.
Cheers,
Chris


----------------

Currently, it seems the Samza job config has something like this:

org.apache.samza.storage.kv.KeyValueStorageEngineFactory

Which today -> defaults to a LevelDbKeyValueStore. To make this more pluggable, 
I think we can use 2 approaches:

i) Separate Factory:

Have something like:
org.apache.samza.storage.kv.PersistentKeyValueStorageEngineFactory
org.apache.samza.storage.kv.InMemoryKeyValueStorageEngineFactory

ii) Additional factory config:

In this case, we can keep the same factory: 
org.apache.samza.storage.kv.KeyValueStorageEngineFactory,

but have additional parameters to determine the type. Example:

stores.*.factory.persistent=true / false

To add to that, I think option (ii) might be better since we can abstract all 
key-value stores (RocksDB / LevelDB / In-memory / blah) with one factory and 
use the additional config parameter to determine what type ?

This way, the different storage engines can be categorized by their types in a 
hierarchy ( KeyValue / BitMap / Document structured / blah ...)

Personally, I'm biased towards option (ii) since the existing jobs don't need 
to change their configs (and we default to LevelDB).  What do you guys think ?

C



> Provide in-memory data store implementation
> -------------------------------------------
>
>                 Key: SAMZA-256
>                 URL: https://issues.apache.org/jira/browse/SAMZA-256
>             Project: Samza
>          Issue Type: Improvement
>          Components: kv
>    Affects Versions: 0.6.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.8.0
>
>
> The sole current kv store, LevelDbKeyValueStore, works well when the amount 
> of data to be stored is prohibitively large to keep it all in memory.  
> However, in cases where the state is small enough to comfortably fit in 
> whatever memory is available, it would be better to provide an in-memory 
> implementation.  This can be backed by either a native Java class, or perhaps 
> a Guava class, if that is found to scale better (or, of course, the backing 
> implementation could be configurable).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to