I am fine with RocksDB state store as built-in state store. Actually the
proposal to have it as external module is to avoid the raised concern in the
previous effort.

The need to have it as experimental doesn't necessarily mean to have it as
external module, I think. They are two things. So I don't think the risk is
highly related to have it as external module or built-in one, except that we
have the state store as default one at the beginning. If it is not a default
one, and we explicitly mention it is an experimental feature, the risk is
not very different between an external module and built-in one. As a
built-in one just makes users easier to try it.

That said even the coming RocksDB state store has been supported for years,
I think it is safer to have it as experimental feature first as it lands to
OSS Spark.

Anyway, I think it is okay to add RocksDB state store among built-in state
stores along with HDFSBasedStateStore.

I also feel that we can just have RocksDB and replace LevelDB with RocksDB.
But this is another story.


Liang-Chi


Jungtaek Lim-2 wrote
> I think adding RocksDB state store to sql/core directly would be
> OK. Personally I also voted "either way is fine with me" against RocksDB
> state store implementation in Spark ecosystem. The overall stance hasn't
> changed, but I'd like to point out that the risk becomes quite lower than
> before, given the fact we can leverage Databricks RocksDB state store
> implementation.
> 
> I feel there were two major reasons to add RocksDB state store to external
> module;
> 
> 1. stability
> 
> Databricks RocksDB state store implementation has been supported for
> years,
> it won't require more time to incubate. We may want to review thoughtfully
> to ensure the open sourced proposal fits to the Apache Spark and still
> retains stability, but this is quite better than the previous targets to
> adopt which may not be tested in production for years.
> 
> That makes me think that we don't have to put it into external and
> consider
> it as experimental.
> 
> 2. dependency
> 
> From Yuanjian's mail, JNI library is the only dependency, which seems fine
> to add by default. We already have LevelDB as one of core dependencies and
> don't concern too much about the JNI library dependency. Probably someone
> might figure out that there are outstanding benefits on replacing LevelDB
> with RocksDB and then RocksDB can even be the one of core dependencies.
> 
> On Tue, Apr 27, 2021 at 6:41 PM Yuanjian Li <

> xyliyuanjian@

> > wrote:
> 
>> Hi all,
>>
>> Following the latest comments in SPARK-34198
>> <https://issues.apache.org/jira/browse/SPARK-34198>, Databricks
>> decided
>> to donate the commercial implementation of the RocksDBStateStore.
>> Compared
>> with the original decision, there’s only one topic we want to raise again
>> for discussion: can we directly add the RockDBStateStoreProvider in the
>> sql/core module? This suggestion based on the following reasons:
>>
>>    1.
>>
>>    The RocksDBStateStore aims to solve the problem of the original
>>    HDFSBasedStateStore, which is built-in.
>>    2.
>>
>>    End users can conveniently set the config to use the new
>>    implementation.
>>    3.
>>
>>    We can set the RocksDB one as the default one in the future.
>>
>>
>> For the consideration of the dependency, I also checked the rocksdbjni we
>> might introduce. As a JNI package
>> <https://repo1.maven.org/maven2/org/rocksdb/rocksdbjni/6.2.2/rocksdbjni-6.2.2.pom>,
>> it should not have any dependency conflicts with Apache Spark.
>>
>> Any suggestions are welcome!
>>
>> Best,
>>
>> Yuanjian





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to