[ 
https://issues.apache.org/jira/browse/KAFKA-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-12748:
----------------------------------
    Labels: rocksdb  (was: )

> Explore new RocksDB options to consider enabling by default
> -----------------------------------------------------------
>
>                 Key: KAFKA-12748
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12748
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>              Labels: rocksdb
>
> With the rocksdb version bump comes a lot of new options, some of which look 
> interesting enough to explore for usage in Streams. We should try setting 
> these as default options and run the benchmarks to look for any performance 
> benefit (or decrease). See javadocs for all Options 
> [here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]
> Options.setAvoidUnnecessaryBlockingIO: 
>     - As the name suggest, avoids blocking/long-latency tasks by scheduling a 
> background job to do it
> Options.setSkipCheckingSstFileSizesOnDbOpen:
>     - Speeds up startup time if there are many sst files, could mean less 
> overhead from things like rebalancing where tasks are migrated between 
> clients or threads. Not sure how many sst files counts as "many", may be less 
> useful now that we've disabled bulk loading 
>  Options.setBestEffortsRecovery: 
>     - Interesting feature to allow recovering missing files without the use 
> of the WAL. Could be useful if the on-disk state is corrupted (eg user 
> deletes a file) without needing to rebuild state from scratch. Though I'd 
> want to dig in further to understand what exactly it does and does not do. 
> Not a performance improvement but we should run the benchmarks to make sure 
> it doesn't make the performance worse.
> Options.setWriteDbidToManifest:
>     - Should be set to true if/when we ever need to rely on the DB id eg for 
> backups. Also not a performance improvement but we should still benchmark 
> this.
> Options.optimizeForSmallDb:
>     - This one is definitely not something we should set by default, as 
> "small" here means under 1GB. But it's probably worth at least calling out in 
> the docs for those users who know their data set size (per store) is under a 
> GB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to