[ https://issues.apache.org/jira/browse/FLINK-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yun Tang resolved FLINK-19710. ------------------------------ Resolution: Information Provided As wrote in the [announcement email|https://lists.apache.org/thread.html/rf74c500b73d469bc1e45739d6a17689d40f1dec9a5058ea94a90b6c4%40%3Cuser.flink.apache.org%3E], we already helped to improve the performance of newer RocksDB version, but not fill the gap totally. After comparing the pros and cons, we decide to upgrade to newer RocksDb-6.20.3. > Fix performance regression to rebase FRocksDB with higher version RocksDB > ------------------------------------------------------------------------- > > Key: FLINK-19710 > URL: https://issues.apache.org/jira/browse/FLINK-19710 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends > Reporter: Yun Tang > Assignee: Yun Tang > Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.14.0 > > > We planed to bump base rocksDB version from 5.17.2 to 6.11.x. However, we > observed performance regression compared with 5.17.2 and 5.18.3 via our own > flink-benchmarks, and reported to RocksDB community in > [rocksdb#5774|https://github.com/facebook/rocksdb/issues/5774]. Since > rocksDB-5.18.3 is a bit old for RocksDB community, and rocksDB built-in > db_bench tool cannot easily reproduce this regression, we did not get any > efficient help from RocksDB community. > Since code freeze of Flink-release-1.12 is close, we have to figure it out by > ourself. We try to use rocksDB built-in db_bench tool first to binary > searching the 160 different commits between rocksDB 5.17.2 and 5.18.3. > However, the performance regression is not so clear. And after using our own > flink-benchmarks. We finally detect the commit which introduced the > nearly-10% performance regression: [replaced __thread with thread_local > keyword > |https://github.com/facebook/rocksdb/commit/d6ec288703c8fc53b54be9e3e3f3ffd6a7487c63] > . > From existing knowledge, the performance regression of {{thread-local}} is > known from [gcc-4.8 changes|https://gcc.gnu.org/gcc-4.8/changes.html#cxx] and > become more serious in [dynamic modules usage > |http://david-grs.github.io/tls_performance_overhead_cost_linux/] [[tls > benchmark|https://testbit.eu/2015/thread-local-storage-benchmark]]]. That > could explain why rocksDB built-in db_bench tool cannot reproduce this > regression as it is complied in static mode by recommendation. > > We plan to fix this in our FRocksDB branch first to revert related changes. > And from my current local experimental result, that revert proved to be > effective to avoid that performance regression. -- This message was sent by Atlassian Jira (v8.3.4#803005)