Hi, Nico and yun: 
       Thanks for your great work and detail description on rocksdb version 
upgrade and performance. About 800 jobs are using rocksdb state backend in our 
production environment, and we
plan to upgrade more aim to solve the gc problems caused by large 
states.Because of non-restrict memory control on rocksdb, we have to spend a 
lot of time to solve the problem of memory usage beyond the physical 
memory.With the support of strict block cache, things will become much easier. 
Also, delete range api is useful for us too, so we prefer to upgrade the 
rocksdb to the new release version and +1(non-binding). best, zlzhang0122

At 2021-08-05 01:50:07, "Yun Tang" <myas...@live.com> wrote:
>Hi Yuval,
>
>Upgrading RocksDB version is a long story since Flink-1.10.
>When we first plan to introduce write buffer manager to help control the 
>memory usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from 
>current RocksDB-5.17. However, we found performance regression in our micro 
>benchmark on state operations [1] if bumped to RocksDB-5.18. We did not figure 
>the root cause at that time and decide to cherry pick the commits of write 
>buffer manager to our own FRocksDB [2]. And we finally released our own 
>frocksdbjni-5.17.2-artisans-2.0 at that time.
>
>As time goes no, more and more bugs or missed features have been reported in 
>the old RocksDB version. Such as:
>
>  1.  Cannot support ARM platform [3]
>  2.  Dose not have stable deleteRange API, which is useful for Flink scale 
> out [4]
>  3.  Cannot support strict block cache [5]
>  4.  Checkpoint might stuck if using UNIVERSVAL compaction strategy [6]
>  5.  Uncontrolled log size make us disabled the RocksDB internal LOG [7]
>  6.  RocksDB's optimizeForPointLookup option might cause data lost [8]
>  7.  Current dummy entry used for memory control in RocksDB-5.17 is too 
> large, leading performance problem [9]
>  8.  Cannot support alpine-based images.
>  9.  .......
>
>Some of the bugs are walked around, and some are still open.
>
>And we decide to make some changes from Flink-1.12. First of all, we reported 
>the performance regression problem compared with RocksDB-5.18 and RocksDB-5.17 
>to RocksDB community [10]. However, as RocksDB-5.x versions are a bit older 
>for the community, and RocksJava usage might not be the core part for facebook 
>guys, we did not get useful replies. Thus, we decide to figure out the root 
>cause of performance regression by ourself.
>Fortunately, we find the cause via binary search the commits among 
>RocksDB-5.18 and RocksDB-5.17, and updated in the original thread [10]. To be 
>short, the performance regression is due to different implementation of 
>`__thread` and `thread_local` in gcc and would have more impact on dynamic 
>loading [11], which is also what current RocksJava jar package does. With my 
>patch [12], the performance regression would disappear if comparing 
>RocksDB-5.18 with RocksDB-5.17.
>
>Unfortunately, RocksDB-5.18 still has many bugs and we want to bump to 
>RocksDB-6.x. However, another performance regression appeared even with my 
>patch [12]. With previous knowledge, we know that we must verify the built .so 
>files with our java-based benchmark instead of using RocksDB built-in 
>db-bench. I started to search the 1340+ commits from RocksDB-5.18 to 
>RocksDB-6.11 to find the performance problem. However, I did not figure out 
>the root cause after spending several weeks this time. The performance behaves 
>up and down in those commits and I cannot get the commit which lead the 
>performance regression. Take this commit of integrating block cache tracer in 
>block-based table reader [13] for example, I noticed that this commit would 
>cause a bit performance regression and that might be the useless usage 
>accounting in operations, however, the problematic code was changed in later 
>commits. Thus, after several weeks digging, I have to give up for the endless 
>searching in the thousand commits temporarily. As RocksDB community seems not 
>make the project management system public, unlike Apache's open JIRA systems, 
>we do not know what benchmark they actually run before releasing each version 
>to guarantee the performance.
>
>With my patch [10] on latest RocksDB-6.20.3, we could get the results on 
>nexmark in the original thread sent by Stephan, and we can see the performance 
>behaves closely in many real-world cases. And we also hope new features, such 
>as direct buffer supporting [14] in RocksJava could help improve RocksDB's 
>performance in the future.
>
>Hope this could explain what we already did.
>
>
>[1] https://github.com/apache/flink-benchmarks
>[2] https://github.com/ververica/frocksdb/tree/FRocksDB-5.17.2
>[3] https://issues.apache.org/jira/browse/FLINK-13598
>[4] https://issues.apache.org/jira/browse/FLINK-21321
>[5] https://github.com/facebook/rocksdb/issues/6247
>[6] https://issues.apache.org/jira/browse/FLINK-21726
>[7] https://issues.apache.org/jira/browse/FLINK-15068
>[8] https://issues.apache.org/jira/browse/FLINK-17800
>[9] https://github.com/facebook/rocksdb/pull/5175
>[10] https://github.com/facebook/rocksdb/issues/5774
>[11] http://david-grs.github.io/tls_performance_overhead_cost_linux/
>[12] https://github.com/ververica/frocksdb/pull/19
>[13] https://github.com/facebook/rocksdb/pull/5441/
>[14] https://github.com/facebook/rocksdb/pull/2283
>
>
>Best,
>Yun Tang
>
>
>On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov 
><yuva...@gmail.com<mailto:yuva...@gmail.com>> wrote:
>We are heavy users of RocksDB and have had several issues with memory managed 
>in Kubernetes, most of them actually went away when we upgraded from Flink 1.9 
>to 1.13.
>
>Do we know why there's such a huge performance regression? Can we improve this 
>somehow with some flag tweaking? It would be great if we see a more in depth 
>explanation of the gains vs losses of upgrading.
>
>On Wed, Aug 4, 2021 at 3:08 PM Stephan Ewen 
><se...@apache.org<mailto:se...@apache.org>> wrote:
>
>
>________________________________
>From: Nico Kruber <n...@apache.org>
>Sent: Thursday, August 5, 2021 0:10
>To: u...@flink.apache.org <u...@flink.apache.org>; dev <dev@flink.apache.org>
>Subject: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance
>
>That's actually also what I'm seeing most of the time and what I'd expect to
>improve with the newer RocksDB version.
>Hence, I'd also favour the upgrade even if there is a slight catch with
>respect to performance - we should, however, continue to investigate this
>together with the RocksDB community.
>
>
>Nico
>
>On Wednesday, 4 August 2021 14:26:32 CEST David Anderson wrote:
>> I am hearing quite often from users who are struggling to manage memory
>> usage, and these are all users using RocksDB. While I don't know for
>> certain that RocksDB is the cause in every case, from my perspective,
>> getting the better memory stability of version 6.20 in place is critical.
>>
>> Regards,
>> David
>>
>> On Wed, Aug 4, 2021 at 8:08 AM Stephan Ewen <se...@apache.org> wrote:
>> > Hi all!
>> >
>> > *!!!  If you are a big user of the Embedded RocksDB State Backend and have
>> > performance sensitive workloads, please read this !!!*
>> >
>> > I want to quickly raise some awareness for a RocksDB version upgrade we
>> > plan to do, and some possible impact on application performance.
>> >
>> > *We plan to upgrade RocksDB to version 6.20.* That version of RocksDB
>> > unfortunately introduces some non-trivial performance regression. In our
>> > Nexmark Benchmark, at least one query is up to 13% slower.
>> > With some fixes, this can be improved, but even then there is an overall
>> > *regression up to 6% in some queries*. (See attached table for results
>> > from relevant Nexmark Benchmark queries).
>> >
>> > We would do this update nonetheless, because we need to get new features
>> > and bugfixes from RocksDB in.
>> >
>> > Please respond to this mail thread if you have major concerns about this.
>> >
>> >
>> > *### Fallback Plan*
>> >
>> > Optionally, we could fall back to Plan B, which is to upgrade RocksDB only
>> > to version 5.18.4.
>> > Which has no performance regression (after applying a custom patch).
>> >
>> > While this spares us the performance degradation of RocksDB 6.20.x, this
>> >
>> > has multiple disadvantages:
>> >   - Does not include the better memory stability (strict cache control)
>> >   - Misses out on some new features which some users asked about
>> >   - Does not have the latest RocksDB bugfixes
>> >
>> > The latest point is especially bad in my opinion. While we can cherry-pick
>> > some bugfixes back (and have done this in the past), users typically run
>> > into an issue first and need to trace it back to RocksDB, then one of the
>> > committers can find the relevant patch from RocksDB master and backport
>> > it.
>> > That isn't the greatest user experience.
>> >
>> > Because of those disadvantages, we would prefer to do the upgrade to the
>> > newer RocksDB version despite the unfortunate performance regression.
>> >
>> > Best,
>> > Stephan
>
>
>--
>Dr. Nico Kruber | Solutions Architect
>
>Follow us @VervericaData Ververica
>--
>Join Flink Forward - The Apache Flink Conference
>Stream Processing | Event Driven | Real Time
>--
>Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>--
>Ververica GmbH
>Registered at Amtsgericht Charlottenburg: HRB 158244 B
>Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
>Wehner
>
>

Reply via email to