lgo edited a comment on pull request #12345:
URL: https://github.com/apache/flink/pull/12345#issuecomment-774844384


   Sorry about having dropped the progress on these changes @qinjunjerry 
@Myasuka (life got the better of me for quite a while). I did not end up 
getting a good production benchmark for the changes here.
   
   I recently had the time to rebase these changes to the latest master. I also 
carved out the change to use `deleteRange` into a separate PR 
(https://github.com/apache/flink/pull/14893) and an early benchmark for that.
   
   While I was doing a pass on that PR, I stumbled on an earlier ticket and 
discussion where @sihuazhou previously investigated using the external ingest 
API to speed these paths up. This is documented in 
[FLINK-8845](https://issues.apache.org/jira/browse/FLINK-8845).
   
   Summarizing the details from there, @sihuazhou's initial work and 
investigation found that the RocksDB's Java API for the SST writer had some key 
performance issues. Specifically, the interface was limited to `put(byte[] key, 
byte[] value)` and internally copied memory for constructing the RocksDB 
`DirectSlice`. This added a non-trivial overhead causing the Java SstFileWriter 
to have poor performance. As a result, they implemented the 
`RocksDBWriteBatchWrapper` for bulk writes rather than SST file ingestion.
   
   I found the RocksDB issue with a detailed write-up outlining this problem 
with `SstFileWriter` performance: 
https://github.com/facebook/rocksdb/issues/2668.
   
   Now, since then there was a PR, 
https://github.com/facebook/rocksdb/pull/2283, made to address this issue but 
it was hanging open from 2017. It had only just gotten merged in Feb 2020! This 
change was released as part of the Java API in RocksDB 6.8.0, see the 
[6.8.0](https://github.com/facebook/rocksdb/blob/master/HISTORY.md#680-02242020).
   
   Provided @sihuazhou's earlier investigation, I suspect this branch may not 
have much of an improvement without upgrading RocksDB to 6.8.0. Given the state 
of the RocksDB upgrade in 
[FLINK-14482](https://issues.apache.org/jira/browse/FLINK-14482), I suspect 
it'll be quite some time (& work) before we get to there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to