[jira] [Commented] (FLINK-35578) Release Frocksdb-8.10.0 official products
[ https://issues.apache.org/jira/browse/FLINK-35578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884943#comment-17884943 ] Yue Ma commented on FLINK-35578: [~zakelly] It seems that almost all the tickets that need to be developed have been completed. May I ask when the official version of frocksdbv8.10 can be released ? > Release Frocksdb-8.10.0 official products > - > > Key: FLINK-35578 > URL: https://issues.apache.org/jira/browse/FLINK-35578 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Zakelly Lan >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884942#comment-17884942 ] Yue Ma commented on FLINK-35574: [~Zakelly] can you help mark this ticket as Resolved ? > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 > *Details:* > |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired > state which has > time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already > in *FrocksDB-8.10.0*| > |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to > avoid performance > regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| > |Fix in FLINK-35575| > |[FRocksDB release guide and helping > scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already > in *FrocksDB-8.10.0*| > |+[Add content related to ARM building in the FROCKSDB-RELEASE > documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already > in *FrocksDB-8.10.0*| > |[[FLINK-23756] Update FrocksDB release document with more > info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already > in *FrocksDB-8.10.0*| > |[Add support for Apple Silicon to RocksJava > (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already > in *FrocksDB-8.10.0*| > |[Fix RocksJava releases for macOS > (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already > in *FrocksDB-8.10.0*| > |+[Fix clang13 build error > (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Resolve brken make > format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| > |Already in *FrocksDB-8.10.0*| > |+[Update circleci xcode version > (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already > in *FrocksDB-8.10.0*| > |+[Upgrade to Ubuntu 20.04 in our CircleCI > config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| > |Fix in > [FLINK-35577|https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47] > fixed in > https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47| > |[Disable useless broken tests due to ci-image > upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| > |Fix in FLINK-35577| > |[[hotfix] Use zlib's fossils page to replace > web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Change the resource request when running > CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already > in *FrocksDB-8.10.0*| > |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 > (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] > > [)|https://github.com/verve
[jira] [Commented] (FLINK-36186) Speed up RocksDB close during manual compaction
[ https://issues.apache.org/jira/browse/FLINK-36186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877960#comment-17877960 ] Yue Ma commented on FLINK-36186: I found that rocksdb does not have the JNI method for DisabManualCompaction. If this optimization is needed, it needs to be added to rocksdb JNI > Speed up RocksDB close during manual compaction > > > Key: FLINK-36186 > URL: https://issues.apache.org/jira/browse/FLINK-36186 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > After https://issues.apache.org/jira/browse/FLINK-26050, Flink RocksDB may > schedule manual compaction asynchronously, but if a failover occurs at this > time, RocksDB will need to wait for the manual comparison to complete before > it can close. This may result in a very long time for task cancellation, > affecting the time for task recovery. > {code:java} > // After this function call, CompactRange() or CompactFiles() will not > // run compactions and fail. Calling this function will tell outstanding > // manual compactions to abort and will wait for them to finish or abort > // before returning. > virtual void DisableManualCompaction() = 0; {code} > The solution is relatively simple. We can manually call > _DisabManulCompaction_ during db close to abort the running ManulCompaction, > which can accelerate db close faster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36186) Speed up RocksDB close during manual compaction
Yue Ma created FLINK-36186: -- Summary: Speed up RocksDB close during manual compaction Key: FLINK-36186 URL: https://issues.apache.org/jira/browse/FLINK-36186 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 After https://issues.apache.org/jira/browse/FLINK-26050, Flink RocksDB may schedule manual compaction asynchronously, but if a failover occurs at this time, RocksDB will need to wait for the manual comparison to complete before it can close. This may result in a very long time for task cancellation, affecting the time for task recovery. {code:java} // After this function call, CompactRange() or CompactFiles() will not // run compactions and fail. Calling this function will tell outstanding // manual compactions to abort and will wait for them to finish or abort // before returning. virtual void DisableManualCompaction() = 0; {code} The solution is relatively simple. We can manually call _DisabManulCompaction_ during db close to abort the running ManulCompaction, which can accelerate db close faster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35581) Remove comments from the code related to ingestDB
[ https://issues.apache.org/jira/browse/FLINK-35581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877629#comment-17877629 ] Yue Ma commented on FLINK-35581: Now, we've updated the Frocksdb version to V8.10 , So I think we can remove the comments in the code to make ingestDB available , [~roman] [~srichter] can you please take a look at this PR [https://github.com/apache/flink/pull/25263] ? > Remove comments from the code related to ingestDB > - > > Key: FLINK-35581 > URL: https://issues.apache.org/jira/browse/FLINK-35581 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35575) FRocksDB supports disabling perf context during compilation
[ https://issues.apache.org/jira/browse/FLINK-35575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871338#comment-17871338 ] Yue Ma commented on FLINK-35575: [~Zakelly] Can you please help to take a look at this pr [https://github.com/ververica/frocksdb/pull/76 |https://github.com/ververica/frocksdb/pull/76]? I want to add some default complication flag in Frocksdb Compile Script , but i'm not sure whether it's the best way. Do you have any suggestions ? > FRocksDB supports disabling perf context during compilation > --- > > Key: FLINK-35575 > URL: https://issues.apache.org/jira/browse/FLINK-35575 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > In FrocksDB 6 thread-local perf-context is disabled by reverting a specific > commit (FLINK-19710). However, this creates conflicts and makes upgrading > more difficult. We found that disabling *PERF_CONTEXT* can improve the > performance of statebenchmark by about 5% and it doesn't create any > conflicts. So we plan to supports disabling perf context during compilation > in FRocksDB new version -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867681#comment-17867681 ] Yue Ma edited comment on FLINK-35576 at 7/22/24 8:10 AM: - [~roman] [~zakelly] I've update the pull request [https://github.com/ververica/frocksdb/pull/78] [https://github.com/ververica/frocksdb/pull/79] was (Author: mayuehappy): update the pull request [https://github.com/ververica/frocksdb/pull/78] https://github.com/ververica/frocksdb/pull/79 > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867681#comment-17867681 ] Yue Ma commented on FLINK-35576: update the pull request [https://github.com/ververica/frocksdb/pull/78] https://github.com/ververica/frocksdb/pull/79 > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867252#comment-17867252 ] Yue Ma commented on FLINK-35576: [~roman] [~zakelly] I have read the descriptions of these two PRs, and these two commits are mainly used to fix API issues with *getColumnFamilyMetaData* and {*}IngestExternalFile{*}. Although that these two APIs have not yet been used in Flink, they do not have a significant impact on the current usage of Flink. But I think it may be used in the future, so cherry pick and these two commits are fine to me. > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866259#comment-17866259 ] Yue Ma commented on FLINK-35576: [~roman] I've finished the cherry-pick and fixed all the UT, can you please help to take a review the pr [https://github.com/ververica/frocksdb/pull/77]? > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-21436) Speed up the restore of UnionListState
[ https://issues.apache.org/jira/browse/FLINK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865323#comment-17865323 ] Yue Ma commented on FLINK-21436: [~fanrui] Yes, i think you are right , Most of the operators using UnionState are legacy sources and sinks. But there are still some users who also use unionState themselves. My question is if we plan to deprecate unionState in the feature, can we add some notice in the document (https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/)? If we don't have this intention, it's necessary to optimize the recovery time of union State? > Speed up the restore of UnionListState > > > Key: FLINK-21436 > URL: https://issues.apache.org/jira/browse/FLINK-21436 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.0 >Reporter: Rui Fan >Priority: Minor > Labels: auto-deprioritized-major > Attachments: JM 启动火焰图.svg, akka timeout Exception.png > > > h1. 1. Problem introduction and cause analysis > Problem description: The duration of UnionListState restore under large > concurrency is more than 2 minutes. > h2. the reason: > 2000 subtasks write 2000 files during checkpoint, and each subtask needs to > read 2000 files during restore. > 2000*2000 = 4 million, so 4 million small files need to be read to hdfs > during restore. HDFS has become a bottleneck, causing restore to be > particularly time-consuming. > h1. 2. Optimize ideas > Under normal circumstances, the UnionListState state is relatively small. > Typical usage scenario: Kafka offset information. > When restoring, JM can directly read all 2000 small files, merge > UnionListState into a byte array and send it to all TMs to avoid frequent > access to hdfs by TMs. > h1. 3. Benefits after optimization > Before optimization: 2000 concurrent, Kafka offset restore takes 90~130 s. > After optimization: 2000 concurrent, Kafka offset restore takes less than 1s. > h1. 4. Risk points > Too big UnionListState leads to too much pressure on JM. > Solution 1: > Add configuration and decide whether to enable this feature. The default is > false, which means the old plan is used. When the user is set to true, JM > will merge. > Solution 2: > The above configuration is not required, which is equivalent to enabling > merge by default. > JM detects the size of the state before merge, and if it is less than the > threshold, the state is considered to be relatively small, and the state is > sent to all TMs through ByteStreamStateHandle. > If the threshold is exceeded, the state is considered to be greater. At this > time, write an hdfs file, and send FileStateHandle to all TMs, and TM can > read this file. > > Note: Most of the scenarios where Flink uses UnionListState are Kafka offset > (small state). In theory, most jobs are risk-free. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-21436) Speed up the restore of UnionListState
[ https://issues.apache.org/jira/browse/FLINK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864966#comment-17864966 ] Yue Ma edited comment on FLINK-21436 at 7/11/24 8:45 AM: - [~fanrui] [~yunta] I just found this ticket , and i have one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it may still makes sense to speed up UnionState restore time currently ? was (Author: mayuehappy): [~fanrui] [~yunta] I just found this ticket , and i have one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it may still makes sense to resume acceleration in UnionState currently ? > Speed up the restore of UnionListState > > > Key: FLINK-21436 > URL: https://issues.apache.org/jira/browse/FLINK-21436 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.0 >Reporter: Rui Fan >Priority: Minor > Labels: auto-deprioritized-major > Attachments: JM 启动火焰图.svg, akka timeout Exception.png > > > h1. 1. Problem introduction and cause analysis > Problem description: The duration of UnionListState restore under large > concurrency is more than 2 minutes. > h2. the reason: > 2000 subtasks write 2000 files during checkpoint, and each subtask needs to > read 2000 files during restore. > 2000*2000 = 4 million, so 4 million small files need to be read to hdfs > during restore. HDFS has become a bottleneck, causing restore to be > particularly time-consuming. > h1. 2. Optimize ideas > Under normal circumstances, the UnionListState state is relatively small. > Typical usage scenario: Kafka offset information. > When restoring, JM can directly read all 2000 small files, merge > UnionListState into a byte array and send it to all TMs to avoid frequent > access to hdfs by TMs. > h1. 3. Benefits after optimization > Before optimization: 2000 concurrent, Kafka offset restore takes 90~130 s. > After optimization: 2000 concurrent, Kafka offset restore takes less than 1s. > h1. 4. Risk points > Too big UnionListState leads to too much pressure on JM. > Solution 1: > Add configuration and decide whether to enable this feature. The default is > false, which means the old plan is used. When the user is set to true, JM > will merge. > Solution 2: > The above configuration is not required, which is equivalent to enabling > merge by default. > JM detects the size of the state before merge, and if it is less than the > threshold, the state is considered to be relatively small, and the state is > sent to all TMs through ByteStreamStateHandle. > If the threshold is exceeded, the state is considered to be greater. At this > time, write an hdfs file, and send FileStateHandle to all TMs, and TM can > read this file. > > Note: Most of the scenarios where Flink uses UnionListState are Kafka offset > (small state). In theory, most jobs are risk-free. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-21436) Speed up the restore of UnionListState
[ https://issues.apache.org/jira/browse/FLINK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864966#comment-17864966 ] Yue Ma edited comment on FLINK-21436 at 7/11/24 8:42 AM: - [~fanrui] [~yunta] I just found this ticket , and i have one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it may still makes sense to resume acceleration in UnionState currently ? was (Author: mayuehappy): [~fanrui] [~yunta] I just found this ticket , and i have one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it still makes sense to resume acceleration in UnionState currently ? > Speed up the restore of UnionListState > > > Key: FLINK-21436 > URL: https://issues.apache.org/jira/browse/FLINK-21436 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.0 >Reporter: Rui Fan >Priority: Minor > Labels: auto-deprioritized-major > Attachments: JM 启动火焰图.svg, akka timeout Exception.png > > > h1. 1. Problem introduction and cause analysis > Problem description: The duration of UnionListState restore under large > concurrency is more than 2 minutes. > h2. the reason: > 2000 subtasks write 2000 files during checkpoint, and each subtask needs to > read 2000 files during restore. > 2000*2000 = 4 million, so 4 million small files need to be read to hdfs > during restore. HDFS has become a bottleneck, causing restore to be > particularly time-consuming. > h1. 2. Optimize ideas > Under normal circumstances, the UnionListState state is relatively small. > Typical usage scenario: Kafka offset information. > When restoring, JM can directly read all 2000 small files, merge > UnionListState into a byte array and send it to all TMs to avoid frequent > access to hdfs by TMs. > h1. 3. Benefits after optimization > Before optimization: 2000 concurrent, Kafka offset restore takes 90~130 s. > After optimization: 2000 concurrent, Kafka offset restore takes less than 1s. > h1. 4. Risk points > Too big UnionListState leads to too much pressure on JM. > Solution 1: > Add configuration and decide whether to enable this feature. The default is > false, which means the old plan is used. When the user is set to true, JM > will merge. > Solution 2: > The above configuration is not required, which is equivalent to enabling > merge by default. > JM detects the size of the state before merge, and if it is less than the > threshold, the state is considered to be relatively small, and the state is > sent to all TMs through ByteStreamStateHandle. > If the threshold is exceeded, the state is considered to be greater. At this > time, write an hdfs file, and send FileStateHandle to all TMs, and TM can > read this file. > > Note: Most of the scenarios where Flink uses UnionListState are Kafka offset > (small state). In theory, most jobs are risk-free. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-21436) Speed up the restore of UnionListState
[ https://issues.apache.org/jira/browse/FLINK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864966#comment-17864966 ] Yue Ma edited comment on FLINK-21436 at 7/11/24 8:42 AM: - [~fanrui] [~yunta] I just found this ticket , and i have one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it still makes sense to resume acceleration in UnionState currently ? was (Author: mayuehappy): [~fanrui] [~yunta] I just found this ticket , and has one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it still makes sense to resume acceleration in UnionState currently ? > Speed up the restore of UnionListState > > > Key: FLINK-21436 > URL: https://issues.apache.org/jira/browse/FLINK-21436 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.0 >Reporter: Rui Fan >Priority: Minor > Labels: auto-deprioritized-major > Attachments: JM 启动火焰图.svg, akka timeout Exception.png > > > h1. 1. Problem introduction and cause analysis > Problem description: The duration of UnionListState restore under large > concurrency is more than 2 minutes. > h2. the reason: > 2000 subtasks write 2000 files during checkpoint, and each subtask needs to > read 2000 files during restore. > 2000*2000 = 4 million, so 4 million small files need to be read to hdfs > during restore. HDFS has become a bottleneck, causing restore to be > particularly time-consuming. > h1. 2. Optimize ideas > Under normal circumstances, the UnionListState state is relatively small. > Typical usage scenario: Kafka offset information. > When restoring, JM can directly read all 2000 small files, merge > UnionListState into a byte array and send it to all TMs to avoid frequent > access to hdfs by TMs. > h1. 3. Benefits after optimization > Before optimization: 2000 concurrent, Kafka offset restore takes 90~130 s. > After optimization: 2000 concurrent, Kafka offset restore takes less than 1s. > h1. 4. Risk points > Too big UnionListState leads to too much pressure on JM. > Solution 1: > Add configuration and decide whether to enable this feature. The default is > false, which means the old plan is used. When the user is set to true, JM > will merge. > Solution 2: > The above configuration is not required, which is equivalent to enabling > merge by default. > JM detects the size of the state before merge, and if it is less than the > threshold, the state is considered to be relatively small, and the state is > sent to all TMs through ByteStreamStateHandle. > If the threshold is exceeded, the state is considered to be greater. At this > time, write an hdfs file, and send FileStateHandle to all TMs, and TM can > read this file. > > Note: Most of the scenarios where Flink uses UnionListState are Kafka offset > (small state). In theory, most jobs are risk-free. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-21436) Speed up the restore of UnionListState
[ https://issues.apache.org/jira/browse/FLINK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864966#comment-17864966 ] Yue Ma commented on FLINK-21436: [~fanrui] [~yunta] I just found this ticket , and has one small question . When does Union State plan to deprecated? Since some source and sink operator still rely on UnionState, so it still makes sense to resume acceleration in UnionState currently ? > Speed up the restore of UnionListState > > > Key: FLINK-21436 > URL: https://issues.apache.org/jira/browse/FLINK-21436 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.0 >Reporter: Rui Fan >Priority: Minor > Labels: auto-deprioritized-major > Attachments: JM 启动火焰图.svg, akka timeout Exception.png > > > h1. 1. Problem introduction and cause analysis > Problem description: The duration of UnionListState restore under large > concurrency is more than 2 minutes. > h2. the reason: > 2000 subtasks write 2000 files during checkpoint, and each subtask needs to > read 2000 files during restore. > 2000*2000 = 4 million, so 4 million small files need to be read to hdfs > during restore. HDFS has become a bottleneck, causing restore to be > particularly time-consuming. > h1. 2. Optimize ideas > Under normal circumstances, the UnionListState state is relatively small. > Typical usage scenario: Kafka offset information. > When restoring, JM can directly read all 2000 small files, merge > UnionListState into a byte array and send it to all TMs to avoid frequent > access to hdfs by TMs. > h1. 3. Benefits after optimization > Before optimization: 2000 concurrent, Kafka offset restore takes 90~130 s. > After optimization: 2000 concurrent, Kafka offset restore takes less than 1s. > h1. 4. Risk points > Too big UnionListState leads to too much pressure on JM. > Solution 1: > Add configuration and decide whether to enable this feature. The default is > false, which means the old plan is used. When the user is set to true, JM > will merge. > Solution 2: > The above configuration is not required, which is equivalent to enabling > merge by default. > JM detects the size of the state before merge, and if it is less than the > threshold, the state is considered to be relatively small, and the state is > sent to all TMs through ByteStreamStateHandle. > If the threshold is exceeded, the state is considered to be greater. At this > time, write an hdfs file, and send FileStateHandle to all TMs, and TM can > read this file. > > Note: Most of the scenarios where Flink uses UnionListState are Kafka offset > (small state). In theory, most jobs are risk-free. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864949#comment-17864949 ] Yue Ma commented on FLINK-35576: [https://github.com/ververica/frocksdb/pull/77] [~Zakelly] Can you please help take a review this PR , I cherry-pick some commit to support IngestDB related API and fix some broken CI case > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864520#comment-17864520 ] Yue Ma edited comment on FLINK-35576 at 7/10/24 12:00 PM: -- [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* can't pass. Mainly because there are still some related changes have not been cherry picked. In order to solve these unit tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] [https://github.com/facebook/rocksdb/pull/12526] [https://github.com/facebook/rocksdb/pull/12602] was (Author: mayuehappy): [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* can't pass. Mainly because there are still some related changes in the commit that have not been cherrypicked. In order to solve these single tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] [https://github.com/facebook/rocksdb/pull/12526] [https://github.com/facebook/rocksdb/pull/12602] > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864520#comment-17864520 ] Yue Ma edited comment on FLINK-35576 at 7/10/24 7:54 AM: - [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* can't pass. Mainly because there are still some related changes in the commit that have not been cherrypicked. In order to solve these single tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] [https://github.com/facebook/rocksdb/pull/12526] [https://github.com/facebook/rocksdb/pull/12602] was (Author: mayuehappy): [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* didn't pass. Mainly because there are still some related changes in the commit that have not been cherrypicked. In order to solve these single tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] [https://github.com/facebook/rocksdb/pull/12526] https://github.com/facebook/rocksdb/pull/12602 > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864520#comment-17864520 ] Yue Ma edited comment on FLINK-35576 at 7/10/24 7:46 AM: - [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* didn't pass. Mainly because there are still some related changes in the commit that have not been cherrypicked. In order to solve these single tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] [https://github.com/facebook/rocksdb/pull/12526] https://github.com/facebook/rocksdb/pull/12602 was (Author: mayuehappy): [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* didn't pass. Mainly because there are still some related changes in the commit that have not been cherrypicked. In order to solve these single tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] https://github.com/facebook/rocksdb/pull/12526 > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864520#comment-17864520 ] Yue Ma commented on FLINK-35576: [~roman] When I finished this cherry-pick, I found that UT *_AssignEpochNumberToMultipleCF_* didn't pass. Mainly because there are still some related changes in the commit that have not been cherrypicked. In order to solve these single tests, I cherry-picked all other commits. Mainly include [https://github.com/facebook/rocksdb/pull/12306] [https://github.com/facebook/rocksdb/pull/12236] [https://github.com/facebook/rocksdb/pull/12130] https://github.com/facebook/rocksdb/pull/12526 > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35576: --- Description: We support the API related to ingest DB in FRocksDb-8.10.0, but many of the fixes related to ingest DB were only integrated in the latest RocksDB version. So we need to add these fixed commit cherryclicks to FRocksDB. Mainly include: |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| | |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| | |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| | |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| | |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| | |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| | |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| | |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| | |https://github.com/facebook/rocksdb/pull/12526| |Fix in https://issues.apache.org/jira/browse/FLINK-35576| |https://github.com/facebook/rocksdb/pull/12602| |Fix in https://issues.apache.org/jira/browse/FLINK-35576| was: We support the API related to ingest DB in FRocksDb-8.10.0, but many of the fixes related to ingest DB were only integrated in the latest RocksDB version. So we need to add these fixed commit cherryclicks to FRocksDB. Mainly include: |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| | |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| | |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| | |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| | |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| | |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| | |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| | |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| | |https://github.com/facebook/rocksdb/pull/12602| |Fix in https://issues.apache.org/jira/browse/FLINK-35576| > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12526| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862415#comment-17862415 ] Yue Ma commented on FLINK-35576: https://github.com/ververica/frocksdb/pull/77 > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35575) FRocksDB supports disabling perf context during compilation
[ https://issues.apache.org/jira/browse/FLINK-35575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862410#comment-17862410 ] Yue Ma commented on FLINK-35575: https://github.com/ververica/frocksdb/pull/76 > FRocksDB supports disabling perf context during compilation > --- > > Key: FLINK-35575 > URL: https://issues.apache.org/jira/browse/FLINK-35575 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > In FrocksDB 6 thread-local perf-context is disabled by reverting a specific > commit (FLINK-19710). However, this creates conflicts and makes upgrading > more difficult. We found that disabling *PERF_CONTEXT* can improve the > performance of statebenchmark by about 5% and it doesn't create any > conflicts. So we plan to supports disabling perf context during compilation > in FRocksDB new version -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860327#comment-17860327 ] Yue Ma commented on FLINK-35574: [~Zakelly] Thanks for the reviewing . Can you please help assign this ticket to me and mark as Resolved ? I'm moving foward to the next subtask > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 > *Details:* > |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired > state which has > time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already > in *FrocksDB-8.10.0*| > |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to > avoid performance > regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| > |Fix in FLINK-35575| > |[FRocksDB release guide and helping > scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already > in *FrocksDB-8.10.0*| > |+[Add content related to ARM building in the FROCKSDB-RELEASE > documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already > in *FrocksDB-8.10.0*| > |[[FLINK-23756] Update FrocksDB release document with more > info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already > in *FrocksDB-8.10.0*| > |[Add support for Apple Silicon to RocksJava > (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already > in *FrocksDB-8.10.0*| > |[Fix RocksJava releases for macOS > (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already > in *FrocksDB-8.10.0*| > |+[Fix clang13 build error > (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Resolve brken make > format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| > |Already in *FrocksDB-8.10.0*| > |+[Update circleci xcode version > (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already > in *FrocksDB-8.10.0*| > |+[Upgrade to Ubuntu 20.04 in our CircleCI > config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| > |Fix in > [FLINK-35577|https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47] > fixed in > https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47| > |[Disable useless broken tests due to ci-image > upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| > |Fix in FLINK-35577| > |[[hotfix] Use zlib's fossils page to replace > web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Change the resource request when running > CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already > in *FrocksDB-8.10.0*| > |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 > (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com
[jira] [Commented] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859542#comment-17859542 ] Yue Ma commented on FLINK-35574: [~roman] If I don't fix these CIs inside this PR. I'm worried about this PR can't be merged. because ci didn't pass . Can you help merge this PR ? So I can move on to the next subtask. > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 > *Details:* > |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired > state which has > time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already > in *FrocksDB-8.10.0*| > |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to > avoid performance > regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| > |Fix in FLINK-35575| > |[FRocksDB release guide and helping > scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already > in *FrocksDB-8.10.0*| > |+[Add content related to ARM building in the FROCKSDB-RELEASE > documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already > in *FrocksDB-8.10.0*| > |[[FLINK-23756] Update FrocksDB release document with more > info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already > in *FrocksDB-8.10.0*| > |[Add support for Apple Silicon to RocksJava > (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already > in *FrocksDB-8.10.0*| > |[Fix RocksJava releases for macOS > (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already > in *FrocksDB-8.10.0*| > |+[Fix clang13 build error > (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Resolve brken make > format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| > |Already in *FrocksDB-8.10.0*| > |+[Update circleci xcode version > (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already > in *FrocksDB-8.10.0*| > |+[Upgrade to Ubuntu 20.04 in our CircleCI > config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| > |Fix in > [FLINK-35577|https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47] > fixed in > https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47| > |[Disable useless broken tests due to ci-image > upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| > |Fix in FLINK-35577| > |[[hotfix] Use zlib's fossils page to replace > web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Change the resource request when running > CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already > in *FrocksDB-8.10.0*| > |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 > (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686d
[jira] [Comment Edited] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859542#comment-17859542 ] Yue Ma edited comment on FLINK-35574 at 6/24/24 3:26 AM: - [~roman] Thanks for the Review! If I don't fix these CIs inside this PR. I'm worried about this PR can't be merged. because ci didn't pass . Can you help merge this PR ? So I can move on to the next subtask. was (Author: mayuehappy): [~roman] If I don't fix these CIs inside this PR. I'm worried about this PR can't be merged. because ci didn't pass . Can you help merge this PR ? So I can move on to the next subtask. > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 > *Details:* > |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired > state which has > time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already > in *FrocksDB-8.10.0*| > |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to > avoid performance > regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| > |Fix in FLINK-35575| > |[FRocksDB release guide and helping > scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already > in *FrocksDB-8.10.0*| > |+[Add content related to ARM building in the FROCKSDB-RELEASE > documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already > in *FrocksDB-8.10.0*| > |[[FLINK-23756] Update FrocksDB release document with more > info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already > in *FrocksDB-8.10.0*| > |[Add support for Apple Silicon to RocksJava > (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already > in *FrocksDB-8.10.0*| > |[Fix RocksJava releases for macOS > (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already > in *FrocksDB-8.10.0*| > |+[Fix clang13 build error > (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Resolve brken make > format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| > |Already in *FrocksDB-8.10.0*| > |+[Update circleci xcode version > (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already > in *FrocksDB-8.10.0*| > |+[Upgrade to Ubuntu 20.04 in our CircleCI > config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| > |Fix in > [FLINK-35577|https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47] > fixed in > https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47| > |[Disable useless broken tests due to ci-image > upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| > |Fix in FLINK-35577| > |[[hotfix] Use zlib's fossils page to replace > web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Change the resource request when running > CI|https://github.com/ververica/frocksdb/commit/2ec
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 *Details:* |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in [FLINK-35577|https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47] fixed in https://github.com/facebook/rocksdb/pull/9481/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[
[jira] [Comment Edited] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854641#comment-17854641 ] Yue Ma edited comment on FLINK-35574 at 6/13/24 6:41 AM: - [~roman] can you please take a review this PR [https://github.com/ververica/frocksdb/pull/74] was (Author: mayuehappy): [~roman] can you please take a view this PR https://github.com/ververica/frocksdb/pull/74 > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 > *Details:* > |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired > state which has > time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already > in *FrocksDB-8.10.0*| > |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to > avoid performance > regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| > |Fix in FLINK-35575| > |[FRocksDB release guide and helping > scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already > in *FrocksDB-8.10.0*| > |+[Add content related to ARM building in the FROCKSDB-RELEASE > documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already > in *FrocksDB-8.10.0*| > |[[FLINK-23756] Update FrocksDB release document with more > info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already > in *FrocksDB-8.10.0*| > |[Add support for Apple Silicon to RocksJava > (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already > in *FrocksDB-8.10.0*| > |[Fix RocksJava releases for macOS > (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already > in *FrocksDB-8.10.0*| > |+[Fix clang13 build error > (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Resolve brken make > format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| > |Already in *FrocksDB-8.10.0*| > |+[Update circleci xcode version > (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already > in *FrocksDB-8.10.0*| > |+[Upgrade to Ubuntu 20.04 in our CircleCI > config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| > |Fix in FLINK-35577| > |[Disable useless broken tests due to ci-image > upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| > |Fix in FLINK-35577| > |[[hotfix] Use zlib's fossils page to replace > web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Change the resource request when running > CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already > in *FrocksDB-8.10.0*| > |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 > (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] > > [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a
[jira] [Commented] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854641#comment-17854641 ] Yue Ma commented on FLINK-35574: [~roman] can you please take a view this PR https://github.com/ververica/frocksdb/pull/74 > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 > *Details:* > |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired > state which has > time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already > in *FrocksDB-8.10.0*| > |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to > avoid performance > regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| > |Fix in FLINK-35575| > |[FRocksDB release guide and helping > scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already > in *FrocksDB-8.10.0*| > |+[Add content related to ARM building in the FROCKSDB-RELEASE > documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already > in *FrocksDB-8.10.0*| > |[[FLINK-23756] Update FrocksDB release document with more > info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already > in *FrocksDB-8.10.0*| > |[Add support for Apple Silicon to RocksJava > (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already > in *FrocksDB-8.10.0*| > |[Fix RocksJava releases for macOS > (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already > in *FrocksDB-8.10.0*| > |+[Fix clang13 build error > (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Resolve brken make > format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| > |Already in *FrocksDB-8.10.0*| > |+[Update circleci xcode version > (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already > in *FrocksDB-8.10.0*| > |+[Upgrade to Ubuntu 20.04 in our CircleCI > config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| > |Fix in FLINK-35577| > |[Disable useless broken tests due to ci-image > upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| > |Fix in FLINK-35577| > |[[hotfix] Use zlib's fossils page to replace > web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already > in *FrocksDB-8.10.0*| > |+[[hotfix] Change the resource request when running > CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already > in *FrocksDB-8.10.0*| > |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 > (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] > > [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| > |*FrocksDB-8.10.0 has upgrade to 1.3*| > |[fix(CompactionFilter): avoid expensive ToString call when not in > Debug`|https://github.com/ververica/f
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 *Details:* |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 *Details:* |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 *Details:* |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0
[jira] [Updated] (FLINK-35573) [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0
[ https://issues.apache.org/jira/browse/FLINK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35573: --- Description: The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically. The plan is as follows *Release Frocksdb-8.10.0 official products* # Prepare the compiled frocksdb branch # Support Disable PERF-CONTEXT in compilation # Cherry pick IngestDB requires commit # Setup the CI environment for FRocksDB-8.10 # Release Frocksdb-8.10.0 official products # Update the dependency of FRocksDB in pom file of Flink-RocksDB-Statebackend *Make ingestDB available* # Flink ingestDB code related bug fixes # Remove comments from the code related to ingestDB and mark the functionality of ingestDB as available (perhaps we can mark ingestDB as an experimental feature first) # Using ingestDB as the default recovery mode for rescaling (已编辑) was: The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically. The plan is as follows *Release Frocksdb-8.10.0 official products* # Prepare the compiled frocksdb branch # Support Disable PERF-CONTEXT in compilation # Cherry pick IngestDB requires commit # Setup the CI environment for FRocksDB-8.10 # Release Frocksdb-8.10.0 official products # Update the dependency of FRocksDB in pom file of Flink-RocksDB-Statebackend *Make ingestDB available* # Flink ingestDB code related bug fixes # Remove comments from the code related to ingestDB and mark the functionality of ingestDB as available (perhaps we can mark ingestDB as an experimental feature first) # Using ingestDB as the default recovery mode for rescaling (已编辑) > [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0 > - > > Key: FLINK-35573 > URL: https://issues.apache.org/jira/browse/FLINK-35573 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > The FLIP: > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] > > RocksDBStateBackend is widely used by Flink users in large state > scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which > mainly supported features such as support arm platform, deleteRange API, > period compaction, etc. It has been a long time since then, and RocksDB has > now been released to version 8.x. The main motivation for this upgrade is to > leverage the features of higher versions of Rocksdb to make Flink > RocksDBStateBackend more powerful. While RocksDB is also continuously > optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with > RocksDB and upgrade it periodically. > The plan is as follows > *Release Frocksdb-8.10.0 official products* > # Prepare the compiled frocksdb branch > # Support Disable PERF-CONTEXT in compilation > # Cherry pick IngestDB requires commit > # Setup the CI environment for FRocksDB-8.10 > # Release Frocksdb-8.10.0 official products > # Update the dependency of FRocksDB in pom file of Flink-RocksDB-Statebackend > *Make ingestDB available* > # Flink ingestDB code related bug fixes > # Remove comments from the code related to ingestDB and mark the > functionality of ingestDB as available (perhaps we can mark ingestDB as an > experimental feature first) > # Using ingestDB as the default recovery mode for rescaling > (已编辑) -- This message
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 *Details:* |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0
[jira] [Updated] (FLINK-35573) [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0
[ https://issues.apache.org/jira/browse/FLINK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35573: --- Description: The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically. The plan is as follows *Release Frocksdb-8.10.0 official products* # Prepare the compiled frocksdb branch # Support Disable PERF-CONTEXT in compilation # Cherry pick IngestDB requires commit # Setup the CI environment for FRocksDB-8.10 # Release Frocksdb-8.10.0 official products # Update the dependency of FRocksDB in pom file of Flink-RocksDB-Statebackend *Make ingestDB available* # Flink ingestDB code related bug fixes # Remove comments from the code related to ingestDB and mark the functionality of ingestDB as available (perhaps we can mark ingestDB as an experimental feature first) # Using ingestDB as the default recovery mode for rescaling (已编辑) was: The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] _RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically._ > [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0 > - > > Key: FLINK-35573 > URL: https://issues.apache.org/jira/browse/FLINK-35573 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > The FLIP: > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] > > RocksDBStateBackend is widely used by Flink users in large state > scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which > mainly supported features such as support arm platform, deleteRange API, > period compaction, etc. It has been a long time since then, and RocksDB has > now been released to version 8.x. The main motivation for this upgrade is to > leverage the features of higher versions of Rocksdb to make Flink > RocksDBStateBackend more powerful. While RocksDB is also continuously > optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with > RocksDB and upgrade it periodically. > The plan is as follows > *Release Frocksdb-8.10.0 official products* > # Prepare the compiled frocksdb branch > # Support Disable PERF-CONTEXT in compilation > # Cherry pick IngestDB requires commit > # Setup the CI environment for FRocksDB-8.10 > # Release Frocksdb-8.10.0 official products > # Update the dependency of FRocksDB in pom file of Flink-RocksDB-Statebackend > *Make ingestDB available* # Flink ingestDB code related bug fixes > # Remove comments from the code related to ingestDB and mark the > functionality of ingestDB as available (perhaps we can mark ingestDB as an > experimental feature first) > # Using ingestDB as the default recovery mode for rescaling > (已编辑) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35576) FRocksDB cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35576: --- Summary: FRocksDB cherry pick IngestDB related commits (was: FRocksdb cherry pick IngestDB related commits) > FRocksDB cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |[[FLINK-
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |*FrocksDB-8.10.0 has upgrade to 1.3*| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |[[FLINK-
[jira] [Updated] (FLINK-35576) FRocksdb cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35576: --- Description: We support the API related to ingest DB in FRocksDb-8.10.0, but many of the fixes related to ingest DB were only integrated in the latest RocksDB version. So we need to add these fixed commit cherryclicks to FRocksDB. Mainly include: |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| | |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| | |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| | |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| | |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| | |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| | |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| | |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| | |https://github.com/facebook/rocksdb/pull/12602| |Fix in https://issues.apache.org/jira/browse/FLINK-35576| was: We support the API related to ingest DB in FRocksDb-8.10.0, but many of the fixes related to ingest DB were only integrated in the latest RocksDB version. So we need to add these fixed commit cherryclicks to FRocksDB. Mainly include: [https://github.com/facebook/rocksdb/pull/11646] [https://github.com/facebook/rocksdb/pull/11868] [https://github.com/facebook/rocksdb/pull/11811] [https://github.com/facebook/rocksdb/pull/11381] [https://github.com/facebook/rocksdb/pull/11379] [https://github.com/facebook/rocksdb/pull/11378] > FRocksdb cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > |*RocksDB Main Branch*|*Commit ID in FrocksDB-8.10.0*|*Plan*| > |https://github.com/facebook/rocksdb/pull/11646|44f0ff31c21164685a6cd25a2beb944767c39e46| > | > |[https://github.com/facebook/rocksdb/pull/11868]|8e1adab5cecad129131a4eceabe645b9442acb9c| > | > |https://github.com/facebook/rocksdb/pull/11811|3c27f56d0b7e359defbc25bf90061214c889f40b| > | > |https://github.com/facebook/rocksdb/pull/11381|4d72f48e57cb0a95b67ff82c6e971f826750334e| > | > |https://github.com/facebook/rocksdb/pull/11379|8d8eb0e77e13a3902d23fbda742dc47aa7bc418f| > | > |https://github.com/facebook/rocksdb/pull/11378|fa878a01074fe039135e37720f669391d1663525| > | > |https://github.com/facebook/rocksdb/pull/12219|183d80d7dc4ce339ab1b6796661d5879b7a40d6a| > | > |https://github.com/facebook/rocksdb/pull/12328|ef430fc72407950f94ca2a4fbb2b15de7ae8ff4f| > | > |https://github.com/facebook/rocksdb/pull/12602| |Fix in > https://issues.apache.org/jira/browse/FLINK-35576| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Description: As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 |*JIRA*|*FrocksDB-6.20.3*|*Commit ID in FrocksDB-8.10.0*|*Plan*| |[[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live|https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786]|3da8249d50c8a3a6ea229f43890d37e098372786|d606c9450bef7d2a22c794f406d7940d9d2f29a4|Already in *FrocksDB-8.10.0*| |+[[FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression|https://github.com/ververica/frocksdb/commit/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01]+|d6f50f33064f1d24480dfb3c586a7bd7a7dbac01| |Fix in FLINK-35575| |[FRocksDB release guide and helping scripts|https://github.com/ververica/frocksdb/commit/2673de8e5460af8d23c0c7e1fb0c3258ea283419]|2673de8e5460af8d23c0c7e1fb0c3258ea283419|b58ba05a380d9bf0c223bc707f14897ce392ce1b|Already in *FrocksDB-8.10.0*| |+[Add content related to ARM building in the FROCKSDB-RELEASE documentation|https://github.com/ververica/frocksdb/commit/ec27ca01db5ff579dd7db1f70cf3a4677b63d589]+|ec27ca01db5ff579dd7db1f70cf3a4677b63d589|6cae002662a45131a0cd90dd84f5d3d3cb958713|Already in *FrocksDB-8.10.0*| |[[FLINK-23756] Update FrocksDB release document with more info|https://github.com/ververica/frocksdb/commit/f75e983045f4b64958dc0e93e8b94a7cfd7663be]|f75e983045f4b64958dc0e93e8b94a7cfd7663be|bac6aeb6e012e19d9d5e3a5ee22b84c1e4a1559c|Already in *FrocksDB-8.10.0*| |[Add support for Apple Silicon to RocksJava (#9254)|https://github.com/ververica/frocksdb/commit/dac2c60bc31b596f445d769929abed292878cac1]|dac2c60bc31b596f445d769929abed292878cac1|#9254|Already in *FrocksDB-8.10.0*| |[Fix RocksJava releases for macOS (#9662)|https://github.com/ververica/frocksdb/commit/22637e11968a627a06a3ac8aa78126e3ae6d1368]|22637e11968a627a06a3ac8aa78126e3ae6d1368|#9662|Already in *FrocksDB-8.10.0*| |+[Fix clang13 build error (#9374)|https://github.com/ververica/frocksdb/commit/a20fb9fa96af7b18015754cf44463e22fc123222]+|a20fb9fa96af7b18015754cf44463e22fc123222|#9374|Already in *FrocksDB-8.10.0*| |+[[hotfix] Resolve brken make format|https://github.com/ververica/frocksdb/commit/cf0acdc08fb1b8397ef29f3b7dc7e0400107555e]+|7a87e0bf4d59cc48f40ce69cf7b82237c5e8170c| |Already in *FrocksDB-8.10.0*| |+[Update circleci xcode version (#9405)|https://github.com/ververica/frocksdb/commit/f24393bdc8d44b79a9be7a58044e5fd01cf50df7]+|cf0acdc08fb1b8397ef29f3b7dc7e0400107555e|#9405|Already in *FrocksDB-8.10.0*| |+[Upgrade to Ubuntu 20.04 in our CircleCI config|https://github.com/ververica/frocksdb/commit/1fecfda040745fc508a0ea0bcbb98c970f89ee3e]+|1fecfda040745fc508a0ea0bcbb98c970f89ee3e| |Fix in FLINK-35577| |[Disable useless broken tests due to ci-image upgraded|https://github.com/ververica/frocksdb/commit/9fef987e988c53a33b7807b85a56305bd9dede81]|9fef987e988c53a33b7807b85a56305bd9dede81| |Fix in FLINK-35577| |[[hotfix] Use zlib's fossils page to replace web.archive|https://github.com/ververica/frocksdb/commit/cbc35db93f312f54b49804177ca11dea44b4d98e]|cbc35db93f312f54b49804177ca11dea44b4d98e|8fff7bb9947f9036021f99e3463c9657e80b71ae|Already in *FrocksDB-8.10.0*| |+[[hotfix] Change the resource request when running CI|https://github.com/ververica/frocksdb/commit/2ec1019fd0433cb8ea5365b58faa2262ea0014e9]+|2ec1019fd0433cb8ea5365b58faa2262ea0014e9|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |{+}[[FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13 (|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]{+}[#56|https://github.com/ververica/frocksdb/pull/56] [)|https://github.com/ververica/frocksdb/commit/3eac409606fcd9ce44a4bf7686db29c06c205039]|3eac409606fcd9ce44a4bf7686db29c06c205039| |Fix in FLINK-35574| |[fix(CompactionFilter): avoid expensive ToString call when not in Debug`|https://github.com/ververica/frocksdb/commit/698c9ca2c419c72145a2e6f5282a7860225b27a0]|698c9ca2c419c72145a2e6f5282a7860225b27a0|927b17e10d2112270ac30c4566238950baba4b7b|Already in *FrocksDB-8.10.0*| |[[FLINK-30457] Add periodic_compaction_seconds option to RocksJava|https://github.com/ververica/frocksdb/commit/ebed4b1326ca4c5c684b46813bdcb1164a669da1]|ebed4b1326ca4c5c684b46813bdcb1164a669da1|#8579|Already in *FrocksDB-8.10.0*| |[[hotfix] Add docs of how to upload ppc64le artifacts to s3|https://github.com/ververica/frocksdb/commit/de2ffe6ef0a11f856b89fb69a34bcdb4782130eb]|de2ffe6ef0a11f856b89fb69a34bcdb4782130eb|174639cf1e6080a8f8f37aec132b3a500428f913|Already in *FrocksDB-8.10.0*| |[[FLINK-33811] Fix the b
[jira] [Commented] (FLINK-33338) Bump FRocksDB version
[ https://issues.apache.org/jira/browse/FLINK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854337#comment-17854337 ] Yue Ma commented on FLINK-8: [~pnowojski] [~roman] [~srichter] As we discussed before, upgrading frocksdb involves many steps, so I have created a new ticket to track the upgrade process. https://issues.apache.org/jira/browse/FLINK-35573 > Bump FRocksDB version > - > > Key: FLINK-8 > URL: https://issues.apache.org/jira/browse/FLINK-8 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Assignee: Roman Khachatryan >Priority: Major > > We need to bump RocksDB in order to be able to use new IngestDB and ClipDB > commands. > If some of the required changes haven't been merged to Facebook/RocksDB, we > should cherry-pick and include them in our FRocksDB fork. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35582) Marking ingestDB as the default recovery mode for rescaling
Yue Ma created FLINK-35582: -- Summary: Marking ingestDB as the default recovery mode for rescaling Key: FLINK-35582 URL: https://issues.apache.org/jira/browse/FLINK-35582 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35580) Fix ingestDB recovery mode related bugs
Yue Ma created FLINK-35580: -- Summary: Fix ingestDB recovery mode related bugs Key: FLINK-35580 URL: https://issues.apache.org/jira/browse/FLINK-35580 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35581) Remove comments from the code related to ingestDB
Yue Ma created FLINK-35581: -- Summary: Remove comments from the code related to ingestDB Key: FLINK-35581 URL: https://issues.apache.org/jira/browse/FLINK-35581 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35579) Update the FrocksDB version in FLINK
Yue Ma created FLINK-35579: -- Summary: Update the FrocksDB version in FLINK Key: FLINK-35579 URL: https://issues.apache.org/jira/browse/FLINK-35579 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35578) Release Frocksdb-8.10.0 official products
Yue Ma created FLINK-35578: -- Summary: Release Frocksdb-8.10.0 official products Key: FLINK-35578 URL: https://issues.apache.org/jira/browse/FLINK-35578 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35576) FRocksdb cherry pick IngestDB related commits
[ https://issues.apache.org/jira/browse/FLINK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35576: --- Summary: FRocksdb cherry pick IngestDB related commits (was: FRocksdb Cherry pick IngestDB requires commit) > FRocksdb cherry pick IngestDB related commits > - > > Key: FLINK-35576 > URL: https://issues.apache.org/jira/browse/FLINK-35576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > We support the API related to ingest DB in FRocksDb-8.10.0, but many of the > fixes related to ingest DB were only integrated in the latest RocksDB > version. So we need to add these fixed commit cherryclicks to FRocksDB. > Mainly include: > [https://github.com/facebook/rocksdb/pull/11646] > [https://github.com/facebook/rocksdb/pull/11868] > [https://github.com/facebook/rocksdb/pull/11811] > [https://github.com/facebook/rocksdb/pull/11381] > [https://github.com/facebook/rocksdb/pull/11379] > [https://github.com/facebook/rocksdb/pull/11378] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35574) Setup base branch for FrocksDB-8.10
[ https://issues.apache.org/jira/browse/FLINK-35574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35574: --- Summary: Setup base branch for FrocksDB-8.10 (was: Set up base branch for FrocksDB-8.10) > Setup base branch for FrocksDB-8.10 > --- > > Key: FLINK-35574 > URL: https://issues.apache.org/jira/browse/FLINK-35574 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > As the first part of FLINK-35573, we need to prepare a base branch for > FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 > of the Rocksdb community. Then check pick the commit which used by Flink from > FRocksDB-6.20.3 to 8.10.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35577) Setup the CI environment for FRocksDB-8.10
Yue Ma created FLINK-35577: -- Summary: Setup the CI environment for FRocksDB-8.10 Key: FLINK-35577 URL: https://issues.apache.org/jira/browse/FLINK-35577 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35576) FRocksdb Cherry pick IngestDB requires commit
Yue Ma created FLINK-35576: -- Summary: FRocksdb Cherry pick IngestDB requires commit Key: FLINK-35576 URL: https://issues.apache.org/jira/browse/FLINK-35576 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 We support the API related to ingest DB in FRocksDb-8.10.0, but many of the fixes related to ingest DB were only integrated in the latest RocksDB version. So we need to add these fixed commit cherryclicks to FRocksDB. Mainly include: [https://github.com/facebook/rocksdb/pull/11646] [https://github.com/facebook/rocksdb/pull/11868] [https://github.com/facebook/rocksdb/pull/11811] [https://github.com/facebook/rocksdb/pull/11381] [https://github.com/facebook/rocksdb/pull/11379] [https://github.com/facebook/rocksdb/pull/11378] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35575) FRocksDB supports disabling perf context during compilation
Yue Ma created FLINK-35575: -- Summary: FRocksDB supports disabling perf context during compilation Key: FLINK-35575 URL: https://issues.apache.org/jira/browse/FLINK-35575 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 In FrocksDB 6 thread-local perf-context is disabled by reverting a specific commit (FLINK-19710). However, this creates conflicts and makes upgrading more difficult. We found that disabling *PERF_CONTEXT* can improve the performance of statebenchmark by about 5% and it doesn't create any conflicts. So we plan to supports disabling perf context during compilation in FRocksDB new version -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35574) Set up base branch for FrocksDB-8.10
Yue Ma created FLINK-35574: -- Summary: Set up base branch for FrocksDB-8.10 Key: FLINK-35574 URL: https://issues.apache.org/jira/browse/FLINK-35574 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 As the first part of FLINK-35573, we need to prepare a base branch for FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 of the Rocksdb community. Then check pick the commit which used by Flink from FRocksDB-6.20.3 to 8.10.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35573) [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0
Yue Ma created FLINK-35573: -- Summary: [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0 Key: FLINK-35573 URL: https://issues.apache.org/jira/browse/FLINK-35573 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 2.0.0 Reporter: Yue Ma Fix For: 2.0.0 The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0|https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] *_This FLIP proposes upgrading the version of FRocksDB in the Flink Project from 6.20.3 to 8.10.0._* _RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically._ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35573) [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0
[ https://issues.apache.org/jira/browse/FLINK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-35573: --- Description: The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] _RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically._ was: The FLIP: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0|https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] *_This FLIP proposes upgrading the version of FRocksDB in the Flink Project from 6.20.3 to 8.10.0._* _RocksDBStateBackend is widely used by Flink users in large state scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which mainly supported features such as support arm platform, deleteRange API, period compaction, etc. It has been a long time since then, and RocksDB has now been released to version 8.x. The main motivation for this upgrade is to leverage the features of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. While RocksDB is also continuously optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with RocksDB and upgrade it periodically._ > [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0 > - > > Key: FLINK-35573 > URL: https://issues.apache.org/jira/browse/FLINK-35573 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 2.0.0 >Reporter: Yue Ma >Priority: Major > Fix For: 2.0.0 > > > The FLIP: > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0] > > _RocksDBStateBackend is widely used by Flink users in large state > scenarios.The last upgrade of FRocksDB was in version Flink-1.14, which > mainly supported features such as support arm platform, deleteRange API, > period compaction, etc. It has been a long time since then, and RocksDB has > now been released to version 8.x. The main motivation for this upgrade is to > leverage the features of higher versions of Rocksdb to make Flink > RocksDBStateBackend more powerful. While RocksDB is also continuously > optimizing and bug fixing, we hope to keep FRocksDB more or less in sync with > RocksDB and upgrade it periodically._ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34430) Akka frame size exceeded with many ByteStreamStateHandle being used
[ https://issues.apache.org/jira/browse/FLINK-34430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818238#comment-17818238 ] Yue Ma commented on FLINK-34430: [~pnowojski] The reason for so many ssts here is that RocksDB cannot delete these files in auto compaction. Can we trigger a manual Comparison at the appropriate time to avoid having too many sst files and solve this problem from the root? [~srichter] Just like we choose to trigger async manual compaction after ingest db rescaling to delete extra keys > Akka frame size exceeded with many ByteStreamStateHandle being used > --- > > Key: FLINK-34430 > URL: https://issues.apache.org/jira/browse/FLINK-34430 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.16.3, 1.17.2, 1.19.0, 1.18.1 >Reporter: Piotr Nowojski >Assignee: Chesnay Schepler >Priority: Major > > The following error can happen > {noformat} > Discarding oversized payload sent to > Actor[akka.tcp://flink@/user/rpc/taskmanager_0#-]: max allowed size > 10485760 bytes, actual size of encoded class > org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation was 11212046 bytes. > > error.stack_trace > akka.remote.OversizedPayloadException: Discarding oversized payload sent to > Actor[akka.tcp://flink@/user/rpc/taskmanager_0#-]: max allowed size > 10485760 bytes, actual size of encoded class > org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation was 11212046 bytes. > {noformat} > when https://issues.apache.org/jira/browse/FLINK-26050 is causing large > amount of small sst files to be created and never deleted. If those files are > small enough to be handled by {{ByteStreamStateHandle}} akka frame size can > be exceeded. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34210) DefaultExecutionGraphBuilder#isCheckpointingEnabled may return Wrong Value when checkpoint disabled
[ https://issues.apache.org/jira/browse/FLINK-34210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-34210: --- Description: The *DefaultExecutionGraphBuilder* will call _{*}isCheckpointingEnabled{*}(JobGraph jobGraph)_ to determine whether the job has enabled Checkpoint and whether to initialize CheckpointCoordinator related components such as CheckpointCoordinator, CheckpointIDCounter , etc. {code:java} // DefaultExecutionGraphBuilder#isCheckpointingEnabled public static boolean isCheckpointingEnabled(JobGraph jobGraph) { return jobGraph.isCheckpointingEnabled(); }{code} The problem is that the logic for determining isCheckpointingEnable here is inaccurate, as *jobGraph. getCheckpointingSettings()* will not be NULL when checkpoint is not enabled, but with CheckpointCoordinatorConfiguration.DISABLED_CHECKPOINT_INTERVAL Interval {code:java} // JobGraph#isCheckpointingEnabled public boolean isCheckpointingEnabled() { if (snapshotSettings == null) { return false; } return snapshotSettings.getCheckpointCoordinatorConfiguration().isCheckpointingEnabled(); } {code} The method to fix this problem is also quite clear. We need to directly reuse the result of jobGraph.isCheckpointingEnable() here was: The *DefaultExecutionGraphBuilder* will call _isCheckpointingEnabled(JobGraph jobGraph)_ to determine whether the job has enabled Checkpoint and whether to initialize CheckpointCoordinator related components such as CheckpointCoordinator, CheckpointIDCounter , etc. The problem is that the logic for determining isCheckpointingEnable here is inaccurate, as *jobGraph. getCheckpointingSettings()* will not be NULL when checkpoint is not enabled, but with CheckpointCoordinatorConfiguration.DISABLED_CHECKPOINT_INTERVAL Interval The method to fix this problem is also quite clear. We need to directly reuse the result of jobGraph.isCheckpointingEnable() here > DefaultExecutionGraphBuilder#isCheckpointingEnabled may return Wrong Value > when checkpoint disabled > --- > > Key: FLINK-34210 > URL: https://issues.apache.org/jira/browse/FLINK-34210 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Yue Ma >Priority: Major > Labels: pull-request-available > > The *DefaultExecutionGraphBuilder* will call > _{*}isCheckpointingEnabled{*}(JobGraph jobGraph)_ > to determine whether the job has enabled Checkpoint and whether to initialize > CheckpointCoordinator related components such as CheckpointCoordinator, > CheckpointIDCounter , etc. > > {code:java} > // DefaultExecutionGraphBuilder#isCheckpointingEnabled > public static boolean isCheckpointingEnabled(JobGraph jobGraph) { > return jobGraph.isCheckpointingEnabled(); > }{code} > > The problem is that the logic for determining isCheckpointingEnable here is > inaccurate, as *jobGraph. getCheckpointingSettings()* will not be NULL when > checkpoint is not enabled, but with > CheckpointCoordinatorConfiguration.DISABLED_CHECKPOINT_INTERVAL Interval > > {code:java} > // JobGraph#isCheckpointingEnabled > public boolean isCheckpointingEnabled() { > if (snapshotSettings == null) { > return false; > } > return > snapshotSettings.getCheckpointCoordinatorConfiguration().isCheckpointingEnabled(); > } {code} > > > The method to fix this problem is also quite clear. We need to directly reuse > the result of jobGraph.isCheckpointingEnable() here > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34210) DefaultExecutionGraphBuilder#isCheckpointingEnabled may return Wrong Value when checkpoint disabled
Yue Ma created FLINK-34210: -- Summary: DefaultExecutionGraphBuilder#isCheckpointingEnabled may return Wrong Value when checkpoint disabled Key: FLINK-34210 URL: https://issues.apache.org/jira/browse/FLINK-34210 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.18.1, 1.19.0 Reporter: Yue Ma The *DefaultExecutionGraphBuilder* will call _isCheckpointingEnabled(JobGraph jobGraph)_ to determine whether the job has enabled Checkpoint and whether to initialize CheckpointCoordinator related components such as CheckpointCoordinator, CheckpointIDCounter , etc. The problem is that the logic for determining isCheckpointingEnable here is inaccurate, as *jobGraph. getCheckpointingSettings()* will not be NULL when checkpoint is not enabled, but with CheckpointCoordinatorConfiguration.DISABLED_CHECKPOINT_INTERVAL Interval The method to fix this problem is also quite clear. We need to directly reuse the result of jobGraph.isCheckpointingEnable() here -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809419#comment-17809419 ] Yue Ma commented on FLINK-33819: [~pnowojski] [~masteryhx] I conducted benchmark tests on Snappy and No_Compression. In the Value State test, I increased the number of setupKeyCounts to 1000 because the default value state test is too small, most of which may be in the memtable. The conclusion is as follows: Point LookUp operations can improve performance by over *130%+* Scan type operations can improve performance by *60%* The performance of write operations has no impact with before The size of valueState has increased from *78M to 96M* after disabling compression The size of mapState has increased from *58m to 86m* (Perhaps this data is for reference only, as the actual compression ratio depends on the characteristics of the business data) |Benchmark|(backendType)|Mode|Score (Snappy)|Score(NoCompression)|Units|performance benfit| |MapStateBenchmark.mapAdd|ROCKSDB|thrpt|654.362|679.276|ops/ms|{color:#FF}3.80737267750878%{color}| |MapStateBenchmark.mapContains|ROCKSDB|thrpt|104.57|297.213|ops/ms|{color:#FF}184.223964808262%{color}| |MapStateBenchmark.mapEntries|ROCKSDB|thrpt|573.153|933.967|ops/ms|{color:#FF}62.9524751680616%{color}| |MapStateBenchmark.mapGet|ROCKSDB|thrpt|106.288|330.821|ops/ms|{color:#FF}211.249623664007%{color}| |MapStateBenchmark.mapIsEmpty|ROCKSDB|thrpt|88.642|207.76|ops/ms|{color:#FF}134.380993208637%{color}| |MapStateBenchmark.mapIterator|ROCKSDB|thrpt|572.848|912.097|ops/ms|{color:#FF}59.2214688713236%{color}| |MapStateBenchmark.mapKeys|ROCKSDB|thrpt|580.244|949.094|ops/ms|{color:#FF}63.568085150385%{color}| |MapStateBenchmark.mapPutAll|ROCKSDB|thrpt|129.965|130.054|ops/ms|{color:#FF}0.0684799753779853%{color}| |MapStateBenchmark.mapRemove|ROCKSDB|thrpt|723.835|785.637|ops/ms|{color:#FF}8.53813369068916%{color}| |MapStateBenchmark.mapUpdate|ROCKSDB|thrpt|697.409|652.893|ops/ms|{color:#FF}-6.38305499355471%{color}| |MapStateBenchmark.mapValues|ROCKSDB|thrpt|579.399|935.651|ops/ms|{color:#FF}61.4864713263226%{color}| |ValueStateBenchmark.valueAdd|ROCKSDB|thrpt|645.081|636.098|ops/ms|{color:#FF}-1.39253830139162%{color}| |ValueStateBenchmark.valueGet|ROCKSDB|thrpt|103.393|297.646|ops/ms|{color:#FF}187.87828963276%{color}| |ValueStateBenchmark.valueUpdate|ROCKSDB|thrpt|560.153|621.502|ops/ms|{color:#FF}10.9521862776777%{color}| | | | | | | | | | | | | | | | | | |DBSize (SnappyCompression)|DBSize (NoCompression)| | | | | |ValueStateBenchMark|78M|96M| | | | | |MapStateBenchMark|58M|86M| | | | | > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34050) Rocksdb state has space amplification after rescaling with DeleteRange
[ https://issues.apache.org/jira/browse/FLINK-34050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805392#comment-17805392 ] Yue Ma commented on FLINK-34050: Hi [~lijinzhong] Thanks for reporting this issue. We have also encountered it before. I think this is a great suggestion. Overall, this is still a trade off of time and space If recovery time is the most important, then we can use deleteRange If we want to achieve good recovery time and space amplification, then we can use deleteRange+deleteFilesInRanges If space enlargement is very important, then we can consider deleteRange+deleteFilesInRanges+CompactRanges (Of course, perhaps we can see if there are other ways to change space reclamation to an asynchronous process) > Rocksdb state has space amplification after rescaling with DeleteRange > -- > > Key: FLINK-34050 > URL: https://issues.apache.org/jira/browse/FLINK-34050 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Jinzhong Li >Priority: Major > Attachments: image-2024-01-10-21-23-48-134.png, > image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png > > > FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will > cause space amplification in some case. > We can reproduce this problem using wordCount job: > 1) before rescaling, state operator in wordCount job has 2 parallelism and > 4G+ full checkpoint size; > !image-2024-01-10-21-24-10-983.png|width=266,height=130! > 2) then restart job with 4 parallelism (for state operator), the full > checkpoint size of new job will be 8G+ ; > 3) after many successful checkpoints, the full checkpoint size is still 8G+; > !image-2024-01-10-21-28-24-312.png|width=454,height=111! > > The root cause of this issue is that the deleted keyGroupRange does not > overlap with current DB keyGroupRange, so new data written into rocksdb after > rescaling almost never do LSM compaction with the deleted data (belonging to > other keyGroupRange.) > > And the space amplification may affect Rocksdb read performance and disk > space usage after rescaling. It looks like a regression due to the > introduction of deleteRange for rescaling optimization. > > To slove this problem, I think maybe we can invoke > Rocksdb.deleteFilesInRanges after deleteRange? > {code:java} > public static void clipDBWithKeyGroupRange() { > //... > List ranges = new ArrayList<>(); > //... > deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes); > ranges.add(beginKeyGroupBytes); > ranges.add(endKeyGroupBytes); > // > for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) { > db.deleteFilesInRanges(columnFamilyHandle, ranges, false); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33337) Expose IngestDB and ClipDB in the official RocksDB API
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-7: --- Attachment: image-2024-01-11-12-03-14-308.png > Expose IngestDB and ClipDB in the official RocksDB API > -- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Assignee: Yue Ma >Priority: Major > Attachments: image-2024-01-11-12-03-14-308.png > > > Remaining open PRs: > None :) > Already merged PRs: > https://github.com/facebook/rocksdb/pull/11646 > https://github.com/facebook/rocksdb/pull/11868 > https://github.com/facebook/rocksdb/pull/11811 > https://github.com/facebook/rocksdb/pull/11381 > https://github.com/facebook/rocksdb/pull/11379 > https://github.com/facebook/rocksdb/pull/11378 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33337) Expose IngestDB and ClipDB in the official RocksDB API
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805364#comment-17805364 ] Yue Ma commented on FLINK-7: Update: https://github.com/facebook/rocksdb/pull/12219 This PR fixes a bug that may cause incorrect ClipDB results > Expose IngestDB and ClipDB in the official RocksDB API > -- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Assignee: Yue Ma >Priority: Major > Attachments: image-2024-01-11-12-03-14-308.png > > > Remaining open PRs: > None :) > Already merged PRs: > https://github.com/facebook/rocksdb/pull/11646 > https://github.com/facebook/rocksdb/pull/11868 > https://github.com/facebook/rocksdb/pull/11811 > https://github.com/facebook/rocksdb/pull/11381 > https://github.com/facebook/rocksdb/pull/11379 > https://github.com/facebook/rocksdb/pull/11378 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33946) RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel
[ https://issues.apache.org/jira/browse/FLINK-33946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805355#comment-17805355 ] Yue Ma commented on FLINK-33946: [~masteryhx] thanks , I would like to take this and I'll draft the pr soon > RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel > > > Key: FLINK-33946 > URL: https://issues.apache.org/jira/browse/FLINK-33946 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.19.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > > When a Job fails, the task needs to be canceled and re-deployed. > RocksDBStatebackend will call RocksDB.close when disposing. > {code:java} > if (!shutting_down_.load(std::memory_order_acquire) && > has_unpersisted_data_.load(std::memory_order_relaxed) && > !mutable_db_options_.avoid_flush_during_shutdown) { > if (immutable_db_options_.atomic_flush) { > autovector cfds; > SelectColumnFamiliesForAtomicFlush(&cfds); > mutex_.Unlock(); > Status s = > AtomicFlushMemTables(cfds, FlushOptions(), FlushReason::kShutDown); > s.PermitUncheckedError(); //**TODO: What to do on error? > mutex_.Lock(); > } else { > for (auto cfd : *versions_->GetColumnFamilySet()) { > if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) { > cfd->Ref(); > mutex_.Unlock(); > Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown); > s.PermitUncheckedError(); //**TODO: What to do on error? > mutex_.Lock(); > cfd->UnrefAndTryDelete(); > } > } > } {code} > By default (avoid_flush_during_shutdown=false) RocksDb requires FlushMemtable > when Close. When the disk pressure is high or the Memtable is large, this > process will be more time-consuming, which will cause the Task to get stuck > in the Canceling stage and affect the speed of job Failover. > In fact, it is completely unnecessary to Flush memtable when Flink Task is > Close, because the data can be replayed from Checkpoint. So we can set > avoid_flush_during_shutdown to true to speed up Task Failover -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805354#comment-17805354 ] Yue Ma commented on FLINK-33819: [~masteryhx] thanks , I would like to take this and I'll draft the pr soon > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33946) RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel
[ https://issues.apache.org/jira/browse/FLINK-33946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805000#comment-17805000 ] Yue Ma commented on FLINK-33946: [~masteryhx] could you please take a look at this ticket ? > RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel > > > Key: FLINK-33946 > URL: https://issues.apache.org/jira/browse/FLINK-33946 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.19.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > > When a Job fails, the task needs to be canceled and re-deployed. > RocksDBStatebackend will call RocksDB.close when disposing. > {code:java} > if (!shutting_down_.load(std::memory_order_acquire) && > has_unpersisted_data_.load(std::memory_order_relaxed) && > !mutable_db_options_.avoid_flush_during_shutdown) { > if (immutable_db_options_.atomic_flush) { > autovector cfds; > SelectColumnFamiliesForAtomicFlush(&cfds); > mutex_.Unlock(); > Status s = > AtomicFlushMemTables(cfds, FlushOptions(), FlushReason::kShutDown); > s.PermitUncheckedError(); //**TODO: What to do on error? > mutex_.Lock(); > } else { > for (auto cfd : *versions_->GetColumnFamilySet()) { > if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) { > cfd->Ref(); > mutex_.Unlock(); > Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown); > s.PermitUncheckedError(); //**TODO: What to do on error? > mutex_.Lock(); > cfd->UnrefAndTryDelete(); > } > } > } {code} > By default (avoid_flush_during_shutdown=false) RocksDb requires FlushMemtable > when Close. When the disk pressure is high or the Memtable is large, this > process will be more time-consuming, which will cause the Task to get stuck > in the Canceling stage and affect the speed of job Failover. > In fact, it is completely unnecessary to Flush memtable when Flink Task is > Close, because the data can be replayed from Checkpoint. So we can set > avoid_flush_during_shutdown to true to speed up Task Failover -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804999#comment-17804999 ] Yue Ma commented on FLINK-33819: [~masteryhx] [~pnowojski] [~srichter] could you please take a look at this ticket ? > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33946) RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel
Yue Ma created FLINK-33946: -- Summary: RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel Key: FLINK-33946 URL: https://issues.apache.org/jira/browse/FLINK-33946 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.19.0 Reporter: Yue Ma Fix For: 1.19.0 When a Job fails, the task needs to be canceled and re-deployed. RocksDBStatebackend will call RocksDB.close when disposing. {code:java} if (!shutting_down_.load(std::memory_order_acquire) && has_unpersisted_data_.load(std::memory_order_relaxed) && !mutable_db_options_.avoid_flush_during_shutdown) { if (immutable_db_options_.atomic_flush) { autovector cfds; SelectColumnFamiliesForAtomicFlush(&cfds); mutex_.Unlock(); Status s = AtomicFlushMemTables(cfds, FlushOptions(), FlushReason::kShutDown); s.PermitUncheckedError(); //**TODO: What to do on error? mutex_.Lock(); } else { for (auto cfd : *versions_->GetColumnFamilySet()) { if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) { cfd->Ref(); mutex_.Unlock(); Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown); s.PermitUncheckedError(); //**TODO: What to do on error? mutex_.Lock(); cfd->UnrefAndTryDelete(); } } } {code} By default (avoid_flush_during_shutdown=false) RocksDb requires FlushMemtable when Close. When the disk pressure is high or the Memtable is large, this process will be more time-consuming, which will cause the Task to get stuck in the Canceling stage and affect the speed of job Failover. In fact, it is completely unnecessary to Flush memtable when Flink Task is Close, because the data can be replayed from Checkpoint. So we can set avoid_flush_during_shutdown to true to speed up Task Failover -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798083#comment-17798083 ] Yue Ma commented on FLINK-33819: [~masteryhx] Thanks for replying. {quote}Linked FLINK-20684 as it has been discussed before. {quote} Sorry for missing the previous ticket and creating the duplicated one {quote}Linked FLINK-11313 which talks about the LZ4 Compression which should be more usable than Snappy. {quote} We did not use LZ4 in the production environment, and for small state jobs, we directly turn off the compression. And for job with large states, we adopted a compression algorithm based on Snappy optimization. {quote}Do you have some test results on it ? {quote} Yes. we did some benchmark test for the *SnappyCompression* and *NoCompression* And result show that and Read Performance. After turning off Compression, State Benchmark read performance can be improved by *80% to 100%* We also conducted end-to-end online job testing, and after turning off Compression, {*}the CPU usage of the job decreased by 16%{*}, while the Checkpoint Total Size increased by *4-5 times.* It is obvious that closing Compression is not only about benefits, but also brings some space amplification. What I want to express is that we may need to provide such a configuration for users to balance how to exchange space for time {quote}BTW, If we'd like to introduce such a option, it's better to guarantee the compalibility. {quote} Sorry, I didn't understand the compatibility issue here. I understand that it is compatible here. After switching the Compression Type, the newly generated file will be compressed using the new Compress Type, and the existing file can still be read and written with old Compress Type. > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-33819: --- Summary: Support setting CompressType in RocksDBStateBackend (was: Suppor setting CompressType in RocksDBStateBackend) > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796527#comment-17796527 ] Yue Ma commented on FLINK-27681: [~pnowojski] Thanks for your suggestion, I think this is a perfect solution. But it sounds like there is still a long way to go to implement this plan. Do we have any specific plans to do this? > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33819) Suppor setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-33819: --- Description: Currently, RocksDBStateBackend does not support setting the compression level, and Snappy is used for compression by default. But we have some scenarios where compression will use a lot of CPU resources. Turning off compression can significantly reduce CPU overhead. So we may need to support a parameter for users to set the CompressType of Rocksdb. !image-2023-12-14-11-35-22-306.png! was: Currently, RocksDBStateBackend does not support setting the compression level, and Snappy is used for compression by default. But we have some scenarios where compression will use a lot of CPU resources. Turning off compression can significantly reduce CPU overhead. So we may need to support a parameter for users to set the CompressType of Rocksdb. !https://internal-api-drive-stream.larkoffice.com/space/api/box/stream/download/preview/ALADbWTMGoD6WexSFGecz2Olnrb/?preview_type=16! > Suppor setting CompressType in RocksDBStateBackend > -- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33819) Suppor setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-33819: --- Attachment: image-2023-12-14-11-35-22-306.png > Suppor setting CompressType in RocksDBStateBackend > -- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > > !https://internal-api-drive-stream.larkoffice.com/space/api/box/stream/download/preview/ALADbWTMGoD6WexSFGecz2Olnrb/?preview_type=16! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33819) Suppor setting CompressType in RocksDBStateBackend
Yue Ma created FLINK-33819: -- Summary: Suppor setting CompressType in RocksDBStateBackend Key: FLINK-33819 URL: https://issues.apache.org/jira/browse/FLINK-33819 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.18.0 Reporter: Yue Ma Fix For: 1.19.0 Attachments: image-2023-12-14-11-32-32-968.png Currently, RocksDBStateBackend does not support setting the compression level, and Snappy is used for compression by default. But we have some scenarios where compression will use a lot of CPU resources. Turning off compression can significantly reduce CPU overhead. So we may need to support a parameter for users to set the CompressType of Rocksdb. !https://internal-api-drive-stream.larkoffice.com/space/api/box/stream/download/preview/ALADbWTMGoD6WexSFGecz2Olnrb/?preview_type=16! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33338) Bump FRocksDB version
[ https://issues.apache.org/jira/browse/FLINK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794507#comment-17794507 ] Yue Ma commented on FLINK-8: [~Yanfei Lei] Thank you very much for your help. What I want to know is whether the performance regression is due to the fact that the commit https://issues.apache.org/jira/browse/FLINK-19710 has not been cherry-picked? I am willing to help analyze the reasons for performance regression. > Bump FRocksDB version > - > > Key: FLINK-8 > URL: https://issues.apache.org/jira/browse/FLINK-8 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Priority: Major > > We need to bump RocksDB in order to be able to use new IngestDB and ClipDB > commands. > If some of the required changes haven't been merged to Facebook/RocksDB, we > should cherry-pick and include them in our FRocksDB fork. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794505#comment-17794505 ] Yue Ma commented on FLINK-27681: {quote}And the corrupted file maybe just uploaded to remote storage without any check like reading block checksum when checkpoint if we don't check it manually. {quote} So I understand that we can do this manual check in this ticket first. If the file is detected to be corrupted, we can fail the job. Is this a good choice? > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793488#comment-17793488 ] Yue Ma commented on FLINK-27681: [~masteryhx] [~pnowojski] [~fanrui] Thanks for the discussion So, IIUC, currently I think the consensus conclusion is that we need to make the job fail if there is file corruption on check , right ? For now failure in the checkpoint asynchronous phase will not cause the job to fail. Should we open another ticket to support the ability to "fail the job if some special exception is occured during the checkpoint asynchronous phase"? > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33672) Use MapState.entries() instead of keys() and get() in over window
[ https://issues.apache.org/jira/browse/FLINK-33672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790464#comment-17790464 ] Yue Ma commented on FLINK-33672: Thanks to [~Zakelly] for discovering this optimization point. I think this is effective and can reduce the time consumption of rocksdb.get() > Use MapState.entries() instead of keys() and get() in over window > - > > Key: FLINK-33672 > URL: https://issues.apache.org/jira/browse/FLINK-33672 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Reporter: Zakelly Lan >Priority: Major > > In code logic related with over windows, such as > org.apache.flink.table.runtime.operators.over.ProcTimeRangeBoundedPrecedingFunction > {code:java} > private transient MapState> inputState; > public void onTimer( > long timestamp, > KeyedProcessFunction.OnTimerContext ctx, > Collector out) > throws Exception { > //... > Iterator iter = inputState.keys().iterator(); > //... > while (iter.hasNext()) { > Long elementKey = iter.next(); > if (elementKey < limit) { > // element key outside of window. Retract values > List elementsRemove = inputState.get(elementKey); > // ... > } > } > //... > } {code} > As we can see, there is a combination of key iteration and get the value for > iterated key from inputState. However for RocksDB, the key iteration calls > entry iteration, which means actually we could replace it by entry iteration > without introducing any extra overhead. And as a result, we could save a > function call of get() by using getValue() of iterated entry at very low cost. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790359#comment-17790359 ] Yue Ma commented on FLINK-27681: {quote}Fail job directly is fine for me, but I guess the PR doesn't fail the job, it just fails the current checkpoint, right? {quote} I think it may be used together with the {*}execution.checkpointing.tolerable-failed-checkpoints{*}, or generally speaking, if it is a high-quality job, users will also pay attention to whether the cp production is successful. {quote}could you provide some simple benchmark here? {quote} I did some testing on my local machine. It takes about 60 to 70ms to check a 64M sst file. Checking a 10GB rocksdb instance takes about 10 seconds. More detailed testing may be needed later. > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789994#comment-17789994 ] Yue Ma commented on FLINK-27681: [~masteryhx] Thanks for the explaining , I agree with it > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789892#comment-17789892 ] Yue Ma commented on FLINK-27681: Hi [~pnowojski] and [~fanrui] , thanks for your repiles. {quote}If one file is uploaded to hdfs in the previous checkpoint, and it's corrupted now {quote} If the file uploaded in HDFS is good, but it may be corrupted by local disk after download during data processing, can this problem be solved by scheduling the TM to another machine after Failover ? Is it more important to ensure that the Checkpoint data on HDFS is available. ? BTW, we don’t seem to have encountered this situation in our actual production environment. I don’t know if you have actually encountered it, or whether we still need to consider this situation. > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789390#comment-17789390 ] Yue Ma commented on FLINK-27681: [~fanrui] {quote}But I still don't know why the file is corrupted, would you mind describing it in detail? {quote} In our production environment, most files are damaged due to hardware failures on the machine where the file is written, such as (Memory CE or SSD disk hardware failure). Under the default Rocksdb option, after a damaged SST is created, if there is no Compaction or Get/Iterator to access this file, DB can always run normally. But when the task fails and recovers from Checkpoint, there may be other Get requests or Compactions that will read this file, and the task will fail at this time. {quote} Is it possible that file corruption occurs after flink check but before uploading the file to hdfs? {quote} Strictly speaking, I think it is possible for file corruption to occur during the process of uploading and downloading to local. It might be better if Flink can add the file verification mechanism during Checkpoint upload and download processes. But as far as I know, most DFSs have data verification mechanisms, so at least we have not encountered this situation in our production environment. Most file corruption occurs before being uploaded to HDFS. > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33338) Bump FRocksDB version
[ https://issues.apache.org/jira/browse/FLINK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789345#comment-17789345 ] Yue Ma commented on FLINK-8: [~pnowojski] Update: Now All the changes we need are released in {*}8.9.0(11/17/203) [HISTORY.md|{*}{*}https://github.com/facebook/rocksdb/blob/main/HISTORY.md{*}{*}]{*} > Bump FRocksDB version > - > > Key: FLINK-8 > URL: https://issues.apache.org/jira/browse/FLINK-8 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Priority: Major > > We need to bump RocksDB in order to be able to use new IngestDB and ClipDB > commands. > If some of the required changes haven't been merged to Facebook/RocksDB, we > should cherry-pick and include them in our FRocksDB fork. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33338) Bump FRocksDB version
[ https://issues.apache.org/jira/browse/FLINK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789345#comment-17789345 ] Yue Ma edited comment on FLINK-8 at 11/24/23 6:49 AM: -- [~pnowojski] Update: Now All the changes we need are released in *8.9.0(11/17/203) [HISTORY.md|https://github.com/facebook/rocksdb/blob/main/HISTORY.md]* was (Author: mayuehappy): [~pnowojski] Update: Now All the changes we need are released in {*}8.9.0(11/17/203) [HISTORY.md|{*}{*}https://github.com/facebook/rocksdb/blob/main/HISTORY.md{*}{*}]{*} > Bump FRocksDB version > - > > Key: FLINK-8 > URL: https://issues.apache.org/jira/browse/FLINK-8 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Priority: Major > > We need to bump RocksDB in order to be able to use new IngestDB and ClipDB > commands. > If some of the required changes haven't been merged to Facebook/RocksDB, we > should cherry-pick and include them in our FRocksDB fork. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788339#comment-17788339 ] Yue Ma commented on FLINK-27681: [~masteryhx] Sorry for the late reply, I submitted a draft PR, please take a look when you have time. > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Assignee: Yue Ma >Priority: Critical > Labels: pull-request-available > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33337) Expose IngestDB and ClipDB in the official RocksDB API
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783984#comment-17783984 ] Yue Ma commented on FLINK-7: [~pnowojski] yes ~ all of the required changes were merged into facebook/rocksdb. I plan to create a version of frocksdb with the latest rocksdb code to test the functionality and performance of ingestdb > Expose IngestDB and ClipDB in the official RocksDB API > -- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Assignee: Yue Ma >Priority: Major > > Remaining open PRs: > None :) > Already merged PRs: > https://github.com/facebook/rocksdb/pull/11646 > https://github.com/facebook/rocksdb/pull/11868 > https://github.com/facebook/rocksdb/pull/11811 > https://github.com/facebook/rocksdb/pull/11381 > https://github.com/facebook/rocksdb/pull/11379 > https://github.com/facebook/rocksdb/pull/11378 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33337) Expose IngestDB and ClipDB in the official RocksDB API
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783463#comment-17783463 ] Yue Ma commented on FLINK-7: Update: [https://github.com/facebook/rocksdb/pull/11646] merged in [{{19768a9}}|https://github.com/facebook/rocksdb/commit/19768a923a814a7510423b57329c50587362541e]. > Expose IngestDB and ClipDB in the official RocksDB API > -- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Assignee: Yue Ma >Priority: Major > > Remaining open PRs: > https://github.com/facebook/rocksdb/pull/11646 > Already merged PRs: > https://github.com/facebook/rocksdb/pull/11868 > https://github.com/facebook/rocksdb/pull/11811 > https://github.com/facebook/rocksdb/pull/11381 > https://github.com/facebook/rocksdb/pull/11379 > https://github.com/facebook/rocksdb/pull/11378 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33337) Expose IngestDB and ClipDB in the official RocksDB API
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783089#comment-17783089 ] Yue Ma commented on FLINK-7: Update: [https://github.com/facebook/rocksdb/pull/11868] merged in [https://github.com/facebook/rocksdb/commit/8e1adab5cecad129131a4eceabe645b9442acb9c] > Expose IngestDB and ClipDB in the official RocksDB API > -- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Piotr Nowojski >Assignee: Yue Ma >Priority: Major > > Remaining open PRs: > https://github.com/facebook/rocksdb/pull/11646 > https://github.com/facebook/rocksdb/pull/11868 > Already merged PRs: > https://github.com/facebook/rocksdb/pull/11811 > https://github.com/facebook/rocksdb/pull/11381 > https://github.com/facebook/rocksdb/pull/11379 > https://github.com/facebook/rocksdb/pull/11378 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-26050) Too many small sst files in rocksdb state backend when using processing time window
[ https://issues.apache.org/jira/browse/FLINK-26050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780918#comment-17780918 ] Yue Ma commented on FLINK-26050: [~wzqiang1332] I think may be you can try with Periodic compaction . h[ttps://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state] {noformat} Periodic compaction could speed up expired state entries cleanup, especially for state entries rarely accessed. Files older than this value will be picked up for compaction, and re-written to the same level as they were before. It makes sure a file goes through compaction filters periodically. You can change it and pass a custom value to StateTtlConfig.newBuilder(...).cleanupInRocksdbCompactFilter(long queryTimeAfterNumEntries, Time periodicCompactionTime) method. The default value of Periodic compaction seconds is 30 days. You could set it to 0 to turn off periodic compaction or set a small value to speed up expired state entries cleanup, but it would trigger more compactions.{noformat} > Too many small sst files in rocksdb state backend when using processing time > window > --- > > Key: FLINK-26050 > URL: https://issues.apache.org/jira/browse/FLINK-26050 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.2, 1.14.3 >Reporter: shen >Priority: Major > Attachments: image-2022-02-09-21-22-13-920.png, > image-2022-02-11-10-32-14-956.png, image-2022-02-11-10-36-46-630.png, > image-2022-02-14-13-04-52-325.png > > > When using processing time window, in some workload, there will be a lot of > small sst files(serveral KB) in rocksdb local directory and may cause "Too > many files error". > Use rocksdb tool ldb to find out content in sst files: > * column family of these small sst files is "processing_window-timers". > * most sst files are in level-1. > * records in sst files are almost kTypeDeletion. > * creation time of sst file correspond to checkpoint interval. > These small sst files seem to be generated when flink checkpoint is > triggered. Although all content in sst are delete tags, they are not > compacted and deleted in rocksdb compaction because of not intersecting with > each other(rocksdb [compaction trivial > move|https://github.com/facebook/rocksdb/wiki/Compaction-Trivial-Move]). And > there seems to be no chance to delete them because of small size and not > intersect with other sst files. > > I will attach a simple program to reproduce the problem. > > Since timer in processing time window is generated in strictly ascending > order(both put and delete). So If workload of job happen to generate level-0 > sst files not intersect with each other.(for example: processing window size > much smaller than checkpoint interval, and no window content cross checkpoint > interval or no new data in window crossing checkpoint interval). There will > be many small sst files generated until job restored from savepoint, or > incremental checkpoint is disabled. > > May be similar problem exists when user use timer in operators with same > workload. > > Code to reproduce the problem: > {code:java} > package org.apache.flink.jira; > import lombok.extern.slf4j.Slf4j; > import org.apache.flink.configuration.Configuration; > import org.apache.flink.configuration.RestOptions; > import org.apache.flink.configuration.TaskManagerOptions; > import org.apache.flink.contrib.streaming.state.RocksDBStateBackend; > import org.apache.flink.streaming.api.TimeCharacteristic; > import org.apache.flink.streaming.api.checkpoint.ListCheckpointed; > import org.apache.flink.streaming.api.datastream.DataStreamSource; > import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; > import org.apache.flink.streaming.api.functions.source.SourceFunction; > import > org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction; > import > org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; > import org.apache.flink.streaming.api.windowing.time.Time; > import org.apache.flink.streaming.api.windowing.windows.TimeWindow; > import org.apache.flink.util.Collector; > import java.util.Collections; > import java.util.List; > import java.util.Random; > @Slf4j > public class StreamApp { > public static void main(String[] args) throws Exception { > Configuration config = new Configuration(); > config.set(RestOptions.ADDRESS, "127.0.0.1"); > config.set(RestOptions.PORT, 10086); > config.set(TaskManagerOptions.NUM_TASK_SLOTS, 6); > new > StreamApp().configureApp(StreamExecutionEnvironment.createLocalEnvironm
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780811#comment-17780811 ] Yue Ma commented on FLINK-27681: [~masteryhx] If we want to only check the checksum of incremental sst, can we use the SstFileReader.verifyChecksum() ? https://github.com/ververica/frocksdb/blob/FRocksDB-6.20.3/java/src/main/java/org/rocksdb/SstFileReader.java > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Priority: Critical > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33126) Fix EventTimeAllWindowCheckpointingITCase jobName typo
Yue Ma created FLINK-33126: -- Summary: Fix EventTimeAllWindowCheckpointingITCase jobName typo Key: FLINK-33126 URL: https://issues.apache.org/jira/browse/FLINK-33126 Project: Flink Issue Type: Improvement Components: Tests Affects Versions: 1.17.1 Reporter: Yue Ma Fix EventTimeAllWindowCheckpointingITCase jobName Typo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33126) Fix EventTimeAllWindowCheckpointingITCase jobName typo
[ https://issues.apache.org/jira/browse/FLINK-33126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-33126: --- Issue Type: Bug (was: Improvement) > Fix EventTimeAllWindowCheckpointingITCase jobName typo > -- > > Key: FLINK-33126 > URL: https://issues.apache.org/jira/browse/FLINK-33126 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.17.1 >Reporter: Yue Ma >Priority: Minor > > Fix EventTimeAllWindowCheckpointingITCase jobName Typo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757850#comment-17757850 ] Yue Ma commented on FLINK-27681: [~masteryhx] I'd like to contribute it I think there might be two ways to check if the file data is correct, One is to check whether the file data is correct by setting DBOptions#setParanoidChecks to read the file after it is generated, but this may cause some read amplification and CPU overhead. another one is Manually call db.VerifyChecksum() to check the correctness of the file when making checkpoint ? WDYT ? > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Priority: Critical > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
[ https://issues.apache.org/jira/browse/FLINK-27681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-27681: --- Attachment: image-2023-08-23-15-06-16-717.png > Improve the availability of Flink when the RocksDB file is corrupted. > - > > Key: FLINK-27681 > URL: https://issues.apache.org/jira/browse/FLINK-27681 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Ming Li >Priority: Critical > Attachments: image-2023-08-23-15-06-16-717.png > > > We have encountered several times when the RocksDB checksum does not match or > the block verification fails when the job is restored. The reason for this > situation is generally that there are some problems with the machine where > the task is located, which causes the files uploaded to HDFS to be incorrect, > but it has been a long time (a dozen minutes to half an hour) when we found > this problem. I'm not sure if anyone else has had a similar problem. > Since this file is referenced by incremental checkpoints for a long time, > when the maximum number of checkpoints reserved is exceeded, we can only use > this file until it is no longer referenced. When the job failed, it cannot be > recovered. > Therefore we consider: > 1. Can RocksDB periodically check whether all files are correct and find the > problem in time? > 2. Can Flink automatically roll back to the previous checkpoint when there is > a problem with the checkpoint data, because even with manual intervention, it > just tries to recover from the existing checkpoint or discard the entire > state. > 3. Can we increase the maximum number of references to a file based on the > maximum number of checkpoints reserved? When the number of references exceeds > the maximum number of checkpoints -1, the Task side is required to upload a > new file for this reference. Not sure if this way will ensure that the new > file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32833) Rocksdb CacheIndexAndFilterBlocks must be true when using shared memory
[ https://issues.apache.org/jira/browse/FLINK-32833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753650#comment-17753650 ] Yue Ma commented on FLINK-32833: hi [~yunta] thank you for your replying. I understand the original purpose of these codes. These configurations can help we limit memory usage better. What I want to say is that the user may need to decide to configure these parameters instead of hardcoding them in the flink code. In our previous tests, we found that indexAndFiltlers has a great impact on performance, especially in hdd environment. In the current flink version, if the user uses shared memory at the same time and wants to ensure high performance, it may also need to set an appropriate WRITE_BUFFER_RATIO or HIGH_PRIORITY_POOL_RATIO , which may be difficult for the user mode. In other words, if the user only wants to put the datablock in the cache, and wants the meta information of indexAndFilter to be resident in memory, it also sounds reasonable. What do you think ? > Rocksdb CacheIndexAndFilterBlocks must be true when using shared memory > --- > > Key: FLINK-32833 > URL: https://issues.apache.org/jira/browse/FLINK-32833 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.17.1 >Reporter: Yue Ma >Priority: Major > > Currently in RocksDBResourceContainer#getColumnOptions, if sharedResources is > used, blockBasedTableConfig will add the following configuration by default. > {code:java} > blockBasedTableConfig.setBlockCache(blockCache); > blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); > blockBasedTableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true); > blockBasedTableConfig.setPinL0FilterAndIndexBlocksInCache(true);{code} > In my understanding, these configurations can help flink better manage the > memory of rocksdb and save some memory overhead in some scenarios. But this > may not be the best practice, mainly for the following reasons: > 1. After CacheIndexAndFilterBlocks is set to true, it may cause index and > filter miss when reading, resulting in performance degradation. > 2. These parameters may not be bound together with whether shared memory is > used, or some configurations should be supported separately to decide whether > to enable these features -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32833) Rocksdb CacheIndexAndFilterBlocks must be true when using shared memory
Yue Ma created FLINK-32833: -- Summary: Rocksdb CacheIndexAndFilterBlocks must be true when using shared memory Key: FLINK-32833 URL: https://issues.apache.org/jira/browse/FLINK-32833 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.17.1 Reporter: Yue Ma Currently in RocksDBResourceContainer#getColumnOptions, if sharedResources is used, blockBasedTableConfig will add the following configuration by default. blockBasedTableConfig.setBlockCache(blockCache); blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); blockBasedTableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true); blockBasedTableConfig.setPinL0FilterAndIndexBlocksInCache(true); In my understanding, these configurations can help flink better manage the memory of rocksdb and save some memory overhead in some scenarios. But this may not be the best practice, mainly for the following reasons: 1. After CacheIndexAndFilterBlocks is set to true, it may cause index and filter miss when reading, resulting in performance degradation. 2. These parameters may not be bound together with whether shared memory is used, or some configurations should be supported separately to decide whether to enable these features -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32833) Rocksdb CacheIndexAndFilterBlocks must be true when using shared memory
[ https://issues.apache.org/jira/browse/FLINK-32833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-32833: --- Description: Currently in RocksDBResourceContainer#getColumnOptions, if sharedResources is used, blockBasedTableConfig will add the following configuration by default. {code:java} blockBasedTableConfig.setBlockCache(blockCache); blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); blockBasedTableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true); blockBasedTableConfig.setPinL0FilterAndIndexBlocksInCache(true);{code} In my understanding, these configurations can help flink better manage the memory of rocksdb and save some memory overhead in some scenarios. But this may not be the best practice, mainly for the following reasons: 1. After CacheIndexAndFilterBlocks is set to true, it may cause index and filter miss when reading, resulting in performance degradation. 2. These parameters may not be bound together with whether shared memory is used, or some configurations should be supported separately to decide whether to enable these features was: Currently in RocksDBResourceContainer#getColumnOptions, if sharedResources is used, blockBasedTableConfig will add the following configuration by default. blockBasedTableConfig.setBlockCache(blockCache); blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); blockBasedTableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true); blockBasedTableConfig.setPinL0FilterAndIndexBlocksInCache(true); In my understanding, these configurations can help flink better manage the memory of rocksdb and save some memory overhead in some scenarios. But this may not be the best practice, mainly for the following reasons: 1. After CacheIndexAndFilterBlocks is set to true, it may cause index and filter miss when reading, resulting in performance degradation. 2. These parameters may not be bound together with whether shared memory is used, or some configurations should be supported separately to decide whether to enable these features > Rocksdb CacheIndexAndFilterBlocks must be true when using shared memory > --- > > Key: FLINK-32833 > URL: https://issues.apache.org/jira/browse/FLINK-32833 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.17.1 >Reporter: Yue Ma >Priority: Major > > Currently in RocksDBResourceContainer#getColumnOptions, if sharedResources is > used, blockBasedTableConfig will add the following configuration by default. > {code:java} > blockBasedTableConfig.setBlockCache(blockCache); > blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); > blockBasedTableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true); > blockBasedTableConfig.setPinL0FilterAndIndexBlocksInCache(true);{code} > In my understanding, these configurations can help flink better manage the > memory of rocksdb and save some memory overhead in some scenarios. But this > may not be the best practice, mainly for the following reasons: > 1. After CacheIndexAndFilterBlocks is set to true, it may cause index and > filter miss when reading, resulting in performance degradation. > 2. These parameters may not be bound together with whether shared memory is > used, or some configurations should be supported separately to decide whether > to enable these features -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery
[ https://issues.apache.org/jira/browse/FLINK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752055#comment-17752055 ] Yue Ma edited comment on FLINK-31238 at 8/8/23 1:45 PM: [~pnowojski] [~srichter] [~yunta] [~masteryhx] Sorry for the late update . I've submitted a draft pr ([https://github.com/apache/flink/pull/23169]). I haven't published the frocksdb jni in this PR to the public maven repository, so this code can only run on my local machine. I have passed all the statebackend unit tests . The frocksdb branch used in the PR is [https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db]. (Depends on rocksdb-8.4.0) .This is just a preview version, in order to facilitate everyone to see flink-related changes and further discussions. was (Author: mayuehappy): [~pnowojski] [~srichter] [~yunta] [~masteryhx] Sorry for the late update . I've submitted a draft pr ([https://github.com/apache/flink/pull/23169]). I haven't published the frocksdb jni in this PR to the public maven repository, so this code can only run on my local machine. I have passed all the statebackend unit tests . The frocksdb branch used in the PR is [https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db]. (This is just a preview version, in order to facilitate everyone to see flink-related changes and further discussions) > Use IngestDB to speed up Rocksdb rescaling recovery > > > Key: FLINK-31238 > URL: https://issues.apache.org/jira/browse/FLINK-31238 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.16.1 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Attachments: image-2023-02-27-16-41-18-552.png, > image-2023-02-27-16-57-18-435.png, image-2023-03-07-14-27-10-260.png, > image-2023-03-09-15-23-30-581.png, image-2023-03-09-15-26-12-314.png, > image-2023-03-09-15-28-32-363.png, image-2023-03-09-15-41-03-074.png, > image-2023-03-09-15-41-08-379.png, image-2023-03-09-15-45-56-081.png, > image-2023-03-09-15-46-01-176.png, image-2023-03-09-15-50-04-281.png, > image-2023-03-29-15-25-21-868.png, image-2023-07-17-14-37-38-864.png, > image-2023-07-17-14-38-56-946.png, image-2023-07-22-14-16-31-856.png, > image-2023-07-22-14-19-01-390.png, image-2023-08-08-21-32-43-783.png, > image-2023-08-08-21-34-39-008.png, image-2023-08-08-21-39-39-135.png, > screenshot-1.png > > > (The detailed design is in this document > [https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI|https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI]) > There have been many discussions and optimizations in the community about > optimizing rocksdb scaling and recovery. > https://issues.apache.org/jira/browse/FLINK-17971 > https://issues.apache.org/jira/browse/FLINK-8845 > https://issues.apache.org/jira/browse/FLINK-21321 > We hope to discuss some of our explorations under this ticket > The process of scaling and recovering in rocksdb simply requires two steps > # Insert the valid keyGroup data of the new task. > # Delete the invalid data in the old stateHandle. > The current method for data writing is to specify the main Db first and then > insert data using writeBatch.In addition, the method of deleteRange is > currently used to speed up the ClipDB. But in our production environment, we > found that the speed of rescaling is still very slow, especially when the > state of a single Task is large. > > We hope that the previous sst file can be reused directly when restoring > state, instead of retraversing the data. So we made some attempts to optimize > it in our internal version of flink and frocksdb. > > We added two APIs *ClipDb* and *IngestDb* in frocksdb. > * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and > db.Delete, DeleteValue and RangeTombstone will not be generated for parts > beyond the key range. We will iterate over the FileMetaData of db. Process > each sst file. There are three situations here. > If all the keys of a file are required, we will keep the sst file and do > nothing > If all the keys of the sst file exceed the specified range, we will delete > the file directly. > If we only need some part of the sst file, we will rewrite the required keys > to generate a new sst file。 > All sst file changes will be placed in a VersionEdit, and the current > versions will LogAndApply this edit to ensure that these changes can take > effect > * IngestDb is used to directly ingest all sst files of one DB into another > DB. But it is necessary to strictly ensure that the keys of the two DBs do > not overlap, which is easy to do in the Fl
[jira] [Comment Edited] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery
[ https://issues.apache.org/jira/browse/FLINK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752055#comment-17752055 ] Yue Ma edited comment on FLINK-31238 at 8/8/23 1:43 PM: [~pnowojski] [~srichter] [~yunta] [~masteryhx] Sorry for the late update . I've submitted a draft pr ([https://github.com/apache/flink/pull/23169]). I haven't published the frocksdb jni in this PR to the public maven repository, so this code can only run on my local machine. I have passed all the statebackend unit tests . The frocksdb branch used in the PR is [https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db]. (This is just a preview version, in order to facilitate everyone to see flink-related changes and further discussions) was (Author: mayuehappy): [~pnowojski] [~srichter] [~yunta] [~masteryhx] Sorry for the late update . I've submitted a draft pr ([https://github.com/apache/flink/pull/23169|https://github.com/apache/flink/pull/23169]). I haven't published the frocksdb jni in this PR to the public maven repository, so this code can only run on my local machine. Can pass all unit tests. The frocksdb branch used in the PR is [https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db|https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db]. (This is just a preview version, in order to facilitate everyone to see flink-related changes and further discussions) > Use IngestDB to speed up Rocksdb rescaling recovery > > > Key: FLINK-31238 > URL: https://issues.apache.org/jira/browse/FLINK-31238 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.16.1 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Attachments: image-2023-02-27-16-41-18-552.png, > image-2023-02-27-16-57-18-435.png, image-2023-03-07-14-27-10-260.png, > image-2023-03-09-15-23-30-581.png, image-2023-03-09-15-26-12-314.png, > image-2023-03-09-15-28-32-363.png, image-2023-03-09-15-41-03-074.png, > image-2023-03-09-15-41-08-379.png, image-2023-03-09-15-45-56-081.png, > image-2023-03-09-15-46-01-176.png, image-2023-03-09-15-50-04-281.png, > image-2023-03-29-15-25-21-868.png, image-2023-07-17-14-37-38-864.png, > image-2023-07-17-14-38-56-946.png, image-2023-07-22-14-16-31-856.png, > image-2023-07-22-14-19-01-390.png, image-2023-08-08-21-32-43-783.png, > image-2023-08-08-21-34-39-008.png, image-2023-08-08-21-39-39-135.png, > screenshot-1.png > > > (The detailed design is in this document > [https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI|https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI]) > There have been many discussions and optimizations in the community about > optimizing rocksdb scaling and recovery. > https://issues.apache.org/jira/browse/FLINK-17971 > https://issues.apache.org/jira/browse/FLINK-8845 > https://issues.apache.org/jira/browse/FLINK-21321 > We hope to discuss some of our explorations under this ticket > The process of scaling and recovering in rocksdb simply requires two steps > # Insert the valid keyGroup data of the new task. > # Delete the invalid data in the old stateHandle. > The current method for data writing is to specify the main Db first and then > insert data using writeBatch.In addition, the method of deleteRange is > currently used to speed up the ClipDB. But in our production environment, we > found that the speed of rescaling is still very slow, especially when the > state of a single Task is large. > > We hope that the previous sst file can be reused directly when restoring > state, instead of retraversing the data. So we made some attempts to optimize > it in our internal version of flink and frocksdb. > > We added two APIs *ClipDb* and *IngestDb* in frocksdb. > * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and > db.Delete, DeleteValue and RangeTombstone will not be generated for parts > beyond the key range. We will iterate over the FileMetaData of db. Process > each sst file. There are three situations here. > If all the keys of a file are required, we will keep the sst file and do > nothing > If all the keys of the sst file exceed the specified range, we will delete > the file directly. > If we only need some part of the sst file, we will rewrite the required keys > to generate a new sst file。 > All sst file changes will be placed in a VersionEdit, and the current > versions will LogAndApply this edit to ensure that these changes can take > effect > * IngestDb is used to directly ingest all sst files of one DB into another > DB. But it is necessary to strictly ensure that the keys of
[jira] [Commented] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery
[ https://issues.apache.org/jira/browse/FLINK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752055#comment-17752055 ] Yue Ma commented on FLINK-31238: [~pnowojski] [~srichter] [~yunta] [~masteryhx] Sorry for the late update . I've submitted a draft pr ([https://github.com/apache/flink/pull/23169|https://github.com/apache/flink/pull/23169]). I haven't published the frocksdb jni in this PR to the public maven repository, so this code can only run on my local machine. Can pass all unit tests. The frocksdb branch used in the PR is [https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db|https://github.com/mayuehappy/rocksdb/tree/Frocksdb-8.4.0-ingest-db]. (This is just a preview version, in order to facilitate everyone to see flink-related changes and further discussions) > Use IngestDB to speed up Rocksdb rescaling recovery > > > Key: FLINK-31238 > URL: https://issues.apache.org/jira/browse/FLINK-31238 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.16.1 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Attachments: image-2023-02-27-16-41-18-552.png, > image-2023-02-27-16-57-18-435.png, image-2023-03-07-14-27-10-260.png, > image-2023-03-09-15-23-30-581.png, image-2023-03-09-15-26-12-314.png, > image-2023-03-09-15-28-32-363.png, image-2023-03-09-15-41-03-074.png, > image-2023-03-09-15-41-08-379.png, image-2023-03-09-15-45-56-081.png, > image-2023-03-09-15-46-01-176.png, image-2023-03-09-15-50-04-281.png, > image-2023-03-29-15-25-21-868.png, image-2023-07-17-14-37-38-864.png, > image-2023-07-17-14-38-56-946.png, image-2023-07-22-14-16-31-856.png, > image-2023-07-22-14-19-01-390.png, image-2023-08-08-21-32-43-783.png, > image-2023-08-08-21-34-39-008.png, image-2023-08-08-21-39-39-135.png, > screenshot-1.png > > > (The detailed design is in this document > [https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI|https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI]) > There have been many discussions and optimizations in the community about > optimizing rocksdb scaling and recovery. > https://issues.apache.org/jira/browse/FLINK-17971 > https://issues.apache.org/jira/browse/FLINK-8845 > https://issues.apache.org/jira/browse/FLINK-21321 > We hope to discuss some of our explorations under this ticket > The process of scaling and recovering in rocksdb simply requires two steps > # Insert the valid keyGroup data of the new task. > # Delete the invalid data in the old stateHandle. > The current method for data writing is to specify the main Db first and then > insert data using writeBatch.In addition, the method of deleteRange is > currently used to speed up the ClipDB. But in our production environment, we > found that the speed of rescaling is still very slow, especially when the > state of a single Task is large. > > We hope that the previous sst file can be reused directly when restoring > state, instead of retraversing the data. So we made some attempts to optimize > it in our internal version of flink and frocksdb. > > We added two APIs *ClipDb* and *IngestDb* in frocksdb. > * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and > db.Delete, DeleteValue and RangeTombstone will not be generated for parts > beyond the key range. We will iterate over the FileMetaData of db. Process > each sst file. There are three situations here. > If all the keys of a file are required, we will keep the sst file and do > nothing > If all the keys of the sst file exceed the specified range, we will delete > the file directly. > If we only need some part of the sst file, we will rewrite the required keys > to generate a new sst file。 > All sst file changes will be placed in a VersionEdit, and the current > versions will LogAndApply this edit to ensure that these changes can take > effect > * IngestDb is used to directly ingest all sst files of one DB into another > DB. But it is necessary to strictly ensure that the keys of the two DBs do > not overlap, which is easy to do in the Flink scenario. The hard link method > will be used in the process of ingesting files, so it will be very fast. At > the same time, the file number of the main DB will be incremented > sequentially, and the SequenceNumber of the main DB will be updated to the > larger SequenceNumber of the two DBs. > When IngestDb and ClipDb are supported, the state restoration logic is as > follows > * Open the first StateHandle as the main DB and pause the compaction. > * Clip the main DB according to the KeyGroup range of the Task with ClipDB > * Open other StateHandles in se
[jira] [Updated] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery
[ https://issues.apache.org/jira/browse/FLINK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-31238: --- Attachment: image-2023-08-08-21-39-39-135.png > Use IngestDB to speed up Rocksdb rescaling recovery > > > Key: FLINK-31238 > URL: https://issues.apache.org/jira/browse/FLINK-31238 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.16.1 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Attachments: image-2023-02-27-16-41-18-552.png, > image-2023-02-27-16-57-18-435.png, image-2023-03-07-14-27-10-260.png, > image-2023-03-09-15-23-30-581.png, image-2023-03-09-15-26-12-314.png, > image-2023-03-09-15-28-32-363.png, image-2023-03-09-15-41-03-074.png, > image-2023-03-09-15-41-08-379.png, image-2023-03-09-15-45-56-081.png, > image-2023-03-09-15-46-01-176.png, image-2023-03-09-15-50-04-281.png, > image-2023-03-29-15-25-21-868.png, image-2023-07-17-14-37-38-864.png, > image-2023-07-17-14-38-56-946.png, image-2023-07-22-14-16-31-856.png, > image-2023-07-22-14-19-01-390.png, image-2023-08-08-21-32-43-783.png, > image-2023-08-08-21-34-39-008.png, image-2023-08-08-21-39-39-135.png, > screenshot-1.png > > > (The detailed design is in this document > [https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI|https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI]) > There have been many discussions and optimizations in the community about > optimizing rocksdb scaling and recovery. > https://issues.apache.org/jira/browse/FLINK-17971 > https://issues.apache.org/jira/browse/FLINK-8845 > https://issues.apache.org/jira/browse/FLINK-21321 > We hope to discuss some of our explorations under this ticket > The process of scaling and recovering in rocksdb simply requires two steps > # Insert the valid keyGroup data of the new task. > # Delete the invalid data in the old stateHandle. > The current method for data writing is to specify the main Db first and then > insert data using writeBatch.In addition, the method of deleteRange is > currently used to speed up the ClipDB. But in our production environment, we > found that the speed of rescaling is still very slow, especially when the > state of a single Task is large. > > We hope that the previous sst file can be reused directly when restoring > state, instead of retraversing the data. So we made some attempts to optimize > it in our internal version of flink and frocksdb. > > We added two APIs *ClipDb* and *IngestDb* in frocksdb. > * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and > db.Delete, DeleteValue and RangeTombstone will not be generated for parts > beyond the key range. We will iterate over the FileMetaData of db. Process > each sst file. There are three situations here. > If all the keys of a file are required, we will keep the sst file and do > nothing > If all the keys of the sst file exceed the specified range, we will delete > the file directly. > If we only need some part of the sst file, we will rewrite the required keys > to generate a new sst file。 > All sst file changes will be placed in a VersionEdit, and the current > versions will LogAndApply this edit to ensure that these changes can take > effect > * IngestDb is used to directly ingest all sst files of one DB into another > DB. But it is necessary to strictly ensure that the keys of the two DBs do > not overlap, which is easy to do in the Flink scenario. The hard link method > will be used in the process of ingesting files, so it will be very fast. At > the same time, the file number of the main DB will be incremented > sequentially, and the SequenceNumber of the main DB will be updated to the > larger SequenceNumber of the two DBs. > When IngestDb and ClipDb are supported, the state restoration logic is as > follows > * Open the first StateHandle as the main DB and pause the compaction. > * Clip the main DB according to the KeyGroup range of the Task with ClipDB > * Open other StateHandles in sequence as Tmp DB, and perform ClipDb > according to the KeyGroup range > * Ingest all tmpDb into the main Db after tmpDb cliped > * Open the Compaction process of the main DB > !screenshot-1.png|width=923,height=243! > We have done some benchmark tests on the internal Flink version, and the test > results show that compared with the writeBatch method, the expansion and > recovery speed of IngestDb can be increased by 5 to 10 times as follows > (SstFileWriter means uses the recovery method of generating sst files through > SstFileWriter in parallel) > * parallelism changes from 4 to 2 > |*TaskStateSize*|*Write_Batch*|*SST_File_Writer*|*Inge
[jira] [Updated] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery
[ https://issues.apache.org/jira/browse/FLINK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Ma updated FLINK-31238: --- Attachment: image-2023-08-08-21-34-39-008.png > Use IngestDB to speed up Rocksdb rescaling recovery > > > Key: FLINK-31238 > URL: https://issues.apache.org/jira/browse/FLINK-31238 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.16.1 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Attachments: image-2023-02-27-16-41-18-552.png, > image-2023-02-27-16-57-18-435.png, image-2023-03-07-14-27-10-260.png, > image-2023-03-09-15-23-30-581.png, image-2023-03-09-15-26-12-314.png, > image-2023-03-09-15-28-32-363.png, image-2023-03-09-15-41-03-074.png, > image-2023-03-09-15-41-08-379.png, image-2023-03-09-15-45-56-081.png, > image-2023-03-09-15-46-01-176.png, image-2023-03-09-15-50-04-281.png, > image-2023-03-29-15-25-21-868.png, image-2023-07-17-14-37-38-864.png, > image-2023-07-17-14-38-56-946.png, image-2023-07-22-14-16-31-856.png, > image-2023-07-22-14-19-01-390.png, image-2023-08-08-21-32-43-783.png, > image-2023-08-08-21-34-39-008.png, screenshot-1.png > > > (The detailed design is in this document > [https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI|https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI]) > There have been many discussions and optimizations in the community about > optimizing rocksdb scaling and recovery. > https://issues.apache.org/jira/browse/FLINK-17971 > https://issues.apache.org/jira/browse/FLINK-8845 > https://issues.apache.org/jira/browse/FLINK-21321 > We hope to discuss some of our explorations under this ticket > The process of scaling and recovering in rocksdb simply requires two steps > # Insert the valid keyGroup data of the new task. > # Delete the invalid data in the old stateHandle. > The current method for data writing is to specify the main Db first and then > insert data using writeBatch.In addition, the method of deleteRange is > currently used to speed up the ClipDB. But in our production environment, we > found that the speed of rescaling is still very slow, especially when the > state of a single Task is large. > > We hope that the previous sst file can be reused directly when restoring > state, instead of retraversing the data. So we made some attempts to optimize > it in our internal version of flink and frocksdb. > > We added two APIs *ClipDb* and *IngestDb* in frocksdb. > * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and > db.Delete, DeleteValue and RangeTombstone will not be generated for parts > beyond the key range. We will iterate over the FileMetaData of db. Process > each sst file. There are three situations here. > If all the keys of a file are required, we will keep the sst file and do > nothing > If all the keys of the sst file exceed the specified range, we will delete > the file directly. > If we only need some part of the sst file, we will rewrite the required keys > to generate a new sst file。 > All sst file changes will be placed in a VersionEdit, and the current > versions will LogAndApply this edit to ensure that these changes can take > effect > * IngestDb is used to directly ingest all sst files of one DB into another > DB. But it is necessary to strictly ensure that the keys of the two DBs do > not overlap, which is easy to do in the Flink scenario. The hard link method > will be used in the process of ingesting files, so it will be very fast. At > the same time, the file number of the main DB will be incremented > sequentially, and the SequenceNumber of the main DB will be updated to the > larger SequenceNumber of the two DBs. > When IngestDb and ClipDb are supported, the state restoration logic is as > follows > * Open the first StateHandle as the main DB and pause the compaction. > * Clip the main DB according to the KeyGroup range of the Task with ClipDB > * Open other StateHandles in sequence as Tmp DB, and perform ClipDb > according to the KeyGroup range > * Ingest all tmpDb into the main Db after tmpDb cliped > * Open the Compaction process of the main DB > !screenshot-1.png|width=923,height=243! > We have done some benchmark tests on the internal Flink version, and the test > results show that compared with the writeBatch method, the expansion and > recovery speed of IngestDb can be increased by 5 to 10 times as follows > (SstFileWriter means uses the recovery method of generating sst files through > SstFileWriter in parallel) > * parallelism changes from 4 to 2 > |*TaskStateSize*|*Write_Batch*|*SST_File_Writer*|*Ingest_DB*| > |500M|Iteration 1: 8.018 s/o