[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279285#comment-17279285 ] Yun Tang commented on FLINK-15318: -- [~maguowei] These tests are dropped in FLINK-18373 and I will close this ticket as it fixed from 1.12.0 > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278547#comment-17278547 ] Guowei Ma commented on FLINK-15318: --- another case https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12891=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=39a61cac-5c62-532f-d2c1-dea450a66708 > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139349#comment-17139349 ] Stephan Ewen commented on FLINK-15318: -- Fair enough, I agree with your conclusion, Yun Tang. +1 to drop these benchmark unit tests. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139243#comment-17139243 ] Yun Tang commented on FLINK-15318: -- [~sewen] After digging into those tests, I think all of them could be dropped. {{RocksDBListStatePerformanceTest}} targets for performance of "stringappendtest" merge operator, which has been covered by [ListStateBenchmark#add|https://github.com/apache/flink-benchmarks/blob/8b449865cf733dbb3c01e997fe44b1a5b6f82cdc/src/main/java/org/apache/flink/state/benchmark/ListStateBenchmark.java#L118]. {{RocksDBWriteBatchPerformanceTest}} targets for performance of WriteBatch which should be covered by [MapStateBenchmark#mapPutAll|https://github.com/apache/flink-benchmarks/blob/8b449865cf733dbb3c01e997fe44b1a5b6f82cdc/src/main/java/org/apache/flink/state/benchmark/MapStateBenchmark.java#L160] {{RocksDBPerformanceTest}} targets for performance of merge and iterator seek and next, which have been covered by [ListStateBenchmark#add|https://github.com/apache/flink-benchmarks/blob/8b449865cf733dbb3c01e997fe44b1a5b6f82cdc/src/main/java/org/apache/flink/state/benchmark/ListStateBenchmark.java#L118] and [MapStateBenchmark#mapIterator|https://github.com/apache/flink-benchmarks/blob/8b449865cf733dbb3c01e997fe44b1a5b6f82cdc/src/main/java/org/apache/flink/state/benchmark/MapStateBenchmark.java#L143] And the most important thing is unit test cannot watch the performance issues clearly. If the execution time expands from 2 seconds to 2.5 seconds, which means the performance regression is about 25%. However, a timeout limit of 3 seconds cannot detect such performance regression. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136697#comment-17136697 ] Stephan Ewen commented on FLINK-15318: -- The purpose of this test is to guard against the "quadratic concatenation complexity bug" that RocksDB had a few versions ago. In that case, the benchmark took 50s or so. We can probably increase this to 5s without a problem. How about this? - we add it to the benchmarks suite to monitor regressions more precisely - we keep it in the codebase with a timeout of 5 seconds or so, as a rough guard > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128915#comment-17128915 ] Robert Metzger commented on FLINK-15318: I don't really have an opinion here, because I'm not very familiar with the RocksDB code. Let's wait for Stephan's response. Another case in {{[ERROR] RocksDBPerformanceTest.testRocksDbRangeGetPerformance:146 » TestTimedOut test ...}}: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2973=logs=3b6ec2fd-a816-5e75-c775-06fb87cb6670=2aff8966-346f-518f-e6ce-de64002a5034 > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127940#comment-17127940 ] Yun Tang commented on FLINK-15318: -- Since we already have the repo: [https://github.com/apache/flink-benchmarks], I wonder the significance of those \{{RocksDB*PerformanceTest}}s and unit test performance would easily be affected by the status of running host. I prefer to remove them all: ({{RocksDBListStatePerformanceTest}}, {{RocksDBWriteBatchPerformanceTest}} and {{RocksDBPerformanceTest}}), and we could also add cases in flink-benchmarks if we think some field is only covered by those tests. What do you think of this [~sewen], [~rmetzger]? > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124616#comment-17124616 ] Robert Metzger commented on FLINK-15318: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2587=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=39a61cac-5c62-532f-d2c1-dea450a66708 > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099879#comment-17099879 ] Robert Metzger commented on FLINK-15318: Another instance: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=610=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=4ed44b66-cdd6-5dcf-5f6a-88b07dda665d > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014379#comment-17014379 ] Ronald O. Edmark commented on FLINK-15318: -- [~yunta] thanks again for helping. This is a Red Hat 7.6 KVM hosted virtual machine, ppc64le with 16 GB of memory, 8 GB swap, 4 cpus. Currently our only solution is to change the performance write time-out from 2 seconds to 3 seconds in ``` ./flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java [*@Test*|https://jazz06.rchland.ibm.com:12443/jazz/users/Test]*(timeout = 2000) change to [@Test|https://jazz06.rchland.ibm.com:12443/jazz/users/Test](timeout = 3000)* *```* > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014361#comment-17014361 ] Yun Tang commented on FLINK-15318: -- [~redmark-ibm] thanks for your feedback. What's your hardware information? It seems no one has ever compared the performance of RocksDB with the same hardware on amd64 V.S ppc64le, we could open an issue in RocksDB community. For this question in Flink, we could increase the timeout if RocksDB has some performance issue on ppc64le confirmed. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014313#comment-17014313 ] Ronald O. Edmark commented on FLINK-15318: -- Yun thank you for the commit changes. I've modified the `pom` file, applied the commit changes and ran a clean test build of RocksDB but we still see the failure. Changing the time from 2 to 3 seconds does work-around the problem. Is 3 sec an acceptable timeout? -Ron ``` [redmark@p006vm23 flink]$ mvn clean test -rf :flink-statebackend-rocksdb_2.11 [INFO] Scanning for projects... .. [INFO] [INFO] Building flink-statebackend-rocksdb 1.8.3 [INFO] [INFO] [INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ flink-statebackend-rocksdb_2.11 --- [INFO] Deleting /root/flink/flink-state-backends/flink-statebackend-rocksdb/target .. [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.473 s <<< FAILURE! - in org.apache.flink.contrib.streaming.state.benchmark.RocksDBWriteBatchPerformanceTest [ERROR] benchMark(org.apache.flink.contrib.streaming.state.benchmark.RocksDBWriteBatchPerformanceTest) Time elapsed: 2.073 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 2000 milliseconds at org.apache.flink.contrib.streaming.state.benchmark.RocksDBWriteBatchPerformanceTest.benchMarkHelper(RocksDBWriteBatchPerformanceTest.java:118) at org.apache.flink.contrib.streaming.state.benchmark.RocksDBWriteBatchPerformanceTest.benchMark(RocksDBWriteBatchPerformanceTest.java:96) ``` > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013995#comment-17013995 ] Yun Tang commented on FLINK-15318: -- [~redmark-ibm] you can try this [commit|https://github.com/Myasuka/flink/commit/484fffb08620ab177175405c53d64faaeb585d01]. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012283#comment-17012283 ] Ronald O. Edmark commented on FLINK-15318: -- Yum, I would like to point out that the tests just fails, changing the number from 2 seconds to 3 seconds works. Most of the time the failure are just over 2 seconds. i.e. *Time elapsed: 2.095* I made these changes to the pom. Current Flink 1.8.3 has com.data-artisans *frocksdbjni* *5.17.2-artisans-1.0* Changed to org.rocksdb *rocksdbjni* *5.17.2* I worked on removing *org.rocksdb.FlinkCompactionFilter* and *org.rocksdb.FlinkCompactionFilter.FlinkCompactionFilterFactory* but I was hitting issues getting it cleanly removed, if you can provide a modified *RocksDbTtlCompactFiltersManager.java* version that will help. Otherwise I'll work on it tomorrow when I have more time. Thanks for helping, Ron > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010885#comment-17010885 ] Yun Tang commented on FLINK-15318: -- [~redmark-ibm] can you retry the official RocksDB to see whether could meet this problem: * Edit {{flink-state-backends/flink-statebackend-rocksdb/pom.xml}} to use rocksDB instead of FrocksDB {code:java} org.rocksdb frocksdbjni 5.17.2 {code} * Edit {{flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/ttl/RocksDbTtlCompactFiltersManager.java}} to drop all usage of {{org.rocksdb.FlinkCompactionFilter}} and {{org.rocksdb.FlinkCompactionFilter.FlinkCompactionFilterFactory}}. Remove them would not affect you to run that test. By doing so, you could verify whether the problem still existed for official RocksDB. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009686#comment-17009686 ] Ronald O. Edmark commented on FLINK-15318: -- I'm hitting the same problem in Flink 1.8.3. Did anyone find a fix for this issue? I have a ppc64le environment to help debug the issue. Red Hat 7.6 Linux ppc64le Java 1.8.0.232 Maven 3.2.5 > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002716#comment-17002716 ] Yun Tang commented on FLINK-15318: -- [~siddheshghadi] I noticed that you also come across that in release-1.8 which is an older version of FRocksDB. From my previous experience, I have noticed that FRocksDB on ppc64le platform behaves worse than other platforms and I actually have not met some guys using Flink in production with ppc64le environment. In a nut shell, the timeout for FRocksDB is not enough on ppc64le platform. Did you use Flink in production on ppc64le platform? I am afraid Flink community lacks of rich experience on ppc64le especially for FRocksDB performance. By the way, can you try to use RocksDB instead of FRocksDB to run the tests (Remember to remove all the usage of {{org.rocksdb.FlinkCompactionFilter}} so that you could build with official RocksDB). > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000605#comment-17000605 ] Siddhesh Ghadi commented on FLINK-15318: I verified it with master, release-1.10, release-1.9 & release-1.8 branches, RocksDBWriteBatchPerformanceTest.benchMark fails on all these branches. Also I came across this error for the first time when I tried it on ppc64le. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15318) RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le
[ https://issues.apache.org/jira/browse/FLINK-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999114#comment-16999114 ] Yun Tang commented on FLINK-15318: -- Which version of Flink did you verify? Did you observe a stable performance behavior before and then suddenly fail due to sync with new commits or you just come across this error for the first time when you just want to try it on ppc64le platform. Actually, Flink community lacks of benchmark on ppc64le environment and I noticed that RocksDB on ppc64le behaves not as good as those on linux64. > RocksDBWriteBatchPerformanceTest.benchMark fails on ppc64le > --- > > Key: FLINK-15318 > URL: https://issues.apache.org/jira/browse/FLINK-15318 > Project: Flink > Issue Type: Bug > Components: Benchmarks, Runtime / State Backends > Environment: arch: ppc64le > os: rhel7.6, ubuntu 18.04 > jdk: 8, 11 > mvn: 3.3.9, 3.6.2 >Reporter: Siddhesh Ghadi >Priority: Major > Attachments: surefire-report.txt > > > RocksDBWriteBatchPerformanceTest.benchMark fails due to TestTimedOut, however > when test-timeout is increased from 2s to 5s in > org/apache/flink/contrib/streaming/state/benchmark/RocksDBWriteBatchPerformanceTest.java:75, > it passes. Is this acceptable solution? > Note: Tests are ran inside a container. -- This message was sent by Atlassian Jira (v8.3.4#803005)