Hi Nicholas, Thanks a lot for identifying and fixing those issues. I just see that RATIS-2164, RATIS-2151, RATIS-2173 have already been fixed. Does it mean we can consider releasing a version from Ratis master branch, e.g. 3.2.0?
Thanks, Duong On 2024/10/09 00:38:16 Tsz Wo Sze wrote: > It turns out that there are quite a few bugs in zero-copy. For a list of > bugs, see https://github.com/apache/ratis/pull/1156 > > Note that the list is likely incomplete. After the fix, it can pass all > tests (with a few retries). > > Since the fix is quite big and non-trivial, I am currently splitting it > into servel JIRAs. After merging them, I will continue debugging zero-copy. > > Tsz-Wo > > > > On Tue, Sep 10, 2024 at 5:07 PM Tsz Wo Sze <[email protected]> wrote: > > > Hi Wei-Chiu, > > > > Thanks for reporting this! > > > > The failure of TestRaftWithGrpc is related to RATIS-2129 and zero-copy; > > see https://issues.apache.org/jira/browse/RATIS-2151 "TestRaftWithGrpc > > may fail after RATIS-2129". > > > > - It does not fail before RATIS-2129 with zero-copy. > > - It does not fail after RATIS-2129 without zero-copy. > > > > However, > > - It fails frequently after RATIS-2129 with zero-copy. > > > > Tsz-Wo > > > > > > On Tue, Sep 10, 2024 at 4:57 PM Wei-Chiu Chuang > > <[email protected]> wrote: > > > >> Hi it looks like TestRaftWithGrpc is failing consistently. > >> > >> Looking at git history, https://github.com/apache/ratis/commits/master/ > >> the failure has been there since > >> RATIS-2129 <https://issues.apache.org/jira/browse/RATIS-2129>. Low > >> replication performance because LogAppender is often blocked by RaftLog's > >> readLock. ( > >> < > >> https://github.com/apache/ratis/commit/781d61d37411b374f104eb0806e1e2c4090fb35e > >> > > >> #1141 <https://github.com/apache/ratis/pull/1141>) > >> < > >> https://github.com/apache/ratis/commit/781d61d37411b374f104eb0806e1e2c4090fb35e > >> > > >> > >> > >> Here is one example: > >> > >> Error: > >> org.apache.ratis.grpc.TestRaftWithGrpc.testUpdateViaHeartbeat(Boolean)[2] > >> Time elapsed: 6.801 s <<< ERROR! > >> 1001 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1002 > >> >java.lang.IllegalStateException: > >> allLeaks.size = 15 > >> 1002 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1003 > >> > > >> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77) > >> 1003 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1004 > >> > > >> at org.apache.ratis.util.LeakDetector.assertNoLeaks(LeakDetector.java:107) > >> 1004 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1005 > >> > > >> at > >> > >> org.apache.ratis.server.impl.MiniRaftCluster.shutdown(MiniRaftCluster.java:869) > >> > >> 1005 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1006 > >> > > >> at > >> > >> org.apache.ratis.grpc.MiniRaftClusterWithGrpc.shutdown(MiniRaftClusterWithGrpc.java:93) > >> > >> 1006 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1007 > >> > > >> at > >> > >> org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:149) > >> > >> 1007 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1008 > >> > > >> at > >> > >> org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:121) > >> > >> 1008 > >> < > >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1009 > >> > > >> at > >> > >> org.apache.ratis.grpc.TestRaftWithGrpc.testUpdateViaHeartbeat(TestRaftWithGrpc.java:76) > >> > >> Not sure if it's a production code issue or test issue. > >> > > >
