Hi Nicholas,

Thanks a lot for identifying and fixing those issues.
I just see that RATIS-2164, RATIS-2151, RATIS-2173 have already been fixed. 
Does it mean we can consider releasing a version from Ratis master branch, e.g. 
3.2.0?

Thanks,
Duong

On 2024/10/09 00:38:16 Tsz Wo Sze wrote:
> It turns out that there are quite a few bugs in zero-copy.  For a list of
> bugs, see https://github.com/apache/ratis/pull/1156
> 
> Note that the list is likely incomplete.  After the fix, it can pass all
> tests (with a few retries).
> 
> Since the fix is quite big and non-trivial, I am currently splitting it
> into servel JIRAs.  After merging them, I will continue debugging zero-copy.
> 
> Tsz-Wo
> 
> 
> 
> On Tue, Sep 10, 2024 at 5:07 PM Tsz Wo Sze <[email protected]> wrote:
> 
> > Hi Wei-Chiu,
> >
> > Thanks for reporting this!
> >
> > The failure of TestRaftWithGrpc is related to RATIS-2129 and zero-copy;
> > see https://issues.apache.org/jira/browse/RATIS-2151 "TestRaftWithGrpc
> > may fail after RATIS-2129".
> >
> > - It does not fail before RATIS-2129 with zero-copy.
> > - It does not fail after RATIS-2129 without zero-copy.
> >
> > However,
> > - It fails frequently after RATIS-2129 with zero-copy.
> >
> > Tsz-Wo
> >
> >
> > On Tue, Sep 10, 2024 at 4:57 PM Wei-Chiu Chuang
> > <[email protected]> wrote:
> >
> >> Hi it looks like TestRaftWithGrpc is failing consistently.
> >>
> >> Looking at git history, https://github.com/apache/ratis/commits/master/
> >> the failure has been there since
> >> RATIS-2129 <https://issues.apache.org/jira/browse/RATIS-2129>. Low
> >> replication performance because LogAppender is often blocked by RaftLog's
> >> readLock. (
> >> <
> >> https://github.com/apache/ratis/commit/781d61d37411b374f104eb0806e1e2c4090fb35e
> >> >
> >> #1141 <https://github.com/apache/ratis/pull/1141>)
> >> <
> >> https://github.com/apache/ratis/commit/781d61d37411b374f104eb0806e1e2c4090fb35e
> >> >
> >>
> >>
> >> Here is one example:
> >>
> >> Error:
> >> org.apache.ratis.grpc.TestRaftWithGrpc.testUpdateViaHeartbeat(Boolean)[2]
> >> Time elapsed: 6.801 s <<< ERROR!
> >> 1001
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1002
> >> >java.lang.IllegalStateException:
> >> allLeaks.size = 15
> >> 1002
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1003
> >> >
> >> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
> >> 1003
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1004
> >> >
> >> at org.apache.ratis.util.LeakDetector.assertNoLeaks(LeakDetector.java:107)
> >> 1004
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1005
> >> >
> >> at
> >>
> >> org.apache.ratis.server.impl.MiniRaftCluster.shutdown(MiniRaftCluster.java:869)
> >>
> >> 1005
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1006
> >> >
> >> at
> >>
> >> org.apache.ratis.grpc.MiniRaftClusterWithGrpc.shutdown(MiniRaftClusterWithGrpc.java:93)
> >>
> >> 1006
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1007
> >> >
> >> at
> >>
> >> org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:149)
> >>
> >> 1007
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1008
> >> >
> >> at
> >>
> >> org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:121)
> >>
> >> 1008
> >> <
> >> https://github.com/apache/ratis/actions/runs/10786817671/job/29914349737#step:5:1009
> >> >
> >> at
> >>
> >> org.apache.ratis.grpc.TestRaftWithGrpc.testUpdateViaHeartbeat(TestRaftWithGrpc.java:76)
> >>
> >> Not sure if it's a production code issue or test issue.
> >>
> >
> 

Reply via email to