[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849779#comment-17849779 ] Vladislav Pyatkov commented on IGNITE-15568: Merged ced0ebba0969ad1b75ee94ca5a252aef15d97955 > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java, MyInsertBenchmarkWithMetrics.java > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849768#comment-17849768 ] Vladislav Pyatkov commented on IGNITE-15568: In bound of this ticket, IGNITE-20536 is also fixed. > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java, MyInsertBenchmarkWithMetrics.java > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847377#comment-17847377 ] Vladislav Pyatkov commented on IGNITE-15568: {code} New Raft metrics: raft.logmanager.disruptor.Batch: [ 0_10:2487232, 10_20:5693, 20_30:2096, 30_40:574, 40_50:295, 50_inf:4] Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units MyInsertBenchmarkWithMetrics.kvInsert 1false 2 avgt 200 6723,882 ± 639,991 us/op MyInsertBenchmarkWithMetrics.kvInsert 1 true 2 avgt 200 7722,169 ± 504,716 us/op Old raft.logmanager.disruptor.Batch: [ 0_10:2788769, 10_20:8218, 20_30:4532, 30_40:1579, 40_50:782, 50_inf:61] raft.nodeimpl.disruptor.Batch: [ 0_10:3274036, 10_20:2066, 20_30:446, 30_40:128, 40_50:35, 50_inf:8] raft.readonlyservice.disruptor.Batch: [ 0_10:2, 10_20:0, 20_30:0, 30_40:0, 40_50:0, 50_inf:0] raft.fsmcaller.disruptor.Batch: [ 0_10:9135, 10_20:6197, 20_30:4795, 30_40:6800, 40_50:80328, 50_inf:73] Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units MyInsertBenchmarkWithMetrics.kvInsert 1false 2 avgt 200 7611,808 ± 695,469 us/op MyInsertBenchmarkWithMetrics.kvInsert 1 true 2 avgt 200 7681,789 ± 433,490 us/op {code} > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java, MyInsertBenchmarkWithMetrics.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847319#comment-17847319 ] Vladislav Pyatkov commented on IGNITE-15568: It was just like it was. I do not edit the old code. > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java, MyInsertBenchmarkWithMetrics.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847276#comment-17847276 ] Alexey Scherbakov commented on IGNITE-15568: This result doesn't make sense to me as well. "Old distruptor' (without optimization) should have batches only of size=1 > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java, MyInsertBenchmarkWithMetrics.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847263#comment-17847263 ] Vladislav Pyatkov commented on IGNITE-15568: I think the previous one also has a sense, because it shows we do not degrade performance. I have attached a new one: [^MyInsertBenchmarkWithMetrics.java] Here is the result from my laptop: {code} Old Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units InsertBenchmark.kvInsert 1false 2 avgt 200 6821,523 ± 1190,279 us/op InsertBenchmark.kvInsert 1 true 2 avgt 200 8172,433 ± 294,077 us/op raft.fsmcaller.disruptor.Batch:[ 0_10:29890, 10_20:71825, 20_30:37548, 30_40:9062, 40_50:1428, 50_inf:2] raft.logmanager.disruptor.Batch:[ 0_10:32324, 10_20:48196, 20_30:53466, 30_40:8831, 40_50:1081, 50_inf:26] raft.nodeimpl.disruptor.Batch:[ 0_10:1804447, 10_20:1205, 20_30:122, 30_40:27, 40_50:14, 50_inf:0] raft.readonlyservice.disruptor.Batch:[ 0_10:6, 10_20:0, 20_30:0, 30_40:0, 40_50:0, 50_inf:0] New Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units InsertBenchmark.kvInsert 1false 2 avgt 200 7357,067 ± 640,983 us/op InsertBenchmark.kvInsert 1 true 2 avgt 200 8015,733 ± 469,096 us/op raft: raft.fsmcaller.disruptor.Batch:[ 0_10:177419, 10_20:78244, 20_30:2549, 30_40:8, 40_50:0, 50_inf:0] raft.logmanager.disruptor.Batch:[ 0_10:71704, 10_20:76075, 20_30:26090, 30_40:1540, 40_50:44, 50_inf:0] raft.nodeimpl.disruptor.Batch:[ 0_10:2283394, 10_20:516, 20_30:52, 30_40:4, 40_50:0, 50_inf:0] raft.readonlyservice.disruptor.Batch:[ 0_10:6, 10_20:0, 20_30:0, 30_40:0, 40_50:0, 50_inf:0] {code} > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java, MyInsertBenchmarkWithMetrics.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected >
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847257#comment-17847257 ] Alexey Scherbakov commented on IGNITE-15568: Proposed test scenario doesn't make sense to me. We need at least 2 partitions and one stripe to see benefits from patch improvement. > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: InsertBenchmark.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847113#comment-17847113 ] Vladislav Pyatkov commented on IGNITE-15568: After two passes (the benchmark was attached), it showed a weak positive impingement. {code:java} New disruptor Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units InsertBenchmark.kvInsert 1false 1 avgt 200 6891,786 ± 480,532 us/op InsertBenchmark.kvInsert 1 true 1 avgt 200 7615,249 ± 462,971 us/op Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units InsertBenchmark.kvInsert 1false 1 avgt 200 6676,231 ± 435,272 us/op InsertBenchmark.kvInsert 1 true 1 avgt 200 7656,038 ± 460,172 us/op Old disruptor Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units InsertBenchmark.kvInsert 1false 1 avgt 200 7398,135 ± 895,617 us/op InsertBenchmark.kvInsert 1 true 1 avgt 200 7965,185 ± 443,870 us/op Benchmark (clusterSize) (fsync) (partitionCount) Mode Cnt Score Error Units InsertBenchmark.kvInsert 1false 1 avgt 200 6618,169 ± 1093,236 us/op InsertBenchmark.kvInsert 1 true 1 avgt 200 8136,877 ± 292,777 us/op {code} > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, performance > Fix For: 3.0.0-beta2 > > Attachments: StripedDisruptor.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-15568) Striped Disruptor doesn't work with JRaft event handlers properly
[ https://issues.apache.org/jira/browse/IGNITE-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741528#comment-17741528 ] Ivan Bessonov commented on IGNITE-15568: The actual implementation here differs from the description. First of all, only the log manager is affected. Second, instead of having a single event that would notify all listeners, I re-use the "endOfBatch" flag. This solution is not as general, and it can be later re-implemented using additional event, but I decided not to change the API too much for now. Log manager internally has the information about the stripe it belongs to. Using that information, it's possible to perform a single write into a shared log storage, even when batch consists of data from several different replication groups. > Striped Disruptor doesn't work with JRaft event handlers properly > - > > Key: IGNITE-15568 > URL: https://issues.apache.org/jira/browse/IGNITE-15568 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Scherbakov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 50m > Remaining Estimate: 0h > > The following scenario is broken: > # Two raft groups are started and mapped to the same stripe. > # Two LogEntryAndClosure events are added in quick succession so they form > distruptor batch: first for group 1, second for group 2. > First event is delivered to group 1 with endOfBatch=false, so it's cached in > org.apache.ignite.raft.jraft.core.NodeImpl.LogEntryAndClosureHandler#tasks > and is not processed. > Second event is delivered to group 2 with endOfBatch=true and processed, but > first event will remain in queue unprocessed forever, because > LogEntryAndClosureHandler are different instances per raft group. > The possible WA for this is to set > org.apache.ignite.raft.jraft.option.RaftOptions#applyBatch=1 > Reproducible by > org.apache.ignite.internal.table.TxDistributedTest_1_1_1#testCrossTable + > applyBatch=32 in ignite-15085 branch > *Implementation notes* > My proposal goes bound Disruptor. The striped disruptor implementation has an > interceptor that proposes an event to a specific interceptor. Only the last > event in the batch has a completion batch flag. For the other RAFT groups, > which has been notified in the striped disruptor, required to create an event > to fix a batch into the specific group. The new event will be created in the > common striped disruptor interceptor, and it will send to a specific > interceptor with flag about batch completion. > The rule of handling the new event is differenced for various interceptor: > {code:java|title=title=ApplyTaskHandler (FSMCallerImpl#runApplyTask)} > if (maxCommittedIndex >= 0) { > doCommitted(maxCommittedIndex); > return -1; > } > {code} > {code:java|title=LogEntryAndClosureHandler(LogEntryAndClosureHandler#onEvent)} > if (this.tasks.size() > 0) { > executeApplyingTasks(this.tasks); > this.tasks.clear(); > } > {code} > {code:java|title=ReadIndexEventHandler(ReadIndexEventHandler#onEvent)} > if (this.events.size() > 0) { > executeReadIndexEvents(this.events); > this.events.clear(); > } > {code} > {code:java|title=StableClosureEventHandler(StableClosureEventHandler#onEvent)} > if (this.ab.size > 0) { > this.lastId = this.ab.flush(); > setDiskId(this.lastId); > } > {code} > Also in bound of this issue, required to rerun benchmarks. Those are expected > to dhow increasing in case with high parallelism in one partition. > There is [an example of the > benchmark|https://github.com/gridgain/apache-ignite-3/tree/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)