Ivan Artiukhov created IGNITE-20691:
---------------------------------------
Summary: Excessive heap utilization under TPC-H benchmark
Key: IGNITE-20691
URL: https://issues.apache.org/jira/browse/IGNITE-20691
Project: Ignite
Issue Type: Bug
Components: sql
Reporter: Ivan Artiukhov
Attachments: GCeasy-report-gc-poc-tester-SERVER-192.pdf,
ignite-config.json, node_logs.zip, tpch_2023-10-17_08-51-49.config.xml,
tpch_2023-10-17_08-51-49.summary.json
Ignite 3, rev.
c962aa1bbb94e73a4d8ce2403aad3d629dd55666
Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends
requests via JDBC. The code of the benchmark:
[https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
Steps:
* Start an Ignite 3 node with the attached bootstrap config
[^ignite-config.json]
* Start a single instance of the benchmark with the following config:
[^tpch_2023-10-17_08-51-49.config.xml]
*Actual result:*
The node fails with exceptions like the following:
{noformat}
2023-10-17 08:36:48:269 +0300
[WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl
[binaryRowMessages=ArrayList [BinaryRowMessageImpl
[binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 cap=99], schemaVersion=1]],
commitPartitionId=TablePartitionIdMessageImpl [partitionId=20, tableId=26],
full=false, groupId=26_part_21, requestType=RW_INSERT_ALL, skipDelayedAck=true,
term=111248573589356783, timestampLong=111248730707264072,
transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
java.util.concurrent.CompletionException:
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
currentLeaseholder=null]
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
at
java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
at
org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
at
org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
at
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
at
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
at
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
at
java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
at
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by:
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException:
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0,
currentLeaseholder=null]
at
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
... 15 more
{noformat}
The benchmark shows 0 completed requests. See
[^tpch_2023-10-17_08-51-49.summary.json]
[https://gceasy.io/] reports that ~20% of the time the node was in GC pause.
See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]
Logs from the node (including GC log): [^node_logs.zip]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)