Ivan Artiukhov created IGNITE-20691:
---------------------------------------

             Summary: Excessive heap utilization under TPC-H benchmark
                 Key: IGNITE-20691
                 URL: https://issues.apache.org/jira/browse/IGNITE-20691
             Project: Ignite
          Issue Type: Bug
          Components: sql
            Reporter: Ivan Artiukhov
         Attachments: GCeasy-report-gc-poc-tester-SERVER-192.pdf, 
ignite-config.json, node_logs.zip, tpch_2023-10-17_08-51-49.config.xml, 
tpch_2023-10-17_08-51-49.summary.json

Ignite 3, rev. 

c962aa1bbb94e73a4d8ce2403aad3d629dd55666

 

Ignite 3 fails under [TPC-H benchmark|https://www.tpc.org/tpch/] which sends 
requests via JDBC. The code of the benchmark: 
[https://github.com/cmu-db/benchbase/tree/main/src/main/resources/benchmarks/tpch]
 

 

Steps:
 * Start an Ignite 3 node with the attached bootstrap config 
[^ignite-config.json]
 * Start a single instance of the benchmark with the following config: 
[^tpch_2023-10-17_08-51-49.config.xml]

*Actual result:*

The node fails with exceptions like the following:
{noformat}
2023-10-17 08:36:48:269 +0300 
[WARNING][%poc-tester-SERVER-192.168.1.80-id-0%metastorage-watch-executor-3][ReplicaManager]
 Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl 
[binaryRowMessages=ArrayList [BinaryRowMessageImpl 
[binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=99 cap=99], schemaVersion=1]], 
commitPartitionId=TablePartitionIdMessageImpl [partitionId=20, tableId=26], 
full=false, groupId=26_part_21, requestType=RW_INSERT_ALL, skipDelayedAck=true, 
term=111248573589356783, timestampLong=111248730707264072, 
transactionId=018b3c21-8952-00a1-0000-000091e0d952]]
java.util.concurrent.CompletionException: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has 
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
currentLeaseholder=null]
    at 
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
    at 
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
    at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
    at 
java.base/java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
    at 
org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
    at 
org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
    at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
    at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:281)
    at 
java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:fe81fc3b-bf42-4433-94f3-b460e1542523 The primary replica has 
changed [expectedLeaseholder=poc-tester-SERVER-192.168.1.80-id-0, 
currentLeaseholder=null]
    at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$182(PartitionReplicaListener.java:2666)
    at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
    ... 15 more
{noformat}
 The benchmark shows 0 completed requests. See 
[^tpch_2023-10-17_08-51-49.summary.json]

[https://gceasy.io/] reports that ~20% of the time the node was in GC pause. 
See [^GCeasy-report-gc-poc-tester-SERVER-192.pdf]

Logs from the node (including GC log): [^node_logs.zip]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to