mdkhusro opened a new issue, #13218: URL: https://github.com/apache/ignite/issues/13218
Description: We are testing Apache Ignite 3.1.0 using a 3-node cluster under heavy JMeter load (~3 million records). Environment: Ignite 3.1.0 Java 17 3 nodes Xms/Xmx = 16GB G1GC Problem: Only node3 experiences severe JVM heap pressure while node1/node2 remain relatively stable. Observed JVM Usage: node1 → ~50-60% node2 → ~60-70% node3 → ~99.96% Old Gen even High CPU usage <img width="710" height="158" alt="Image" src="https://github.com/user-attachments/assets/c2b86dc6-b0bd-4239-b02f-d36e1148b7f7" /> GC Statistics on node3: Full GC Count = 2189 Full GC Time = 16220 sec Errors Observed: 1. IGN-TX-4 Failed to acquire a lock due to a possible deadlock Replication is timed out SYSTEM_WORKER_BLOCKED Example: A critical thread is blocked for 11772 ms: node3-network-worker-5 JRaft PreVote timeout / unsuccessful election rounds Heap Histogram Findings on node3: Very large accumulation of: CompletableFuture (~36 million) PartitionReplicaListener$OperationId (~36 million) TxCleanupReadyFutureList (~36 million) We also observed very high raft activity for Zone 20 partitions (13k+ events in logs). Example workload: requestType=RW_GET_ALL primaryKeys.size=39 Questions: Is this expected under heavy concurrent RW_GET_ALL workload? Could this indicate replication/transaction cleanup backlog or future accumulation issue? Are there recommended tuning settings for this workload pattern? Has this behavior improved in newer Ignite 3 versions? I can provide: GC logs heap histogram thread dumps JVM graphs additional logs if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
