gswcomputing opened a new issue, #12623:
URL: https://github.com/apache/ignite/issues/12623

   Hello guys, I’m using Apache Ignite 2.16.0/2.17.0 in a production 
environment with a 15 server-nodes cluster.
   
   A deadlock occurred when one of the nodes(Replace with ip1) was executing 
`org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy#query(org.apache.ignite.cache.query.SqlFieldsQuery)`.
   
   Thread stack is as follows:
   "xxx" Id=317 TIMED_WAITING on 
java.util.concurrent.CountDownLatch$Sync@9342695
       at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
       -  waiting on java.util.concurrent.CountDownLatch$Sync@9342695
       at 
[email protected]/java.util.concurrent.locks.LockSupport.parkNanos(Unknown 
Source)
       at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
 Source)
       at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(Unknown
 Source)
       at [email protected]/java.util.concurrent.CountDownLatch.await(Unknown 
Source)
       at 
org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:8228)
       at 
org.apache.ignite.internal.processors.query.h2.twostep.ReduceQueryRun.tryMapToSources(ReduceQueryRun.java:218)
       at 
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.awaitAllReplies(GridReduceQueryExecutor.java:1065)
       at 
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:448)
       at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$5.iterator(IgniteH2Indexing.java:1447)
       at 
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iter(QueryCursorImpl.java:102)
       at 
org.apache.ignite.internal.processors.query.h2.RegisteredQueryCursor.iter(RegisteredQueryCursor.java:91)
       at 
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:92)
   
   By checking the logs, it was found that one of the nodes in the cluster 
restarted while the query was being executed.
   reboot   system boot  5.10.0-136.12.0. Mon Mar  4 19:51 - 15:10 (3+19:19)
   
   At this time, checking the latest topology baseline, it was found that the 
node where the thread was stuck was only the one with my own IP:
   globalState=DiscoveryDataClusterState [state=ACTIVE, 
lastStateChangeTime=xxx, baselineTopology=BaselineTopology [id=0, 
branchingHash=-708844738, branchingType='New BaselineTopology', 
baselineNodes=[ip1:port1]]
   
   My ignite configuration is as follows:
   IgniteConfiguration igniteCfg = new IgniteConfiguration();
   TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
   ipFinder.setAddresses(addressList:[15 nodes ip]).setShared(false);
   TcpDiscoverySpi spi = new TcpDiscoverySpi();
   spi.setIpFinder(ipFinder);
   DataRegionConfiguration dataRegionConfiguration = new 
DataRegionConfiguration();
   dataRegionConfiguration.setPersistenceEnabled(false);
   
igniteCfg.setDiscoverySpi(spi).setDataStorageConfiguration(dataRegionConfiguration);
   CacheConfiguration cacheCfg = new CacheConfiguration<>(cacheName);
   cacheCfg.setCacheMode(CacheMode.PARTITIONED)
   .setBackups(0)
   .setIndexedTypes(Integer.class, AlarmRecord.class)
   .setSqlFunctionClasses(ExtIgniteFunctions.class)
   .setRebalanceDelay(-1)
   .setOnheapCacheEnabled(false)
   .setSqlOnheapCacheEnabled(false)
   .setQueryParallelism(2)
   .setRebalanceMode(CacheRebalanceMode.NONE)
   .setAffinity(affFunc);
   
   Finally, I would appreciate guidance on:
   Recommended production configuration
   Any known limitations or best practices to ensure cluster stability and 
avoid full outages
   How should I configure it to ensure that queries already executed during the 
restart of some nodes in the cluster do not get stuck as described above?
   Thank you for your guidance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to