I have reproduced the possible bug I reported in my earlier email. Given a running grid, having a client node in the grid attempt to create a cache using a DataRegionName that does not exist in the grid causes immediate failure in the client node with the following log output.
2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Completed partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode [id=9d5ed68d-38bb-447d-aed5-189f52660716, consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], rebalanced=false, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]] 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Exchange timings [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), stage="Total time" (14859 ms)] 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]] 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer] Finished exchange init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false] 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer] AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true] Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process. ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange process. ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process. at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272) at org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278) at org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242) at org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643) at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process. at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3832) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3813) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1796) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1053) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3348) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3182) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at java.base/java.lang.Thread.run(Thread.java:829) Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to initialize exchange locally [locNodeId=e9325b04-00fa-452e-9796-989b47b860ea] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1483) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:979) ... 4 more Caused by: class org.apache.ignite.IgniteCheckedException: Requested DataRegion is not configured: Default-Mutable at org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.dataRegion(IgniteCacheDatabaseSharedManager.java:896) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:2463) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.getOrCreateCacheGroupContext(GridCacheProcessor.java:2181) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:1991) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1926) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$55a0e703$1(GridCacheProcessor.java:1801) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCachesIfPossible$16(GridCacheProcessor.java:1771) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCaches(GridCacheProcessor.java:1798) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCachesIfPossible(GridCacheProcessor.java:1769) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:1000) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:886) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1472) ... 5 more at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck() at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallObjectMethod(GlobalRef obj, IntPtr methodId, Int64* argsPtr) at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.TargetInStreamOutObject(GlobalRef target, Int32 opType, Int64 inMemPtr) at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, Action`1 writeAction) --- End of inner exception stack trace --- --- End of inner exception stack trace --- at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, Action`1 writeAction) at Apache.Ignite.Core.Impl.PlatformTargetAdapter.DoOutOpObject(Int32 type, Action`1 action) at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration, PlatformCacheConfiguration platformCacheConfiguration, Op op) at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration, PlatformCacheConfiguration platformCacheConfiguration) at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration) at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration) This failure causes issues in the server nodes in the grid which now fail to restart with these errors such as the below (for the incorrectly create cache) but which are repeated for every defined cache in the grid: 2023-08-27 17:11:36,882 [42] INF [ImmutableCacheComputeServer] Can not finish proxy initialization because proxy does not exist, cacheName=SiteModelMetadata, localNodeId=3d4a75e8-174d-4947-877e-e45784d8d08d 2 At this point the grid is now unusable. To summarise: Attempted creation of a cache with an unknown DataRegionName causes immediate and unrecovered failure in the entire grid. Raymond. On Fri, Aug 25, 2023 at 7:47 PM Raymond Wilson <raymond_wil...@trimble.com> wrote: > We believe we had some code on a dev environment attempt to create a cache > that was intended for another Ignite. > > The creation of this cache would have failed (at least) because the data > region referenced in the cache configuration does not exist on that > environment. > > A subsequent restart of the environment some time later started failing to > initialise nodes on which the failed cache would have been stored had it > succeeded. > > The failing nodes report this in the log: > > 2023-08-25 04:20:24,540 [44] WRN [ImmutableCacheComputeServer] Cache > can not be started : cache=SiteModelMetadata > > 2023-08-25 04:20:11,265 [1] WRN [ImmutableCacheComputeServer] WAL > segment tail reached. [idx=414, isWorkDir=true, > serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer@c3719e5, > actualFilePtr=WALPointer [idx=414, fileOff=452480679, len=0]] > > This error implies that (somehow) Ignite considers this to be a cache > existing in the grid and is attempting to set it up. > > Raymond. > > -- <http://www.trimble.com/> Raymond Wilson Trimble Distinguished Engineer, Civil Construction Software (CCS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>