[ 
https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760119#comment-17760119
 ] 

Raymond Wilson commented on IGNITE-20299:
-----------------------------------------

[~ptupitsyn]

As some additional background information, the process that executed the cache 
creation command hosts two separate Ignite client node JVMs where each is a 
client node of a separate Ignite grid.

The data region in question is a valid data region in one of the grids, but not 
in the one that was asked to create the cache.

The code creating the cache is pretty simple:

{noformat}
    public SiteModelMetadataManager(StorageMutability mutability)
    {
      // Obtain the ignite reference for the primary grid orientation of 
SiteModels
      var ignite = DIContext.Obtain<ITRexGridFactory>()?.Grid(mutability);

      _metaDataCache = ignite?.GetOrCreateCache<Guid, 
ISiteModelMetadata>(ConfigureCache());

      if (_metaDataCache == null)
        throw new TRexException($"Failed to get or create Ignite cache 
{TRexCaches.SiteModelMetadataCacheName()}, ignite reference is {ignite}");
    }
{noformat}

We do not observe the message "Failed to get or create Ignite cache" in our 
logs, implying GetOrCreate cache threw an exception we are not capturing.

The configuration for the cache in question is set up by this code:


{noformat}
    private CacheConfiguration ConfigureCache()
    {
      return new CacheConfiguration
      {
        Name = TRexCaches.SiteModelMetadataCacheName(),
        KeepBinaryInStore = true,
        CacheMode = CacheMode.Replicated,
        DataRegionName = DataRegions.MUTABLE_NONSPATIAL_DATA_REGION
      };
    }
{noformat}




> Creating a cache with an unknown data region name causes total unrecoverable 
> failure of the grid
> ------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-20299
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20299
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.15
>         Environment: Observed in:
> C# client and grid running on Linux in a container
> C# client and grid running on Windows
>  
>            Reporter: Raymond Wilson
>            Priority: Major
>
> Using the Ignite C# client.
>  
> Given a running grid, having a client (and perhaps server) node in the grid 
> attempt to create a cache using a DataRegionName that does not exist in the 
> grid causes immediate failure in the client node with the following log 
> output. 
>  
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Completed 
> partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=9d5ed68d-38bb-447d-aed5-189f52660716, 
> consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList 
> [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, 
> lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, 
> isClient=true], rebalanced=false, done=true, newCrdFut=null], 
> topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Exchange timings 
> [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in 
> exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 
> ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), 
> stage="Total time" (14859 ms)]
> 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer]   Exchange longest 
> local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer]   Finished exchange 
> init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false]
> 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer]   
> AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, 
> evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true]
> Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class 
> org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
>  ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange 
> process.
>  ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: 
> class org.apache.ignite.IgniteCheckedException: Failed to complete exchange 
> process.
>         at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272)
>         at 
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278)
>         at 
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242)
>         at 
> org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643)
>         at 
> org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to complete 
> exchange process.
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3832)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3813)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1796)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1053)
>         at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3348)
>         at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3182)
>         at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
>         at java.base/java.lang.Thread.run(Thread.java:829)
>         Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to 
> initialize exchange locally [locNodeId=e9325b04-00fa-452e-9796-989b47b860ea]
>                 at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1483)
>                 at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:979)
>                 ... 4 more
>         Caused by: class org.apache.ignite.IgniteCheckedException: Requested 
> DataRegion is not configured: Default-Mutable
>                 at 
> org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.dataRegion(IgniteCacheDatabaseSharedManager.java:896)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:2463)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.getOrCreateCacheGroupContext(GridCacheProcessor.java:2181)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:1991)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1926)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$55a0e703$1(GridCacheProcessor.java:1801)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCachesIfPossible$16(GridCacheProcessor.java:1771)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCaches(GridCacheProcessor.java:1798)
>                 at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCachesIfPossible(GridCacheProcessor.java:1769)
>                 at 
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:1000)
>                 at 
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:886)
>                 at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1472)
>                 ... 5 more
>    at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck()
>    at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallObjectMethod(GlobalRef 
> obj, IntPtr methodId, Int64* argsPtr)
>    at 
> Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.TargetInStreamOutObject(GlobalRef
>  target, Int32 opType, Int64 inMemPtr)
>    at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, 
> Action`1 writeAction)
>    --- End of inner exception stack trace —
>    --- End of inner exception stack trace —
>    at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, 
> Action`1 writeAction)
>    at Apache.Ignite.Core.Impl.PlatformTargetAdapter.DoOutOpObject(Int32 type, 
> Action`1 action)
>    at 
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration 
> configuration, NearCacheConfiguration nearConfiguration, 
> PlatformCacheConfiguration platformCacheConfiguration, Op op)
>    at 
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration 
> configuration, NearCacheConfiguration nearConfiguration, 
> PlatformCacheConfiguration platformCacheConfiguration)
>    at 
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration 
> configuration, NearCacheConfiguration nearConfiguration)
>    at 
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration 
> configuration)
>  
>  
> This failure causes issues in the server nodes in the grid which now fail to 
> restart with errors such as below (for the incorrectly create cache) but 
> which are repeated for every defined cache in the grid:
>  
> 2023-08-27 17:11:36,882 [42] INF [ImmutableCacheComputeServer]   Can not 
> finish proxy initialization because proxy does not exist, 
> cacheName=SiteModelMetadata, localNodeId=3d4a75e8-174d-4947-877e-e45784d8d08d
> 2
>  
> At this point the grid is now unusable.
>  
> In summary: Attempted creation of a cache with an unknown DataRegionName 
> causes immediate and unrecoverable failure in the entire grid. 
>  
>  On attempted restarted Ignite notes all caches (including system caches) as 
> being "Started cache in recovery mode" in the log and then scans the WAL. The 
> incorrectly created cache is not mentioned at this point.
> At the point the cluster is activated, (ie: this appears in the log "Started 
> state transition: activate cluster"), it states the incorrectly created cache 
> ("SiteModelMetadata") can not be started, log entry is: "Cache can not be 
> started : cache=SiteModelMetadata"
> This is followed by multiple messages like this: "Finished recovery for cache 
> [cache=ignite-sys-cache, grp=ignite-sys-cache, 
> startVer=AffinityTopologyVersion [topVer=12, minorTopVer=1]] "
>  
> This is followed by errors relating to proxy initialisation, eg: "Can not 
> finish proxy initialization because proxy does not exist, 
> cacheName=ignite-sys-cache, localNodeId=4d44108f-cd96-4953-94db-6365f998a91b"
>  
> All caches are then stopped, eg: "Stopped cache 
> [cacheName=ignite-sys-cache]", and the grid enters a relatively dormant 
> inactivated state  where it only emits Ignite heartbeat messages.
>  
> This bug appears to be trivially easy to reproduce by creating a cache with 
> an unknown data region.
>  
> Attempting to destroy the bad cache in the grid with the Control.sh tool 
> results in this output:
>  
> {noformat}
> # ./control.sh --cache destroy --caches SiteModelMetadata
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.ignite.internal.util.GridUnsafe$2 
> (file:/trex/libs/ignite-core-2.15.0.jar) to field java.nio.Buffer.address
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.ignite.internal.util.GridUnsafe$2
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future 
> releaseWarning! 
> The command will destroy 1 caches: SiteModelMetadata.
> If you continue, the cache data will be impossible to recover.
> Press 'y' to continue . . . y
> [01:53:38,925][SEVERE][session=24175683][CommandHandlerLog] Connection to 
> cluster failed. Latest topology update failed.Control utility [ver. 
> 2.15.0#20230425-sha1:f98f7f35]
> 2023 Copyright(C) Apache Software FoundationUser: rootTime: 
> 2023-08-29T01:53:26.048
> Command [CACHE] startedArguments: --cache destroy --caches 
> SiteModelMetadata--------------------------------------------------------------------------------
> Connection to cluster failed. 
> Latest topology update failed.Command [CACHE] finished with code: 
> Control utility has completed execution at: 2023-08-29T01:53:38.926
> {noformat}
>  
>  The Control.sh comment to list caches also fails.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to