Looking at the cache-SiteModelMetaData folder in the persistence folder for
a server node shows a "cache_data" file 6kb in size. No other cache folders
contain this file.

As an experiment I renamed this file to "cache_dataxxx". This appeared to
be sufficient to permit the grid to restart. Similarly renaming the cache
folder to  "xxxcache-SiteModelMetaData" also permitted the grid to restart;
we will be testing this further to verify.

Raymond.


On Sun, Aug 27, 2023 at 5:20 PM Raymond Wilson <raymond_wil...@trimble.com>
wrote:

> I have reproduced the possible bug I reported in my earlier email.
>
> Given a running grid, having a client node in the grid attempt to create a
> cache using a DataRegionName that does not exist in the grid causes
> immediate failure in the client node with the following log output.
>
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Completed
> partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0,
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
> [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
> [id=9d5ed68d-38bb-447d-aed5-189f52660716,
> consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList
> [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8,
> lastExchangeTime=1693112858024, loc=false,
> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], rebalanced=false,
> done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=15,
> minorTopVer=0]]
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Exchange
> timings [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0],
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting
> in exchange queue" (14850 ms), stage="Exchange parameters initialization"
> (2 ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4
> ms), stage="Total time" (14859 ms)]
> 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer]   Exchange
> longest local stages [startVer=AffinityTopologyVersion [topVer=15,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer]   Finished
> exchange init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0],
> crd=false]
> 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer]
> AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED,
> evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true]
> Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class
> org.apache.ignite.IgniteCheckedException: Failed to complete exchange
> process.
>  ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete
> exchange process.
>  ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException:
> class org.apache.ignite.IgniteCheckedException: Failed to complete exchange
> process.
>         at
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272)
>         at
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278)
>         at
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242)
>         at
> org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643)
>         at
> org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
> complete exchange process.
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3832)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3813)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1796)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1053)
>         at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3348)
>         at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3182)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
>         at java.base/java.lang.Thread.run(Thread.java:829)
>         Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
> to initialize exchange locally
> [locNodeId=e9325b04-00fa-452e-9796-989b47b860ea]
>                 at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1483)
>                 at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:979)
>                 ... 4 more
>         Caused by: class org.apache.ignite.IgniteCheckedException:
> Requested DataRegion is not configured: Default-Mutable
>                 at
> org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.dataRegion(IgniteCacheDatabaseSharedManager.java:896)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:2463)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.getOrCreateCacheGroupContext(GridCacheProcessor.java:2181)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:1991)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1926)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$55a0e703$1(GridCacheProcessor.java:1801)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCachesIfPossible$16(GridCacheProcessor.java:1771)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCaches(GridCacheProcessor.java:1798)
>                 at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCachesIfPossible(GridCacheProcessor.java:1769)
>                 at
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:1000)
>                 at
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:886)
>                 at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1472)
>                 ... 5 more
>
>    at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck()
>    at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallObjectMethod(GlobalRef
> obj, IntPtr methodId, Int64* argsPtr)
>    at
> Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.TargetInStreamOutObject(GlobalRef
> target, Int32 opType, Int64 inMemPtr)
>    at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32
> type, Action`1 writeAction)
>    --- End of inner exception stack trace ---
>    --- End of inner exception stack trace ---
>    at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32
> type, Action`1 writeAction)
>    at Apache.Ignite.Core.Impl.PlatformTargetAdapter.DoOutOpObject(Int32
> type, Action`1 action)
>    at
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration
> configuration, NearCacheConfiguration nearConfiguration,
> PlatformCacheConfiguration platformCacheConfiguration, Op op)
>    at
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration
> configuration, NearCacheConfiguration nearConfiguration,
> PlatformCacheConfiguration platformCacheConfiguration)
>    at
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration
> configuration, NearCacheConfiguration nearConfiguration)
>    at
> Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration
> configuration)
>
>
> This failure causes issues in the server nodes in the grid which now fail
> to restart with these errors such as the below (for the incorrectly create
> cache) but which are repeated for every defined cache in the grid:
>
> 2023-08-27 17:11:36,882 [42] INF [ImmutableCacheComputeServer]   Can not
> finish proxy initialization because proxy does not exist,
> cacheName=SiteModelMetadata,
> localNodeId=3d4a75e8-174d-4947-877e-e45784d8d08d
> 2
>
> At this point the grid is now unusable.
>
> To summarise: Attempted creation of a cache with an unknown DataRegionName
> causes immediate and unrecovered failure in the entire grid.
>
> Raymond.
>
>
> On Fri, Aug 25, 2023 at 7:47 PM Raymond Wilson <raymond_wil...@trimble.com>
> wrote:
>
>> We believe we had some code on a dev environment attempt to create a
>> cache that was intended for another Ignite.
>>
>> The creation of this cache would have failed (at least) because the data
>> region referenced in the cache configuration does not exist on that
>> environment.
>>
>> A subsequent restart of the environment some time later started failing
>> to initialise nodes on which the failed cache would have been stored had it
>> succeeded.
>>
>> The failing nodes report this in the log:
>>
>> 2023-08-25 04:20:24,540 [44] WRN [ImmutableCacheComputeServer]   Cache
>> can not be started : cache=SiteModelMetadata
>>
>> 2023-08-25 04:20:11,265 [1] WRN [ImmutableCacheComputeServer]   WAL
>> segment tail reached. [idx=414, isWorkDir=true,
>> serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer@c3719e5,
>> actualFilePtr=WALPointer [idx=414, fileOff=452480679, len=0]]
>>
>> This error implies that (somehow) Ignite considers this to be a cache
>> existing in the grid and is attempting to set it up.
>>
>> Raymond.
>>
>>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Trimble Distinguished Engineer, Civil Construction Software (CCS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to