[
https://issues.apache.org/jira/browse/JCS-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Vandahl resolved JCS-242.
--------------------------------
Fix Version/s: jcs-4.0
Resolution: Fixed
> Lateral Cache init timing bug
> -----------------------------
>
> Key: JCS-242
> URL: https://issues.apache.org/jira/browse/JCS-242
> Project: Commons JCS
> Issue Type: Bug
> Components: TCP Lateral Cache
> Affects Versions: jcs-3.2.1
> Reporter: Lukas Doros
> Assignee: Thomas Vandahl
> Priority: Major
> Fix For: jcs-4.0
>
> Attachments: lateral-bug.zip
>
>
> When starting up Lateral Cache and a remote node is not available, it will
> not be retried.
> *Scenario:*
> 2 nodes, A and B. Both are shutdown.
> A is starting, B is not available yet, therefore connecting fails.
> B starts, can connect to A.
> A will not try again.
> *Reason/Problem:*
> LateralTCPCacheFactory (line 278 following)
> {code:java}
> newService = new LateralTCPService<>(lca, elementSerializer);
> }
> catch ( final IOException ex )
> {
> // Failed to connect to the lateral server.
> // Configure this LateralCacheManager instance to use the
> // "zombie" services.
> log.error( "Failure, lateral instance will use zombie service", ex );
> newService = new
> ZombieCacheServiceNonLocal<>(lca.getZombieQueueMaxSize());
> // Notify the cache monitor about the error, and kick off
> // the recovery process.
> monitor.notifyError();
> } {code}
> new LateralTCPService fails, monitor is notified about the issue and is
> expected to retry the connect.
> BUT when the monitor immediatly tries to reconnect, it fails.
>
> LateralCacheMonitor (line 113 following)
> {code:java}
> caches.forEach((cacheName, cache) -> {
> if (cache.getStatus() == CacheStatus.ERROR)
> {
> log.info( "Found LateralCacheNoWait in error, " + cacheName );
> final ITCPLateralCacheAttributes lca =
> (ITCPLateralCacheAttributes)
> cache.getAuxiliaryCacheAttributes();
> // Get service instance
> final ICacheServiceNonLocal<Object, Object> cacheService =
> factory.getCSNLInstance(lca, cache.getElementSerializer());
> // If we can't fix them, just skip and re-try in the
> // next round.
> if (!(cacheService instanceof ZombieCacheServiceNonLocal))
> {
> cache.fixCache(cacheService);
> }
> }
> }); {code}
> At this time, "caches" is empty, nothing is done and 'allright' is set to
> true.
>
> Back to LateralTCPCacheFactory (line 111 following).
> At line 114 'caches' is populated, but that's to late.
> {code:java}
> final LateralCacheNoWait<K, V> lateralNoWait = createCacheNoWait(lacClone,
> cacheEventLogger, elementSerializer); // <-- inside here exception is
> catched and monitor notified
> addListenerIfNeeded( lacClone, cacheMgr, elementSerializer );
> monitorCache(lateralNoWait); // <-- here 'caches' is populated.
> noWaits.add( lateralNoWait ); {code}
>
> *Possible Solution:*
> {code:java}
> final LateralCacheNoWait<K, V> lateralNoWait = createCacheNoWait(lacClone,
> cacheEventLogger, elementSerializer);
> addListenerIfNeeded( lacClone, cacheMgr, elementSerializer );
> monitorCache(lateralNoWait);
> noWaits.add( lateralNoWait );
> // CHANGE START
> if (lateralNoWait.getStatus() == CacheStatus.ERROR) {
> monitor.notifyError();
> }
> // CHANGE END{code}
> Notifying monitor after 'caches' is populated.
>
> *Addendum:*
> I've attached a project with a test case for this problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)