[ 
https://issues.apache.org/jira/browse/JCS-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Vandahl resolved JCS-242.
--------------------------------
    Fix Version/s: jcs-4.0
       Resolution: Fixed

> Lateral Cache init timing bug
> -----------------------------
>
>                 Key: JCS-242
>                 URL: https://issues.apache.org/jira/browse/JCS-242
>             Project: Commons JCS
>          Issue Type: Bug
>          Components: TCP Lateral Cache
>    Affects Versions: jcs-3.2.1
>            Reporter: Lukas Doros
>            Assignee: Thomas Vandahl
>            Priority: Major
>             Fix For: jcs-4.0
>
>         Attachments: lateral-bug.zip
>
>
> When starting up Lateral Cache and a remote node is not available, it will 
> not be retried.
> *Scenario:*
> 2 nodes, A and B. Both are shutdown.
> A is starting, B is not available yet, therefore connecting fails.
> B starts, can connect to A.
> A will not try again.
> *Reason/Problem:*
> LateralTCPCacheFactory (line 278 following)
> {code:java}
>     newService = new LateralTCPService<>(lca, elementSerializer);
> }
> catch ( final IOException ex )
> {
>     // Failed to connect to the lateral server.
>     // Configure this LateralCacheManager instance to use the
>     // "zombie" services.
>     log.error( "Failure, lateral instance will use zombie service", ex );
>     newService = new 
> ZombieCacheServiceNonLocal<>(lca.getZombieQueueMaxSize());
>     // Notify the cache monitor about the error, and kick off
>     // the recovery process.
>     monitor.notifyError();
> } {code}
> new LateralTCPService fails, monitor is notified about the issue and is 
> expected to retry the connect.
> BUT when the monitor immediatly tries to reconnect, it fails.
>  
> LateralCacheMonitor (line 113 following)
> {code:java}
> caches.forEach((cacheName, cache) -> {
>     if (cache.getStatus() == CacheStatus.ERROR)
>     {
>         log.info( "Found LateralCacheNoWait in error, " + cacheName );
>         final ITCPLateralCacheAttributes lca =
>                 (ITCPLateralCacheAttributes) 
> cache.getAuxiliaryCacheAttributes();
>         // Get service instance
>         final ICacheServiceNonLocal<Object, Object> cacheService =
>                 factory.getCSNLInstance(lca, cache.getElementSerializer());
>         // If we can't fix them, just skip and re-try in the
>         // next round.
>         if (!(cacheService instanceof ZombieCacheServiceNonLocal))
>         {
>             cache.fixCache(cacheService);
>         }
>     }
> }); {code}
> At this time, "caches" is empty, nothing is done and 'allright' is set to 
> true.
>  
> Back to LateralTCPCacheFactory (line 111 following).
> At line 114 'caches' is populated, but that's to late.
> {code:java}
> final LateralCacheNoWait<K, V> lateralNoWait = createCacheNoWait(lacClone, 
> cacheEventLogger, elementSerializer);  // <-- inside here exception is 
> catched and monitor notified
> addListenerIfNeeded( lacClone, cacheMgr, elementSerializer );
> monitorCache(lateralNoWait); // <-- here 'caches' is populated.
> noWaits.add( lateralNoWait ); {code}
>  
> *Possible Solution:*
> {code:java}
> final LateralCacheNoWait<K, V> lateralNoWait = createCacheNoWait(lacClone, 
> cacheEventLogger, elementSerializer);
> addListenerIfNeeded( lacClone, cacheMgr, elementSerializer );
> monitorCache(lateralNoWait);
> noWaits.add( lateralNoWait );
> // CHANGE START
> if (lateralNoWait.getStatus() == CacheStatus.ERROR) {
>     monitor.notifyError();
> } 
> // CHANGE END{code}
> Notifying monitor after 'caches' is populated.
>  
> *Addendum:*
> I've attached a project with a test case for this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to