Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

Bruce Schuchardt Thu, 16 Apr 2020 07:51:02 -0700

I've run into the CacheExistsException problem that John mentioned many times, 
or the associated "A connection to a distributed system already exists in this 
VM" exception thrown by InternalDistributedSystem.


On 4/14/20, 4:02 PM, "John Blum" <jb...@pivotal.io> wrote:

    Among other problems I encountered, 1 problem that seemed to affect
    *Integration
    Tests* as I described was that the *Singleton* cache reference was not
    cleaned up in a timely manner. Therefore, starting a fresh cache instance
    (without coordination) in another *Integration Tests* would occasionally
    throw a CacheExistsException (IIRC).
    
    It would be roughly equivalent to ...
    
    Cache cache = new CacheFactory().create();
    // create more cache objects (Regions, Indexes, etc)
    cache.close();
    cache = new CacheFactory().create();  // EXCEPTION!!!
    
    There is a lot of stuff (even asynchronous things) going on inside cache
    close that can take time.  Even deallocation of system resources does not
    always happen in a timely manner.
    
    Kirk will undoubtedly remember other things he encountered as well.
    
    -j
    
    
    On Tue, Apr 14, 2020 at 3:45 PM Mark Hanson <mhan...@pivotal.io> wrote:
    
    > I believe it is because of disk persistence among other things. Kirk would
    > know for sure. He fixed the issue and his PR got shutdown.
    > I totally support just fixing the bug.
    >
    > Let’s give Kirk a chance to chime in.
    >
    > Thanks,
    > Mark
    >
    > > On Apr 14, 2020, at 3:39 PM, Dan Smith <dsm...@pivotal.io> wrote:
    > >
    > > IMO if it's not currently synchronous, that's just a bug that needs to 
be
    > > fixed. If folks can provide details on what actually was asynchronous or
    > > the particular test failures they saw, that would be helpful.
    > >
    > > Previously, when this came up it looks like Kirk found that close would
    > not
    > > wait for a different call to close() issued by a different thread [1]. 
Is
    > > that still the bug we are running into? One that thread, it seems like 
we
    > > also had a consensus we should just fix bugs with Cache.close:
    > >
    > > -Dan
    > >
    > > 1.
    > >
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Ff385a6dd51209e2706c68c9821412a6f22ebef3e809636060ac0bf55%40%253Cdev.geode.apache.org%253E&amp;data=02%7C01%7Chansonm%40vmware.com%7C7a43463ab53c416234d908d7e0c4cc6b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637225008165230328&amp;sdata=GD77kAubDDWfP93zjYsNP61VMN4%2FKBAHcq95GwjeMBc%3D&amp;reserved=0
    > >
    > > On Tue, Apr 14, 2020 at 3:23 PM John Blum <jb...@pivotal.io> wrote:
    > >
    > >> My first thought is cache close (i.e. RegionService.close()  should be
    > >> synchronous (by default), perhaps, with non-blocking options or options
    > to
    > >> wait for a set timeout as Jake suggested.
    > >>
    > >> This is a problem for *Integration Tests* (that start a peer cache
    > >> instance, in-process or standalone) in general and not simply just
    > >> "distributed" tests!  This is the reason I built support for this in
    > >> *Spring
    > >> Test for Apache Geode* (STDG) since subsequent tests requiring a peer
    > cache
    > >> instance (or CacheServer) may conflict with each other, especially
    > given 1)
    > >> the cache instance is a *Singleton* and 2) it is ideal to not have to
    > >> restart the JVM between, even for *Integration Tests*, however, turns
    > out
    > >> to be a really common practice. *#ugh*  However, without proper
    > >> coordination this can be a real problem, hence STDG's support.  Even
    > when
    > >> forking JVMs, you still have to be careful to wait in certain cases, as
    > you
    > >> could run into other conflicts (e.g. BindExceptions if not varying port
    > >> numbers and such).  For all these reasons and more, it is important 
that
    > >> the cache has fully shutdown and released all its resources.
    > >>
    > >> Also, while we are on this topic, I think it would be useful to have a
    > >> dedicated interface for the cache instance lifecycle.  It's unfortunate
    > the
    > >> CacheListener interface is already taken for Region events. *#sigh*
    > >>
    > >> The interface might be something like...
    > >>
    > >> interface CacheLifecycleListener {
    > >>
    > >>  default void isStarting(CacheEvent event) {}
    > >>
    > >>  default void onStart(CacheEvent event) {}
    > >>
    > >>  default void isClosing(CacheEvent event) {}
    > >>
    > >>  default void onClose(CacheEvent event) {}
    > >>
    > >>  ...
    > >>
    > >> }
    > >>
    > >> -j
    > >>
    > >> On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh <jhu...@pivotal.io> wrote:
    > >>
    > >>> The isClosed flag and method might be used currently more as an
    > >> isClosing.
    > >>> The GemFireCacheImpl.isClosed() method is actually returning 
isClosing.
    > >>> Whatever change to isClosed that will be made, will have to properly
    > >> handle
    > >>> cases where it's meant to be treated as isClosing().
    > >>>
    > >>> On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson <hans...@vmware.com>
    > wrote:
    > >>>
    > >>>> Hi Jake,
    > >>>>
    > >>>> For Option 6: We could fix isClosed as well. That is a great
    > >> suggestion.
    > >>>> Currently, it returns almost immediately.
    > >>>> I like your options though....
    > >>>>
    > >>>> Any other thoughts?
    > >>>>
    > >>>> Any preferences? It think any of the current options seem better than
    > >> the
    > >>>> current situation as long as we fix isClosed.
    > >>>>
    > >>>> Thanks,
    > >>>> Mark
    > >>>> ________________________________
    > >>>> From: Jacob Barrett <jbarr...@pivotal.io>
    > >>>> Sent: Tuesday, April 14, 2020 2:30 PM
    > >>>> To: dev@geode.apache.org <dev@geode.apache.org>
    > >>>> Subject: Re: [Discuss] Cache.close synchronous is not synchronous, 
but
    > >>>> code still expects it to be....
    > >>>>
    > >>>> Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes 
and
    > >>>> waits until it is really closed.
    > >>>> Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
    > >>>> cache is really close.
    > >>>> Option 6: cache.close(); while (!cache.isClosed());
    > >>>>
    > >>>>
    > >>>>> On Apr 14, 2020, at 2:11 PM, Mark Hanson <mhan...@pivotal.io> wrote:
    > >>>>>
    > >>>>> Hi All,
    > >>>>>
    > >>>>> I know that we have discussed this once before, but I think it bears
    > >>>> repeating. We have test code that assumes cache.close is synchronous.
    > >> It
    > >>> is
    > >>>> not. Not even close. I would like discuss some possible changes.
    > >>>>>
    > >>>>> Option 1. Call it what it is.  Deprecate Cache.close and create a 
new
    > >>>> method called asyncClose to replace it. Simple and descriptive.
    > >>>>> Option 2. Fix cache close so it is synchronous. Some might say that
    > >> we
    > >>>> are going to break behavior, but I doubt any user relies on the fact
    > >> that
    > >>>> it is asynchronous. That would be dangerous in and of itself.
    > >> Obviously,
    > >>> we
    > >>>> don’t want to change behavior, but there have been a number of
    > >>> distributed
    > >>>> tests that have failed for this. If internal to the code devs don’t
    > get
    > >>> it
    > >>>> right, where does that leave users.
    > >>>>> Option 3. Status quo.
    > >>>>>
    > >>>>> What do you think? Are there other options I am missing?
    > >>>>>
    > >>>>> Thanks,
    > >>>>> Mark
    > >>>>>
    > >>>>
    > >>>>
    > >>>
    > >>
    > >>
    > >> --
    > >> -John
    > >> Spring Data Team
    > >>
    >
    >
    
    -- 
    -John
    Spring Data Team

Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

Reply via email to