Among other problems I encountered, 1 problem that seemed to affect
*Integration
Tests* as I described was that the *Singleton* cache reference was not
cleaned up in a timely manner. Therefore, starting a fresh cache instance
(without coordination) in another *Integration Tests* would occasionally
throw a CacheExistsException (IIRC).

It would be roughly equivalent to ...

Cache cache = new CacheFactory().create();
// create more cache objects (Regions, Indexes, etc)
cache.close();
cache = new CacheFactory().create();  // EXCEPTION!!!

There is a lot of stuff (even asynchronous things) going on inside cache
close that can take time.  Even deallocation of system resources does not
always happen in a timely manner.

Kirk will undoubtedly remember other things he encountered as well.

-j


On Tue, Apr 14, 2020 at 3:45 PM Mark Hanson <mhan...@pivotal.io> wrote:

> I believe it is because of disk persistence among other things. Kirk would
> know for sure. He fixed the issue and his PR got shutdown.
> I totally support just fixing the bug.
>
> Let’s give Kirk a chance to chime in.
>
> Thanks,
> Mark
>
> > On Apr 14, 2020, at 3:39 PM, Dan Smith <dsm...@pivotal.io> wrote:
> >
> > IMO if it's not currently synchronous, that's just a bug that needs to be
> > fixed. If folks can provide details on what actually was asynchronous or
> > the particular test failures they saw, that would be helpful.
> >
> > Previously, when this came up it looks like Kirk found that close would
> not
> > wait for a different call to close() issued by a different thread [1]. Is
> > that still the bug we are running into? One that thread, it seems like we
> > also had a consensus we should just fix bugs with Cache.close:
> >
> > -Dan
> >
> > 1.
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Ff385a6dd51209e2706c68c9821412a6f22ebef3e809636060ac0bf55%40%253Cdev.geode.apache.org%253E&amp;data=02%7C01%7Chansonm%40vmware.com%7C7a43463ab53c416234d908d7e0c4cc6b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637225008165230328&amp;sdata=GD77kAubDDWfP93zjYsNP61VMN4%2FKBAHcq95GwjeMBc%3D&amp;reserved=0
> >
> > On Tue, Apr 14, 2020 at 3:23 PM John Blum <jb...@pivotal.io> wrote:
> >
> >> My first thought is cache close (i.e. RegionService.close()  should be
> >> synchronous (by default), perhaps, with non-blocking options or options
> to
> >> wait for a set timeout as Jake suggested.
> >>
> >> This is a problem for *Integration Tests* (that start a peer cache
> >> instance, in-process or standalone) in general and not simply just
> >> "distributed" tests!  This is the reason I built support for this in
> >> *Spring
> >> Test for Apache Geode* (STDG) since subsequent tests requiring a peer
> cache
> >> instance (or CacheServer) may conflict with each other, especially
> given 1)
> >> the cache instance is a *Singleton* and 2) it is ideal to not have to
> >> restart the JVM between, even for *Integration Tests*, however, turns
> out
> >> to be a really common practice. *#ugh*  However, without proper
> >> coordination this can be a real problem, hence STDG's support.  Even
> when
> >> forking JVMs, you still have to be careful to wait in certain cases, as
> you
> >> could run into other conflicts (e.g. BindExceptions if not varying port
> >> numbers and such).  For all these reasons and more, it is important that
> >> the cache has fully shutdown and released all its resources.
> >>
> >> Also, while we are on this topic, I think it would be useful to have a
> >> dedicated interface for the cache instance lifecycle.  It's unfortunate
> the
> >> CacheListener interface is already taken for Region events. *#sigh*
> >>
> >> The interface might be something like...
> >>
> >> interface CacheLifecycleListener {
> >>
> >>  default void isStarting(CacheEvent event) {}
> >>
> >>  default void onStart(CacheEvent event) {}
> >>
> >>  default void isClosing(CacheEvent event) {}
> >>
> >>  default void onClose(CacheEvent event) {}
> >>
> >>  ...
> >>
> >> }
> >>
> >> -j
> >>
> >> On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh <jhu...@pivotal.io> wrote:
> >>
> >>> The isClosed flag and method might be used currently more as an
> >> isClosing.
> >>> The GemFireCacheImpl.isClosed() method is actually returning isClosing.
> >>> Whatever change to isClosed that will be made, will have to properly
> >> handle
> >>> cases where it's meant to be treated as isClosing().
> >>>
> >>> On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson <hans...@vmware.com>
> wrote:
> >>>
> >>>> Hi Jake,
> >>>>
> >>>> For Option 6: We could fix isClosed as well. That is a great
> >> suggestion.
> >>>> Currently, it returns almost immediately.
> >>>> I like your options though....
> >>>>
> >>>> Any other thoughts?
> >>>>
> >>>> Any preferences? It think any of the current options seem better than
> >> the
> >>>> current situation as long as we fix isClosed.
> >>>>
> >>>> Thanks,
> >>>> Mark
> >>>> ________________________________
> >>>> From: Jacob Barrett <jbarr...@pivotal.io>
> >>>> Sent: Tuesday, April 14, 2020 2:30 PM
> >>>> To: dev@geode.apache.org <dev@geode.apache.org>
> >>>> Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but
> >>>> code still expects it to be....
> >>>>
> >>>> Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and
> >>>> waits until it is really closed.
> >>>> Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
> >>>> cache is really close.
> >>>> Option 6: cache.close(); while (!cache.isClosed());
> >>>>
> >>>>
> >>>>> On Apr 14, 2020, at 2:11 PM, Mark Hanson <mhan...@pivotal.io> wrote:
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> I know that we have discussed this once before, but I think it bears
> >>>> repeating. We have test code that assumes cache.close is synchronous.
> >> It
> >>> is
> >>>> not. Not even close. I would like discuss some possible changes.
> >>>>>
> >>>>> Option 1. Call it what it is.  Deprecate Cache.close and create a new
> >>>> method called asyncClose to replace it. Simple and descriptive.
> >>>>> Option 2. Fix cache close so it is synchronous. Some might say that
> >> we
> >>>> are going to break behavior, but I doubt any user relies on the fact
> >> that
> >>>> it is asynchronous. That would be dangerous in and of itself.
> >> Obviously,
> >>> we
> >>>> don’t want to change behavior, but there have been a number of
> >>> distributed
> >>>> tests that have failed for this. If internal to the code devs don’t
> get
> >>> it
> >>>> right, where does that leave users.
> >>>>> Option 3. Status quo.
> >>>>>
> >>>>> What do you think? Are there other options I am missing?
> >>>>>
> >>>>> Thanks,
> >>>>> Mark
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> -John
> >> Spring Data Team
> >>
>
>

-- 
-John
Spring Data Team

Reply via email to