I've run into the CacheExistsException problem that John mentioned many times, or the associated "A connection to a distributed system already exists in this VM" exception thrown by InternalDistributedSystem.
On 4/14/20, 4:02 PM, "John Blum" <jb...@pivotal.io> wrote: Among other problems I encountered, 1 problem that seemed to affect *Integration Tests* as I described was that the *Singleton* cache reference was not cleaned up in a timely manner. Therefore, starting a fresh cache instance (without coordination) in another *Integration Tests* would occasionally throw a CacheExistsException (IIRC). It would be roughly equivalent to ... Cache cache = new CacheFactory().create(); // create more cache objects (Regions, Indexes, etc) cache.close(); cache = new CacheFactory().create(); // EXCEPTION!!! There is a lot of stuff (even asynchronous things) going on inside cache close that can take time. Even deallocation of system resources does not always happen in a timely manner. Kirk will undoubtedly remember other things he encountered as well. -j On Tue, Apr 14, 2020 at 3:45 PM Mark Hanson <mhan...@pivotal.io> wrote: > I believe it is because of disk persistence among other things. Kirk would > know for sure. He fixed the issue and his PR got shutdown. > I totally support just fixing the bug. > > Let’s give Kirk a chance to chime in. > > Thanks, > Mark > > > On Apr 14, 2020, at 3:39 PM, Dan Smith <dsm...@pivotal.io> wrote: > > > > IMO if it's not currently synchronous, that's just a bug that needs to be > > fixed. If folks can provide details on what actually was asynchronous or > > the particular test failures they saw, that would be helpful. > > > > Previously, when this came up it looks like Kirk found that close would > not > > wait for a different call to close() issued by a different thread [1]. Is > > that still the bug we are running into? One that thread, it seems like we > > also had a consensus we should just fix bugs with Cache.close: > > > > -Dan > > > > 1. > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Ff385a6dd51209e2706c68c9821412a6f22ebef3e809636060ac0bf55%40%253Cdev.geode.apache.org%253E&data=02%7C01%7Chansonm%40vmware.com%7C7a43463ab53c416234d908d7e0c4cc6b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637225008165230328&sdata=GD77kAubDDWfP93zjYsNP61VMN4%2FKBAHcq95GwjeMBc%3D&reserved=0 > > > > On Tue, Apr 14, 2020 at 3:23 PM John Blum <jb...@pivotal.io> wrote: > > > >> My first thought is cache close (i.e. RegionService.close() should be > >> synchronous (by default), perhaps, with non-blocking options or options > to > >> wait for a set timeout as Jake suggested. > >> > >> This is a problem for *Integration Tests* (that start a peer cache > >> instance, in-process or standalone) in general and not simply just > >> "distributed" tests! This is the reason I built support for this in > >> *Spring > >> Test for Apache Geode* (STDG) since subsequent tests requiring a peer > cache > >> instance (or CacheServer) may conflict with each other, especially > given 1) > >> the cache instance is a *Singleton* and 2) it is ideal to not have to > >> restart the JVM between, even for *Integration Tests*, however, turns > out > >> to be a really common practice. *#ugh* However, without proper > >> coordination this can be a real problem, hence STDG's support. Even > when > >> forking JVMs, you still have to be careful to wait in certain cases, as > you > >> could run into other conflicts (e.g. BindExceptions if not varying port > >> numbers and such). For all these reasons and more, it is important that > >> the cache has fully shutdown and released all its resources. > >> > >> Also, while we are on this topic, I think it would be useful to have a > >> dedicated interface for the cache instance lifecycle. It's unfortunate > the > >> CacheListener interface is already taken for Region events. *#sigh* > >> > >> The interface might be something like... > >> > >> interface CacheLifecycleListener { > >> > >> default void isStarting(CacheEvent event) {} > >> > >> default void onStart(CacheEvent event) {} > >> > >> default void isClosing(CacheEvent event) {} > >> > >> default void onClose(CacheEvent event) {} > >> > >> ... > >> > >> } > >> > >> -j > >> > >> On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh <jhu...@pivotal.io> wrote: > >> > >>> The isClosed flag and method might be used currently more as an > >> isClosing. > >>> The GemFireCacheImpl.isClosed() method is actually returning isClosing. > >>> Whatever change to isClosed that will be made, will have to properly > >> handle > >>> cases where it's meant to be treated as isClosing(). > >>> > >>> On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson <hans...@vmware.com> > wrote: > >>> > >>>> Hi Jake, > >>>> > >>>> For Option 6: We could fix isClosed as well. That is a great > >> suggestion. > >>>> Currently, it returns almost immediately. > >>>> I like your options though.... > >>>> > >>>> Any other thoughts? > >>>> > >>>> Any preferences? It think any of the current options seem better than > >> the > >>>> current situation as long as we fix isClosed. > >>>> > >>>> Thanks, > >>>> Mark > >>>> ________________________________ > >>>> From: Jacob Barrett <jbarr...@pivotal.io> > >>>> Sent: Tuesday, April 14, 2020 2:30 PM > >>>> To: dev@geode.apache.org <dev@geode.apache.org> > >>>> Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but > >>>> code still expects it to be.... > >>>> > >>>> Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and > >>>> waits until it is really closed. > >>>> Option 5: Cache.close(Runnable closedCalleback) - Runs callback after > >>>> cache is really close. > >>>> Option 6: cache.close(); while (!cache.isClosed()); > >>>> > >>>> > >>>>> On Apr 14, 2020, at 2:11 PM, Mark Hanson <mhan...@pivotal.io> wrote: > >>>>> > >>>>> Hi All, > >>>>> > >>>>> I know that we have discussed this once before, but I think it bears > >>>> repeating. We have test code that assumes cache.close is synchronous. > >> It > >>> is > >>>> not. Not even close. I would like discuss some possible changes. > >>>>> > >>>>> Option 1. Call it what it is. Deprecate Cache.close and create a new > >>>> method called asyncClose to replace it. Simple and descriptive. > >>>>> Option 2. Fix cache close so it is synchronous. Some might say that > >> we > >>>> are going to break behavior, but I doubt any user relies on the fact > >> that > >>>> it is asynchronous. That would be dangerous in and of itself. > >> Obviously, > >>> we > >>>> don’t want to change behavior, but there have been a number of > >>> distributed > >>>> tests that have failed for this. If internal to the code devs don’t > get > >>> it > >>>> right, where does that leave users. > >>>>> Option 3. Status quo. > >>>>> > >>>>> What do you think? Are there other options I am missing? > >>>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>> > >>>> > >>> > >> > >> > >> -- > >> -John > >> Spring Data Team > >> > > -- -John Spring Data Team