The problem is in-flight watchers and async background calls. There’s no way to cancel these and they can take time to occur - even after a recipe instance is closed.
-Jordan > On May 31, 2016, at 5:11 PM, Cameron McKenzie <[email protected]> wrote: > > Ok, running it again now. > > Is the problem that the watcher clean up for the recipes is done > asynchronously after they are closed? > > On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <[email protected] >> wrote: > >> OK - please try now. I added a loop in the “no watchers” checker. If there >> are remaining watchers, it will sleep a bit and try again. >> >> -Jordan >> >>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <[email protected]> >> wrote: >>> >>> Looks like these failures are intermittent. Running them directly in >>> Eclipse they seem to be passing. I will run the whole thing again in the >>> morning and see how it goes. >>> >>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie < >> [email protected]> >>> wrote: >>> >>>> There are still 2 tests failing for me: >>>> >>>> FAILURE! - in >>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache >>>> >> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>> Time elapsed: 17.488 sec <<< FAILURE! >>>> java.lang.AssertionError: One or more child watchers are still >> registered: >>>> [/test] >>>> at >>>> >> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53) >>>> at >>>> >> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707) >>>> >>>> FAILURE! - in >>>> >> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster >>>> >> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster) >>>> Time elapsed: 96.641 sec <<< FAILURE! >>>> java.lang.AssertionError: expected [true] but found [false] >>>> at org.testng.Assert.fail(Assert.java:94) >>>> at org.testng.Assert.failNotEquals(Assert.java:494) >>>> at org.testng.Assert.assertTrue(Assert.java:42) >>>> at org.testng.Assert.assertTrue(Assert.java:52) >>>> at >>>> >> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294) >>>> >>>> Failed tests: >>>> >>>> >> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child >>>> watchers are still registered: [/test] >>>> Run 2: PASS >>>> >>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but >>>> found [false] >>>> >>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0 >>>> >>>> >>>> >>>> >>>> >>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie < >> [email protected] >>>>> wrote: >>>> >>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that, >> and >>>>> if it's all good will merge into CURATOR-3.0 >>>>> cheers >>>>> >>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman < >>>>> [email protected]> wrote: >>>>> >>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I >>>>>> made/pushed my changes in CURATOR-332 >>>>>> >>>>>> -jordan >>>>>> >>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie < >> [email protected]> >>>>>> wrote: >>>>>>> >>>>>>> I'm still seeing 6 failed tests that seem related to the same stuff >>>>>> after >>>>>>> merging your fix: >>>>>>> >>>>>>> Failed tests: >>>>>>> >>>>>> >> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child >> watchers >>>>>>> are still registered: [/test] >>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child >> watchers >>>>>>> are still registered: [/test] >>>>>>> >>>>>>> >>>>>> >> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>> Run 1: >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934 >>>>>>> One or more child watchers are still registered: [/test] >>>>>>> Run 2: >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934 >>>>>>> One or more child watchers are still registered: [/test] >>>>>>> >>>>>>> >>>>>> >> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child >>>>>>> watchers are still registered: [/one/two/three] >>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child >>>>>>> watchers are still registered: [/one/two/three] >>>>>>> >>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but >>>>>>> found [false] >>>>>>> >>>>>> >> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount) >>>>>>> Run 1: PASS >>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more data >>>>>>> watchers are still registered: [/count] >>>>>>> >>>>>>> >>>>>> >> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount) >>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are >>>>>> still >>>>>>> registered: [/count] >>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are >>>>>> still >>>>>>> registered: [/count] >>>>>>> >>>>>>> >>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0 >>>>>>> >>>>>>> >>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I see the problem. The fix is not simple though so I’ll spend some >>>>>> time on >>>>>>>> it. The TL;DR is that exists watchers are still supposed to get set >>>>>> when >>>>>>>> there is a KeeperException.NoNode and the code isn’t handling it. >> But, >>>>>>>> while I was looking at the code I realized there are some >> significant >>>>>>>> additional problems. Curator, here, is trying to mirror what >>>>>> ZooKeeper does >>>>>>>> internally which is insanely complicated. In hindsight, the whole ZK >>>>>>>> watcher mechanism should’ve been decoupled from the mutator APIs. >>>>>> But, of >>>>>>>> course, that’s easy for me to say now. >>>>>>>> >>>>>>>> -Jordan >>>>>>>> >>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie < >>>>>> [email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks Scott, >>>>>>>>> Those tests are now passing for me. >>>>>>>>> >>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on the >> 3.0 >>>>>>>>> branch. It appears that this is actually potentially a bug in the >>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a quick >>>>>> look >>>>>>>>> through, but I haven't dived in in any detail. It's the >>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can you >>>>>> have a >>>>>>>>> look? If not, let me know and I'll do some more digging. >>>>>>>>> >>>>>>>>> cheers >>>>>>>>> >>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie < >>>>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Scott. >>>>>>>>>> >>>>>>>>>> Push the fix to master and merge it into 3.0. >>>>>>>>>> >>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto Nexus. >>>>>>>>>> cheers >>>>>>>>>> >>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum < >> [email protected] >>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both master >>>>>> and >>>>>>>> 3.0. >>>>>>>>>>> Where should I push the fix? >>>>>>>>>>> >>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie < >>>>>>>> [email protected] >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Scott, >>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are failing >>>>>> there. >>>>>>>>>>>> cheers >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum < >>>>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Sure, what SHA are they failing at Cam? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Scott can you take a look? >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Jordan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie < >>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few times >> but >>>>>> no >>>>>>>>>>>> love: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151 >>>>>>>>>>>>>> actual 6 >>>>>>>>>>>>>>> expected -31: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will have a look into what's going on in the morning. Given >>>>>> that >>>>>>>>>>>>> these >>>>>>>>>>>>>>> may take some messing about to fix up, do we just want to >> vote >>>>>> on >>>>>>>>>>>>> 2.11.0 >>>>>>>>>>>>>>> separately, as that is all ready to go? >>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Great news. Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ==================== >>>>>>>>>>>>>>>> Jordan Zimmerman >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie < >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have fixed up the test case, all good now. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema validation >>>>>> stuff. >>>>>>>>>>> It >>>>>>>>>>>>> now >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because the >>>>>> unit >>>>>>>>>>>> test >>>>>>>>>>>>>>>> uses a >>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> final String adjustedPath = >>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath, >>>>>>>>>>>>>>>> createMode.isSequential())); >>>>>>>>>>>>>>>>>> List<ACL> aclList = acling.getAclList(adjustedPath); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode, >>>>>>>>>>>>>>>> data, >>>>>>>>>>>>>>>>>> aclList); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> String returnPath = null; >>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() ) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data, givenPath); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a >> failure >>>>>>>>>>> in a >>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the >>>>>>>>>>> expectation is >>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations? >>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so maybe >>>>>>>>>>>> something >>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I get >>>>>>>>>>> stuck. >>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Let me know if you need help. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the >>>>>> master >>>>>>>>>>>>> branch? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -JZ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Guys, >>>>>>>>>>>>>>>>>>>>> There's a test >> TestFrameworkBackground:testErrorListener >>>>>>>>>>> that >>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>> failing >>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try and >>>>>>>>>>> provoke >>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>> error >>>>>>>>>>>>>>>>>>>>> via a bad ACL provider. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the >> CreateBuilderImpl >>>>>>>>>>> prior >>>>>>>>>>>> to >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception that >>>>>> it >>>>>>>>>>>> throws >>>>>>>>>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just throws >>>>>> an >>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated up >> the >>>>>>>>>>> stack >>>>>>>>>>>> at >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>> point the unit test fails. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't >>>>>> understand >>>>>>>>>>> how >>>>>>>>>>>>> it >>>>>>>>>>>>>>>> ever >>>>>>>>>>>>>>>>>>>>> worked? >>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >> >>
