Ok, running it again now. Is the problem that the watcher clean up for the recipes is done asynchronously after they are closed?
On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <[email protected] > wrote: > OK - please try now. I added a loop in the “no watchers” checker. If there > are remaining watchers, it will sleep a bit and try again. > > -Jordan > > > On May 31, 2016, at 1:33 AM, Cameron McKenzie <[email protected]> > wrote: > > > > Looks like these failures are intermittent. Running them directly in > > Eclipse they seem to be passing. I will run the whole thing again in the > > morning and see how it goes. > > > > On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie < > [email protected]> > > wrote: > > > >> There are still 2 tests failing for me: > >> > >> FAILURE! - in > >> org.apache.curator.framework.recipes.cache.TestPathChildrenCache > >> > testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) > >> Time elapsed: 17.488 sec <<< FAILURE! > >> java.lang.AssertionError: One or more child watchers are still > registered: > >> [/test] > >> at > >> > org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53) > >> at > >> > org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707) > >> > >> FAILURE! - in > >> > org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster > >> > testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster) > >> Time elapsed: 96.641 sec <<< FAILURE! > >> java.lang.AssertionError: expected [true] but found [false] > >> at org.testng.Assert.fail(Assert.java:94) > >> at org.testng.Assert.failNotEquals(Assert.java:494) > >> at org.testng.Assert.assertTrue(Assert.java:42) > >> at org.testng.Assert.assertTrue(Assert.java:52) > >> at > >> > org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294) > >> > >> Failed tests: > >> > >> > org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) > >> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child > >> watchers are still registered: [/test] > >> Run 2: PASS > >> > >> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but > >> found [false] > >> > >> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0 > >> > >> > >> > >> > >> > >> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie < > [email protected] > >>> wrote: > >> > >>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that, > and > >>> if it's all good will merge into CURATOR-3.0 > >>> cheers > >>> > >>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman < > >>> [email protected]> wrote: > >>> > >>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I > >>>> made/pushed my changes in CURATOR-332 > >>>> > >>>> -jordan > >>>> > >>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie < > [email protected]> > >>>> wrote: > >>>>> > >>>>> I'm still seeing 6 failed tests that seem related to the same stuff > >>>> after > >>>>> merging your fix: > >>>>> > >>>>> Failed tests: > >>>>> > >>>> > org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) > >>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child > watchers > >>>>> are still registered: [/test] > >>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child > watchers > >>>>> are still registered: [/test] > >>>>> > >>>>> > >>>> > org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) > >>>>> Run 1: > TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934 > >>>>> One or more child watchers are still registered: [/test] > >>>>> Run 2: > TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934 > >>>>> One or more child watchers are still registered: [/test] > >>>>> > >>>>> > >>>> > org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) > >>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child > >>>>> watchers are still registered: [/one/two/three] > >>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child > >>>>> watchers are still registered: [/one/two/three] > >>>>> > >>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but > >>>>> found [false] > >>>>> > >>>> > org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount) > >>>>> Run 1: PASS > >>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more data > >>>>> watchers are still registered: [/count] > >>>>> > >>>>> > >>>> > org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount) > >>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are > >>>> still > >>>>> registered: [/count] > >>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are > >>>> still > >>>>> registered: [/count] > >>>>> > >>>>> > >>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0 > >>>>> > >>>>> > >>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> I see the problem. The fix is not simple though so I’ll spend some > >>>> time on > >>>>>> it. The TL;DR is that exists watchers are still supposed to get set > >>>> when > >>>>>> there is a KeeperException.NoNode and the code isn’t handling it. > But, > >>>>>> while I was looking at the code I realized there are some > significant > >>>>>> additional problems. Curator, here, is trying to mirror what > >>>> ZooKeeper does > >>>>>> internally which is insanely complicated. In hindsight, the whole ZK > >>>>>> watcher mechanism should’ve been decoupled from the mutator APIs. > >>>> But, of > >>>>>> course, that’s easy for me to say now. > >>>>>> > >>>>>> -Jordan > >>>>>> > >>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie < > >>>> [email protected]> > >>>>>> wrote: > >>>>>>> > >>>>>>> Thanks Scott, > >>>>>>> Those tests are now passing for me. > >>>>>>> > >>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on the > 3.0 > >>>>>>> branch. It appears that this is actually potentially a bug in the > >>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a quick > >>>> look > >>>>>>> through, but I haven't dived in in any detail. It's the > >>>>>>> WatcherRemovalManager stuff I think. If you've got time, can you > >>>> have a > >>>>>>> look? If not, let me know and I'll do some more digging. > >>>>>>> > >>>>>>> cheers > >>>>>>> > >>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie < > >>>>>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Thanks Scott. > >>>>>>>> > >>>>>>>> Push the fix to master and merge it into 3.0. > >>>>>>>> > >>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto Nexus. > >>>>>>>> cheers > >>>>>>>> > >>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum < > [email protected] > >>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Alright, I have a fix, but it wants to be applied to both master > >>>> and > >>>>>> 3.0. > >>>>>>>>> Where should I push the fix? > >>>>>>>>> > >>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie < > >>>>>> [email protected] > >>>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks Scott, > >>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are failing > >>>> there. > >>>>>>>>>> cheers > >>>>>>>>>> > >>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum < > >>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Sure, what SHA are they failing at Cam? > >>>>>>>>>>> > >>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman < > >>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Scott can you take a look? > >>>>>>>>>>>> > >>>>>>>>>>>> -Jordan > >>>>>>>>>>>> > >>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie < > >>>>>>>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few times > but > >>>> no > >>>>>>>>>> love: > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151 > >>>>>>>>>>>> actual 6 > >>>>>>>>>>>>> expected -31: > >>>>>>>>>>>>> > >>>>>>>>>>>>> I will have a look into what's going on in the morning. Given > >>>> that > >>>>>>>>>>> these > >>>>>>>>>>>>> may take some messing about to fix up, do we just want to > vote > >>>> on > >>>>>>>>>>> 2.11.0 > >>>>>>>>>>>>> separately, as that is all ready to go? > >>>>>>>>>>>>> cheers > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman < > >>>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Great news. Thanks. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ==================== > >>>>>>>>>>>>>> Jordan Zimmerman > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie < > >>>>>>>>>>> [email protected] > >>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I have fixed up the test case, all good now. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie < > >>>>>>>>>>>>>> [email protected]> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Looks like it was introduced with the schema validation > >>>> stuff. > >>>>>>>>> It > >>>>>>>>>>> now > >>>>>>>>>>>>>> does > >>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because the > >>>> unit > >>>>>>>>>> test > >>>>>>>>>>>>>> uses a > >>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> final String adjustedPath = > >>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath, > >>>>>>>>>>>>>> createMode.isSequential())); > >>>>>>>>>>>>>>>> List<ACL> aclList = acling.getAclList(adjustedPath); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode, > >>>>>>>>>>>>>> data, > >>>>>>>>>>>>>>>> aclList); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> String returnPath = null; > >>>>>>>>>>>>>>>> if ( backgrounding.inBackground() ) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data, givenPath); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a > failure > >>>>>>>>> in a > >>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the > >>>>>>>>> expectation is > >>>>>>>>>>>> that > >>>>>>>>>>>>>>>> this only gets called on backgrounding operations? > >>>>>>>>>>>>>>>> cheers > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie < > >>>>>>>>>>>>>> [email protected]> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so maybe > >>>>>>>>>> something > >>>>>>>>>>>> has > >>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I get > >>>>>>>>> stuck. > >>>>>>>>>>>>>>>>> cheers > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman < > >>>>>>>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Let me know if you need help. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the > >>>> master > >>>>>>>>>>> branch? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -JZ > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie < > >>>>>>>>>>>>>> [email protected]> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Guys, > >>>>>>>>>>>>>>>>>>> There's a test > TestFrameworkBackground:testErrorListener > >>>>>>>>> that > >>>>>>>>>> is > >>>>>>>>>>>>>>>>>> failing > >>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try and > >>>>>>>>> provoke > >>>>>>>>>>> an > >>>>>>>>>>>>>>>>>> error > >>>>>>>>>>>>>>>>>>> via a bad ACL provider. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> But this ACL provider is called by the > CreateBuilderImpl > >>>>>>>>> prior > >>>>>>>>>> to > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception that > >>>> it > >>>>>>>>>> throws > >>>>>>>>>>>>>>>>>> happens > >>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just throws > >>>> an > >>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated up > the > >>>>>>>>> stack > >>>>>>>>>> at > >>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>> point the unit test fails. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't > >>>> understand > >>>>>>>>> how > >>>>>>>>>>> it > >>>>>>>>>>>>>> ever > >>>>>>>>>>>>>>>>>>> worked? > >>>>>>>>>>>>>>>>>>> cheers > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > >
