Maybe we need to look at some way of providing a hook for tests to wait
reliably for asynch tasks to finish?

The latest round of tests ran OK. One test failed on an unrelated thing
(ConnectionLoss), but this appears to be a transient thing as it's worked
ok the next time around.

I will start getting a release together. Thanks for you help with the
updated tests.
cheers

On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <[email protected]
> wrote:

> The problem is in-flight watchers and async background calls. There’s no
> way to cancel these and they can take time to occur - even after a recipe
> instance is closed.
>
> -Jordan
>
> > On May 31, 2016, at 5:11 PM, Cameron McKenzie <[email protected]>
> wrote:
> >
> > Ok, running it again now.
> >
> > Is the problem that the watcher clean up for the recipes is done
> > asynchronously after they are closed?
> >
> > On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> [email protected]
> >> wrote:
> >
> >> OK - please try now. I added a loop in the “no watchers” checker. If
> there
> >> are remaining watchers, it will sleep a bit and try again.
> >>
> >> -Jordan
> >>
> >>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <[email protected]>
> >> wrote:
> >>>
> >>> Looks like these failures are intermittent. Running them directly in
> >>> Eclipse they seem to be passing. I will run the whole thing again in
> the
> >>> morning and see how it goes.
> >>>
> >>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >> [email protected]>
> >>> wrote:
> >>>
> >>>> There are still 2 tests failing for me:
> >>>>
> >>>> FAILURE! - in
> >>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: One or more child watchers are still
> >> registered:
> >>>> [/test]
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>
> >>>> FAILURE! - in
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: expected [true] but found [false]
> >>>> at org.testng.Assert.fail(Assert.java:94)
> >>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>
> >>>> Failed tests:
> >>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
> >>>> watchers are still registered: [/test]
> >>>> Run 2: PASS
> >>>>
> >>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
> >>>> found [false]
> >>>>
> >>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >> [email protected]
> >>>>> wrote:
> >>>>
> >>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that,
> >> and
> >>>>> if it's all good will merge into CURATOR-3.0
> >>>>> cheers
> >>>>>
> >>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I
> >>>>>> made/pushed my changes in CURATOR-332
> >>>>>>
> >>>>>> -jordan
> >>>>>>
> >>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >> [email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> I'm still seeing 6 failed tests that seem related to the same stuff
> >>>>>> after
> >>>>>>> merging your fix:
> >>>>>>>
> >>>>>>> Failed tests:
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
> >> watchers
> >>>>>>> are still registered: [/test]
> >>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
> >> watchers
> >>>>>>> are still registered: [/test]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1:
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>> Run 2:
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
> >>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
> >>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>
> >>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
> but
> >>>>>>> found [false]
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>> Run 1: PASS
> >>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
> data
> >>>>>>> watchers are still registered: [/count]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are
> >>>>>> still
> >>>>>>> registered: [/count]
> >>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are
> >>>>>> still
> >>>>>>> registered: [/count]
> >>>>>>>
> >>>>>>>
> >>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>>> I see the problem. The fix is not simple though so I’ll spend some
> >>>>>> time on
> >>>>>>>> it. The TL;DR is that exists watchers are still supposed to get
> set
> >>>>>> when
> >>>>>>>> there is a KeeperException.NoNode and the code isn’t handling it.
> >> But,
> >>>>>>>> while I was looking at the code I realized there are some
> >> significant
> >>>>>>>> additional problems. Curator, here, is trying to mirror what
> >>>>>> ZooKeeper does
> >>>>>>>> internally which is insanely complicated. In hindsight, the whole
> ZK
> >>>>>>>> watcher mechanism should’ve been decoupled from the mutator APIs.
> >>>>>> But, of
> >>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>
> >>>>>>>> -Jordan
> >>>>>>>>
> >>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks Scott,
> >>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>
> >>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on the
> >> 3.0
> >>>>>>>>> branch. It appears that this is actually potentially a bug in the
> >>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
> quick
> >>>>>> look
> >>>>>>>>> through, but I haven't dived in in any detail. It's the
> >>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can you
> >>>>>> have a
> >>>>>>>>> look? If not, let me know and I'll do some more digging.
> >>>>>>>>>
> >>>>>>>>> cheers
> >>>>>>>>>
> >>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Scott.
> >>>>>>>>>>
> >>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>
> >>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto Nexus.
> >>>>>>>>>> cheers
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >> [email protected]
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
> master
> >>>>>> and
> >>>>>>>> 3.0.
> >>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
> >>>>>>>> [email protected]
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are failing
> >>>>>> there.
> >>>>>>>>>>>> cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
> >>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
> >>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few times
> >> but
> >>>>>> no
> >>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
> Given
> >>>>>> that
> >>>>>>>>>>>>> these
> >>>>>>>>>>>>>>> may take some messing about to fix up, do we just want to
> >> vote
> >>>>>> on
> >>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema validation
> >>>>>> stuff.
> >>>>>>>>>>> It
> >>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because
> the
> >>>>>> unit
> >>>>>>>>>>>> test
> >>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>   final String adjustedPath =
> >>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>   List<ACL> aclList = acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>
> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>   String returnPath = null;
> >>>>>>>>>>>>>>>>>>   if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>   {
> >>>>>>>>>>>>>>>>>>       pathInBackground(adjustedPath, data, givenPath);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
> >> failure
> >>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
> >>>>>>>>>>> expectation is
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
> maybe
> >>>>>>>>>>>> something
> >>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I
> get
> >>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the
> >>>>>> master
> >>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>> There's a test
> >> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>> that
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try
> and
> >>>>>>>>>>> provoke
> >>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >> CreateBuilderImpl
> >>>>>>>>>>> prior
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception
> that
> >>>>>> it
> >>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
> throws
> >>>>>> an
> >>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated up
> >> the
> >>>>>>>>>>> stack
> >>>>>>>>>>>> at
> >>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
> >>>>>> understand
> >>>>>>>>>>> how
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Reply via email to