I’ll try to address this as part of CURATOR-333
> On May 31, 2016, at 7:08 PM, Cameron McKenzie <[email protected]> wrote:
>
> Maybe we need to look at some way of providing a hook for tests to wait
> reliably for asynch tasks to finish?
>
> The latest round of tests ran OK. One test failed on an unrelated thing
> (ConnectionLoss), but this appears to be a transient thing as it's worked
> ok the next time around.
>
> I will start getting a release together. Thanks for you help with the
> updated tests.
> cheers
>
> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <[email protected]
>> wrote:
>
>> The problem is in-flight watchers and async background calls. There’s no
>> way to cancel these and they can take time to occur - even after a recipe
>> instance is closed.
>>
>> -Jordan
>>
>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <[email protected]>
>> wrote:
>>>
>>> Ok, running it again now.
>>>
>>> Is the problem that the watcher clean up for the recipes is done
>>> asynchronously after they are closed?
>>>
>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>> [email protected]
>>>> wrote:
>>>
>>>> OK - please try now. I added a loop in the “no watchers” checker. If
>> there
>>>> are remaining watchers, it will sleep a bit and try again.
>>>>
>>>> -Jordan
>>>>
>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <[email protected]>
>>>> wrote:
>>>>>
>>>>> Looks like these failures are intermittent. Running them directly in
>>>>> Eclipse they seem to be passing. I will run the whole thing again in
>> the
>>>>> morning and see how it goes.
>>>>>
>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>> There are still 2 tests failing for me:
>>>>>>
>>>>>> FAILURE! - in
>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>
>>>>
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>> Time elapsed: 17.488 sec <<< FAILURE!
>>>>>> java.lang.AssertionError: One or more child watchers are still
>>>> registered:
>>>>>> [/test]
>>>>>> at
>>>>>>
>>>>
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>> at
>>>>>>
>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>
>>>>>> FAILURE! - in
>>>>>>
>>>>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>
>>>>
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>> Time elapsed: 96.641 sec <<< FAILURE!
>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>> at
>>>>>>
>>>>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>
>>>>>> Failed tests:
>>>>>>
>>>>>>
>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
>>>>>> watchers are still registered: [/test]
>>>>>> Run 2: PASS
>>>>>>
>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
>>>>>> found [false]
>>>>>>
>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>> [email protected]
>>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that,
>>>> and
>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>> cheers
>>>>>>>
>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I
>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>
>>>>>>>> -jordan
>>>>>>>>
>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I'm still seeing 6 failed tests that seem related to the same stuff
>>>>>>>> after
>>>>>>>>> merging your fix:
>>>>>>>>>
>>>>>>>>> Failed tests:
>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>>>> watchers
>>>>>>>>> are still registered: [/test]
>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>>>> watchers
>>>>>>>>> are still registered: [/test]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1:
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>> Run 2:
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>
>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>> but
>>>>>>>>> found [false]
>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>> Run 1: PASS
>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
>> data
>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are
>>>>>>>> still
>>>>>>>>> registered: [/count]
>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are
>>>>>>>> still
>>>>>>>>> registered: [/count]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I see the problem. The fix is not simple though so I’ll spend some
>>>>>>>> time on
>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to get
>> set
>>>>>>>> when
>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling it.
>>>> But,
>>>>>>>>>> while I was looking at the code I realized there are some
>>>> significant
>>>>>>>>>> additional problems. Curator, here, is trying to mirror what
>>>>>>>> ZooKeeper does
>>>>>>>>>> internally which is insanely complicated. In hindsight, the whole
>> ZK
>>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator APIs.
>>>>>>>> But, of
>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>
>>>>>>>>>> -Jordan
>>>>>>>>>>
>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>
>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on the
>>>> 3.0
>>>>>>>>>>> branch. It appears that this is actually potentially a bug in the
>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
>> quick
>>>>>>>> look
>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can you
>>>>>>>> have a
>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>>
>>>>>>>>>>> cheers
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>
>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>
>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto Nexus.
>>>>>>>>>>>> cheers
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>> [email protected]
>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
>> master
>>>>>>>> and
>>>>>>>>>> 3.0.
>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are failing
>>>>>>>> there.
>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few times
>>>> but
>>>>>>>> no
>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
>> Given
>>>>>>>> that
>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just want to
>>>> vote
>>>>>>>> on
>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema validation
>>>>>>>> stuff.
>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because
>> the
>>>>>>>> unit
>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>> List<ACL> aclList = acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>
>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data, givenPath);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
>>>> failure
>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
>> maybe
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I
>> get
>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the
>>>>>>>> master
>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try
>> and
>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>> CreateBuilderImpl
>>>>>>>>>>>>> prior
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception
>> that
>>>>>>>> it
>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
>> throws
>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated up
>>>> the
>>>>>>>>>>>>> stack
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
>>>>>>>> understand
>>>>>>>>>>>>> how
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>
>>