My current testing suggests that the problem is the call to:
client.removeWatchers();
in InterProcessSemaphoreV2
if I comment out that line your test has yet to fail for me. Maybe you can
verify. I’ll also look at why this is causing the failure.
> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <[email protected]> wrote:
>
> The counter is just being used to check if semaphores are still being
> acquired. Essentially it just runs in a loop acquiring semaphores (and
> incrementing the counter when they are acquired).
>
> Then it shuts down the server, waits until it the session is lost, then
> restarts the server and then checks that semaphores are being acquired
> correctly again (by checking that the counter is being incremented).
>
> This is just a simplified version of the test that is failing.
>
> When the test fails, all of the threads are attempting to get a lease on
> the semaphore, but none of them get it, then the test times out while
> waiting.
>
>
>
> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <[email protected]
>> wrote:
>
>> I also had to add:
>>
>> while(!lost.get() && (counter.get() > 0))
>> {
>> Thread.sleep(1000);
>> }
>> Which seems more correct to me.
>>
>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <[email protected]>
>> wrote:
>>>
>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>> in
>>> TestInterprocessMutexNotReconnecting
>>>
>>> For me it's failing around 20% of the time.
>>> cheers
>>>
>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>> [email protected]>
>>> wrote:
>>>
>>>> Yep, just let me confirm that it's actually getting the same problem.
>> I'm
>>>> sure it was before, but I've just run it a bunch of times and
>> everything's
>>>> been fine.
>>>>
>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>> [email protected]> wrote:
>>>>
>>>>> Can you push your unit test somewhere?
>>>>>
>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>> though.
>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>> restart ZK
>>>>>> about 25% of the time, none of the clients can reacquire the
>> semaphore.
>>>>>>
>>>>>> Still trying to work out what's going on, but I'm probably not going
>> to
>>>>>> have a lot of time today to look at it.
>>>>>> cheers
>>>>>>
>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>>
>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>> [email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>> yet)
>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>>
>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>> missing
>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>> throws
>>>>>>> an
>>>>>>>> exception if they return true. As far as I can work out, this means
>>>>> that
>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>> that
>>>>>>>> there are Multiple acquirers.
>>>>>>>>
>>>>>>>> This test is failing fairly consistently. It seems to be the
>> remaining
>>>>>>> test
>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>> cheers
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>> [email protected]
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
>> on
>>>>>>>>> success as well, and the problem is not in the cluster restart.
>> Will
>>>>>>> keep
>>>>>>>>> digging.
>>>>>>>>>
>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>> (assertion
>>>>>>> at
>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>> the
>>>>>>>>>> watcher removal.
>>>>>>>>>>
>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>>>> fails
>>>>>>>>>> it seems that it's got something to do with watcher removal. When
>>>>> the
>>>>>>> test
>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>>
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>> KeeperErrorCode
>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>> at
>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>> at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>>
>>>>>>>>>> Is it possible it's something to do with the way that the cluster
>> is
>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
>> is
>>>>>>> just
>>>>>>>>>> created.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>>
>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
>> to
>>>>>>> wait
>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>>
>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>> unrelated
>>>>>>> thing
>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>> it's
>>>>>>>>>>> worked
>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>>
>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>> with
>>>>> the
>>>>>>>>>>>> updated tests.
>>>>>>>>>>>> cheers
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>> There’s
>>>>>>>>>>> no
>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>> after
>>>>> a
>>>>>>>>>>> recipe
>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>> done
>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>> checker.
>>>>>>> If
>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>> directly
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>>>> again
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>
>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>> still
>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>> against
>>>>>>>>>>> that,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>> merged
>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>>>> same
>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>>
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>>
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>> expected
>>>>>>>>>>> [true]
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
>> or
>>>>>>> more
>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>>>> spend
>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>> supposed
>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>> handling
>>>>>>>>>>> it.
>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>> some
>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>>>> what
>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>>>> the
>>>>>>>>>>> whole
>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>> mutator
>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>> consistently
>>>>>>>>>>> on the
>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>>>> bug
>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>> I've
>>>>>>> had a
>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>> the
>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>> time,
>>>>>>> can
>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>> digging.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>>>> onto
>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>>>> both
>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>> are
>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>>>> few
>>>>>>>>>>> times
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>> morning.
>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>> just
>>>>>>> want
>>>>>>>>>>> to
>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>> Zimmerman
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>>>> Because
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data,
>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>>>> force a
>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>> UnhandledErrorListener,
>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>> McKenzie
>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>> there,
>>>>>>> so
>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>> know
>>>>>>> if
>>>>>>>>>>> I
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>>>> it to
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>> seems to
>>>>>>>>>>> try
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>> exception
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>> it
>>>>>>> just
>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>>>> don't
>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>
>>