Re: CURATOR-3.0 tests

Jordan Zimmerman Wed, 01 Jun 2016 21:58:52 -0700

OK - I got a failure even with the line commented out. However, I found a 
similar line in LockInternals. I’m going to comment that out too and retest.


> On Jun 1, 2016, at 11:55 PM, Jordan Zimmerman <[email protected]> 
> wrote:
> 
> My current testing suggests that the problem is the call to:
> 
>       client.removeWatchers();
> 
> in InterProcessSemaphoreV2
> 
> if I comment out that line your test has yet to fail for me. Maybe you can 
> verify. I’ll also look at why this is causing the failure.
> 
>> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <[email protected]> wrote:
>> 
>> The counter is just being used to check if semaphores are still being
>> acquired. Essentially it just runs in a loop acquiring semaphores (and
>> incrementing the counter when they are acquired).
>> 
>> Then it shuts down the server, waits until it the session is lost, then
>> restarts the server and then checks that semaphores are being acquired
>> correctly again (by checking that the counter is being incremented).
>> 
>> This is just a simplified version of the test that is failing.
>> 
>> When the test fails, all of the threads are attempting to get a lease on
>> the semaphore, but none of them get it, then the test times out while
>> waiting.
>> 
>> 
>> 
>> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <[email protected]
>>> wrote:
>> 
>>> I also had to add:
>>> 
>>> while(!lost.get() && (counter.get() > 0))
>>> {
>>>   Thread.sleep(1000);
>>> }
>>> Which seems more correct to me.
>>> 
>>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <[email protected]>
>>> wrote:
>>>> 
>>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>>> in
>>>> TestInterprocessMutexNotReconnecting
>>>> 
>>>> For me it's failing around 20% of the time.
>>>> cheers
>>>> 
>>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>>> [email protected]>
>>>> wrote:
>>>> 
>>>>> Yep, just let me confirm that it's actually getting the same problem.
>>> I'm
>>>>> sure it was before, but I've just run it a bunch of times and
>>> everything's
>>>>> been fine.
>>>>> 
>>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Can you push your unit test somewhere?
>>>>>> 
>>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>>> though.
>>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>>> restart ZK
>>>>>>> about 25% of the time, none of the clients can reacquire the
>>> semaphore.
>>>>>>> 
>>>>>>> Still trying to work out what's going on, but I'm probably not going
>>> to
>>>>>>> have a lot of time today to look at it.
>>>>>>> cheers
>>>>>>> 
>>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>>> 
>>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>>> yet)
>>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>>> 
>>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>>> missing
>>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>>> throws
>>>>>>>> an
>>>>>>>>> exception if they return true. As far as I can work out, this means
>>>>>> that
>>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>>> that
>>>>>>>>> there are Multiple acquirers.
>>>>>>>>> 
>>>>>>>>> This test is failing fairly consistently. It seems to be the
>>> remaining
>>>>>>>> test
>>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>>> cheers
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>>> [email protected]
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
>>> on
>>>>>>>>>> success as well, and the problem is not in the cluster restart.
>>> Will
>>>>>>>> keep
>>>>>>>>>> digging.
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>>> (assertion
>>>>>>>> at
>>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>>> the
>>>>>>>>>>> watcher removal.
>>>>>>>>>>> 
>>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>>>>> fails
>>>>>>>>>>> it seems that it's got something to do with watcher removal. When
>>>>>> the
>>>>>>>> test
>>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>>> 
>>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>>> KeeperErrorCode
>>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>>> at
>>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>>> at
>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>>> 
>>>>>>>>>>> Is it possible it's something to do with the way that the cluster
>>> is
>>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
>>> is
>>>>>>>> just
>>>>>>>>>>> created.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
>>> to
>>>>>>>> wait
>>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>>> unrelated
>>>>>>>> thing
>>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>>> it's
>>>>>>>>>>>> worked
>>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>>> with
>>>>>> the
>>>>>>>>>>>>> updated tests.
>>>>>>>>>>>>> cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>>> There’s
>>>>>>>>>>>> no
>>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>>> after
>>>>>> a
>>>>>>>>>>>> recipe
>>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>>> done
>>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>>> checker.
>>>>>>>> If
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>>> directly
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>>>>> again
>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>> 
>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>>> still
>>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>>> more
>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>> [true]
>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>>> against
>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>>> merged
>>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>>>>> same
>>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>>>>> child
>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>>>>> child
>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>>> 
>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>>> 
>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>> more
>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>> more
>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>>> expected
>>>>>>>>>>>> [true]
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
>>> or
>>>>>>>> more
>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>>>>> spend
>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>>> supposed
>>>>>> to
>>>>>>>>>>>> get
>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>>> handling
>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>>> some
>>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>>>>> what
>>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>>>>> the
>>>>>>>>>>>> whole
>>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>>> mutator
>>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>>> consistently
>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>>>>> bug
>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>>> I've
>>>>>>>> had a
>>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>>> the
>>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>>> time,
>>>>>>>> can
>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>>> digging.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>>>>> onto
>>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>>>>> both
>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>>> are
>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>>>>> few
>>>>>>>>>>>> times
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>>> morning.
>>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>>> just
>>>>>>>> want
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>>> Zimmerman
>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>>>>> Because
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data,
>>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>>>>> force a
>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>>> UnhandledErrorListener,
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>>> McKenzie
>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>>> there,
>>>>>>>> so
>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>>> know
>>>>>>>> if
>>>>>>>>>>>> I
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>>>>> it to
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>>> seems to
>>>>>>>>>>>> try
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>>> exception
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>>> it
>>>>>>>> just
>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
>

Re: CURATOR-3.0 tests

Reply via email to