OK - I got a failure even with the line commented out. However, I found a similar line in LockInternals. I’m going to comment that out too and retest.
> On Jun 1, 2016, at 11:55 PM, Jordan Zimmerman <[email protected]> > wrote: > > My current testing suggests that the problem is the call to: > > client.removeWatchers(); > > in InterProcessSemaphoreV2 > > if I comment out that line your test has yet to fail for me. Maybe you can > verify. I’ll also look at why this is causing the failure. > >> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <[email protected]> wrote: >> >> The counter is just being used to check if semaphores are still being >> acquired. Essentially it just runs in a loop acquiring semaphores (and >> incrementing the counter when they are acquired). >> >> Then it shuts down the server, waits until it the session is lost, then >> restarts the server and then checks that semaphores are being acquired >> correctly again (by checking that the counter is being incremented). >> >> This is just a simplified version of the test that is failing. >> >> When the test fails, all of the threads are attempting to get a lease on >> the semaphore, but none of them get it, then the test times out while >> waiting. >> >> >> >> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <[email protected] >>> wrote: >> >>> I also had to add: >>> >>> while(!lost.get() && (counter.get() > 0)) >>> { >>> Thread.sleep(1000); >>> } >>> Which seems more correct to me. >>> >>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <[email protected]> >>> wrote: >>>> >>>> I have just pushed an interprocess_mutex_issue branch. The test case is >>> in >>>> TestInterprocessMutexNotReconnecting >>>> >>>> For me it's failing around 20% of the time. >>>> cheers >>>> >>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie < >>> [email protected]> >>>> wrote: >>>> >>>>> Yep, just let me confirm that it's actually getting the same problem. >>> I'm >>>>> sure it was before, but I've just run it a bunch of times and >>> everything's >>>>> been fine. >>>>> >>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman < >>>>> [email protected]> wrote: >>>>> >>>>>> Can you push your unit test somewhere? >>>>>> >>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2 >>> though. >>>>>>> I've written a simplified unit test that just has a bunch of clients >>>>>>> attempting to grab a lease on the semaphore. When I shutdown and >>>>>> restart ZK >>>>>>> about 25% of the time, none of the clients can reacquire the >>> semaphore. >>>>>>> >>>>>>> Still trying to work out what's going on, but I'm probably not going >>> to >>>>>>> have a lot of time today to look at it. >>>>>>> cheers >>>>>>> >>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Odd - SemaphoreClient does seem wrong. >>>>>>>> >>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie < >>> [email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> It looks like under some circumstances (which I haven't worked out >>>>>> yet) >>>>>>>>> that the InterprocessMutex acquire() is not working correctly when >>>>>>>>> reconnecting to ZK. Still digging into why this is. >>>>>>>>> >>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm >>>>>> missing >>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but >>>>>> throws >>>>>>>> an >>>>>>>>> exception if they return true. As far as I can work out, this means >>>>>> that >>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating >>>>>> that >>>>>>>>> there are Multiple acquirers. >>>>>>>>> >>>>>>>>> This test is failing fairly consistently. It seems to be the >>> remaining >>>>>>>> test >>>>>>>>> that keeps failing in the Jenkins build also >>>>>>>>> cheers >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie < >>>>>> [email protected] >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown >>> on >>>>>>>>>> success as well, and the problem is not in the cluster restart. >>> Will >>>>>>>> keep >>>>>>>>>> digging. >>>>>>>>>> >>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie < >>>>>>>> [email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling >>>>>> (assertion >>>>>>>> at >>>>>>>>>>> line 294). Again, it seems like some sort of race condition with >>> the >>>>>>>>>>> watcher removal. >>>>>>>>>>> >>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it >>>>>> fails >>>>>>>>>>> it seems that it's got something to do with watcher removal. When >>>>>> the >>>>>>>> test >>>>>>>>>>> passes, this error is not logged. >>>>>>>>>>> >>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException: >>>>>>>> KeeperErrorCode >>>>>>>>>>> = No such watcher for /foo/bar/lock/leases >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58) >>>>>>>>>>> at >>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712) >>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) >>>>>>>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) >>>>>>>>>>> >>>>>>>>>>> Is it possible it's something to do with the way that the cluster >>> is >>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one >>> is >>>>>>>> just >>>>>>>>>>> created. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I’ll try to address this as part of CURATOR-333 >>>>>>>>>>>> >>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie < >>>>>>>> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests >>> to >>>>>>>> wait >>>>>>>>>>>>> reliably for asynch tasks to finish? >>>>>>>>>>>>> >>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an >>> unrelated >>>>>>>> thing >>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as >>> it's >>>>>>>>>>>> worked >>>>>>>>>>>>> ok the next time around. >>>>>>>>>>>>> >>>>>>>>>>>>> I will start getting a release together. Thanks for you help >>> with >>>>>> the >>>>>>>>>>>>> updated tests. >>>>>>>>>>>>> cheers >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman < >>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The problem is in-flight watchers and async background calls. >>>>>>>> There’s >>>>>>>>>>>> no >>>>>>>>>>>>>> way to cancel these and they can take time to occur - even >>> after >>>>>> a >>>>>>>>>>>> recipe >>>>>>>>>>>>>> instance is closed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Jordan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie < >>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok, running it again now. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is >>> done >>>>>>>>>>>>>>> asynchronously after they are closed? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman < >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers” >>>>>> checker. >>>>>>>> If >>>>>>>>>>>>>> there >>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Jordan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie < >>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them >>>>>> directly >>>>>>>>>>>> in >>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing >>>>>> again >>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> morning and see how it goes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There are still 2 tests failing for me: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> FAILURE! - in >>>>>>>>>>>>>>>>>> >>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec <<< FAILURE! >>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are >>>>>> still >>>>>>>>>>>>>>>> registered: >>>>>>>>>>>>>>>>>> [/test] >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> FAILURE! - in >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster) >>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec <<< FAILURE! >>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false] >>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94) >>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494) >>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42) >>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Failed tests: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or >>>>>> more >>>>>>>>>>>> child >>>>>>>>>>>>>>>>>> watchers are still registered: [/test] >>>>>>>>>>>>>>>>>> Run 2: PASS >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected >>>>>> [true] >>>>>>>>>>>> but >>>>>>>>>>>>>>>>>> found [false] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie < >>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests >>>>>> against >>>>>>>>>>>> that, >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0 >>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is >>> merged >>>>>>>>>>>> yet. I >>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -jordan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the >>>>>> same >>>>>>>>>>>> stuff >>>>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>> merging your fix: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Failed tests: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more >>>>>> child >>>>>>>>>>>>>>>> watchers >>>>>>>>>>>>>>>>>>>>> are still registered: [/test] >>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more >>>>>> child >>>>>>>>>>>>>>>> watchers >>>>>>>>>>>>>>>>>>>>> are still registered: [/test] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>>>>>>>>>>>>>>>> Run 1: >>>>>>>>>>>>>>>> >>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934 >>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test] >>>>>>>>>>>>>>>>>>>>> Run 2: >>>>>>>>>>>>>>>> >>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934 >>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache) >>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or >>>>>> more >>>>>>>>>>>> child >>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three] >>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or >>>>>> more >>>>>>>>>>>> child >>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 >>> expected >>>>>>>>>>>> [true] >>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>>> found [false] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount) >>>>>>>>>>>>>>>>>>>>> Run 1: PASS >>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One >>> or >>>>>>>> more >>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount) >>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data >>>>>>>>>>>> watchers are >>>>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>>>> registered: [/count] >>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data >>>>>>>>>>>> watchers are >>>>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>>>> registered: [/count] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll >>>>>>>> spend >>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>> time on >>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still >>> supposed >>>>>> to >>>>>>>>>>>> get >>>>>>>>>>>>>> set >>>>>>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t >>>>>>>> handling >>>>>>>>>>>> it. >>>>>>>>>>>>>>>> But, >>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are >>> some >>>>>>>>>>>>>>>> significant >>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror >>>>>> what >>>>>>>>>>>>>>>>>>>> ZooKeeper does >>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight, >>>>>> the >>>>>>>>>>>> whole >>>>>>>>>>>>>> ZK >>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the >>>>>> mutator >>>>>>>>>>>> APIs. >>>>>>>>>>>>>>>>>>>> But, of >>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -Jordan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie < >>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks Scott, >>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing >>>>>> consistently >>>>>>>>>>>> on the >>>>>>>>>>>>>>>> 3.0 >>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a >>>>>> bug >>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. >>> I've >>>>>>>> had a >>>>>>>>>>>>>> quick >>>>>>>>>>>>>>>>>>>> look >>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's >>> the >>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got >>> time, >>>>>>>> can >>>>>>>>>>>> you >>>>>>>>>>>>>>>>>>>> have a >>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more >>> digging. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie < >>>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 >>>>>> onto >>>>>>>>>>>> Nexus. >>>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum < >>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to >>>>>> both >>>>>>>>>>>>>> master >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> 3.0. >>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie < >>>>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott, >>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they >>> are >>>>>>>>>>>> failing >>>>>>>>>>>>>>>>>>>> there. >>>>>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum < >>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman >>> < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a >>>>>> few >>>>>>>>>>>> times >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>>>>>>> love: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151 >>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the >>>>>>>> morning. >>>>>>>>>>>>>> Given >>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we >>> just >>>>>>>> want >>>>>>>>>>>> to >>>>>>>>>>>>>>>> vote >>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan >>> Zimmerman >>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ==================== >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron >>> McKenzie < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron >>>>>> McKenzie < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema >>>>>>>>>>>> validation >>>>>>>>>>>>>>>>>>>> stuff. >>>>>>>>>>>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>>>>>>>>>>>>> now >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. >>>>>>>>>>>> Because >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> unit >>>>>>>>>>>>>>>>>>>>>>>>>> test >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an >>> exception >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath = >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential())); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList = >>>>>>>>>>>> acling.getAclList(adjustedPath); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() ) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data, >>>>>>>>>>>> givenPath); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to >>>>>>>> force a >>>>>>>>>>>>>>>> failure >>>>>>>>>>>>>>>>>>>>>>>>> in a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the >>> UnhandledErrorListener, >>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>> expectation is >>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding >>>>>> operations? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron >>>>>> McKenzie >>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes >>>>>> there, >>>>>>>> so >>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>>>>>>>>> something >>>>>>>>>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you >>>>>> know >>>>>>>> if >>>>>>>>>>>> I >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>>>>>> stuck. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan >>>>>>>> Zimmerman < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared >>>>>> it to >>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> master >>>>>>>>>>>>>>>>>>>>>>>>>>> branch? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron >>>>>>>> McKenzie < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test >>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener >>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It >>>>>> seems to >>>>>>>>>>>> try >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>> provoke >>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the >>>>>>>>>>>>>>>> CreateBuilderImpl >>>>>>>>>>>>>>>>>>>>>>>>> prior >>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the >>>>>>>>>>>> exception >>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>>> throws >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, >>> it >>>>>>>> just >>>>>>>>>>>>>> throws >>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is >>>>>>>>>>>> propogated up >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>> stack >>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just >>>>>>>> don't >>>>>>>>>>>>>>>>>>>> understand >>>>>>>>>>>>>>>>>>>>>>>>> how >>>>>>>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>> >>> >
