[ https://issues.apache.org/jira/browse/CURATOR-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330182#comment-17330182 ]
Jordan Zimmerman commented on CURATOR-595: ------------------------------------------ I think the {{acquire(0, ...)}} is the problem. If I get rid of the sleep before acquire and change the acquire to: {{assertNotNull(semaphore.acquire(sessionTimout * 2, TimeUnit.SECONDS));}} it works every time > InterProcessSemaphoreV2 LOST isn't releasing permits for other clients > ---------------------------------------------------------------------- > > Key: CURATOR-595 > URL: https://issues.apache.org/jira/browse/CURATOR-595 > Project: Apache Curator > Issue Type: Bug > Components: Recipes > Affects Versions: 5.1.0 > Reporter: Francesco Nigro > Assignee: Jordan Zimmerman > Priority: Major > > I'm not sure this is the right place to raise this, but I've added this test > on TestInterProcessSemaphore: > {code:java} > @Test > public void testAcquireAfterLostServerOnRestart() throws Exception { > final int sessionTimout = 4000; > final int connectionTimout = 2000; > try (CuratorFramework client = > CuratorFrameworkFactory.newClient(server.getConnectString(), sessionTimout, > connectionTimout, new RetryNTimes(0, 1))) { > client.start(); > client.blockUntilConnected(); > final InterProcessSemaphoreV2 semaphore = new > InterProcessSemaphoreV2(client, "/1", 1); > assertNotNull(semaphore.acquire()); > CountDownLatch lost = new CountDownLatch(1); > client.getConnectionStateListenable().addListener((client1, > newState) -> { > if (newState == ConnectionState.LOST) { > lost.countDown(); > } > }); > server.stop(); > lost.await(); > } > server.restart(); > try (CuratorFramework client = > CuratorFrameworkFactory.newClient(server.getConnectString(), sessionTimout, > connectionTimout, new RetryNTimes(0, 1))) { > client.start(); > client.blockUntilConnected(); > final InterProcessSemaphoreV2 semaphore = new > InterProcessSemaphoreV2(client, "/1", 1); > final int serverTick = ZooKeeperServer.DEFAULT_TICK_TIME; > Thread.sleep(sessionTimout + serverTick); > assertNotNull(semaphore.acquire(0, TimeUnit.SECONDS)); > } > } > {code} > And this is not passing: the doc of InterProcessSemaphoreV2 state that > bq. "However, if the client session drops (crash, etc.), any leases held by > the client are automatically closed and made available to other clients." > maybe I'm missing something obvious on the ZK server config instead. > Just checked out that by running on separated processes the same test: > # start server on process A > # start lease acquire on process B, listening for LOST events before suicide > # restart server on Process A cause process B to suicide (as expected) > # start lease acquire on process C, now succeed > It seems that there is something going on in the intra-process case that's > not working as expected (to me, at least). > NOTE: as written in newer comments, raising the timeout doesn't seems to work > too and different boxes are getting different outcomes (making this an > intermittent failure). -- This message was sent by Atlassian Jira (v8.3.4#803005)