[ https://issues.apache.org/jira/browse/IGNITE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602430#comment-16602430 ]
Pavel Kuznetsov edited comment on IGNITE-9447 at 9/3/18 8:06 PM: ----------------------------------------------------------------- We decided not to use distributed mutex, just lookup thhought the topology history was (Author: pkouznet): We decided not to use distributed mutex, just lookup into the topology history > Benchmarks hangs intemittently due to distributed race condition. > ----------------------------------------------------------------- > > Key: IGNITE-9447 > URL: https://issues.apache.org/jira/browse/IGNITE-9447 > Project: Ignite > Issue Type: Bug > Components: sql > Reporter: Pavel Kuznetsov > Assignee: Pavel Kuznetsov > Priority: Minor > > If we run more than one yardstick driver, benchmark hangs intermittently. > We've got yardstick's base driver class > org.apache.ignite.yardstick.IgniteAbstractBenchmark it has logic to wait all > the nodes in the cluster. > {noformat} > final CountDownLatch nodesStartedLatch = new CountDownLatch(1); > ignite().events().localListen(new IgnitePredicate<Event>() { > @Override public boolean apply(Event gridEvt) { > if (nodesStarted()) > nodesStartedLatch.countDown(); > return true; > } > }, EVT_NODE_JOINED); > if (!nodesStarted()) { > println(cfg, "Waiting for " + (args.nodes() - 1) + " nodes to > start..."); > nodesStartedLatch.await(); > } > {noformat} > This code is executed on every driver node. > If we want to close local ignite instance just after cluster is ready > (containing expected number of nodes), sometimes we'll have dead lock: > 1) cluster contains N-1 nodes, all nodes are waiting for the Nth node. > 2) Nth node is connected, cluster receives message, waitForNodes code of Nth > node is not executed. > 3) N-1 nodes got this message and stop waiting. > 4) N-1 thinks that cluster is ready and call ignite.close() on their local > instances > 5) Nth node starts waiting for cluster to contain number of nodes, but N-1 of > them closed their instances > 6) Nth node is waiting infinitely. > We can avoid this problem if we use distributed CountDownLatch -- This message was sent by Atlassian JIRA (v7.6.3#76005)