[ https://issues.apache.org/jira/browse/IGNITE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663640#comment-16663640 ]
ASF GitHub Bot commented on IGNITE-9447: ---------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/4879 > Benchmarks hangs intemittently due to distributed race condition. > ----------------------------------------------------------------- > > Key: IGNITE-9447 > URL: https://issues.apache.org/jira/browse/IGNITE-9447 > Project: Ignite > Issue Type: Bug > Components: yardstick > Reporter: Pavel Kuznetsov > Assignee: Pavel Kuznetsov > Priority: Minor > Fix For: 2.8 > > > If we run more than one yardstick driver, benchmark hangs intermittently. > We've got yardstick's base driver class > org.apache.ignite.yardstick.IgniteAbstractBenchmark it has logic to wait all > the nodes in the cluster. > {noformat} > final CountDownLatch nodesStartedLatch = new CountDownLatch(1); > ignite().events().localListen(new IgnitePredicate<Event>() { > @Override public boolean apply(Event gridEvt) { > if (nodesStarted()) > nodesStartedLatch.countDown(); > return true; > } > }, EVT_NODE_JOINED); > if (!nodesStarted()) { > println(cfg, "Waiting for " + (args.nodes() - 1) + " nodes to > start..."); > nodesStartedLatch.await(); > } > {noformat} > This code is executed on every driver node. > If we want to close local ignite instance just after cluster is ready > (containing expected number of nodes), sometimes we'll have dead lock: > 1) cluster contains N-1 nodes, all nodes are waiting for the Nth node. > 2) Nth node is connected, cluster receives message, waitForNodes code of Nth > node is not executed. > 3) N-1 nodes got this message and stop waiting. > 4) N-1 thinks that cluster is ready and call ignite.close() on their local > instances > 5) Nth node starts waiting for cluster to contain number of nodes, but N-1 of > them closed their instances > 6) Nth node is waiting infinitely. > We can avoid this problem if we use distributed CountDownLatch -- This message was sent by Atlassian JIRA (v7.6.3#76005)