[ https://issues.apache.org/jira/browse/IGNITE-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411795#comment-17411795 ]
Pavel Pereslegin edited comment on IGNITE-15300 at 9/8/21, 8:50 AM: -------------------------------------------------------------------- The test hangs when the restore process is initiated from node 1, whose communication is later blocked (and cannot be unblocked). The test flaky fails due to a state sync issue. We are canceling the process on two nodes, but only waiting on the initiator to complete (this has been fixed in IGNITE-14794). It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked it on TeamCity (the problem is hardly reproducible locally), [suite started 80+ times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F9186%2Fhead]. Execution timeouts (not related to this issue) - 2 times. testBaselineCollectCrd - 6 failures. testBaselineCollect - 1 failure. testSnapshotRestoreCancelAndStatus - *0* failures. was (Author: xtern): The test hangs when the restore process is initiated from node 1, whose communication is later blocked (and cannot be unblocked). The test flaky fails due to a state sync issue. We are canceling the process on two nodes, but only waiting on the initiator to complete (this has been fixed in IGNITE-14794). It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked it on TeamCity (the problem is hardly reproducible locally), suite started 80+ times. Execution timeouts (not related to this issue) - 2 times. testBaselineCollectCrd - 6 failures. testBaselineCollect - 1 failure. testSnapshotRestoreCancelAndStatus - *0* failures. > Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment > ------------------------------------------------------------------------- > > Key: IGNITE-15300 > URL: https://issues.apache.org/jira/browse/IGNITE-15300 > Project: Ignite > Issue Type: Test > Reporter: Maxim Muzafarov > Assignee: Pavel Pereslegin > Priority: Major > Labels: iep-43 > Time Spent: 10m > Remaining Estimate: 0h > > https://ci.ignite.apache.org/viewLog.html?buildId=6123288&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper#testNameId-4389213602152674112 > {code} > [2021-08-09 22:59:49,757][ERROR][main][root] Test failed > [test=GridCommandHandlerTest#testSnapshotRestoreCancelAndStatus, > duration=16514] > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.ignite.testframework.GridTestUtils.assertContains(GridTestUtils.java:391) > at > org.apache.ignite.util.GridCommandHandlerTest.testSnapshotRestoreCancelAndStatus(GridCommandHandlerTest.java:3312) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2432) > {code} > Sometimes zk suite hangs ([execution > timeout|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed]) > on this test with the following stacktrace. > {noformat} > "rest-#15365%gridCommandHandlerTest0%" #16591 prio=5 os_prio=0 > tid=0x00007f7e7842b800 nid=0x1a79 waiting on condition [0x00007f7e30416000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreCancelTask$1.execute(SnapshotRestoreCancelTask.java:43) > at > org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:601) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7270) > at > org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:595) > at > org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:522) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > at > org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1305) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1435) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:665) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:535) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:834) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:448) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:427) > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.executeRestoreManagementTask(IgniteSnapshotManager.java:1743) > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.cancelSnapshotRestore(IgniteSnapshotManager.java:1008) > at > org.apache.ignite.internal.visor.snapshot.VisorSnapshotRestoreTask$VisorSnapshotRestoreCancelJob.run(VisorSnapshotRestoreTask.java:93) > at > org.apache.ignite.internal.visor.snapshot.VisorSnapshotRestoreTask$VisorSnapshotRestoreCancelJob.run(VisorSnapshotRestoreTask.java:79) > at org.apache.ignite.internal.visor.VisorJob.execute(VisorJob.java:69) > at > org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:601) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7270) > at > org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:595) > at > org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:522) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > at > org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1305) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1435) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:665) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:535) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:834) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:568) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:548) > at > org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler.handleAsyncUnsafe(GridTaskCommandHandler.java:223) > at > org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler.handleAsync(GridTaskCommandHandler.java:162) > at > org.apache.ignite.internal.processors.rest.GridRestProcessor.handleRequest0(GridRestProcessor.java:316) > at > org.apache.ignite.internal.processors.rest.GridRestProcessor.handleRequest(GridRestProcessor.java:302) > at > org.apache.ignite.internal.processors.rest.GridRestProcessor.access$000(GridRestProcessor.java:107) > at > org.apache.ignite.internal.processors.rest.GridRestProcessor$2.body(GridRestProcessor.java:188) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)