Hi dev1, On Tue, 30 Nov 2021 at 16:21, dev1 <d...@etcoleman.com> wrote:
> Some of those tests are trying to stress conditions that require a lot of > resources to replicate specific conditions. Have you tried to run those > individual tests in isolation so that you are not competing for resources? > Do they always fail, or are the failures transient? > Q: Have you tried to run those individual tests in isolation so that you are not competing for resources? A: This is what I mean with the following: --------------------- The tests fail even when executed separately, e.g.: mvn verify -Dit.test=ConcurrentDeleteTableIT -o -rf :accumulo-test --------------------- Q: Do they always fail, or are the failures transient? A: I also tried to explain that with "These tests fail consistently at every build attempt!" Mark > > -----Original Message----- > From: Mark Jens <mark.r.j...@gmail.com> > Sent: Tuesday, November 30, 2021 4:05 AM > To: dev@accumulo.apache.org > Subject: Consistent IT tests failures on Linux ARM64 > > Hello Accumulo community, > > At my job we consider using Linux ARM64 servers and I've been tasked to > test Accumulo. > > I face some timeout related issues with several IT tests: > > > [ERROR] > > org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete > Time elapsed: 420.122 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 420 > seconds at java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method) > at java.base@11.0.11 > /java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) > at java.base@11.0.11 > /java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447) > at java.base@11.0.11 > /java.util.concurrent.FutureTask.get(FutureTask.java:190) > at > > app//org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete(ConcurrentDeleteTableIT.java:213) > at java.base@11.0.11 > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at java.base@11.0.11 > > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base@11.0.11 > > /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.11/java.lang.reflect.Method.invoke(Method.java:566) > at > > app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > > app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > > app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > > app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > > app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > > app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > > app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > > app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.base@11.0.11 > /java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) > > [ERROR] > > org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete > Time elapsed: 420.122 s <<< ERROR! > java.lang.Exception: Appears to be stuck in thread Time-limited > test-SendThread(localhost:44251) > at java.base@11.0.11/sun.nio.ch.EPoll.wait(Native Method) at > java.base@11.0.11 > /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120) > at java.base@11.0.11 > /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124) > at java.base@11.0.11/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:136) > at > > app//org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:347) > at > app//org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > > [ERROR] > > org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps > Time elapsed: 420.011 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 420 > seconds at java.base@11.0.11/java.lang.Thread.sleep(Native Method) at > > app//org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:299) > at app//org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:442) > at app//org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:372) > at > > app//org.apache.accumulo.core.clientImpl.ClientContext.verifyInstanceId(ClientContext.java:467) > at > > app//org.apache.accumulo.core.clientImpl.ClientContext.getInstanceID(ClientContext.java:446) > at > > app//org.apache.accumulo.core.clientImpl.ClientContext.getManagerLocations(ClientContext.java:405) > at > > app//org.apache.accumulo.core.clientImpl.ManagerClient.getConnection(ManagerClient.java:59) > at > > app//org.apache.accumulo.core.clientImpl.ManagerClient.getConnectionWithRetry(ManagerClient.java:49) > at > > app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.beginFateOperation(TableOperationsImpl.java:260) > at > > app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:369) > at > > app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:359) > at > > app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1670) > at > > app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.create(TableOperationsImpl.java:248) > at > > app//org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps(ConcurrentDeleteTableIT.java:76) > at java.base@11.0.11 > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at java.base@11.0.11 > > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base@11.0.11 > > /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.11/java.lang.reflect.Method.invoke(Method.java:566) > at > > app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > > app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > > app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > > app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > > app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > > app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > > app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > > app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.base@11.0.11 > /java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) > > [INFO] Running org.apache.accumulo.test.functional.ScannerContextIT > [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 102.909 s - in org.apache.accumulo.test.functional.ScannerContextIT > [INFO] Running org.apache.accumulo.test.functional.KerberosRenewalIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 504.472 s - in org.apache.accumulo.test.functional.KerberosRenewalIT > [INFO] Running org.apache.accumulo.test.functional.BatchWriterFlushIT > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 62.132 s - in org.apache.accumulo.test.functional.BatchWriterFlushIT > [INFO] Running org.apache.accumulo.test.functional.BinaryIT > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 65.034 s - in org.apache.accumulo.test.functional.BinaryIT > [INFO] Running org.apache.accumulo.test.functional.PermissionsIT > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 59.25 s - in org.apache.accumulo.test.functional.PermissionsIT > [INFO] Running org.apache.accumulo.test.functional.ZookeeperRestartIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 37.37 s - in org.apache.accumulo.test.functional.ZookeeperRestartIT > [INFO] Running org.apache.accumulo.test.functional.CreateManyScannersIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 23.046 s - in org.apache.accumulo.test.functional.CreateManyScannersIT > [INFO] Running org.apache.accumulo.test.functional.CreateInitialSplitsIT > [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 255.108 s - in org.apache.accumulo.test.functional.CreateInitialSplitsIT > [INFO] Running org.apache.accumulo.test.functional.MonitorSslIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 25.304 s - in org.apache.accumulo.test.functional.MonitorSslIT > [INFO] Running org.apache.accumulo.test.functional.RestartStressIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 78.359 s - in org.apache.accumulo.test.functional.RestartStressIT > [INFO] Running org.apache.accumulo.test.functional.BulkSplitOptimizationIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 59.289 s - in org.apache.accumulo.test.functional.BulkSplitOptimizationIT > [INFO] Running org.apache.accumulo.test.functional.BulkNewIT > [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 63.696 s - in org.apache.accumulo.test.functional.BulkNewIT > [INFO] Running org.apache.accumulo.test.functional.BloomFilterIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 135.298 s - in org.apache.accumulo.test.functional.BloomFilterIT > [INFO] Running org.apache.accumulo.test.functional.BulkIT > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 122.959 s - in org.apache.accumulo.test.functional.BulkIT > [INFO] Running org.apache.accumulo.test.functional.BinaryStressIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 38.626 s - in org.apache.accumulo.test.functional.BinaryStressIT > [INFO] Running org.apache.accumulo.test.functional.ClassLoaderIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 45.61 s - in org.apache.accumulo.test.functional.ClassLoaderIT > [INFO] Running org.apache.accumulo.test.functional.LogicalTimeIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 116.819 s - in org.apache.accumulo.test.functional.LogicalTimeIT > [INFO] Running org.apache.accumulo.test.functional.SplitRecoveryIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 25.421 s - in org.apache.accumulo.test.functional.SplitRecoveryIT > [INFO] Running org.apache.accumulo.test.functional.BigRootTabletIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 96.86 s - in org.apache.accumulo.test.functional.BigRootTabletIT > [INFO] Running org.apache.accumulo.test.functional.GarbageCollectorIT > [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 238.409 s - in org.apache.accumulo.test.functional.GarbageCollectorIT > [INFO] Running > org.apache.accumulo.test.functional.BalanceInPresenceOfOfflineTableIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 219.253 s - in > org.apache.accumulo.test.functional.BalanceInPresenceOfOfflineTableIT > [INFO] Running org.apache.accumulo.test.functional.VisibilityIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 38.015 s - in org.apache.accumulo.test.functional.VisibilityIT > [INFO] Running org.apache.accumulo.test.functional.SslWithClientAuthIT > [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 489.863 s - in org.apache.accumulo.test.functional.SslWithClientAuthIT > [INFO] Running org.apache.accumulo.test.functional.SummaryIT > [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 111.552 s - in org.apache.accumulo.test.functional.SummaryIT > [INFO] Running org.apache.accumulo.test.functional.MaxOpenIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 30.061 s - in org.apache.accumulo.test.functional.MaxOpenIT > [INFO] Running org.apache.accumulo.test.functional.ManagerFailoverIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 47.089 s - in org.apache.accumulo.test.functional.ManagerFailoverIT > [INFO] Running org.apache.accumulo.test.functional.DeleteRowsIT > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 229.586 s - in org.apache.accumulo.test.functional.DeleteRowsIT > [INFO] Running org.apache.accumulo.test.functional.BackupManagerIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 22.943 s - in org.apache.accumulo.test.functional.BackupManagerIT > [INFO] Running org.apache.accumulo.test.functional.TabletMetadataIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 46.728 s - in org.apache.accumulo.test.functional.TabletMetadataIT > [INFO] Running org.apache.accumulo.test.functional.LateLastContactIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 46.648 s - in org.apache.accumulo.test.functional.LateLastContactIT > [INFO] Running org.apache.accumulo.test.functional.SimpleBalancerFairnessIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 71.934 s - in org.apache.accumulo.test.functional.SimpleBalancerFairnessIT > [INFO] Running org.apache.accumulo.test.functional.HalfDeadTServerIT > [ERROR] Tests run: 3, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: > 307.904 s <<< FAILURE! - in > org.apache.accumulo.test.functional.HalfDeadTServerIT > [ERROR] org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover > Time elapsed: 240.011 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 240 > seconds at java.base@11.0.11/java.lang.Object.wait(Native Method) at > java.base@11.0.11/java.lang.Object.wait(Object.java:328) > at java.base@11.0.11/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) > at > > app//org.apache.accumulo.test.functional.HalfDeadTServerIT.test(HalfDeadTServerIT.java:217) > at > > app//org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover(HalfDeadTServerIT.java:142) > at java.base@11.0.11 > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at java.base@11.0.11 > > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base@11.0.11 > > /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.11/java.lang.reflect.Method.invoke(Method.java:566) > at > > app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > > app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > > app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > > app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > > app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > > app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > > app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > > app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.base@11.0.11 > /java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) > > [ERROR] org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover > Time elapsed: 240.012 s <<< ERROR! > java.lang.Exception: Appears to be stuck in thread Time-limited > test-SendThread(localhost:39285) > at java.base@11.0.11/sun.nio.ch.EPoll.wait(Native Method) at > java.base@11.0.11 > /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120) > at java.base@11.0.11 > /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124) > at java.base@11.0.11/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:136) > at > > app//org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:347) > at > app//org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > > [INFO] Running org.apache.accumulo.test.functional.MetadataIT > [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 97.987 s - in org.apache.accumulo.test.functional.MetadataIT > [INFO] Running org.apache.accumulo.test.functional.ScanSessionTimeOutIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 43.91 s - in org.apache.accumulo.test.functional.ScanSessionTimeOutIT > [INFO] Running org.apache.accumulo.test.functional.ZooCacheIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 33.986 s - in org.apache.accumulo.test.functional.ZooCacheIT > [INFO] Running org.apache.accumulo.test.functional.DeleteRowsSplitIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 113.928 s - in org.apache.accumulo.test.functional.DeleteRowsSplitIT > [INFO] Running org.apache.accumulo.test.ScanFlushWithTimeIT > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 36.854 s - in org.apache.accumulo.test.ScanFlushWithTimeIT > [INFO] Running org.apache.accumulo.test.AuditMessageIT > [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 165.169 s - in org.apache.accumulo.test.AuditMessageIT > [INFO] Running > org.apache.accumulo.test.gc.replication.CloseWriteAheadLogReferencesIT > [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: > 0.039 s - in > org.apache.accumulo.test.gc.replication.CloseWriteAheadLogReferencesIT > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] > > org.apache.accumulo.test.compaction.ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction > [ERROR] Run 1: > ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction:178 » > TestTimedOut > [ERROR] Run 2: > ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction » Appears > to ... > [INFO] > [ERROR] ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps:76 » > TestTimedOut test t... > [ERROR] > > org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete > [ERROR] Run 1: > ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete:213 » TestTimedOut > tes... > [ERROR] Run 2: ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete » > Appears to be stuck... > [INFO] > [ERROR] org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover > [ERROR] Run 1: > > HalfDeadTServerIT.testRecover:142->test:217->Object.wait:328->Object.wait:-2 > » TestTimedOut > [ERROR] Run 2: HalfDeadTServerIT.testRecover » Appears to be stuck in > thread Time-limited te... > [INFO] > [ERROR] org.apache.accumulo.test.functional.SslIT.adminStop > [ERROR] Run 1: SslIT.adminStop:68->Object.wait:328->Object.wait:-2 » > TestTimedOut test timed ... > [ERROR] Run 2: SslIT.adminStop » Appears to be stuck in thread > Time-limited test-SendThread(... > > These tests fail consistently at every build attempt! > > The tests fail even when executed separately, e.g.: > mvn verify -Dit.test=ConcurrentDeleteTableIT -o -rf :accumulo-test > > > I am using the current 'main' branch of Accumulo. > JDK 11.0.11 > Maven: 3.8.2 > OS: Ubuntu 20.04.3 ARM64 > > Is there anything that could be done to fix these problems ? > For example some config settings ?! > > P.S. At https://github.com/apache/accumulo/issues/1884 I read that Linux > ARM64 is a supported platform since the JVM supports it. > > Thanks! > > Mark >