I don't see any reason it would break anything else and not opposed to making a change there to avoid repeated calls to the security provider to create the credentials, but I'm strongly suspicious that this would fix the performance problem with that IT. I've seen that test pass very quickly before, without your change. I think it might be a coincidence. I think if you were to capture a thread dump at other times, you wouldn't always see it in that code, but you'd find it busy doing other work instead. If it does fix it permanently, though, I'd be pleasantly surprised. Regardless, I think we can move forward with your PR, either way, because it does avoid unnecessary recomputation of immutable credentials in ServerInfo.
On Thu, Dec 2, 2021 at 7:23 AM Mark Jens <[email protected]> wrote: > > Please review https://github.com/apache/accumulo/pull/2374 > By caching the ServerInfo's Credentials ConcurrentDeleteTableIT passes > almost 6 times faster now! > I am running the whole test suite now to see whether it doesn't break > something else. > > On Thu, 2 Dec 2021 at 13:49, Mark Jens <[email protected]> wrote: > > > Reducing the log output did not reduce the test run time: > > > > diff --git test/src/main/resources/log4j2-test.properties > > test/src/main/resources/log4j2-test.properties > > index 9124914f7a..810c7bf06f 100644 > > --- test/src/main/resources/log4j2-test.properties > > +++ test/src/main/resources/log4j2-test.properties > > @@ -28,7 +28,7 @@ appender.console.layout.type = PatternLayout > > appender.console.layout.pattern = %d{ISO8601} [%c{2}] %-5p: %m%n > > > > logger.01.name = org.apache.accumulo.core > > -logger.01.level = debug > > +logger.01.level = info > > > > logger.02.name = org.apache.accumulo.core.clientImpl.ManagerClient > > logger.02.level = info > > @@ -106,7 +106,7 @@ logger.25.name = org.apache.hadoop.security > > logger.25.level = info > > > > logger.26.name = org.apache.hadoop.minikdc > > -logger.26.level = debug > > +logger.26.level = info > > > > > > @@ -169,6 +169,6 @@ logger.metrics.level = info > > logger.metrics.additivity = false > > logger.metrics.appenderRef.metrics.ref = LoggingMetricsOutput > > > > -rootLogger.level = debug > > +rootLogger.level = info > > rootLogger.appenderRef.console.ref = STDOUT > > > > INFO] Running org.apache.accumulo.test.functional.ConcurrentDeleteTableIT > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > > 785.503 s - in org.apache.accumulo.test.functional.ConcurrentDeleteTableIT > > > > > > On Thu, 2 Dec 2021 at 12:10, Mark Jens <[email protected]> wrote: > > > >> Hi again, > >> > >> Here are the thread dumps as promised: > >> > >> 1) Both TabletServers are very busy at compressing at close time. The > >> following stacks are dumped in ~5 secs interval: > >> > >> "tablet migration-Worker-1" #4380 daemon prio=5 os_prio=0 cpu=68425.44ms > >> elapsed=75.42s tid=0x0000fffeac074800 nid=0x33077e runnable > >> [0x0000fffe8f3fd000] > >> java.lang.Thread.State: RUNNABLE > >> at sun.security.provider.SHA5.implCompressCheck([email protected] > >> /SHA5.java:232) > >> at sun.security.provider.SHA5.implCompress([email protected] > >> /SHA5.java:221) > >> at sun.security.provider.DigestBase.engineUpdate([email protected] > >> /DigestBase.java:124) > >> at > >> java.security.MessageDigest$Delegate.engineUpdate([email protected] > >> /MessageDigest.java:623) > >> at java.security.MessageDigest.update([email protected] > >> /MessageDigest.java:345) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:421) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> at > >> org.apache.accumulo.server.ServerInfo.getPrincipal(ServerInfo.java:148) > >> at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:169) > >> at > >> org.apache.accumulo.core.clientImpl.ClientContext.getProperties(ClientContext.java:236) > >> at > >> org.apache.accumulo.core.clientImpl.ClientContext.createScanner(ClientContext.java:635) > >> at > >> org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.buildNonRoot(TabletsMetadata.java:177) > >> at > >> org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.build(TabletsMetadata.java:125) > >> at > >> org.apache.accumulo.core.metadata.schema.AmpleImpl.readTablet(AmpleImpl.java:46) > >> at > >> org.apache.accumulo.core.metadata.schema.Ample.readTablet(Ample.java:141) > >> at > >> org.apache.accumulo.tserver.tablet.Tablet.closeConsistencyCheck(Tablet.java:1379) > >> at > >> org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:1331) > >> - locked <0x00000000f1585830> (a > >> org.apache.accumulo.tserver.tablet.Tablet) > >> at > >> org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:1221) > >> at > >> org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:92) > >> at > >> io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> at > >> io.opentelemetry.context.Context$$Lambda$209/0x000000010035c840.run(Unknown > >> Source) > >> at > >> java.util.concurrent.ThreadPoolExecutor.runWorker([email protected] > >> /ThreadPoolExecutor.java:1128) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected] > >> /ThreadPoolExecutor.java:628) > >> at > >> io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> at > >> io.opentelemetry.context.Context$$Lambda$209/0x000000010035c840.run(Unknown > >> Source) > >> at java.lang.Thread.run([email protected]/Thread.java:829) > >> > >> "tablet migration-Worker-1" #4380 daemon prio=5 os_prio=0 cpu=72485.20ms > >> elapsed=79.71s tid=0x0000fffeac074800 nid=0x33077e runnable > >> [0x0000fffe8f3fd000] > >> java.lang.Thread.State: RUNNABLE > >> at > >> sun.security.provider.DigestBase.implCompressMultiBlock0([email protected] > >> /DigestBase.java:149) > >> at > >> sun.security.provider.DigestBase.implCompressMultiBlock([email protected] > >> /DigestBase.java:144) > >> at sun.security.provider.DigestBase.engineUpdate([email protected] > >> /DigestBase.java:131) > >> at > >> java.security.MessageDigest$Delegate.engineUpdate([email protected] > >> /MessageDigest.java:623) > >> at java.security.MessageDigest.update([email protected] > >> /MessageDigest.java:345) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:403) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> at > >> org.apache.accumulo.server.ServerInfo.getPrincipal(ServerInfo.java:148) > >> at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:169) > >> ... > >> > >> "tablet migration-Worker-1" #4380 daemon prio=5 os_prio=0 cpu=81174.59ms > >> elapsed=89.01s tid=0x0000fffeac074800 nid=0x33077e runnable > >> [0x0000fffe8f3fd000] > >> java.lang.Thread.State: RUNNABLE > >> at sun.security.provider.ByteArrayAccess.l2bBig([email protected] > >> /ByteArrayAccess.java:449) > >> at sun.security.provider.SHA5.implDigest([email protected] > >> /SHA5.java:131) > >> at sun.security.provider.DigestBase.engineDigest([email protected] > >> /DigestBase.java:210) > >> at sun.security.provider.DigestBase.engineDigest([email protected] > >> /DigestBase.java:189) > >> at > >> java.security.MessageDigest$Delegate.engineDigest([email protected] > >> /MessageDigest.java:639) > >> at java.security.MessageDigest.digest([email protected] > >> /MessageDigest.java:385) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:439) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> at > >> org.apache.accumulo.server.ServerInfo.getPrincipal(ServerInfo.java:148) > >> at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:169) > >> ... > >> > >> "tablet migration-Worker-1" #4380 daemon prio=5 os_prio=0 cpu=86499.01ms > >> elapsed=94.68s tid=0x0000fffeac074800 nid=0x33077e runnable > >> [0x0000fffe8f3fd000] > >> java.lang.Thread.State: RUNNABLE > >> at > >> sun.security.provider.DigestBase.implCompressMultiBlock0([email protected] > >> /DigestBase.java:149) > >> at > >> sun.security.provider.DigestBase.implCompressMultiBlock([email protected] > >> /DigestBase.java:144) > >> at sun.security.provider.DigestBase.engineUpdate([email protected] > >> /DigestBase.java:131) > >> at > >> java.security.MessageDigest$Delegate.engineUpdate([email protected] > >> /MessageDigest.java:623) > >> at java.security.MessageDigest.update([email protected] > >> /MessageDigest.java:345) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:403) > >> at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> at > >> org.apache.accumulo.server.ServerInfo.getPrincipal(ServerInfo.java:148) > >> at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:169) > >> ... > >> > >> "tablet migration-Worker-1" #6107 daemon prio=5 os_prio=0 cpu=109551.37ms > >> elapsed=117.48s tid=0x0000fffeac01b000 nid=0x33174d runnable > >> [0x0000fffe7bffd000] > >> 14012 java.lang.Thread.State: RUNNABLE > >> 14013 at > >> sun.security.provider.DigestBase.implCompressMultiBlock0([email protected] > >> /DigestBase.java:149) > >> 14014 at > >> sun.security.provider.DigestBase.implCompressMultiBlock([email protected] > >> /DigestBase.java:144) > >> 14015 at sun.security.provider.DigestBase.engineUpdate([email protected] > >> /DigestBase.java:131) > >> 14016 at > >> java.security.MessageDigest$Delegate.engineUpdate([email protected] > >> /MessageDigest.java:623) > >> 14017 at java.security.MessageDigest.update([email protected] > >> /MessageDigest.java:345) > >> 14018 at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:432) > >> 14019 at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> 14020 at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> 14021 at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> 14022 at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> 14023 at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> 14024 at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> 14025 at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> 14026 at > >> org.apache.accumulo.server.ServerInfo.getAuthenticationToken(ServerInfo.java:153) > >> 14027 at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:168) > >> 14028 at > >> org.apache.accumulo.core.clientImpl.ClientContext.getProperties(ClientContext.java:236) > >> > >> Notice that ClientContext.getProperties(ClientContext.java:236) most of > >> the times calls ServerInfo.getPrincipal(ServerInfo.java:148) but in the > >> last one it calls ServerInfo.getAuthenticationToken(ServerInfo.java:153). > >> And both lead to (a lot of ?!) compressing.. > >> > >> 2) The "Manager" process writes ~200Mb of logs. Maybe the default log > >> level should not be DEBUG ?! > >> > >> Most of its threads either wait for notifications from Zookeeper: > >> > >> 878647 "Manager-ClientPool-Worker-3" #61 daemon prio=5 os_prio=0 > >> cpu=375.95ms elapsed=182.38s tid=0x0000fffee0007800 nid=0x32d943 in > >> Object.wait() [0x0000fffebb7fc000] > >> 878648 java.lang.Thread.State: TIMED_WAITING (on object monitor) > >> 878649 at java.lang.Object.wait([email protected]/Native Method) > >> 878650 - waiting on <no object reference available> > >> 878651 at > >> org.apache.accumulo.fate.ZooStore.waitForStatusChange(ZooStore.java:386) > >> 878652 - waiting to re-lock in wait() <0x00000000f1427458> (a > >> org.apache.accumulo.fate.ZooStore) > >> 878653 at > >> org.apache.accumulo.fate.AgeOffStore.waitForStatusChange(AgeOffStore.java:209) > >> 878654 at > >> org.apache.accumulo.core.logging.FateLogger$1.waitForStatusChange(FateLogger.java:75) > >> 878655 at > >> org.apache.accumulo.fate.Fate.waitForCompletion(Fate.java:297) > >> 878656 at > >> org.apache.accumulo.manager.FateServiceHandler.waitForFateOperation(FateServiceHandler.java:659) > >> 878657 at > >> org.apache.accumulo.manager.ManagerClientServiceHandler.waitForFateOperation(ManagerClientServiceHandler.java:100) > >> ... > >> > >> or wait for data: > >> 878781 "Repo Runner-Worker-1" #90 daemon prio=5 os_prio=0 cpu=7440.91ms > >> elapsed=179.99s tid=0x0000fffeb0002000 nid=0x32d99a in Object.wait() > >> [0x0000fffebadfd000] > >> 878782 java.lang.Thread.State: WAITING (on object monitor) > >> 878783 at java.lang.Object.wait([email protected]/Native Method) > >> 878784 - waiting on <no object reference available> > >> 878785 at java.lang.Object.wait([email protected]/Object.java:328) > >> 878786 at > >> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1529) > >> 878787 - waiting to re-lock in wait() <0x00000000f9bf42d8> (a > >> org.apache.zookeeper.ClientCnxn$Packet) > >> 878788 at > >> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1512) > >> 878789 at > >> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2587) > >> 878790 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.lambda$getChildren$5(ZooReader.java:87) > >> 878791 at > >> org.apache.accumulo.fate.zookeeper.ZooReader$$Lambda$182/0x0000000100324040.apply(Unknown > >> Source) > >> 878792 at > >> org.apache.accumulo.fate.zookeeper.ZooReader$$Lambda$184/0x0000000100323c40.apply(Unknown > >> Source) > >> 878793 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:165) > >> 878794 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:144) > >> 878795 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:131) > >> 878796 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.getChildren(ZooReader.java:87) > >> 878797 at org.apache.accumulo.fate.ZooStore.reserve(ZooStore.java:141) > >> 878798 at > >> org.apache.accumulo.fate.AgeOffStore.reserve(AgeOffStore.java:155) > >> 878799 at > >> org.apache.accumulo.core.logging.FateLogger$1.reserve(FateLogger.java:50) > >> 878800 at > >> org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72) > >> 878801 at > >> io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> 878802 at > >> io.opentelemetry.context.Context$$Lambda$209/0x0000000100353840.run(Unknown > >> Source) > >> 878803 at > >> java.util.concurrent.ThreadPoolExecutor.runWorker([email protected] > >> /ThreadPoolExecutor.java:1128) > >> 878804 at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected] > >> /ThreadPoolExecutor.java:628) > >> 878805 at > >> io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> 878806 at > >> io.opentelemetry.context.Context$$Lambda$209/0x0000000100353840.run(Unknown > >> Source) > >> 878807 at java.lang.Thread.run([email protected]/Thread.java:829) > >> > >> 908220 "Status Thread" #41 daemon prio=5 os_prio=0 cpu=1700.28ms > >> elapsed=187.25s tid=0x0000fffee41f9800 nid=0x32d920 in Object.wait() > >> [0x0000ffff20f50000] > >> 908221 java.lang.Thread.State: WAITING (on object monitor) > >> 908222 at java.lang.Object.wait([email protected]/Native Method) > >> 908223 - waiting on <no object reference available> > >> 908224 at java.lang.Object.wait([email protected]/Object.java:328) > >> 908225 at > >> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1529) > >> 908226 - waiting to re-lock in wait() <0x00000000fa781138> (a > >> org.apache.zookeeper.ClientCnxn$Packet) > >> 908227 at > >> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1512) > >> 908228 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2129) > >> 908229 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.lambda$getData$0(ZooReader.java:65) > >> 908230 at > >> org.apache.accumulo.fate.zookeeper.ZooReader$$Lambda$220/0x0000000100351440.apply(Unknown > >> Source) > >> 908231 at > >> org.apache.accumulo.fate.zookeeper.ZooReader$$Lambda$184/0x0000000100323c40.apply(Unknown > >> Source) > >> 908232 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:165) > >> 908233 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:144) > >> 908234 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:131) > >> 908235 at > >> org.apache.accumulo.fate.zookeeper.ZooReader.getData(ZooReader.java:65) > >> 908236 at > >> org.apache.accumulo.manager.Manager.getManagerGoalState(Manager.java:496) > >> 908237 at > >> org.apache.accumulo.manager.Manager$StatusThread.updateStatus(Manager.java:822) > >> 908238 at > >> org.apache.accumulo.manager.Manager$StatusThread.run(Manager.java:797) > >> 908239 at > >> io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> 908240 at > >> io.opentelemetry.context.Context$$Lambda$209/0x0000000100353840.run(Unknown > >> Source) > >> 908241 at java.lang.Thread.run([email protected]/Thread.java:829) > >> > >> 3) SimpleGarbageCollector is also busy in getting credentials > >> > >> "gc" #31 prio=5 os_prio=0 cpu=15495.47ms elapsed=209.43s > >> tid=0x0000ffff28295800 nid=0x32dac5 runnable [0x0000ffff3a5fb000] > >> 2503 java.lang.Thread.State: RUNNABLE > >> 2504 at > >> sun.security.provider.DigestBase.implCompressMultiBlock0([email protected] > >> /DigestBase.java:149) > >> 2505 at > >> sun.security.provider.DigestBase.implCompressMultiBlock([email protected] > >> /DigestBase.java:144) > >> 2506 at sun.security.provider.DigestBase.engineUpdate([email protected] > >> /DigestBase.java:131) > >> 2507 at > >> java.security.MessageDigest$Delegate.engineUpdate([email protected] > >> /MessageDigest.java:623) > >> 2508 at java.security.MessageDigest.update([email protected] > >> /MessageDigest.java:345) > >> 2509 at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:421) > >> 2510 at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> 2511 at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> 2512 at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> 2513 at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> 2514 at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> 2515 at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> 2516 at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> 2517 at > >> org.apache.accumulo.server.ServerInfo.getPrincipal(ServerInfo.java:148) > >> 2518 at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:169) > >> 2519 at > >> org.apache.accumulo.core.clientImpl.ClientContext.getProperties(ClientContext.java:236) > >> 2520 at > >> org.apache.accumulo.core.clientImpl.ClientContext.createScanner(ClientContext.java:635) > >> 2521 at > >> org.apache.accumulo.server.metadata.ServerAmpleImpl.getGcCandidates(ServerAmpleImpl.java:180) > >> 2522 at > >> org.apache.accumulo.gc.SimpleGarbageCollector$GCEnv.getCandidates(SimpleGarbageCollector.java:199) > >> 2523 at > >> org.apache.accumulo.gc.GarbageCollectionAlgorithm.collect(GarbageCollectionAlgorithm.java:302) > >> 2524 at > >> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:502) > >> 2525 at io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> 2526 at > >> io.opentelemetry.context.Context$$Lambda$209/0x0000000100357840.run(Unknown > >> Source) > >> 2527 at java.lang.Thread.run([email protected]/Thread.java:829) > >> > >> > >> 3151 "gc" #31 prio=5 os_prio=0 cpu=15982.95ms elapsed=218.59s > >> tid=0x0000ffff28295800 nid=0x32dac5 runnable [0x0000ffff3a5fb000] > >> 3152 java.lang.Thread.State: RUNNABLE > >> 3153 at java.util.Arrays.hashCode([email protected]/Arrays.java:4685) > >> 3154 at java.util.Objects.hash([email protected]/Objects.java:146) > >> 3155 at java.security.Provider$ServiceKey.hashCode([email protected] > >> /Provider.java:1107) > >> 3156 at java.util.concurrent.ConcurrentHashMap.get([email protected] > >> /ConcurrentHashMap.java:936) > >> 3157 at java.security.Provider.getService([email protected] > >> /Provider.java:1282) > >> 3158 at sun.security.jca.ProviderList.getService([email protected] > >> /ProviderList.java:380) > >> 3159 at sun.security.jca.GetInstance.getInstance([email protected] > >> /GetInstance.java:157) > >> 3160 at java.security.Security.getImpl([email protected] > >> /Security.java:700) > >> 3161 at java.security.MessageDigest.getInstance([email protected] > >> /MessageDigest.java:178) > >> 3162 at > >> org.apache.commons.codec.digest.DigestUtils.getDigest(DigestUtils.java:170) > >> 3163 at > >> org.apache.commons.codec.digest.Sha2Crypt.sha2Crypt(Sha2Crypt.java:395) > >> 3164 at > >> org.apache.commons.codec.digest.Sha2Crypt.sha512Crypt(Sha2Crypt.java:585) > >> 3165 at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:78) > >> 3166 at org.apache.commons.codec.digest.Crypt.crypt(Crypt.java:167) > >> 3167 at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.hashInstanceConfigs(SystemCredentials.java:120) > >> 3168 at > >> org.apache.accumulo.server.security.SystemCredentials$SystemToken.generate(SystemCredentials.java:125) > >> 3169 at > >> org.apache.accumulo.server.security.SystemCredentials.get(SystemCredentials.java:66) > >> 3170 at > >> org.apache.accumulo.server.ServerInfo.getCredentials(ServerInfo.java:179) > >> 3171 at > >> org.apache.accumulo.server.ServerInfo.getAuthenticationToken(ServerInfo.java:153) > >> 3172 at > >> org.apache.accumulo.server.ServerInfo.getProperties(ServerInfo.java:168) > >> 3173 at > >> org.apache.accumulo.core.clientImpl.ClientContext.getProperties(ClientContext.java:236) > >> 3174 at > >> org.apache.accumulo.core.clientImpl.ClientContext.createScanner(ClientContext.java:635) > >> 3175 at > >> org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.buildNonRoot(TabletsMetadata.java:177) > >> 3176 at > >> org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.build(TabletsMetadata.java:125) > >> 3177 at > >> org.apache.accumulo.gc.SimpleGarbageCollector$GCEnv.getReferences(SimpleGarbageCollector.java:249) > >> 3178 at > >> org.apache.accumulo.gc.GarbageCollectionAlgorithm.confirmDeletes(GarbageCollectionAlgorithm.java:169) > >> 3179 at > >> org.apache.accumulo.gc.GarbageCollectionAlgorithm.confirmDeletesTrace(GarbageCollectionAlgorithm.java:276) > >> 3180 at > >> org.apache.accumulo.gc.GarbageCollectionAlgorithm.deleteBatch(GarbageCollectionAlgorithm.java:330) > >> 3181 at > >> org.apache.accumulo.gc.GarbageCollectionAlgorithm.collect(GarbageCollectionAlgorithm.java:315) > >> 3182 at > >> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:501) > >> 3183 at io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) > >> 3184 at > >> io.opentelemetry.context.Context$$Lambda$209/0x0000000100357840.run(Unknown > >> Source) > >> 3185 at java.lang.Thread.run([email protected]/Thread.java:829) > >> > >> > >> 4) Nothing interesting for Initialize, Main and ZooKeeperServerMain > >> processes > >> > >> > >> I'm not saying that the above are problematic. You know how Accumulo > >> works. It is up to you to decide whether something should be improved. > >> > >> Regards, > >> Mark > >> > >> > >> On Wed, 1 Dec 2021 at 16:35, Mark Jens <[email protected]> wrote: > >> > >>> > >>> > >>> On Tue, 30 Nov 2021 at 18:32, Christopher <[email protected]> wrote: > >>> > >>>> It looks like the tests are timing out. This happens frequently when > >>>> running on resource-constrained systems. You can give the test more > >>>> time by increasing the timeout factor: `mvn clean verify > >>>> -Dcheckstyle.skip -Dspotbugs.skip -Dit.test=ConcurrentDeleteTableIT > >>>> -Dtimeout.factor=3` > >>>> > >>>> There's nothing we know of that would change the way our tests work > >>>> due to ARM64, but you may have issues because of limited RAM, slow CPU > >>>> speeds, slow disk I/O, busy background processes, or other > >>>> resource-related issues. I don't think most of the currently active > >>>> developers use ARM64, or have access to a test machine to reproduce or > >>>> > >>> > >>> In case anyone wants to test on Linux ARM64 you could easily use Oracle > >>> Cloud for free. > >>> > >>> https://martin-grigorov.medium.com/github-actions-arm64-runner-on-oracle-cloud-a77cdf7a325a > >>> explains how to create a VM and how to use this VM as a Github Actions > >>> runner. > >>> https://github.com/apache/accumulo/issues/1884#issuecomment-970267282 > >>> mentions this article. > >>> > >>> > >>>> experiment with Accumulo there, so you may have to do some of your own > >>>> troubleshooting. If you can rule out resource-constraint issues, and > >>>> it isn't already a known flaky test (ConcurrentDeleteTableIT is known > >>>> flaky and sometimes times out on x86_64 as well), you could create a > >>>> bug ticket with more details at > >>>> https://github.com/apache/accumulo/issues ; there is an issue template > >>>> specifically for broken and/or flaky tests that you can select when > >>>> creating a new ticket. > >>>> > >>>> On Tue, Nov 30, 2021 at 9:34 AM Mark Jens <[email protected]> > >>>> wrote: > >>>> > > >>>> > Hi dev1, > >>>> > > >>>> > On Tue, 30 Nov 2021 at 16:21, dev1 <[email protected]> wrote: > >>>> > > >>>> > > Some of those tests are trying to stress conditions that require a > >>>> lot of > >>>> > > resources to replicate specific conditions. Have you tried to run > >>>> those > >>>> > > individual tests in isolation so that you are not competing for > >>>> resources? > >>>> > > Do they always fail, or are the failures transient? > >>>> > > > >>>> > > >>>> > Q: Have you tried to run those individual tests in isolation so that > >>>> you > >>>> > are not competing for resources? > >>>> > A: This is what I mean with the following: > >>>> > --------------------- > >>>> > The tests fail even when executed separately, e.g.: > >>>> > mvn verify -Dit.test=ConcurrentDeleteTableIT -o -rf :accumulo-test > >>>> > --------------------- > >>>> > > >>>> > Q: Do they always fail, or are the failures transient? > >>>> > A: I also tried to explain that with "These tests fail consistently at > >>>> > every build attempt!" > >>>> > > >>>> > Mark > >>>> > > >>>> > > > >>>> > > -----Original Message----- > >>>> > > From: Mark Jens <[email protected]> > >>>> > > Sent: Tuesday, November 30, 2021 4:05 AM > >>>> > > To: [email protected] > >>>> > > Subject: Consistent IT tests failures on Linux ARM64 > >>>> > > > >>>> > > Hello Accumulo community, > >>>> > > > >>>> > > At my job we consider using Linux ARM64 servers and I've been > >>>> tasked to > >>>> > > test Accumulo. > >>>> > > > >>>> > > I face some timeout related issues with several IT tests: > >>>> > > > >>>> > > > >>>> > > [ERROR] > >>>> > > > >>>> > > > >>>> org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete > >>>> > > Time elapsed: 420.122 s <<< ERROR! > >>>> > > org.junit.runners.model.TestTimedOutException: test timed out after > >>>> 420 > >>>> > > seconds at [email protected]/jdk.internal.misc.Unsafe.park(Native > >>>> Method) > >>>> > > at [email protected] > >>>> > > /java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) > >>>> > > at [email protected] > >>>> > > /java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447) > >>>> > > at [email protected] > >>>> > > /java.util.concurrent.FutureTask.get(FutureTask.java:190) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete(ConcurrentDeleteTableIT.java:213) > >>>> > > at [email protected] > >>>> > > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > >>>> > > Method) > >>>> > > at [email protected] > >>>> > > > >>>> > > > >>>> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >>>> > > at [email protected] > >>>> > > > >>>> > > > >>>> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>>> > > at [email protected] > >>>> /java.lang.reflect.Method.invoke(Method.java:566) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > >>>> > > at [email protected] > >>>> > > /java.util.concurrent.FutureTask.run(FutureTask.java:264) > >>>> > > at [email protected]/java.lang.Thread.run(Thread.java:829) > >>>> > > > >>>> > > [ERROR] > >>>> > > > >>>> > > > >>>> org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete > >>>> > > Time elapsed: 420.122 s <<< ERROR! > >>>> > > java.lang.Exception: Appears to be stuck in thread Time-limited > >>>> > > test-SendThread(localhost:44251) > >>>> > > at [email protected]/sun.nio.ch.EPoll.wait(Native Method) at > >>>> > > [email protected] > >>>> > > /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120) > >>>> > > at [email protected] > >>>> > > /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124) > >>>> > > at [email protected]/sun.nio.ch > >>>> .SelectorImpl.select(SelectorImpl.java:136) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:347) > >>>> > > at > >>>> > > > >>>> app//org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > >>>> > > > >>>> > > [ERROR] > >>>> > > > >>>> > > > >>>> org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps > >>>> > > Time elapsed: 420.011 s <<< ERROR! > >>>> > > org.junit.runners.model.TestTimedOutException: test timed out after > >>>> 420 > >>>> > > seconds at [email protected]/java.lang.Thread.sleep(Native Method) > >>>> at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:299) > >>>> > > at > >>>> app//org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:442) > >>>> > > at > >>>> app//org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:372) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.ClientContext.verifyInstanceId(ClientContext.java:467) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.ClientContext.getInstanceID(ClientContext.java:446) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.ClientContext.getManagerLocations(ClientContext.java:405) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.ManagerClient.getConnection(ManagerClient.java:59) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.ManagerClient.getConnectionWithRetry(ManagerClient.java:49) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.beginFateOperation(TableOperationsImpl.java:260) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:369) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:359) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1670) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.create(TableOperationsImpl.java:248) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps(ConcurrentDeleteTableIT.java:76) > >>>> > > at [email protected] > >>>> > > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > >>>> > > Method) > >>>> > > at [email protected] > >>>> > > > >>>> > > > >>>> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >>>> > > at [email protected] > >>>> > > > >>>> > > > >>>> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>>> > > at [email protected] > >>>> /java.lang.reflect.Method.invoke(Method.java:566) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > >>>> > > at [email protected] > >>>> > > /java.util.concurrent.FutureTask.run(FutureTask.java:264) > >>>> > > at [email protected]/java.lang.Thread.run(Thread.java:829) > >>>> > > > >>>> > > [INFO] Running org.apache.accumulo.test.functional.ScannerContextIT > >>>> > > [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 102.909 s - in org.apache.accumulo.test.functional.ScannerContextIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.KerberosRenewalIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 504.472 s - in org.apache.accumulo.test.functional.KerberosRenewalIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.BatchWriterFlushIT > >>>> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 62.132 s - in org.apache.accumulo.test.functional.BatchWriterFlushIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BinaryIT > >>>> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 65.034 s - in org.apache.accumulo.test.functional.BinaryIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.PermissionsIT > >>>> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 59.25 s - in org.apache.accumulo.test.functional.PermissionsIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.ZookeeperRestartIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 37.37 s - in org.apache.accumulo.test.functional.ZookeeperRestartIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.CreateManyScannersIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 23.046 s - in > >>>> org.apache.accumulo.test.functional.CreateManyScannersIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.CreateInitialSplitsIT > >>>> > > [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 255.108 s - in > >>>> org.apache.accumulo.test.functional.CreateInitialSplitsIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.MonitorSslIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 25.304 s - in org.apache.accumulo.test.functional.MonitorSslIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.RestartStressIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 78.359 s - in org.apache.accumulo.test.functional.RestartStressIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.BulkSplitOptimizationIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 59.289 s - in > >>>> org.apache.accumulo.test.functional.BulkSplitOptimizationIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BulkNewIT > >>>> > > [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 63.696 s - in org.apache.accumulo.test.functional.BulkNewIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BloomFilterIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 135.298 s - in org.apache.accumulo.test.functional.BloomFilterIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BulkIT > >>>> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 122.959 s - in org.apache.accumulo.test.functional.BulkIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BinaryStressIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 38.626 s - in org.apache.accumulo.test.functional.BinaryStressIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.ClassLoaderIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 45.61 s - in org.apache.accumulo.test.functional.ClassLoaderIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.LogicalTimeIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 116.819 s - in org.apache.accumulo.test.functional.LogicalTimeIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.SplitRecoveryIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 25.421 s - in org.apache.accumulo.test.functional.SplitRecoveryIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BigRootTabletIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 96.86 s - in org.apache.accumulo.test.functional.BigRootTabletIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.GarbageCollectorIT > >>>> > > [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 238.409 s - in > >>>> org.apache.accumulo.test.functional.GarbageCollectorIT > >>>> > > [INFO] Running > >>>> > > > >>>> org.apache.accumulo.test.functional.BalanceInPresenceOfOfflineTableIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 219.253 s - in > >>>> > > > >>>> org.apache.accumulo.test.functional.BalanceInPresenceOfOfflineTableIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.VisibilityIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 38.015 s - in org.apache.accumulo.test.functional.VisibilityIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.SslWithClientAuthIT > >>>> > > [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 489.863 s - in > >>>> org.apache.accumulo.test.functional.SslWithClientAuthIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.SummaryIT > >>>> > > [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 111.552 s - in org.apache.accumulo.test.functional.SummaryIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.MaxOpenIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 30.061 s - in org.apache.accumulo.test.functional.MaxOpenIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.ManagerFailoverIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 47.089 s - in org.apache.accumulo.test.functional.ManagerFailoverIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.DeleteRowsIT > >>>> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 229.586 s - in org.apache.accumulo.test.functional.DeleteRowsIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.BackupManagerIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 22.943 s - in org.apache.accumulo.test.functional.BackupManagerIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.TabletMetadataIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 46.728 s - in org.apache.accumulo.test.functional.TabletMetadataIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.LateLastContactIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 46.648 s - in org.apache.accumulo.test.functional.LateLastContactIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.SimpleBalancerFairnessIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 71.934 s - in > >>>> org.apache.accumulo.test.functional.SimpleBalancerFairnessIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.HalfDeadTServerIT > >>>> > > [ERROR] Tests run: 3, Failures: 0, Errors: 2, Skipped: 0, Time > >>>> elapsed: > >>>> > > 307.904 s <<< FAILURE! - in > >>>> > > org.apache.accumulo.test.functional.HalfDeadTServerIT > >>>> > > [ERROR] > >>>> org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover > >>>> > > Time elapsed: 240.011 s <<< ERROR! > >>>> > > org.junit.runners.model.TestTimedOutException: test timed out after > >>>> 240 > >>>> > > seconds at [email protected]/java.lang.Object.wait(Native Method) > >>>> at > >>>> > > [email protected]/java.lang.Object.wait(Object.java:328) > >>>> > > at [email protected] > >>>> /java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.test.functional.HalfDeadTServerIT.test(HalfDeadTServerIT.java:217) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover(HalfDeadTServerIT.java:142) > >>>> > > at [email protected] > >>>> > > /jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > >>>> > > Method) > >>>> > > at [email protected] > >>>> > > > >>>> > > > >>>> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >>>> > > at [email protected] > >>>> > > > >>>> > > > >>>> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>>> > > at [email protected] > >>>> /java.lang.reflect.Method.invoke(Method.java:566) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > >>>> > > at [email protected] > >>>> > > /java.util.concurrent.FutureTask.run(FutureTask.java:264) > >>>> > > at [email protected]/java.lang.Thread.run(Thread.java:829) > >>>> > > > >>>> > > [ERROR] > >>>> org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover > >>>> > > Time elapsed: 240.012 s <<< ERROR! > >>>> > > java.lang.Exception: Appears to be stuck in thread Time-limited > >>>> > > test-SendThread(localhost:39285) > >>>> > > at [email protected]/sun.nio.ch.EPoll.wait(Native Method) at > >>>> > > [email protected] > >>>> > > /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120) > >>>> > > at [email protected] > >>>> > > /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124) > >>>> > > at [email protected]/sun.nio.ch > >>>> .SelectorImpl.select(SelectorImpl.java:136) > >>>> > > at > >>>> > > > >>>> > > > >>>> app//org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:347) > >>>> > > at > >>>> > > > >>>> app//org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > >>>> > > > >>>> > > [INFO] Running org.apache.accumulo.test.functional.MetadataIT > >>>> > > [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 97.987 s - in org.apache.accumulo.test.functional.MetadataIT > >>>> > > [INFO] Running > >>>> org.apache.accumulo.test.functional.ScanSessionTimeOutIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 43.91 s - in > >>>> org.apache.accumulo.test.functional.ScanSessionTimeOutIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.ZooCacheIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 33.986 s - in org.apache.accumulo.test.functional.ZooCacheIT > >>>> > > [INFO] Running org.apache.accumulo.test.functional.DeleteRowsSplitIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 113.928 s - in org.apache.accumulo.test.functional.DeleteRowsSplitIT > >>>> > > [INFO] Running org.apache.accumulo.test.ScanFlushWithTimeIT > >>>> > > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 36.854 s - in org.apache.accumulo.test.ScanFlushWithTimeIT > >>>> > > [INFO] Running org.apache.accumulo.test.AuditMessageIT > >>>> > > [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time > >>>> elapsed: > >>>> > > 165.169 s - in org.apache.accumulo.test.AuditMessageIT > >>>> > > [INFO] Running > >>>> > > > >>>> org.apache.accumulo.test.gc.replication.CloseWriteAheadLogReferencesIT > >>>> > > [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time > >>>> elapsed: > >>>> > > 0.039 s - in > >>>> > > > >>>> org.apache.accumulo.test.gc.replication.CloseWriteAheadLogReferencesIT > >>>> > > [INFO] > >>>> > > [INFO] Results: > >>>> > > [INFO] > >>>> > > [ERROR] Errors: > >>>> > > [ERROR] > >>>> > > > >>>> > > > >>>> org.apache.accumulo.test.compaction.ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction > >>>> > > [ERROR] Run 1: > >>>> > > ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction:178 > >>>> » > >>>> > > TestTimedOut > >>>> > > [ERROR] Run 2: > >>>> > > ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction » > >>>> Appears > >>>> > > to ... > >>>> > > [INFO] > >>>> > > [ERROR] ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps:76 » > >>>> > > TestTimedOut test t... > >>>> > > [ERROR] > >>>> > > > >>>> > > > >>>> org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete > >>>> > > [ERROR] Run 1: > >>>> > > ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete:213 » > >>>> TestTimedOut > >>>> > > tes... > >>>> > > [ERROR] Run 2: > >>>> ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete » > >>>> > > Appears to be stuck... > >>>> > > [INFO] > >>>> > > [ERROR] > >>>> org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover > >>>> > > [ERROR] Run 1: > >>>> > > > >>>> > > > >>>> HalfDeadTServerIT.testRecover:142->test:217->Object.wait:328->Object.wait:-2 > >>>> > > » TestTimedOut > >>>> > > [ERROR] Run 2: HalfDeadTServerIT.testRecover » Appears to be > >>>> stuck in > >>>> > > thread Time-limited te... > >>>> > > [INFO] > >>>> > > [ERROR] org.apache.accumulo.test.functional.SslIT.adminStop > >>>> > > [ERROR] Run 1: > >>>> SslIT.adminStop:68->Object.wait:328->Object.wait:-2 » > >>>> > > TestTimedOut test timed ... > >>>> > > [ERROR] Run 2: SslIT.adminStop » Appears to be stuck in thread > >>>> > > Time-limited test-SendThread(... > >>>> > > > >>>> > > These tests fail consistently at every build attempt! > >>>> > > > >>>> > > The tests fail even when executed separately, e.g.: > >>>> > > mvn verify -Dit.test=ConcurrentDeleteTableIT -o -rf :accumulo-test > >>>> > > > >>>> > > > >>>> > > I am using the current 'main' branch of Accumulo. > >>>> > > JDK 11.0.11 > >>>> > > Maven: 3.8.2 > >>>> > > OS: Ubuntu 20.04.3 ARM64 > >>>> > > > >>>> > > Is there anything that could be done to fix these problems ? > >>>> > > For example some config settings ?! > >>>> > > > >>>> > > P.S. At https://github.com/apache/accumulo/issues/1884 I read that > >>>> Linux > >>>> > > ARM64 is a supported platform since the JVM supports it. > >>>> > > > >>>> > > Thanks! > >>>> > > > >>>> > > Mark > >>>> > > > >>>> > >>>
