[jira] [Created] (ATLAS-4659) Atlas in HA mode fails to get healthy
Richard Pijnenburg created ATLAS-4659: - Summary: Atlas in HA mode fails to get healthy Key: ATLAS-4659 URL: https://issues.apache.org/jira/browse/ATLAS-4659 Project: Atlas Issue Type: Bug Affects Versions: 3.0.0 Environment: Zookeeper 3.8.0 Reporter: Richard Pijnenburg We are trying to setup atlas with the HA functionality using zookeeper 3.8.0 Relevant logs: {code:java} 2022-08-18 14:57:06,924 INFO - [main:] ~ Found matched server id id1 with host port: atlas-0.atlas-headless.atlas.svc.cluster.local:21000 (AtlasServerIdSelector:65) 2022-08-18 14:57:06,924 INFO - [main:] ~ Starting leader election for id1 (ActiveInstanceElectorService:112) 2022-08-18 14:57:06,933 INFO - [main:] ~ Leader latch started for id1. (ActiveInstanceElectorService:118) 2022-08-18 14:57:06,991 INFO - [main:] ~ AtlasJsonProvider() instantiated (AtlasJsonProvider:53) 2022-08-18 14:57:07,296 WARN - [main-EventThread:] ~ Server instance with server id id1 is elected as leader (ActiveInstanceElectorService:152) 2022-08-18 14:57:07,296 WARN - [main-EventThread:] ~ Instance becoming active from PASSIVE (ServiceState:88 ——— 2022-08-18 14:57:27,818 INFO - [main-EventThread:] ~ Reacting to active state: initializing Kafka consumers (NotificationHookConsumer:421) 2022-08-18 14:57:27,819 INFO - [main-EventThread:] ~ ==> KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, autoCommitEnabled=false) (KafkaNotification:194) 2022-08-18 14:57:28,237 INFO - [main-EventThread:] ~ <== KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, autoCommitEnabled=false) (KafkaNotification:234) 2022-08-18 14:57:28,402 INFO - [main-EventThread:] ~ ==> TaskManagement.instanceIsActive() (TaskManagement:94) 2022-08-18 14:57:28,402 INFO - [main-EventThread:] ~ TaskManagement: Started! (TaskManagement:196) 2022-08-18 14:57:28,479 INFO - [NotificationHookConsumer thread-0:] ~ [atlas-hook-consumer-thread]: Starting (Logging:66) 2022-08-18 14:57:28,481 INFO - [NotificationHookConsumer thread-0:] ~ ==> HookConsumer doWork() (NotificationHookConsumer$HookConsumer:540) 2022-08-18 14:57:28,483 INFO - [NotificationHookConsumer thread-0:] ~ Atlas Server is not ready. Waiting for 1000 milliseconds to retry... (NotificationHookConsumer$HookConsumer:940) 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ TaskManagement: Found: 0: Tasks in pending state. (TaskManagement:212) 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ <== TaskManagement.instanceIsActive() (TaskManagement:98) 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ ==> IndexRecoveryService.instanceIsActive() (IndexRecoveryService:117) 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ <== IndexRecoveryService.instanceIsActive() (IndexRecoveryService:121) 2022-08-18 14:57:28,486 INFO - [index-health-monitor:] ~ Index Health Monitor: Starting... (IndexRecoveryService$RecoveryThread:175) 2022-08-18 14:57:28,487 ERROR - [main-EventThread:] ~ Got exception while activating (ActiveInstanceElectorService:162) org.apache.atlas.exception.AtlasBaseException: ActiveInstanceState.update resulted in exception. at org.apache.atlas.web.service.ActiveInstanceState.update(ActiveInstanceState.java:119) at org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:158) at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:702) at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:698) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:697) at org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:575) at org.apache.curator.framework.recipes.leader.LeaderLatch.access$600(LeaderLatch.java:65) at org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:626) at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:883) at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:653) at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152) at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.
[jira] [Updated] (ATLAS-4659) Atlas in HA mode fails to get healthy
[ https://issues.apache.org/jira/browse/ATLAS-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Pijnenburg updated ATLAS-4659: -- Environment: Zookeeper 3.8.0, Cassandra 4.0.5, Solr 8.11.1 (was: Zookeeper 3.8.0) > Atlas in HA mode fails to get healthy > - > > Key: ATLAS-4659 > URL: https://issues.apache.org/jira/browse/ATLAS-4659 > Project: Atlas > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: Zookeeper 3.8.0, Cassandra 4.0.5, Solr 8.11.1 >Reporter: Richard Pijnenburg >Priority: Major > > We are trying to setup atlas with the HA functionality using zookeeper 3.8.0 > Relevant logs: > {code:java} > 2022-08-18 14:57:06,924 INFO - [main:] ~ Found matched server id id1 with > host port: atlas-0.atlas-headless.atlas.svc.cluster.local:21000 > (AtlasServerIdSelector:65) > 2022-08-18 14:57:06,924 INFO - [main:] ~ Starting leader election for id1 > (ActiveInstanceElectorService:112) > 2022-08-18 14:57:06,933 INFO - [main:] ~ Leader latch started for id1. > (ActiveInstanceElectorService:118) > 2022-08-18 14:57:06,991 INFO - [main:] ~ AtlasJsonProvider() instantiated > (AtlasJsonProvider:53) > 2022-08-18 14:57:07,296 WARN - [main-EventThread:] ~ Server instance with > server id id1 is elected as leader (ActiveInstanceElectorService:152) > 2022-08-18 14:57:07,296 WARN - [main-EventThread:] ~ Instance becoming > active from PASSIVE (ServiceState:88 > > ——— > > 2022-08-18 14:57:27,818 INFO - [main-EventThread:] ~ Reacting to active > state: initializing Kafka consumers (NotificationHookConsumer:421) > 2022-08-18 14:57:27,819 INFO - [main-EventThread:] ~ ==> > KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, > autoCommitEnabled=false) (KafkaNotification:194) > 2022-08-18 14:57:28,237 INFO - [main-EventThread:] ~ <== > KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, > autoCommitEnabled=false) (KafkaNotification:234) > 2022-08-18 14:57:28,402 INFO - [main-EventThread:] ~ ==> > TaskManagement.instanceIsActive() (TaskManagement:94) > 2022-08-18 14:57:28,402 INFO - [main-EventThread:] ~ TaskManagement: > Started! (TaskManagement:196) > 2022-08-18 14:57:28,479 INFO - [NotificationHookConsumer thread-0:] ~ > [atlas-hook-consumer-thread]: Starting (Logging:66) > 2022-08-18 14:57:28,481 INFO - [NotificationHookConsumer thread-0:] ~ ==> > HookConsumer doWork() (NotificationHookConsumer$HookConsumer:540) > 2022-08-18 14:57:28,483 INFO - [NotificationHookConsumer thread-0:] ~ Atlas > Server is not ready. Waiting for 1000 milliseconds to retry... > (NotificationHookConsumer$HookConsumer:940) > 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ TaskManagement: Found: > 0: Tasks in pending state. (TaskManagement:212) > 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ <== > TaskManagement.instanceIsActive() (TaskManagement:98) > 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ ==> > IndexRecoveryService.instanceIsActive() (IndexRecoveryService:117) > 2022-08-18 14:57:28,485 INFO - [main-EventThread:] ~ <== > IndexRecoveryService.instanceIsActive() (IndexRecoveryService:121) > 2022-08-18 14:57:28,486 INFO - [index-health-monitor:] ~ Index Health > Monitor: Starting... (IndexRecoveryService$RecoveryThread:175) > 2022-08-18 14:57:28,487 ERROR - [main-EventThread:] ~ Got exception while > activating (ActiveInstanceElectorService:162) > org.apache.atlas.exception.AtlasBaseException: ActiveInstanceState.update > resulted in exception. > at > org.apache.atlas.web.service.ActiveInstanceState.update(ActiveInstanceState.java:119) > at > org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:158) > at > org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:702) > at > org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:698) > at > org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) > at > org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) > at > org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:697) > at > org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:575) > at > org.apache.curator.framework.recipes.leader.LeaderLatch.access$600(LeaderLatch.java:65) > at > org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:626) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java: