[jira] [Created] (ATLAS-4659) Atlas in HA mode fails to get healthy

2022-08-18 Thread Richard Pijnenburg (Jira)
Richard Pijnenburg created ATLAS-4659:
-

 Summary: Atlas in HA mode fails to get healthy
 Key: ATLAS-4659
 URL: https://issues.apache.org/jira/browse/ATLAS-4659
 Project: Atlas
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: Zookeeper 3.8.0
Reporter: Richard Pijnenburg


We are trying to setup atlas with the HA functionality using zookeeper 3.8.0

Relevant logs:
{code:java}
2022-08-18 14:57:06,924 INFO  - [main:] ~ Found matched server id id1 with host 
port: atlas-0.atlas-headless.atlas.svc.cluster.local:21000 
(AtlasServerIdSelector:65)
2022-08-18 14:57:06,924 INFO  - [main:] ~ Starting leader election for id1 
(ActiveInstanceElectorService:112)
2022-08-18 14:57:06,933 INFO  - [main:] ~ Leader latch started for id1. 
(ActiveInstanceElectorService:118)
2022-08-18 14:57:06,991 INFO  - [main:] ~ AtlasJsonProvider() instantiated 
(AtlasJsonProvider:53)
2022-08-18 14:57:07,296 WARN  - [main-EventThread:] ~ Server instance with 
server id id1 is elected as leader (ActiveInstanceElectorService:152)
2022-08-18 14:57:07,296 WARN  - [main-EventThread:] ~ Instance becoming active 
from PASSIVE (ServiceState:88
 
———
 
2022-08-18 14:57:27,818 INFO  - [main-EventThread:] ~ Reacting to active state: 
initializing Kafka consumers (NotificationHookConsumer:421)
2022-08-18 14:57:27,819 INFO  - [main-EventThread:] ~ ==> 
KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, 
autoCommitEnabled=false) (KafkaNotification:194)
2022-08-18 14:57:28,237 INFO  - [main-EventThread:] ~ <== 
KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, 
autoCommitEnabled=false) (KafkaNotification:234)
2022-08-18 14:57:28,402 INFO  - [main-EventThread:] ~ ==> 
TaskManagement.instanceIsActive() (TaskManagement:94)
2022-08-18 14:57:28,402 INFO  - [main-EventThread:] ~ TaskManagement: Started! 
(TaskManagement:196)
2022-08-18 14:57:28,479 INFO  - [NotificationHookConsumer thread-0:] ~ 
[atlas-hook-consumer-thread]: Starting (Logging:66)
2022-08-18 14:57:28,481 INFO  - [NotificationHookConsumer thread-0:] ~ ==> 
HookConsumer doWork() (NotificationHookConsumer$HookConsumer:540)
2022-08-18 14:57:28,483 INFO  - [NotificationHookConsumer thread-0:] ~ Atlas 
Server is not ready. Waiting for 1000 milliseconds to retry... 
(NotificationHookConsumer$HookConsumer:940)
2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ TaskManagement: Found: 0: 
Tasks in pending state. (TaskManagement:212)
2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ <== 
TaskManagement.instanceIsActive() (TaskManagement:98)
2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ ==> 
IndexRecoveryService.instanceIsActive() (IndexRecoveryService:117)
2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ <== 
IndexRecoveryService.instanceIsActive() (IndexRecoveryService:121)
2022-08-18 14:57:28,486 INFO  - [index-health-monitor:] ~ Index Health Monitor: 
Starting... (IndexRecoveryService$RecoveryThread:175)
2022-08-18 14:57:28,487 ERROR - [main-EventThread:] ~ Got exception while 
activating (ActiveInstanceElectorService:162)
org.apache.atlas.exception.AtlasBaseException: ActiveInstanceState.update 
resulted in exception.
        at 
org.apache.atlas.web.service.ActiveInstanceState.update(ActiveInstanceState.java:119)
        at 
org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:158)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:702)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:698)
        at 
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
        at 
org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at 
org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:697)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:575)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch.access$600(LeaderLatch.java:65)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:626)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:883)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:653)
        at 
org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.

[jira] [Updated] (ATLAS-4659) Atlas in HA mode fails to get healthy

2022-08-18 Thread Richard Pijnenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Pijnenburg updated ATLAS-4659:
--
Environment: Zookeeper 3.8.0, Cassandra 4.0.5, Solr 8.11.1  (was: Zookeeper 
3.8.0)

> Atlas in HA mode fails to get healthy
> -
>
> Key: ATLAS-4659
> URL: https://issues.apache.org/jira/browse/ATLAS-4659
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: Zookeeper 3.8.0, Cassandra 4.0.5, Solr 8.11.1
>Reporter: Richard Pijnenburg
>Priority: Major
>
> We are trying to setup atlas with the HA functionality using zookeeper 3.8.0
> Relevant logs:
> {code:java}
> 2022-08-18 14:57:06,924 INFO  - [main:] ~ Found matched server id id1 with 
> host port: atlas-0.atlas-headless.atlas.svc.cluster.local:21000 
> (AtlasServerIdSelector:65)
> 2022-08-18 14:57:06,924 INFO  - [main:] ~ Starting leader election for id1 
> (ActiveInstanceElectorService:112)
> 2022-08-18 14:57:06,933 INFO  - [main:] ~ Leader latch started for id1. 
> (ActiveInstanceElectorService:118)
> 2022-08-18 14:57:06,991 INFO  - [main:] ~ AtlasJsonProvider() instantiated 
> (AtlasJsonProvider:53)
> 2022-08-18 14:57:07,296 WARN  - [main-EventThread:] ~ Server instance with 
> server id id1 is elected as leader (ActiveInstanceElectorService:152)
> 2022-08-18 14:57:07,296 WARN  - [main-EventThread:] ~ Instance becoming 
> active from PASSIVE (ServiceState:88
>  
> ———
>  
> 2022-08-18 14:57:27,818 INFO  - [main-EventThread:] ~ Reacting to active 
> state: initializing Kafka consumers (NotificationHookConsumer:421)
> 2022-08-18 14:57:27,819 INFO  - [main-EventThread:] ~ ==> 
> KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, 
> autoCommitEnabled=false) (KafkaNotification:194)
> 2022-08-18 14:57:28,237 INFO  - [main-EventThread:] ~ <== 
> KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, 
> autoCommitEnabled=false) (KafkaNotification:234)
> 2022-08-18 14:57:28,402 INFO  - [main-EventThread:] ~ ==> 
> TaskManagement.instanceIsActive() (TaskManagement:94)
> 2022-08-18 14:57:28,402 INFO  - [main-EventThread:] ~ TaskManagement: 
> Started! (TaskManagement:196)
> 2022-08-18 14:57:28,479 INFO  - [NotificationHookConsumer thread-0:] ~ 
> [atlas-hook-consumer-thread]: Starting (Logging:66)
> 2022-08-18 14:57:28,481 INFO  - [NotificationHookConsumer thread-0:] ~ ==> 
> HookConsumer doWork() (NotificationHookConsumer$HookConsumer:540)
> 2022-08-18 14:57:28,483 INFO  - [NotificationHookConsumer thread-0:] ~ Atlas 
> Server is not ready. Waiting for 1000 milliseconds to retry... 
> (NotificationHookConsumer$HookConsumer:940)
> 2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ TaskManagement: Found: 
> 0: Tasks in pending state. (TaskManagement:212)
> 2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ <== 
> TaskManagement.instanceIsActive() (TaskManagement:98)
> 2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ ==> 
> IndexRecoveryService.instanceIsActive() (IndexRecoveryService:117)
> 2022-08-18 14:57:28,485 INFO  - [main-EventThread:] ~ <== 
> IndexRecoveryService.instanceIsActive() (IndexRecoveryService:121)
> 2022-08-18 14:57:28,486 INFO  - [index-health-monitor:] ~ Index Health 
> Monitor: Starting... (IndexRecoveryService$RecoveryThread:175)
> 2022-08-18 14:57:28,487 ERROR - [main-EventThread:] ~ Got exception while 
> activating (ActiveInstanceElectorService:162)
> org.apache.atlas.exception.AtlasBaseException: ActiveInstanceState.update 
> resulted in exception.
>         at 
> org.apache.atlas.web.service.ActiveInstanceState.update(ActiveInstanceState.java:119)
>         at 
> org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:158)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:702)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:698)
>         at 
> org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
>         at 
> org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
>         at 
> org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:697)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:575)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch.access$600(LeaderLatch.java:65)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:626)
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java: