[ 
https://issues.apache.org/jira/browse/ATLAS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899822#comment-16899822
 ] 

ASF subversion and git services commented on ATLAS-3168:
--------------------------------------------------------

Commit fb54a29db5a3a454f237341e7be60fb2f12de5fc in atlas's branch 
refs/heads/master from Sarath Subramanian
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=fb54a29 ]

ATLAS-3168: Fix intermittent UT failure: 
NotificationHookConsumerKafkaTest.initNotificationService()


> PatchFx: Support for HA Mode
> ----------------------------
>
>                 Key: ATLAS-3168
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3168
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: 2.0.0, trunk
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>            Priority: Major
>             Fix For: 2.0.0, trunk
>
>         Attachments: ATLAS-3168-PatchFx-Fix-for-Startup-in-HA-mode.patch, 
> ATLAS-3168-PatchFx-Unit-test-fixes-and-optimization.patch
>
>
> *Description*
> PatchFx in HA mode causes exceptions.
> *Steps to Duplicate*
> Deploy latest version of Atlas on a cluster with HA deployment.
> Following error appears during startup:
> {code:java}
> 2019-04-23 03:54:22,280 ERROR - [main-EventThread:] ~ Got exception while 
> activating (ActiveInstanceElectorService:160)
> java.lang.NullPointerException
>         at 
> org.apache.atlas.repository.audit.HBaseBasedAuditRepository.createTableIfNotExists(HBaseBasedAuditRepository.java:521)
>         at 
> org.apache.atlas.repository.audit.HBaseBasedAuditRepository.instanceIsActive(HBaseBasedAuditRepository.java:627)
>         at 
> org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:154)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661)
>         at 
> org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
>         at 
> org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435)
>         at 
> org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:660)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:539)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch.access$700(LeaderLatch.java:65)
>         at 
> org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:590)
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:865)
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:635)
>         at 
> org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
>         at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> 2019-04-23 03:54:22,280 WARN  - [main-EventThread:] ~ Server instance with 
> server id id2 is removed as leader (ActiveInstanceElectorService:197)
> {code}
> *Root Cause*
> Pattern followed within Atlas:
>  * _Service.start_ is called when _Services_ is initialized.
>  * For every service:
>  ** Atlas is not in HA mode: Start and perform startup specific actions.
>  ** Atlas is in HA mode: Start and wait for _instanceIsActive_ to be called.
>  * _AtlasPatchService_ did not implement _ActiveStateChangeHandler_.
>  * _AtlasPatchService_ was not registered with 
> _ActiveStateChangeHandler.HandlerOrder_.
> This cause _AtlasPatchService.start_ to perform its job of patching the 
> database. This happened without _AtlasTypeDefStoreInitializer_ initialized. 
> This cause exceptions. _ActiveInstanceElectoral_ service got callback from ZK 
> asking it to call the _instanceIsActive_ method on _HBaseRepositoryService_, 
> which had not been started. This caused the exception to show the stack trace.
> *Solution*
> Modify _AtlasPatchService_ to follow the pattern used for other services.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to