[
https://issues.apache.org/jira/browse/FALCON-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541622#comment-14541622
]
Peeyush Bishnoi commented on FALCON-1165:
-----------------------------------------
Discussed the issue with [~venkatnrangan] and [~ajayyadava]. We agreed to log
the error message from onReload method, if any exception occurred when Falcon
try to reload registered cluster entities on source cluster in class
SharedLibraryHostingService. Through this, Falcon will be restarted
successfully on source cluster.
Updated patch is attached, please review.
> Falcon restart failed, if defined service in cluster entity is unreachable
> --------------------------------------------------------------------------
>
> Key: FALCON-1165
> URL: https://issues.apache.org/jira/browse/FALCON-1165
> Project: Falcon
> Issue Type: Bug
> Components: oozie
> Reporter: Peeyush Bishnoi
> Assignee: Peeyush Bishnoi
> Fix For: 0.7
>
> Attachments: FALCON-1165.patch, FALCON-1165.v1.patch,
> FALCON-1165.v2.patch
>
>
> Falcon fail to restart, if any service in the cluster entity is not reachable
> or down.
> For example, if there are clusters X, Y, Z. In cluster X, submit cluster
> entities which points to services of cluster Y & Z. Execute some replication
> jobs from cluster X to Y and even to cluster Z as well. If after certain
> duration, cluster Z HDFS service is down due to maintenance activity and at
> the same time we require to restart Falcon service on cluster X due to some
> reason, then Falcon will fail to restart on cluster X.
> This issue has been reported internally at Hortonworks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)