[
https://issues.apache.org/jira/browse/FALCON-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029768#comment-15029768
]
Srikanth Sundarrajan commented on FALCON-141:
---------------------------------------------
I, [~sandeep.samudrala], [~pragya.mittal] discussed a bit offline on how the
cluster updates can possibly work. Wanted to put the thoughts down here for
broader discussion & consideration.
+Cluster Updates are necessary to handle following scenarios:+
* Update from Non secure hadoop cluster to secure or vice-versa
* Update from Non HA to HA or vice-versa
* Update of end points of any of the interfaces
* Update of properties in the cluster entity
In general cluster updates should perform the following actions
* Validate the new entity definition
* Perform touch of all feed & process entities to complete the operation (after
deduping entities)
We were considering using OOZIE-2187 to centralize the end points to simplify
the updates, but there are a few short comings with the approach as pointed out
by [~venkatnrangan] during the bi-weekly sync up call.
* Cross cluster Hive replication may have multiple NN/JT end points referred to
in a workflow and we can't piggy back on the global conf
* There may be other interfaces defined in the cluster entity, which may not be
supported in oozie's global section
* This may not work directly without performing a touch on every entity in the
system after the feature is enabled
+The new proposal is as follows+
* Enable a feature through admin option to put falcon is special mode:
"safe-mode" or "-initialize-update"
* Disallow all operations except for some read-only operations over and above
FALCON-1623
* Accept cluster update operation and add the updated cluster definition in
staging directory without actually performing the update
* Use admin option to leave "safe-mode" or "finalize-update" to perform the
cluster update (validation of entity followed by dependent entity updates).
System will successfully leave safe-mode if it is able to perform update, else
will remain in safe-mode.
* If cluster update is successful, but dependent entity update were to fail,
touch operation on entity can be performed to move forward.
??Falcon server on restart will put it self automatically in safe-mode if it
finds any entity in the staging directory??
*Some scenarios and how they play out with the new proposal*
+Move to Safe Mode, No updates+
* Issue admin option to move to safe mode, don't perform cluster entity update
operation
* Leave safe mode - Goes to normal (NOOP)
+Move to Safe Mode, No updates, Restart Falcon+
* Issue admin option to move to safe mode, dont perform cluster entity update
operation
* Restart Falcon
* On Restart checks staged cluster entity updates, finds none and restarts
normally
+Move to Safe Mode, Update one or more entities, Leave safe mode+
* Issue admin option to move to safe mode,
* Perform cluster entity update operation
* Issue leave safe mode admin op
* Checks for existence of staged cluster updates
* Validates cluster entity
* Performs cluster update
* Perform update on dependent entities
* Leave safe mode
+Move to Safe Mode, Update one or more entities, Restart, Leave safe mode+
* Issue admin option to move to safe mode,
* Perform cluster entity update operation
* Restart falcon server
* Falcon finds staged updates, moves to Safe mode automatically
* Issue leave safe mode admin op
* Checks for existence of staged cluster updates
* Validates cluster entity
* Performs cluster update
* Perform update on dependent entities
* Leave safe mode
> Support cluster updates
> -----------------------
>
> Key: FALCON-141
> URL: https://issues.apache.org/jira/browse/FALCON-141
> Project: Falcon
> Issue Type: Bug
> Reporter: Shwetha G S
> Assignee: Ajay Yadava
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)