[ 
https://issues.apache.org/jira/browse/FALCON-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029768#comment-15029768
 ] 

Srikanth Sundarrajan commented on FALCON-141:
---------------------------------------------

I, [~sandeep.samudrala], [~pragya.mittal] discussed a bit offline on how the 
cluster updates can possibly work. Wanted to put the thoughts down here for 
broader discussion & consideration.

+Cluster Updates are necessary to handle following scenarios:+
* Update from Non secure hadoop cluster to secure or vice-versa
* Update from Non HA to HA or vice-versa
* Update of end points of any of the interfaces
* Update of properties in the cluster entity

In general cluster updates should perform the following actions
* Validate the new entity definition
* Perform touch of all feed & process entities to complete the operation (after 
deduping entities)

We were considering using OOZIE-2187 to centralize the end points to simplify 
the updates, but there are a few short comings with the approach as pointed out 
by [~venkatnrangan] during the bi-weekly sync up call. 
* Cross cluster Hive replication may have multiple NN/JT end points referred to 
in a workflow and we can't piggy back on the global conf
* There may be other interfaces defined in the cluster entity, which may not be 
supported in oozie's global section
* This may not work directly without performing a touch on every entity in the 
system after the feature is enabled

+The new proposal is as follows+
* Enable a feature through admin option to put falcon is special mode: 
"safe-mode" or "-initialize-update" 
* Disallow all operations except for some read-only operations over and above 
FALCON-1623
* Accept cluster update operation and add the updated cluster definition in 
staging directory without actually performing the update
* Use admin option to leave "safe-mode" or "finalize-update" to perform the 
cluster update (validation of entity followed by dependent entity updates). 
System will successfully leave safe-mode if it is able to perform update, else 
will remain in safe-mode. 
* If cluster update is successful, but dependent entity update were to fail, 
touch operation on entity can be performed to move forward.
??Falcon server on restart will put it self automatically in safe-mode if it 
finds any entity in the staging directory??

*Some scenarios and how they play out with the new proposal*
+Move to Safe Mode, No updates+
* Issue admin option to move to safe mode, don't perform cluster entity update 
operation
* Leave safe mode - Goes to normal (NOOP)

+Move to Safe Mode, No updates, Restart Falcon+
* Issue admin option to move to safe mode, dont perform cluster entity update 
operation
* Restart Falcon
* On Restart checks staged cluster entity updates, finds none and restarts 
normally

+Move to Safe Mode,  Update one or more entities, Leave safe mode+
* Issue admin option to move to safe mode, 
* Perform cluster entity update operation
* Issue leave safe mode admin op
* Checks for existence of staged cluster updates
* Validates cluster entity
* Performs cluster update
* Perform update on dependent entities
* Leave safe mode

+Move to Safe Mode,  Update one or more entities, Restart, Leave safe mode+
* Issue admin option to move to safe mode, 
* Perform cluster entity update operation
* Restart falcon server
* Falcon finds staged updates, moves to Safe mode automatically
* Issue leave safe mode admin op
* Checks for existence of staged cluster updates
* Validates cluster entity
* Performs cluster update
* Perform update on dependent entities
* Leave safe mode

> Support cluster updates
> -----------------------
>
>                 Key: FALCON-141
>                 URL: https://issues.apache.org/jira/browse/FALCON-141
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Shwetha G S
>            Assignee: Ajay Yadava
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to