I would like to draw your attention to JIRA TRAFODION-2001 <https://issues.apache.org/jira/browse/TRAFODION-2001> which specifies changes in configuration and operational components to support elasticity in Trafodion. My intent is to generate discussion, obtain feedback, correct mistakes, add missing items, and obtain consensus for when to integrate these changes into the mainline code. Inherent with this capability is the likelihood that other aspects of managing a Trafodion instance will require changes and possibly enhancements. At a minimum, these enhancements change the way current key process components are configured and managed, and the old way goes away (this means that you will want to know the details of this JIRA if you are an active contributor to Trafodion).
I am adding the contents of this email as an initial comment in the TRAFODION-2001 JIRA and request that all feedback be done as comments in the JIRA. I thank you in advance. A little background, most of the implementation was done in the spring of 2015 and donated to the Apache Foundation at the end of September 2015. I am in the process of merging these changes to the current Trafodion baseline in my private fork. Here is where I need your active participation and to help with that here is a brief summary: First, review the document attached to TRAFODION-2001 <https://issues.apache.org/jira/browse/TRAFODION-2001> JIRA, as you will need its context for what follows here. Current state: Trafodion Foundation components: 'monitor/shell': * 'persist config/exec/info' commands are implemented o A 'persist kill' command is not currently specified, which I believe to be an unintended omission and needs to be added (it is an incomplete story without it as stopping persistent processes whose number grows and contracts based on node membership cannot be done with one simple command). o Some important items to consider with a 'persist kill' command: * Will return an error when used with DTM persistent processes (the transaction manager process should not be stopped in haphazard way) * Are there other persistent processes that should also be protected in this manner? * Should it return an error with TSID persistent processes? o The implementation of the 'persist kill' command corrects a problem with the code generated in the 'sscpstop', and 'ssmpstop'. * The current code generated does not take into account new processes created when nodes are added. * 'node config' command is implemented * 'node add/delete' commands - TODO - in process 'scripts' changes implemented * Compilation of Trafodion configuration file, 'sqconfig', with new 'persist' section is implemented ('sqgen', Et. Al. scripts) o The generation of 'gomon.cold' is greatly simplified as are the '<xxx>start' scripts * Creation and display of configuration data base is implemented Location of merged changes: git remote add zcorrea_fork [email protected]:zcorrea/incubator-trafodion <mailto:[email protected]:zcorrea/incubator-trafodion> Branch: zcorrea_fork/TRAFODION-2001 Impact to other components: Hadoop/Trafodion Installation * The ability to add and remove servers in an existing cluster implies the provisioning and removal of operational resources of those servers. o Trafodion depends on Hadoop and there is an implied order of provisioning and operational readiness when adding servers to a cluster. o This order will be the reverse when removing servers from a cluster. Trafodion components * Existing functionality in Trafodion assumes that when an instance is started, its static configuration does not change. Nodes may go down, i.e., fail, but the number of configured node remains static. This will no longer be true as node membership will expand and contract in the life time of a instance after initial instance startup. I look forward to your feedback, Zalo Gonzalo Correa
