I am +1 on all these concepts. This matches all of our anticipated requirements for deployment of service configs, near instant changes, logging, reporting, rollback and testing. I think the canary testing options need to be fleshed out a bit more. There are a lot of different options suggested below and I think each has pros/cons but the core DSCV idea gives us a lot of advantages.
Ryan Durfey M | 303-524-5099 On 5/1/17, 11:12 AM, "Nir Sopher" <n...@qwilt.com> wrote: Dear all, Planning the efforts toward "self-service", we are considering "delivery-service configuration versioning" (DSCV) as one of our next steps. In a very high level, by DSCV we refer to the ability to hold multiple configuration versions/revisions per delivery-service, and actively choose which version should be deployed. A significant portion of the value we would like to bring when working toward "self-service" can be achieved using the initial step of configuration versioning: 1. As the amount of delivery-services handled by TC is increasing, denying the "non dev-ops" user from changing delivery-services configuration by himself, and require a "dev-ops" user to actually make the changes in the DB, put an increasing load on the operations team. Via DSCV the operator may allow the users to really push configurations into the DB, as it separates the provisioning phase from the deployment. Once commited, the CDN's "dev-ops" user is able to examine the changes and choose which version should be deployed, subject to the operator's acceptance policy. 2. DSCV brings improved auditing and troubleshooting capabilities, which is important for supporting TC deployment growth, as well as allow users to be more independent. It allows to investigate issues using versions associated log records, as well as the data in the DB itself: Examining the delivery-service versions, their meta data (e.g. "deployed dates") as well as use tools for versions comparisons. 3. DSCV allows a simple delivery service configuration rollback, which provides a quick remedy for configuration errors issues. Moreover, we suggest to allow the deployment of multiple versions of the same delivery service simultaneously, on the same caches. Doing so, and allowing the operator to orchestrate the usage of the different versions (for example, via "steering"), the below become available: 1. Manual testing of a new delivery-service configuration, via dedicated URL or using request headers. 2. Staging / Canary testing of new versions, applying them only for a specific content path, or filtering base on source IP. 3. Gradual transition between the different configuration versions. 4. Configuration versions A/B testing (assuming the reporting/stats also becomes "version aware"). 5. Immediate (no CRON wait, cr-config change only) delivery-service version"switch", and specifically immediate rollback capabilities. Note that, engineering wise, one may consider DSCV as a building block for other "self-service" steps. It allows the system to identify what configuration is deployed on which server, as well as allows the servers to identify configuration changes with DS granularity. Therefore, it can help to decouple the individual delivery services deployment as well as reduce the load derived from the caches update process. We would greatly appreciate community input on the subject. Many thanks, Nir