Hi, We've stared on the automation journey some time ago. These are just some generic subjects that you might have to think about:
1. What to automate There are many different types of 'automation' out there. We currently concentrate on 'orchestration' of product instances and on 'automation' of various operational tasks. That means that the 'base' configuration of any device is assumed, so for example all the policers and shapers we use for our customers are already pre-defined and can be used. 2. Orchestration - model The key for us is the Product-Service-Resource model. In that approach each product has a defined list of parameters (and their values) and consists of a number of services (in networking world, for example a L3VPN product would have services such as access interface, QoS, routing instance etc). Both product and services are 'abstract' and don't reflect any device/model. Services use 'resources' as a way of implementing the configuration on individual devices. This allows for reusability of code, but also allows for products that live across multiple domains. For example - one product can contain a number of routers, switches, firewalls and applications. Some might be provisioned using SSH/CLI scrubbing, some using APIs. Currently we only generate the instance configuration (for example a L3VPN), and not the base configuration. Base configuration in our case is 'automation' - like adding new PE to a network and is subject to different rules. We use ansible as the engine, with multiple modules on top (including our own ones). 3. Handling errors Things go wrong even when you automate them. When a product instance uses resources across 10 devices (and takes 30 minutes to fully roll out) there must be a reliable roll-back process available. We don't relay on the devices to do it (as the config could have been changed by something else already) but instead we pre-generate 'reversal' config that we deploy if we run into problems. In case of upgrades that config reverts to previously known good state, for new installation it simply removes deployed config. 4. Making updates When a customer wants to upgrade their product from 500Mb/s to 1Gb/s on the access layer - how do you do it? In our case we hold 'instance data' which is the set of input values of the product parameters, any change to that set cuases all the configs to be regenerated and reprovisioned (details are down to individual devices, some actually roll-out the configs, even if its not different, some not) 5. Logging/reporting All automated operations must be logged, including the changes they make to all systems. High level reporting (on number of failures/successes) across devices/types etc helps to pin problems quickly. 6. Dealing with shared resources Sometimes making changes means changing objects that might already be configured. For example creating a unit on an interface that requires particular encapsulation on the interface. The easiest way to deal with this is to standardise all shared resource, but we found it's not always possible. 7. Good inventory system You need a way of storing all the information about your network and systems, also ability to automatically allocate things like VLANs, IPs etc. All of that must be available over an API. We also store what we call 'instance data' - all the parameters that are used to create the instance of the product on all devices. 8. Change process that allows for 'automatic deployments' If you currently have a process that relies on peer reviews, CAB meetings etc - those things will have to change. Our goal is to be able to provision an instance using a single API call (but we're not there yet). 9. Offline generation and validation We generate our configs offline, verify variables and syntax (where possible) before deployment. This way a lot of errors and inconsistencies can be detected even before touching the network/systems. Failing here is 'cheap' - nothing is really changed yet. If the failure happens during deployment it is more 'expensive' - it has to be rolled back carefully on a number of devices. Each service and resource is responsible for its own validation. Some of them query external data sources, some query live devices (for example to make sure that that VLAN id is not in use), some only do syntactic and semantic validation. 10. Post-deployment verification Once all bits are pieces are in - how to confirm that the setup is actually working? For example for a L3VPN that might mean prefixes visible in routing tables on devices, ICMP ping working between different PEs etc. For things like BGP sessions with customers (and any other customer-dependant services) it's worth marking them as 'soft' failures at this stage. 11. RBAC Who should have access to what products, on which devices they can deploy? kind regards Pshem On Fri, 17 Aug 2018 at 22:55 Antti Ristimäki <antti.ristim...@csc.fi> wrote: > Hi colleagues, > > This is something that I've been thinking quite a lot, so I would be > delighted to hear some comments, experiences or recommendations. > > So, now that more and more of us are automating their network, there will > be the question about how to manage the configurations, if they are > partially automated and partially manually maintained. This will be the > case especially while transitioning from a pure CLI jockey network towards > a more automated one. There are probably multiple approaches to solve this, > but below are a few of them: > > One option is to generate the whole config automatically e.g. from a > template or a database and just _not_accepting_ any manual configurations > at all. Then when there are needs to do something custom not yet supported > by the automation tools, instead of manually configuring it one would take > some additional time and build the support into the automation tools. The > cost for this might be that deploying something new/custom/tailor-made > might take a bit more time compared to just manually configuring it, but in > a long run the benefits are obvious. I'm personally preferring this > approach. > > Generating the _whole_ configuration automatically off-line from the > scratch makes it also easy to remove elements from the configuration, as > the auto-generated config can completely replace the existing > running-config. > > If the above mentioned is not doable for the entire configuration, one can > take one configuration hierarchy level at a time and automate it, after > which no manual configurations will be accepted under that hierarchy. This > is rather trivial especially for those configuration hierarchies that tend > to be static most of the time. > > Another option is to apply the auto-generated configuration via > apply-groups and apply all manual configurations explicitly so that the > automatic and manual configurations merge with each other. The positive > side of this approach is that it makes easy to develop the automation tools > so that manual configs are not overridden by auto-generated config, but I > personally see somewhat inconvenient that one really doesn't see the > effective running-config when using apply-groups, unless one remembers to > display inheritance. > > Any thoughts appreciated. > > Antti > _______________________________________________ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp