Adam, Important observation, I prefer keep my pw working even a lot of segments of the network are affected by fiber cut and so on...
When I migrate my BGP VPLS services to l2circuits, my problems today is almost Zero. No matter what happens, business order for everyone is to keep everything running 24/7/365 with zero downtime no matter what.... planned maintenance doesn’t count, since is planned. VPLS services, as I said before, cause two outages in one year due l2 loop caused by operation team, after hours with no progress to find the loop origin, I was called (escalated) to solve the problem. That’s is I want to mean with my experience, uptime, availability, quality of services and so on.... I was a Cisco CCxx for many years with blind eyes in one vendor only.... even this vendor cause downtime “with brand”! My Cisco env goes down!!? Oh yes, it’s a Cisco!!! I am ok with that? Not anymore! I want peace, happy customers, sell more. With the time that I have today, I can study new tech, make some lab tests, asking for this or for that with different vendors. Today, I can sleep well without that fear if, someone will loop something, if some equipment will crash due cpu/memory problems. And yes, I am a Network Warrior! But now.... a warrior tech. Like Call Of Duty Infinity Warfare! :) att Alexandre Em 8 de jul de 2018, à(s) 17:58, "adamv0...@netconsultings.com" <adamv0...@netconsultings.com> escreveu: >> From: James Bensley [mailto:jwbens...@gmail.com] >> Sent: Friday, July 06, 2018 2:04 PM >> >> >> >> On 5 July 2018 09:56:40 BST, adamv0...@netconsultings.com wrote: >>>> Of James Bensley >>>> Sent: Thursday, July 05, 2018 9:15 AM >>>> >>>> - 100% rFLA coverage: TI-LA covers the "black spots" we currently >>> have. >>>> >>> Yeah that's an interesting use case you mentioned, that I haven't >>> considered, that is no TE need but FRR need. >>> But I guess if it was business critical to get those blind spots >>> FRR-protected then you would have done something about it already >>> right? >> >> Hi Adam, >> >> Yeah correct, no mission critical services are effected by this for us, so >> the >> business obviously hasn't allocated resource to do anything about it. If it >> was >> a major issue, it should be as simple as adding an extra back haul link to a >> node or shifting existing ones around (to reshape the P space and Q space to >> "please" the FRR algorithm). >> >>> So I guess it's more like it would be nice to have, now is it enough >>> to expose the business to additional risk? >>> Like for instance yes you'd test the feature to death to make sure it >>> works under any circumstances (it's the very heart of the network after >>> all if that breaks everything breaks), but the problem I see is then >>> going to a next release couple of years later -since SR is a new thing >>> it would have a ton of new stuff added to it by then resulting in >>> higher potential for regression bugs with comparison to LDP or RSVP >>> which have been around since >>> ever and every new release to these two is basically just bug fixes. >> >> Good point, I think its worth breaking that down into two separate >> points/concerns: >> >> Initial deployment bugs: >> We've done stuff like pay for a CPoC with Cisco, then deployed, then had it >> all blow up, then paod Cisco AS to asses the situation only to be told it's >> not a >> good design :D So we just assume a default/safe view now that no amount >> of testing will protect us. We ensure we have backout plans if something >> immediately blows up, and heightened reporting for issues that take 72 >> hours to show up, and change freezes to cover issues that take a week to >> show up etc. etc. So I think as far as an initial SR deployment goes, all we >> can >> do is our best with regards to being cautious, just as we would with any >> major core changes. So I don't see the initial deployment as any more risky >> than other core projects we've undertaken like changing vendors, entire >> chassis replacements, code upgrades between major versions etc. >> >> Regression bugs: >> My opinion is that in the case of something like SR which is being deployed >> based on early drafts, regression bugs is potentially a bigger issue than an >> initial deployment. I hadn't considered this. Again though I think its >> something we can reasonably prepare for. Depending on the potential >> impact to the business you could go as far as standing up a new chassis next >> to an existing one, but on the newer code version, run them in parallel, >> migrating services over slowly, keep the old one up for a while before you >> take it down. You could just do something as simple and physically replace >> the routing engine, keep the old one on site for a bit so you can quickly >> swap >> back. Or just drain the links in the IGP, downgraded the code, and then un- >> drain the links, if you've got some single homed services on there. If you >> have OOB access and plan all the rollback config in advance, we can >> operationally support the risks, no differently to any other major core >> change. >> >> Probably the hardest part is assessing what the risk actually is? How to know >> what level of additional support, monitoring, people, you will need. If you >> under resource a rollback of a major failure, and fuck the rollback too, you >> might need some new pants :) >> > Well yes I suppose one could actually look at it as on any other major > project like upgrade to a new SW release, or migration from LDP to RSVP-TE or > adding a second plane -or all 3 together. > And apart from the tedious and rigorous testing (god there's got to be a > better way of doing SW validation testing) you made me think about scoping > the fallback and contingency options in case things down work out. > These huge projects are always carried out in number of stages each broken > down to several individual steps all this is to ease out the deployment but > also to scope the fallout in case things go south. > Like in migrations from LDP to RSVP you go intra-pop first then inter-pop > between a pair of POPs and so on using small incremental steps and all this > time the fallback option is the good old LDP maybe even well after the > project is done until the operational confidence is high enough or till the > next code upgrade. And I think a similar approach can be used to de-risk an > SR rollout. > > > adam > > netconsultings.com > ::carrier-class solutions for the telecommunications industry:: > > > _______________________________________________ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp