Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On 28/Jan/20 09:45, Saku Ytti wrote: > We should learn to crawl before we take rocket to proxima centauri. Agreed! > > You don't need ML/AI to find problems in your network, using algorithm > 'this counter which increments at rate X stopped incrementing or > started to increment 100 times slower' and 'this counter which does > not increment, started to increment', and you'll find a lot of > problems in your network. But do you care about every problem in your > network, or only problems that customers care about? Agreed! > > Juniper once in EBC had some really smart academics explaining us > their ML/AI project which predicts resource needs on a given system. > They quoted how close they got to real numbers then I asked how does > it perform against naive system, after explaining by naive system I > mean system like 'my box has 1M FIB entries so FIB entry uses > RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't > tried this and couldn't tell how well the ML/AI performs against this. > > Can you really train today ML/AI to determine what actually matters? I > don't think you can, because what actually matters is something that > impacted customer, and you simply cannot put enough learning data in, > you don't have nearly enough customer trouble tickets to be able to > correlate them to network data you're collecting and start predicting > which complex counter combinations are predicting customer ticket > later. Agreed! > > But are you at least monitoring how many networks are lost inside your > network? Delta of input/output? That is fairly trivial to cover _all > reasons for packet loss_, of course latency/jitter are not covered, > but still, it covers alot of ground fast. Do you have a single system > where you collect all data? Have you enrichened the data stuff like > npu, linecard, city, country, region? Almost no one is doing even very > basic stuff, so I think ML/AI isn't going to be the low hanging fruit > any time soon. If you have a single system with lot of labels for > every counter, you can do a lot with very naive analytics. If you > don't have the data, you can't do anything with the smartest possible > system. And I think almost no one is collecting data in such a manner > that it's actually capitalisable, because we can keep running the > network with how how we did in 90s, IF-MIB and netflow, in separate > systems, with no encrichement at all. Agreed! For us, between Iris (a South African-written NMS), Kentik and Blue Planet ROA (formerly Packet Design) gives us plenty of insight into what our network is doing, what it did, and what it may do. We pay Iris for their NMS, and this gives us quite a bit of flexibility in what we can monitor and alert, provided there is way we can get the data off the box. We don't believe spending too much time and effort in building ML/AI engines will solve a real problem in our specific network, today. Tomorrow, maybe. I'd rather spend time upgrading Iris to support telemetry streaming, as this has immediate and tangible benefits. Mark. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On 27/Jan/20 22:30, adamv0...@netconsultings.com wrote: > Very good point Robert, > There are indeed two parts to the whole automation story > (it' obvious that this theme deserve a series of blog posts, but I keep on > finding excuses). > > The analogy I usually use in presentations is the left brain right brain > analogy, > Where left brain is responsible for logical thinking and right brain is > responsible for creative thinking and intuition. > So a complete automation solution is built similarly: > Left brain is responsible for routine automated service provisioning > - and contains models of resources, services, devices, workflows, policies > -and you can teach it by loading new/additional models. > Right brain on the other hand is responsible for "self-driving" the network > (yeah I know can't think of better term) > - and collects data from network and acts on distributed policies, and also > performs trending, analytics, correlation, arbitration etc... > Now left brain and right brain talk to each other obviously, > Policies are defined in left brain and distributed to right brain to act on > them. > Also right brain can trigger workflows in left brain. > > Major paradigm shift for our service designers here will be that they are now > going to be responsible not only for putting the individual service building > blocks together in term of config (and service lifecycle workflow -tbd), but > also in terms of policies - determining the health of the provisioned service > (including thresholds, post-checks, ongoing checks etc...) > But following the MDE (Model driven Engineering) theme it's not just service > designers contributing to the policy library, it's Ops teams, Security teams, > etc... > Main advantage is see is that some of the policies that will be created for > the soon to be automated service certification testing could then be reused > for the particular service provisioning post-test and service lifecycle > monitoring and vice versa. > Then obviously there are policies defining what to do in various DDoS > scenarios, and I consider the vendor solutions actually doing analytics, > correlation, arbitration all part of the left brain). Not to sound silly, but you're taking me back to where I was right around the time Cisco decided to pick up Tail-f :-). > Then nowadays there's also the possibility to enable tons upon tons of > streaming telemetry -where I could see it all landing in a common data lake > where some form of deep convolutional neural networks could be used for > unsupervised pattern/feature learning, -reason being I'd like the system to > tell me look if this counter is high and that one too and this is low then > this usually happens. But I'd rather wait to see what the industry offers in > this area than developing such solutions internally. For me, this makes a lot of sense. I'm happy to support standardization of telemetry streaming (box vendor) and decoding (NMS vendor) because that enhances the NMS capabilities, which takes away a lot of the corner-case issues Saku highlighted (well, that's the hope). > For now I'm glad I have automation projects going, when I asked whether we > should have AI in network strategy for 2020 I got awkward silence in > response. I'm not even going to touch the ML/AI pole :-). > I don't know, my experience is that working in tandem with a devops person > (as opposed to trying to figure it myself) gets me the desired results much > faster (and in line with whatever their sys-architecture guidelines or coding > principles are) while I can focus on WHAT (from the network perspective) not > HOW (coding/system perspective). Although yes for some of the POC stuff I > wish I had some coding skills. > But to give you a concrete example from my work, when I had a choice to read > some python books or some more microservice architecture books I chose the > latter as it was more important for me to know the difference between for > instance orchestration and choreography among other aspects of microservice > architectures to assess the pros and cons of each in order to make an > educated argument for the service workflow engine architecture choice - so it > lines up with what I had in mind for service layer workflows > flexibility/agility. Totally agree with you there. Network engineers should stop feeling the pressure about needing to become software heads. I can tell you now, where we are with getting software to solve of our operational problems, network engineers are going to be in huge demand. Let's just not ruin the pot by teaching them "GUI is the absolute answer" the moment they leave the university gates and enter the real world. It's been hard enough disabusing them of "Class A, Class B, Class C". > Well we are starting to get a glimpse of it already with VM of a > Route-Reflector running on a server - who owns the host (HW & SW) is it > sys-admins or ip-ops, which Mark
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On 27/Jan/20 00:18, Robert Raszuk wrote: > > Without proper automation in place going way above basic IGP, BGP, > LDP, BFD etc ... you need a bit of clever automation to detect it and > either alarm noc or if they are really smart take such router out of > the SPF network wide. If not you sit and wait till pissed customers > call - which is already a failure. So we use Packet Design (now Ciena Blue Planet) for stuff like this. Works like a treat :-). We also have a dear NMS that can be told what to look for and notify the team. If NOC's are sitting and waiting for customers to get pissed off, automation is going to make your problems worse, and not better. Mark. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On 26/Jan/20 22:46, adamv0...@netconsultings.com wrote: > You nailed it Mark, > My opinion is that this new NetDevOps/NetOps initiative is the biggest > blunder of the networking industry. > If as a network engineer/architect you have some coding skills well good for > you, > But are programming skills a requirement to get into network > engineering/architecture nowadays - that absolutely should not be the case. > We need skilled network engineers and architects to know how to build and > operate complex networks > We need skilled developers and system architects to know how to build and > operate complex systems (including network automation systems) > We need these two groups to be able to talk to each other in a constructive > manner - check out Model-driven engineering to get you started. Totally agreed. > Following these 3 simple premises you can then afford to have an army of > web-ui clickers - provisioning network services not knowing the first thing > about what' going on in the background of the network automation system. Or > not, and you just handover the web-ui/API to your customers and have them > self-service. Which is fine for me, as long as there are still some real network engineers at the company that can make sure the network runs when the automation system bombs out. > > Imagine a case where network engineer builds an automation solution based on > number of hacks involving ansible, python, ydk, whatever... and this > solution gets traction and is used by the company. > Now that poor networking guy has a full-time job supporting the automation > solution, fixing bugs, developing new functionality and you just lost one > network engineer. This is a good example of jack of all trades but master of > none. > Even if you're a small operation or a start-up hiring a developer and make > him talk to the network engineer in a virtual team is a much better option. Agreed - but we, generally, have to start from somewhere. And if you are going to start small with Ansible, I've found that network engineers will do that better with limited sysadmin experience than trying to get the sysadmin to stand up Ansible and get it to talk to routers. Over time, if its successful, you can farm out bits about Ansible to the software team that best suits their skills. I'm not for the idea that network engineers are obsolete and the only way to run an IP/MPLS network, over the next decade, is by giving it to the software heads. > Don't judge the book by its cover, in other words just give NETCONF and YANG > a try, seriously. > > I'd say that NETCONF's biggest advantage over SNMP/CLI is it's transaction > mechanism particularly atomicity and consistency ("all or nothing" and "all > at once") from the full ACID, but all these are addressed by all NOS-es > supporting two stage commit via CLI (As Saku mentioned below), so not a > biggie. > Sorry Mark XE is not one of those NOS-es, but you could still get the > functionality on XE using NETCONF ;) > > YANG on the other hand gives one a common modelling language for > representing services layer configuration and network layer configuration, > which I find useful. > But I'm a minority, I guess there aren't many of you using RFC8299 & RFC8466 > as bases for decomposing your L2 and L3 services and building a service > abstraction layer, on top of network configuration layer, so YMMV. I believed a lot in NETCONF/YANG in the middle of this past decade, and actually insisted solutions support it. But as I've said before, over time, you get to learn how to sniff the smell in the air, and while a lot of those solutions paint a good picture, for our particular use-case, it just felt like a whole lot of complexity for the 2 simple things we wanted to achieve first - customer service provisioning/de-provisioning + network deployment. Which is not to say that NETCONF/YANG have no use-case. Down the line, if what we are doing with Ansible becomes overly complex to require models based on NETCONF/YANG, happy to consider. But to get off the ground, I feel they are too heavy. I don't want to design overly elaborate data models - we know what it takes to deploy or run a device; we've been doing it for years. That doesn't change regardless of if it's being done by humans in a semi- or fully hands-off approach. The last 10 years have been bogged down by trying to figure how much automation to do, when to do it and how. In the end, we are still in the same place, and even more confused. If starting off slow with Ansible is okay for me for the next decade, I'm good with that. > I'd say there are different levels of automation, > At the entry level the aim might be just faster CLI interaction/simpler CLI > scraping (Ansible and the like) and at the other extreme there's the full > potential of model based engineering realized with frameworks like ONAP. > And operators naturally find themselves somewhere between these two poles in > their automation efforts
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On Tue, 28 Jan 2020 at 11:19, Robert Raszuk wrote: > So at t0+N I record how many packets entered my system. (We are already at > loss here as RE can generate packets unless you add to this RE outbound > packets). Then at t0+N+uS (uS) delta of switching via fabric you record > number of packets which left the box. > > What is your N and uS ? You're not gonna get it 1:1, you will monitor the delta rate you see and react when the delta rate increases. Of course you can keep tuning this, by adding more and more drop counters to reduce known delta rate, but you always have to accept you can't explain it perfectly. But not every small issue is an important issue. Certainly your fabric lost 30% would have been blatantly obvious even in most naive such system. > > You don't need ML/AI to find problems in your network, using algorithm > > 'this counter which increments at rate X stopped incrementing or > > started to increment 100 times slower' > > Well the way I read Adam's note was that learning this rate X is what he > (IMHO correctly) calls ML :) What I mean current_rate = X, if now_rate > X*100 or now_rate < X/100, no ML, just stupid static comparison of dramatic rate change. And even this is advanced by today's standard. Even counter rate went to 0 from non-zero or went to non-zero from 0 exposes lot of real issues, but issues which happen so rarely customers are not complaining about them. Particular example, all of us have some ip checksum errors in the network, when it's on edge router, edge interface ingress direction you can ignore it 'someone elses problem', but we also see it in other interface/direction where it means we flipped bits somewhere and calculated correct FCS over the broken data, i.e. we have broken memory somewhere. But it probably isn't broken enough to matter, it probably mangles packets rather rarely. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
> And I think almost no one is collecting data in such a manner > that it's actually capitalisable, because we can keep running the > network with how how we did in 90s, IF-MIB and netflow, in separate > systems, with no encrichement at all. Spot on ! Btw Saku - you keep suggesting measuring delta of input/output ... well to do it well I am afraid it is not trivial. So at t0+N I record how many packets entered my system. (We are already at loss here as RE can generate packets unless you add to this RE outbound packets). Then at t0+N+uS (uS) delta of switching via fabric you record number of packets which left the box. What is your N and uS ? Do you subtract BFD packets which enter and leave on the same line card both ingress and egress ? Monitoring drops is much easier if we are dealing with platforms which is honest in recording them. > You don't need ML/AI to find problems in your network, using algorithm > 'this counter which increments at rate X stopped incrementing or > started to increment 100 times slower' Well the way I read Adam's note was that learning this rate X is what he (IMHO correctly) calls ML :) Cheers, R. On Tue, Jan 28, 2020 at 8:45 AM Saku Ytti wrote: > On Mon, 27 Jan 2020 at 22:30, wrote: > > > Then nowadays there's also the possibility to enable tons upon tons of > streaming telemetry -where I could see it all landing in a common data lake > where some form of deep convolutional neural networks could be used for > unsupervised pattern/feature learning, -reason being I'd like the system to > tell me look if this counter is high and that one too and this is low then > this usually happens. But I'd rather wait to see what the industry offers > in this area than developing such solutions internally. For now I'm glad I > have automation projects going, when I asked whether we should have AI in > network strategy for 2020 I got awkward silence in response. > > We should learn to crawl before we take rocket to proxima centauri. > > You don't need ML/AI to find problems in your network, using algorithm > 'this counter which increments at rate X stopped incrementing or > started to increment 100 times slower' and 'this counter which does > not increment, started to increment', and you'll find a lot of > problems in your network. But do you care about every problem in your > network, or only problems that customers care about? > > Juniper once in EBC had some really smart academics explaining us > their ML/AI project which predicts resource needs on a given system. > They quoted how close they got to real numbers then I asked how does > it perform against naive system, after explaining by naive system I > mean system like 'my box has 1M FIB entries so FIB entry uses > RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't > tried this and couldn't tell how well the ML/AI performs against this. > > Can you really train today ML/AI to determine what actually matters? I > don't think you can, because what actually matters is something that > impacted customer, and you simply cannot put enough learning data in, > you don't have nearly enough customer trouble tickets to be able to > correlate them to network data you're collecting and start predicting > which complex counter combinations are predicting customer ticket > later. > > But are you at least monitoring how many networks are lost inside your > network? Delta of input/output? That is fairly trivial to cover _all > reasons for packet loss_, of course latency/jitter are not covered, > but still, it covers alot of ground fast. Do you have a single system > where you collect all data? Have you enrichened the data stuff like > npu, linecard, city, country, region? Almost no one is doing even very > basic stuff, so I think ML/AI isn't going to be the low hanging fruit > any time soon. If you have a single system with lot of labels for > every counter, you can do a lot with very naive analytics. If you > don't have the data, you can't do anything with the smartest possible > system. And I think almost no one is collecting data in such a manner > that it's actually capitalisable, because we can keep running the > network with how how we did in 90s, IF-MIB and netflow, in separate > systems, with no encrichement at all. > > -- > ++ytti > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On Mon, 27 Jan 2020 at 22:30, wrote: > Then nowadays there's also the possibility to enable tons upon tons of > streaming telemetry -where I could see it all landing in a common data lake > where some form of deep convolutional neural networks could be used for > unsupervised pattern/feature learning, -reason being I'd like the system to > tell me look if this counter is high and that one too and this is low then > this usually happens. But I'd rather wait to see what the industry offers in > this area than developing such solutions internally. For now I'm glad I have > automation projects going, when I asked whether we should have AI in network > strategy for 2020 I got awkward silence in response. We should learn to crawl before we take rocket to proxima centauri. You don't need ML/AI to find problems in your network, using algorithm 'this counter which increments at rate X stopped incrementing or started to increment 100 times slower' and 'this counter which does not increment, started to increment', and you'll find a lot of problems in your network. But do you care about every problem in your network, or only problems that customers care about? Juniper once in EBC had some really smart academics explaining us their ML/AI project which predicts resource needs on a given system. They quoted how close they got to real numbers then I asked how does it perform against naive system, after explaining by naive system I mean system like 'my box has 1M FIB entries so FIB entry uses RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't tried this and couldn't tell how well the ML/AI performs against this. Can you really train today ML/AI to determine what actually matters? I don't think you can, because what actually matters is something that impacted customer, and you simply cannot put enough learning data in, you don't have nearly enough customer trouble tickets to be able to correlate them to network data you're collecting and start predicting which complex counter combinations are predicting customer ticket later. But are you at least monitoring how many networks are lost inside your network? Delta of input/output? That is fairly trivial to cover _all reasons for packet loss_, of course latency/jitter are not covered, but still, it covers alot of ground fast. Do you have a single system where you collect all data? Have you enrichened the data stuff like npu, linecard, city, country, region? Almost no one is doing even very basic stuff, so I think ML/AI isn't going to be the low hanging fruit any time soon. If you have a single system with lot of labels for every counter, you can do a lot with very naive analytics. If you don't have the data, you can't do anything with the smartest possible system. And I think almost no one is collecting data in such a manner that it's actually capitalisable, because we can keep running the network with how how we did in 90s, IF-MIB and netflow, in separate systems, with no encrichement at all. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
> From: Robert Raszuk > Sent: Sunday, January 26, 2020 10:18 PM > > Hi Adam, > > I would almost agree entirely with you except that there are two completely > different reasons for automation. > > One as you described is related to service provisioning - here we have full > agreement. > > The other one is actually of keeping your network running. Imagine router > maintaining entire control plane perfectly fine, imagine BFD working fine to > the box from peers but dropping between line cards via fabric from 20% to > 80% traffic. Unfortunately this is not a theory but real world :( > Very good point Robert, There are indeed two parts to the whole automation story (it' obvious that this theme deserve a series of blog posts, but I keep on finding excuses). The analogy I usually use in presentations is the left brain right brain analogy, Where left brain is responsible for logical thinking and right brain is responsible for creative thinking and intuition. So a complete automation solution is built similarly: Left brain is responsible for routine automated service provisioning - and contains models of resources, services, devices, workflows, policies -and you can teach it by loading new/additional models. Right brain on the other hand is responsible for "self-driving" the network (yeah I know can't think of better term) - and collects data from network and acts on distributed policies, and also performs trending, analytics, correlation, arbitration etc... Now left brain and right brain talk to each other obviously, Policies are defined in left brain and distributed to right brain to act on them. Also right brain can trigger workflows in left brain. Major paradigm shift for our service designers here will be that they are now going to be responsible not only for putting the individual service building blocks together in term of config (and service lifecycle workflow -tbd), but also in terms of policies - determining the health of the provisioned service (including thresholds, post-checks, ongoing checks etc...) But following the MDE (Model driven Engineering) theme it's not just service designers contributing to the policy library, it's Ops teams, Security teams, etc... Main advantage is see is that some of the policies that will be created for the soon to be automated service certification testing could then be reused for the particular service provisioning post-test and service lifecycle monitoring and vice versa. Then obviously there are policies defining what to do in various DDoS scenarios, and I consider the vendor solutions actually doing analytics, correlation, arbitration all part of the left brain). > Without proper automation in place going way above basic IGP, BGP, LDP, > BFD etc ... you need a bit of clever automation to detect it and either alarm > noc or if they are really smart take such router out of the SPF network wide. > If not you sit and wait till pissed customers call - which is already a > failure. > Then nowadays there's also the possibility to enable tons upon tons of streaming telemetry -where I could see it all landing in a common data lake where some form of deep convolutional neural networks could be used for unsupervised pattern/feature learning, -reason being I'd like the system to tell me look if this counter is high and that one too and this is low then this usually happens. But I'd rather wait to see what the industry offers in this area than developing such solutions internally. For now I'm glad I have automation projects going, when I asked whether we should have AI in network strategy for 2020 I got awkward silence in response. > Sure not everyone needs to be great coder ... but having network eng with > skills sufficient enough to understand code, ability to debug it or at min > design functional blocks of the automation routines are really must have > today. > I don't know, my experience is that working in tandem with a devops person (as opposed to trying to figure it myself) gets me the desired results much faster (and in line with whatever their sys-architecture guidelines or coding principles are) while I can focus on WHAT (from the network perspective) not HOW (coding/system perspective). Although yes for some of the POC stuff I wish I had some coding skills. But to give you a concrete example from my work, when I had a choice to read some python books or some more microservice architecture books I chose the latter as it was more important for me to know the difference between for instance orchestration and choreography among other aspects of microservice architectures to assess the pros and cons of each in order to make an educated argument for the service workflow engine architecture choice - so it lines up with what I had in mind for service layer workflows flexibility/agility. > And I am not even mentioning about all of the new OEM platforms with OS > coming from completely different part of the
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On Mon, 27 Jan 2020 at 00:18, Robert Raszuk wrote: > The other one is actually of keeping your network running. Imagine router > maintaining entire control plane perfectly fine, imagine BFD working fine to > the box from peers but dropping between line cards via fabric from 20% to 80% > traffic. Unfortunately this is not a theory but real world :( > > Without proper automation in place going way above basic IGP, BGP, LDP, BFD > etc ... you need a bit of clever automation to detect it and either alarm noc > or if they are really smart take such router out of the SPF network wide. If > not you sit and wait till pissed customers call - which is already a failure. Automation and monitoring to me are a very different subjects. Everyone has war stories of those long tail problems when something utterly weird is happening in the network and how problematic it was to find. But this particular example is fairly easy, either you are polling drop counter which shows the drops or your packets in - packets out+drop delta is off. But there will always be massive amount of long tail risks which your nms won't know about, things break in a very creative and complex ways. And you can monitor these very carefully, you can screenscrape all NPU counters and your network is behaving _right now_ suboptimally, you see NPU exceptions/trapstats increasing which should not and you can spend months figuring out 1 issue out of hundred you have, all of which are real issues, but which might affect one packet in a billion. Is it worth knowing these? We are screenscraping and graphing all NPU counters, as these typically are not available in GUI in case of JunOS they are not even modelled because they are PFE counters. We rarely proactively tend to them, because fixing them causes more outages than letting them be. But often when strange issues do happen at scale which customers care about, these counters reduce MTTR. So if you think you don't have active issues, you're not monitoring well enough. When you do monitor well enough you have to decide which issues to fix and which to let be. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
Hi Adam, I would almost agree entirely with you except that there are two completely different reasons for automation. One as you described is related to service provisioning - here we have full agreement. The other one is actually of keeping your network running. Imagine router maintaining entire control plane perfectly fine, imagine BFD working fine to the box from peers but dropping between line cards via fabric from 20% to 80% traffic. Unfortunately this is not a theory but real world :( Without proper automation in place going way above basic IGP, BGP, LDP, BFD etc ... you need a bit of clever automation to detect it and either alarm noc or if they are really smart take such router out of the SPF network wide. If not you sit and wait till pissed customers call - which is already a failure. Sure not everyone needs to be great coder ... but having network eng with skills sufficient enough to understand code, ability to debug it or at min design functional blocks of the automation routines are really must have today. And I am not even mentioning about all of the new OEM platforms with OS coming from completely different part of the world :) That's when the real fun starts and rubber hits the road when network eng can not run gdb on a daily basis. Cheers, Robert. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
> Mark Tinka > Sent: Friday, January 24, 2020 2:32 PM > > On 24/Jan/20 12:10, Saku Ytti wrote: > > > In my opinion we do roughly the same thing, the same way in networks, > > with the same protocols since my start of career in 90s, very little > > has changed and you could drop competent neteng from 90s to today and > > they'd be immediately productive. Compare this to what has happened to > > compute the difference is striking. > > Agreed - but is it really enough to the extent that the common buzz sentence > nowadays is "Network engineers are dead, they'll all be replaced by software > [developers]"? > > I mean, I'd wager that more than half of the problems you find with > automation and tooling development is a total lack of protocol between > software developers and network engineers; in the same company. While > there has been plenty of success with a software developer reading a > networking-related RFC and writing code for that without needing to > understand, really, how IP/MPLS networks work, it's a whole other issue > trying to teach a network engineer how to write code, or a software > developer what IS-IS actually does. > You nailed it Mark, My opinion is that this new NetDevOps/NetOps initiative is the biggest blunder of the networking industry. If as a network engineer/architect you have some coding skills well good for you, But are programming skills a requirement to get into network engineering/architecture nowadays - that absolutely should not be the case. We need skilled network engineers and architects to know how to build and operate complex networks We need skilled developers and system architects to know how to build and operate complex systems (including network automation systems) We need these two groups to be able to talk to each other in a constructive manner - check out Model-driven engineering to get you started. Following these 3 simple premises you can then afford to have an army of web-ui clickers - provisioning network services not knowing the first thing about what' going on in the background of the network automation system. Or not, and you just handover the web-ui/API to your customers and have them self-service. Imagine a case where network engineer builds an automation solution based on number of hacks involving ansible, python, ydk, whatever... and this solution gets traction and is used by the company. Now that poor networking guy has a full-time job supporting the automation solution, fixing bugs, developing new functionality and you just lost one network engineer. This is a good example of jack of all trades but master of none. Even if you're a small operation or a start-up hiring a developer and make him talk to the network engineer in a virtual team is a much better option. > > People who think that netconf and yang are solving big problems and > > are key to solve automation probably haven't done much automation. > > Totally agreed. But to also be fair, NETCONF/YANG are normally being touted > by vendors (much like Segment Routing, 5G and SD-WAN, but I digress). I've > not really found actual operators with anything meaningful and useful to say > about NETCONF/YANG. > > Raise your hands if I'm talking nonesense. > > For us, we find this whole NETCONF/YANG thing to be too heavy for simple > instructions you need to send to devices, not to mention the fact that > support within and between vendors is questionable (FlowSpec, anyone?). > > I mean, that's why Ansible was so pleasing to our fingertips - all you need is > SSH and a large-enough, repetitive problem you want to go away quickly. > Don't judge the book by its cover, in other words just give NETCONF and YANG a try, seriously. I'd say that NETCONF's biggest advantage over SNMP/CLI is it's transaction mechanism particularly atomicity and consistency ("all or nothing" and "all at once") from the full ACID, but all these are addressed by all NOS-es supporting two stage commit via CLI (As Saku mentioned below), so not a biggie. Sorry Mark XE is not one of those NOS-es, but you could still get the functionality on XE using NETCONF ;) YANG on the other hand gives one a common modelling language for representing services layer configuration and network layer configuration, which I find useful. But I'm a minority, I guess there aren't many of you using RFC8299 & RFC8466 as bases for decomposing your L2 and L3 services and building a service abstraction layer, on top of network configuration layer, so YMMV. > > Roughly netconf is new snmp and yang is new mib, what ever they enable > > could have been enabled by existing protocols decades ago, the > > advantages are modest and will remain so. > > Completely agreed! > Regarding SNMP vs NETCONF similarities, For pulling operational data yes, for pushing configuration not really... see ACID above. > > The key enabler for > > automation is device accepting arbitrary new B config when it is > > running arbitrary new A config and transition
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On 24/Jan/20 12:10, Saku Ytti wrote: > In my opinion we do roughly the same thing, the same way in networks, > with the same protocols since my start of career in 90s, very little > has changed and you could drop competent neteng from 90s to today and > they'd be immediately productive. Compare this to what has happened to > compute the difference is striking. Agreed - but is it really enough to the extent that the common buzz sentence nowadays is "Network engineers are dead, they'll all be replaced by software [developers]"? I mean, I'd wager that more than half of the problems you find with automation and tooling development is a total lack of protocol between software developers and network engineers; in the same company. While there has been plenty of success with a software developer reading a networking-related RFC and writing code for that without needing to understand, really, how IP/MPLS networks work, it's a whole other issue trying to teach a network engineer how to write code, or a software developer what IS-IS actually does. I can't remember if I gave this example here before, but I know of a network operator in Vienna who had to scramble and get their engineer trained on CLI when they'd been setting up peering sessions fine for 3 years via a GUI, and when the GUI and automation front-end all went to hell, that network engineer didn't know how to fall back to simple CLI to setup even simpler BGP sessions for peering, by hand. While clicking on GUI's is great, I don't have confidence that a network of any decent scale can be ran, today, without some form of CLI jockeying. And on the back of that, do we want to kill off the basics of a network engineer in favour of Day 1 university graduates eager to click a GUI button when provisioning your backbone, and they don't actually understand what the "Wide Metric" checkbox actually means? > People who think that netconf and yang are solving big problems and > are key to solve automation probably haven't done much automation. Totally agreed. But to also be fair, NETCONF/YANG are normally being touted by vendors (much like Segment Routing, 5G and SD-WAN, but I digress). I've not really found actual operators with anything meaningful and useful to say about NETCONF/YANG. Raise your hands if I'm talking nonesense. For us, we find this whole NETCONF/YANG thing to be too heavy for simple instructions you need to send to devices, not to mention the fact that support within and between vendors is questionable (FlowSpec, anyone?). I mean, that's why Ansible was so pleasing to our fingertips - all you need is SSH and a large-enough, repetitive problem you want to go away quickly. > Roughly netconf is new snmp and yang is new mib, what ever they enable > could have been enabled by existing protocols decades ago, the > advantages are modest and will remain so. Completely agreed! > The key enabler for > automation is device accepting arbitrary new B config when it is > running arbitrary new A config and transition there hitlessly. > Generating full new config from DB+template is trivial problem, trying > to be aware of network state and move from arbitrary state A to > arbitrary state B with minimal amount of changes is hard and > unnecessary problem. I tend to agree with you, Saku. What I've heard (from the vendors, again) is that Ansible is not great because you don't inherently get state confirmation feedback after posting the new configuration, and that adding that intelligence into Ansible requires time and energy to code. Okay, fair point, I'll bite. But also, we are network engineers - we know what commands do when they run, and we've spent decades building templates from as simple as a Windows Notepad text to as complex as a MySQL database. Then again, Terraform is meant to fix that downside of Ansible, but for me, I don't really see that as a big issue. We aren't trying to provision services across network domains (despite what MEF's LSO architecture will have you believe), and even if we were, do I really want you fiddling in my network. We each know our networks better than outsiders know them, so what gives? > If/when network becomes more cloudified, more as-a-service, where you > use API to turn up your own active devices and circuits where you > want, when you want, instead of owning anything and once those > proprietary APIs get some subset standard APIs we'll probably start to > see openstack, kubernetes type of complexity explosion in networks > too. MEF's LSO, which they've been pushing since about 2014. The concept is sexy, but honestly, I've not heard much ado in 6 years re: real-world deployment. Also, while I'm wild enough to be one of the first maniacs to run a network-wide Route Reflector on a VM on a server in 2014, you won't find me deploying said RR's in AWS or Azure, so I can access them over some API into an Openstack/Kubernetes/Docker enclosure. Life is too interesting enough as it is :-). > But as long
Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
On Fri, 24 Jan 2020 at 10:33, Mark Tinka wrote: > Since about 2012, every time we've felt we've come close to finding an In my opinion we do roughly the same thing, the same way in networks, with the same protocols since my start of career in 90s, very little has changed and you could drop competent neteng from 90s to today and they'd be immediately productive. Compare this to what has happened to compute the difference is striking. > My 1+1 assessment of all of these issues is, I believe, down to the fact > that the industry wants to automate in an open standards manner, where People who think that netconf and yang are solving big problems and are key to solve automation probably haven't done much automation. Roughly netconf is new snmp and yang is new mib, what ever they enable could have been enabled by existing protocols decades ago, the advantages are modest and will remain so. The key enabler for automation is device accepting arbitrary new B config when it is running arbitrary new A config and transition there hitlessly. Generating full new config from DB+template is trivial problem, trying to be aware of network state and move from arbitrary state A to arbitrary state B with minimal amount of changes is hard and unnecessary problem. If/when network becomes more cloudified, more as-a-service, where you use API to turn up your own active devices and circuits where you want, when you want, instead of owning anything and once those proprietary APIs get some subset standard APIs we'll probably start to see openstack, kubernetes type of complexity explosion in networks too. But as long as we keep owning the network most will keep running it CLI jjockey network, touch when you must, but in many cases no one touches it for weeks or months. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp