subject:"Re\: \[j\-nsp\] Automation \- The Skinny \(Was\: Re\: ACX5448 \& ACX710\)"

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-02-02 Thread Mark Tinka




On 28/Jan/20 09:45, Saku Ytti wrote:


> We should learn to crawl before we take rocket to proxima centauri.

Agreed!


>
> You don't need ML/AI to find problems in your network, using algorithm
> 'this counter which increments at rate X stopped incrementing or
> started to increment 100 times slower' and 'this counter which does
> not increment, started to increment', and you'll find a lot of
> problems in your network. But do you care about every problem in your
> network, or only problems that customers care about?

Agreed!


>
> Juniper once in EBC had some really smart academics explaining us
> their ML/AI project which predicts resource needs on a given system.
> They quoted how close they got to real numbers then I asked how does
> it perform against naive system, after explaining by naive system I
> mean system like 'my box has 1M FIB entries so FIB entry uses
> RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't
> tried this and couldn't tell how well the ML/AI performs against this.
>
> Can you really train today ML/AI to determine what actually matters? I
> don't think you can, because what actually matters is something that
> impacted customer, and you simply cannot put enough learning data in,
> you don't have nearly enough customer trouble tickets to be able to
> correlate them to network data you're collecting and start predicting
> which complex counter combinations are predicting customer ticket
> later.

Agreed!

>
> But are you at least monitoring how many networks are lost inside your
> network? Delta of input/output? That is fairly trivial to cover _all
> reasons for packet loss_, of course latency/jitter are not covered,
> but still, it covers alot of ground fast. Do you have a single system
> where you collect all data? Have you enrichened the data stuff like
> npu, linecard, city, country, region? Almost no one is doing even very
> basic stuff, so I think ML/AI isn't going to be the low hanging fruit
> any time soon. If you have a single system with lot of labels for
> every counter, you can do a lot with very naive analytics. If you
> don't have the data, you can't do anything with the smartest possible
> system. And I think almost no one is collecting data in such a manner
> that it's actually capitalisable, because we can keep running the
> network with how how we did in 90s, IF-MIB and netflow, in separate
> systems, with no encrichement at all.

Agreed!

For us, between Iris (a South African-written NMS), Kentik and Blue
Planet ROA (formerly Packet Design) gives us plenty of insight into what
our network is doing, what it did, and what it may do. We pay Iris for
their NMS, and this gives us quite a bit of flexibility in what we can
monitor and alert, provided there is way we can get the data off the box.

We don't believe spending too much time and effort in building ML/AI
engines will solve a real problem in our specific network, today.
Tomorrow, maybe.

I'd rather spend time upgrading Iris to support telemetry streaming, as
this has immediate and tangible benefits.

Mark.

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-28 Thread Mark Tinka




On 27/Jan/20 22:30, adamv0...@netconsultings.com wrote:

> Very good point Robert,
> There are indeed two parts to the whole automation story 
> (it' obvious that this theme deserve a series of blog posts, but I keep on 
> finding excuses).
>
> The analogy I usually use in presentations is the left brain right brain 
> analogy,
> Where left brain is responsible for logical thinking  and right brain is 
> responsible for creative thinking and intuition.
> So a complete automation solution is built similarly:
> Left brain is responsible for routine automated service provisioning 
> - and contains models of resources, services, devices, workflows, policies 
> -and you can teach it by loading new/additional models.
> Right brain on the other hand is responsible for "self-driving" the network 
> (yeah I know can't think of better term)
> - and collects data from network and acts on distributed policies, and also 
> performs trending, analytics, correlation, arbitration etc...  
> Now left brain and right brain talk to each other obviously,
> Policies are defined in left brain and distributed to right brain to act on 
> them.
> Also right brain can trigger workflows in left brain. 
>
> Major paradigm shift for our service designers here will be that they are now 
> going to be responsible not only for putting the individual service building 
> blocks together in term of config (and service lifecycle workflow -tbd), but 
> also in terms of policies - determining the health of the provisioned service 
> (including thresholds, post-checks, ongoing checks etc...)
> But following the MDE (Model driven Engineering) theme it's not just service 
> designers contributing to the policy library, it's Ops teams, Security teams, 
> etc...
> Main advantage is see is that some of the policies that will be created for 
> the soon to be automated service certification testing could then be reused 
> for the particular service provisioning post-test and service lifecycle 
> monitoring and vice versa.
> Then obviously there are policies defining what to do in various DDoS 
> scenarios, and I consider the vendor solutions actually doing analytics, 
> correlation, arbitration all part of the left brain).

Not to sound silly, but you're taking me back to where I was right
around the time Cisco decided to pick up Tail-f :-).


> Then nowadays there's also the possibility to enable tons upon tons of 
> streaming telemetry -where I could see it all landing in a common data lake 
> where some form of deep convolutional neural networks could be used for 
> unsupervised pattern/feature learning, -reason being I'd like the system to 
> tell me look if this counter is high and that one too and this is low then 
> this usually happens. But I'd rather wait to see what the industry offers in 
> this area than developing such solutions internally.

For me, this makes a lot of sense. I'm happy to support standardization
of telemetry streaming (box vendor) and decoding (NMS vendor) because
that enhances the NMS capabilities, which takes away a lot of the
corner-case issues Saku highlighted (well, that's the hope).


>  For now I'm glad I have automation projects going, when I asked whether we 
> should have AI in network strategy for 2020 I got awkward silence in 
> response. 

I'm not even going to touch the ML/AI pole :-).


> I don't know, my experience is that working in tandem with a devops person 
> (as opposed to trying to figure it myself) gets me the desired results much 
> faster (and in line with whatever their sys-architecture guidelines or coding 
> principles are) while I can focus on WHAT (from the network perspective) not 
> HOW (coding/system perspective). Although yes for some of the POC stuff I 
> wish I had some coding skills. 
> But to give you a concrete example from my work, when I had a choice to read 
> some python books or some more microservice architecture books I chose the 
> latter as it was more important for me to know the difference between for 
> instance orchestration and choreography among other aspects of microservice 
> architectures to assess the pros and cons of each in order to make an 
> educated argument for the service workflow engine architecture choice - so it 
> lines up with what I had in mind for service layer workflows 
> flexibility/agility.  

Totally agree with you there.

Network engineers should stop feeling the pressure about needing to
become software heads. I can tell you now, where we are with getting
software to solve of our operational problems, network engineers are
going to be in huge demand. Let's just not ruin the pot by teaching them
"GUI is the absolute answer" the moment they leave the university gates
and enter the real world. It's been hard enough disabusing them of
"Class A, Class B, Class C".


> Well we are starting to get a glimpse of it already with VM of a 
> Route-Reflector running on a server - who owns the host (HW & SW) is it 
> sys-admins or ip-ops, which Mark

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-28 Thread Mark Tinka

On 27/Jan/20 00:18, Robert Raszuk wrote:

>
> Without proper automation in place going way above basic IGP, BGP,
> LDP, BFD etc ... you need a bit of clever automation to detect it and
> either alarm noc or if they are really smart take such router out of
> the SPF network wide. If not you sit and wait till pissed customers
> call - which is already a failure.

So we use Packet Design (now Ciena Blue Planet) for stuff like this.
Works like a treat :-).

We also have a dear NMS that can be told what to look for and notify the
team.

If NOC's are sitting and waiting for customers to get pissed off,
automation is going to make your problems worse, and not better.

Mark.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-28 Thread Mark Tinka

On 26/Jan/20 22:46, adamv0...@netconsultings.com wrote:

> You nailed it Mark,
> My opinion is that this new NetDevOps/NetOps initiative is the biggest
> blunder of the networking industry.
> If as a network engineer/architect you have some coding skills well good for
> you,
> But are programming skills a requirement to get into network
> engineering/architecture nowadays - that absolutely should not be the case. 
> We need skilled network engineers and architects to know how to build and
> operate complex networks 
> We need skilled developers and system architects to know how to build and
> operate complex systems (including network automation systems)
> We need these two groups to be able to talk to each other in a constructive
> manner - check out Model-driven engineering to get you started.

Totally agreed.

> Following these 3 simple premises you can then afford to have an army of
> web-ui clickers - provisioning network services not knowing the first thing
> about what' going on in the background of the network automation system. Or
> not, and you just handover the web-ui/API to your customers and have them
> self-service.  

Which is fine for me, as long as there are still some real network
engineers at the company that can make sure the network runs when the
automation system bombs out.

>
> Imagine a case where network engineer builds an automation solution based on
> number of hacks involving ansible, python, ydk, whatever... and this
> solution gets traction and is used by the company.
> Now that poor networking guy has a full-time job supporting the automation
> solution, fixing bugs, developing new functionality and you just lost one
> network engineer. This is a good example of jack of all trades but master of
> none.
> Even if you're a small operation or a start-up hiring a developer and make
> him talk to the network engineer in a virtual team is a much better option.

Agreed - but we, generally, have to start from somewhere. And if you are
going to start small with Ansible, I've found that network engineers
will do that better with limited sysadmin experience than trying to get
the sysadmin to stand up Ansible and get it to talk to routers. Over
time, if its successful, you can farm out bits about Ansible to the
software team that best suits their skills.

I'm not for the idea that network engineers are obsolete and the only
way to run an IP/MPLS network, over the next decade, is by giving it to
the software heads.

> Don't judge the book by its cover, in other words just give NETCONF and YANG
> a try, seriously.
>
> I'd say that NETCONF's biggest advantage over SNMP/CLI is it's transaction
> mechanism particularly atomicity and consistency ("all or nothing" and "all
> at once") from the full ACID, but all these are addressed by all NOS-es
> supporting two stage commit via CLI (As Saku mentioned below), so not a
> biggie. 
> Sorry Mark XE is not one of those NOS-es, but you could still get the
> functionality on XE using NETCONF ;)  
>
> YANG on the other hand gives one a common modelling language for
> representing services layer configuration and network layer configuration,
> which I find useful.
> But I'm a minority, I guess there aren't many of you using RFC8299 & RFC8466
> as bases for decomposing your L2 and L3 services and building a service
> abstraction layer, on top of network configuration layer, so YMMV.

I believed a lot in NETCONF/YANG in the middle of this past decade, and
actually insisted solutions support it. But as I've said before, over
time, you get to learn how to sniff the smell in the air, and while a
lot of those solutions paint a good picture, for our particular
use-case, it just felt like a whole lot of complexity for the 2 simple
things we wanted to achieve first - customer service
provisioning/de-provisioning + network deployment.

Which is not to say that NETCONF/YANG have no use-case. Down the line,
if what we are doing with Ansible becomes overly complex to require
models based on NETCONF/YANG, happy to consider. But to get off the
ground, I feel they are too heavy. I don't want to design overly
elaborate data models - we know what it takes to deploy or run a device;
we've been doing it for years. That doesn't change regardless of if it's
being done by humans in a semi- or fully hands-off approach.

The last 10 years have been bogged down by trying to figure how much
automation to do, when to do it and how. In the end, we are still in the
same place, and even more confused. If starting off slow with Ansible is
okay for me for the next decade, I'm good with that.

> I'd say there are different levels of automation, 
> At the entry level the aim might be just faster CLI interaction/simpler CLI
> scraping (Ansible and the like) and at the other extreme there's the full
> potential of model based engineering realized with frameworks like ONAP. 
> And operators naturally find themselves somewhere between these two poles in
> their automation efforts

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-28 Thread Saku Ytti

On Tue, 28 Jan 2020 at 11:19, Robert Raszuk  wrote:

> So at t0+N I record how many packets entered my system. (We are already at 
> loss here as RE can generate packets unless you add to this RE outbound 
> packets). Then at t0+N+uS (uS) delta of switching via fabric you record 
> number of packets which left the box.
>
> What is your N and uS ?

You're not gonna get it 1:1, you will monitor the delta rate you see
and react when the delta rate increases. Of course you can keep tuning
this, by adding more and more drop counters to reduce known delta
rate, but you always have to accept you can't explain it perfectly.
But not every small issue is an important issue. Certainly your fabric
lost 30% would have been blatantly obvious even in most naive such
system.

> > You don't need ML/AI to find problems in your network, using algorithm
> > 'this counter which increments at rate X stopped incrementing or
> > started to increment 100 times slower'
>
> Well the way I read Adam's note was that learning this rate X is what he 
> (IMHO correctly) calls ML :)

What I mean current_rate = X, if now_rate > X*100 or now_rate < X/100,
no ML, just stupid static comparison of dramatic rate change. And even
this is advanced by today's standard. Even counter rate went to 0 from
non-zero or went to non-zero from 0 exposes lot of real issues, but
issues which happen so rarely customers are not complaining about
them.

Particular example, all of us have some ip checksum errors in the
network, when it's on edge router, edge interface ingress direction
you can ignore it 'someone elses problem', but we also see it in other
interface/direction where it means we flipped bits somewhere and
calculated correct FCS over the broken data, i.e. we have broken
memory somewhere. But it probably isn't broken enough to matter, it
probably mangles packets rather rarely.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-28 Thread Robert Raszuk

> And I think almost no one is collecting data in such a manner
> that it's actually capitalisable, because we can keep running the
> network with how how we did in 90s, IF-MIB and netflow, in separate
> systems, with no encrichement at all.

Spot on !

Btw Saku - you keep suggesting measuring delta of input/output ... well to
do it well I am afraid it is not trivial.

So at t0+N I record how many packets entered my system. (We are already at
loss here as RE can generate packets unless you add to this RE outbound
packets). Then at t0+N+uS (uS) delta of switching via fabric you record
number of packets which left the box.

What is your N and uS ?

Do you subtract BFD packets which enter and leave on the same line card
both ingress and egress ?

Monitoring drops is much easier if we are dealing with platforms which is
honest in recording them.

> You don't need ML/AI to find problems in your network, using algorithm
> 'this counter which increments at rate X stopped incrementing or
> started to increment 100 times slower'

Well the way I read Adam's note was that learning this rate X is what he
(IMHO correctly) calls ML :)

Cheers,
R.





On Tue, Jan 28, 2020 at 8:45 AM Saku Ytti  wrote:

> On Mon, 27 Jan 2020 at 22:30,  wrote:
>
> > Then nowadays there's also the possibility to enable tons upon tons of
> streaming telemetry -where I could see it all landing in a common data lake
> where some form of deep convolutional neural networks could be used for
> unsupervised pattern/feature learning, -reason being I'd like the system to
> tell me look if this counter is high and that one too and this is low then
> this usually happens. But I'd rather wait to see what the industry offers
> in this area than developing such solutions internally. For now I'm glad I
> have automation projects going, when I asked whether we should have AI in
> network strategy for 2020 I got awkward silence in response.
>
> We should learn to crawl before we take rocket to proxima centauri.
>
> You don't need ML/AI to find problems in your network, using algorithm
> 'this counter which increments at rate X stopped incrementing or
> started to increment 100 times slower' and 'this counter which does
> not increment, started to increment', and you'll find a lot of
> problems in your network. But do you care about every problem in your
> network, or only problems that customers care about?
>
> Juniper once in EBC had some really smart academics explaining us
> their ML/AI project which predicts resource needs on a given system.
> They quoted how close they got to real numbers then I asked how does
> it perform against naive system, after explaining by naive system I
> mean system like 'my box has 1M FIB entries so FIB entry uses
> RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't
> tried this and couldn't tell how well the ML/AI performs against this.
>
> Can you really train today ML/AI to determine what actually matters? I
> don't think you can, because what actually matters is something that
> impacted customer, and you simply cannot put enough learning data in,
> you don't have nearly enough customer trouble tickets to be able to
> correlate them to network data you're collecting and start predicting
> which complex counter combinations are predicting customer ticket
> later.
>
> But are you at least monitoring how many networks are lost inside your
> network? Delta of input/output? That is fairly trivial to cover _all
> reasons for packet loss_, of course latency/jitter are not covered,
> but still, it covers alot of ground fast. Do you have a single system
> where you collect all data? Have you enrichened the data stuff like
> npu, linecard, city, country, region? Almost no one is doing even very
> basic stuff, so I think ML/AI isn't going to be the low hanging fruit
> any time soon. If you have a single system with lot of labels for
> every counter, you can do a lot with very naive analytics. If you
> don't have the data, you can't do anything with the smartest possible
> system. And I think almost no one is collecting data in such a manner
> that it's actually capitalisable, because we can keep running the
> network with how how we did in 90s, IF-MIB and netflow, in separate
> systems, with no encrichement at all.
>
> --
>   ++ytti
>
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-27 Thread Saku Ytti

On Mon, 27 Jan 2020 at 22:30,  wrote:

> Then nowadays there's also the possibility to enable tons upon tons of 
> streaming telemetry -where I could see it all landing in a common data lake 
> where some form of deep convolutional neural networks could be used for 
> unsupervised pattern/feature learning, -reason being I'd like the system to 
> tell me look if this counter is high and that one too and this is low then 
> this usually happens. But I'd rather wait to see what the industry offers in 
> this area than developing such solutions internally. For now I'm glad I have 
> automation projects going, when I asked whether we should have AI in network 
> strategy for 2020 I got awkward silence in response.

We should learn to crawl before we take rocket to proxima centauri.

You don't need ML/AI to find problems in your network, using algorithm
'this counter which increments at rate X stopped incrementing or
started to increment 100 times slower' and 'this counter which does
not increment, started to increment', and you'll find a lot of
problems in your network. But do you care about every problem in your
network, or only problems that customers care about?

Juniper once in EBC had some really smart academics explaining us
their ML/AI project which predicts resource needs on a given system.
They quoted how close they got to real numbers then I asked how does
it perform against naive system, after explaining by naive system I
mean system like 'my box has 1M FIB entries so FIB entry uses
RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't
tried this and couldn't tell how well the ML/AI performs against this.

Can you really train today ML/AI to determine what actually matters? I
don't think you can, because what actually matters is something that
impacted customer, and you simply cannot put enough learning data in,
you don't have nearly enough customer trouble tickets to be able to
correlate them to network data you're collecting and start predicting
which complex counter combinations are predicting customer ticket
later.

But are you at least monitoring how many networks are lost inside your
network? Delta of input/output? That is fairly trivial to cover _all
reasons for packet loss_, of course latency/jitter are not covered,
but still, it covers alot of ground fast. Do you have a single system
where you collect all data? Have you enrichened the data stuff like
npu, linecard, city, country, region? Almost no one is doing even very
basic stuff, so I think ML/AI isn't going to be the low hanging fruit
any time soon. If you have a single system with lot of labels for
every counter, you can do a lot with very naive analytics. If you
don't have the data, you can't do anything with the smartest possible
system. And I think almost no one is collecting data in such a manner
that it's actually capitalisable, because we can keep running the
network with how how we did in 90s, IF-MIB and netflow, in separate
systems, with no encrichement at all.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-27 Thread adamv0025

> From: Robert Raszuk 
> Sent: Sunday, January 26, 2020 10:18 PM
> 
> Hi Adam,
> 
> I would almost agree entirely with you except that there are two completely
> different reasons for automation.
> 
> One as you described is related to service provisioning - here we have full
> agreement.
> 
> The other one is actually of keeping your network running. Imagine router
> maintaining entire control plane perfectly fine, imagine BFD working fine to
> the box from peers but dropping between line cards via fabric from 20% to
> 80% traffic. Unfortunately this is not a theory but real world :(
> 
Very good point Robert,
There are indeed two parts to the whole automation story 
(it' obvious that this theme deserve a series of blog posts, but I keep on 
finding excuses).

The analogy I usually use in presentations is the left brain right brain 
analogy,
Where left brain is responsible for logical thinking  and right brain is 
responsible for creative thinking and intuition.
So a complete automation solution is built similarly:
Left brain is responsible for routine automated service provisioning 
- and contains models of resources, services, devices, workflows, policies -and 
you can teach it by loading new/additional models.
Right brain on the other hand is responsible for "self-driving" the network 
(yeah I know can't think of better term)
- and collects data from network and acts on distributed policies, and also 
performs trending, analytics, correlation, arbitration etc...  
Now left brain and right brain talk to each other obviously,
Policies are defined in left brain and distributed to right brain to act on 
them.
Also right brain can trigger workflows in left brain. 

Major paradigm shift for our service designers here will be that they are now 
going to be responsible not only for putting the individual service building 
blocks together in term of config (and service lifecycle workflow -tbd), but 
also in terms of policies - determining the health of the provisioned service 
(including thresholds, post-checks, ongoing checks etc...)
But following the MDE (Model driven Engineering) theme it's not just service 
designers contributing to the policy library, it's Ops teams, Security teams, 
etc...
Main advantage is see is that some of the policies that will be created for the 
soon to be automated service certification testing could then be reused for the 
particular service provisioning post-test and service lifecycle monitoring and 
vice versa.
Then obviously there are policies defining what to do in various DDoS 
scenarios, and I consider the vendor solutions actually doing analytics, 
correlation, arbitration all part of the left brain).
 
> Without proper automation in place going way above basic IGP, BGP, LDP,
> BFD etc ... you need a bit of clever automation to detect it and either alarm
> noc or if they are really smart take such router out of the SPF network wide.
> If not you sit and wait till pissed customers call - which is already a 
> failure.
> 
Then nowadays there's also the possibility to enable tons upon tons of 
streaming telemetry -where I could see it all landing in a common data lake 
where some form of deep convolutional neural networks could be used for 
unsupervised pattern/feature learning, -reason being I'd like the system to 
tell me look if this counter is high and that one too and this is low then this 
usually happens. But I'd rather wait to see what the industry offers in this 
area than developing such solutions internally. For now I'm glad I have 
automation projects going, when I asked whether we should have AI in network 
strategy for 2020 I got awkward silence in response. 


> Sure not everyone needs to be great coder ... but having network eng with
> skills sufficient enough to understand code, ability to debug it or at min
> design functional blocks of the automation routines are really must have
> today.
> 
I don't know, my experience is that working in tandem with a devops person (as 
opposed to trying to figure it myself) gets me the desired results much faster 
(and in line with whatever their sys-architecture guidelines or coding 
principles are) while I can focus on WHAT (from the network perspective) not 
HOW (coding/system perspective). Although yes for some of the POC stuff I wish 
I had some coding skills. 
But to give you a concrete example from my work, when I had a choice to read 
some python books or some more microservice architecture books I chose the 
latter as it was more important for me to know the difference between for 
instance orchestration and choreography among other aspects of microservice 
architectures to assess the pros and cons of each in order to make an educated 
argument for the service workflow engine architecture choice - so it lines up 
with what I had in mind for service layer workflows flexibility/agility.  

> And I am not even mentioning about all of the new OEM platforms with OS
> coming from completely different part of the

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-27 Thread Saku Ytti

On Mon, 27 Jan 2020 at 00:18, Robert Raszuk  wrote:

> The other one is actually of keeping your network running. Imagine router 
> maintaining entire control plane perfectly fine, imagine BFD working fine to 
> the box from peers but dropping between line cards via fabric from 20% to 80% 
> traffic. Unfortunately this is not a theory but real world :(
>
> Without proper automation in place going way above basic IGP, BGP, LDP, BFD 
> etc ... you need a bit of clever automation to detect it and either alarm noc 
> or if they are really smart take such router out of the SPF network wide. If 
> not you sit and wait till pissed customers call - which is already a failure.

Automation and monitoring to me are a very different subjects.
Everyone has war stories of those long tail problems when something
utterly weird is happening in the network and how problematic it was
to find. But this particular example is fairly easy, either you are
polling drop counter which shows the drops or your packets in -
packets out+drop delta is off.
But there will always be massive amount of long tail risks which your
nms won't know about, things break in a very creative and complex
ways. And you can monitor these very carefully, you can screenscrape
all NPU counters and your network is behaving _right now_
suboptimally, you see NPU exceptions/trapstats increasing which should
not and you can spend months figuring out 1 issue out of hundred you
have, all of which are real issues, but which might affect one packet
in a billion.
Is it worth knowing these? We are screenscraping and graphing all NPU
counters, as these typically are not available in GUI in case of JunOS
they are not even modelled because they are PFE counters. We rarely
proactively tend to them, because fixing them causes more outages than
letting them be. But often when strange issues do happen at scale
which customers care about, these counters reduce MTTR.
So if you think you don't have active issues, you're not monitoring
well enough. When you do monitor well enough you have to decide which
issues to fix and which to let be.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-26 Thread Robert Raszuk

Hi Adam,

I would almost agree entirely with you except that there are two completely
different reasons for automation.

One as you described is related to service provisioning - here we have full
agreement.

The other one is actually of keeping your network running. Imagine router
maintaining entire control plane perfectly fine, imagine BFD working fine
to the box from peers but dropping between line cards via fabric from 20%
to 80% traffic. Unfortunately this is not a theory but real world :(

Without proper automation in place going way above basic IGP, BGP, LDP, BFD
etc ... you need a bit of clever automation to detect it and either alarm
noc or if they are really smart take such router out of the SPF network
wide. If not you sit and wait till pissed customers call - which is already
a failure.

Sure not everyone needs to be great coder ... but having network eng with
skills sufficient enough to understand code, ability to debug it or at min
design functional blocks of the automation routines are really must have
today.

And I am not even mentioning about all of the new OEM platforms with OS
coming from completely different part of the world :) That's when the real
fun starts and rubber hits the road when network eng can not run gdb on a
daily basis.

Cheers,
Robert.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-26 Thread adamv0025

> Mark Tinka
> Sent: Friday, January 24, 2020 2:32 PM
> 
> On 24/Jan/20 12:10, Saku Ytti wrote:
> 
> > In my opinion we do roughly the same thing, the same way in networks,
> > with the same protocols since my start of career in 90s, very little
> > has changed and you could drop competent neteng from 90s to today and
> > they'd be immediately productive. Compare this to what has happened to
> > compute the difference is striking.
> 
> Agreed - but is it really enough to the extent that the common buzz
sentence
> nowadays is "Network engineers are dead, they'll all be replaced by
software
> [developers]"?
> 
> I mean, I'd wager that more than half of the problems you find with
> automation and tooling development is a total lack of protocol between
> software developers and network engineers; in the same company. While
> there has been plenty of success with a software developer reading a
> networking-related RFC and writing code for that without needing to
> understand, really, how IP/MPLS networks work, it's a whole other issue
> trying to teach a network engineer how to write code, or a software
> developer what IS-IS actually does.
> 
You nailed it Mark,
My opinion is that this new NetDevOps/NetOps initiative is the biggest
blunder of the networking industry.
If as a network engineer/architect you have some coding skills well good for
you,
But are programming skills a requirement to get into network
engineering/architecture nowadays - that absolutely should not be the case. 
We need skilled network engineers and architects to know how to build and
operate complex networks 
We need skilled developers and system architects to know how to build and
operate complex systems (including network automation systems)
We need these two groups to be able to talk to each other in a constructive
manner - check out Model-driven engineering to get you started.
Following these 3 simple premises you can then afford to have an army of
web-ui clickers - provisioning network services not knowing the first thing
about what' going on in the background of the network automation system. Or
not, and you just handover the web-ui/API to your customers and have them
self-service.  

Imagine a case where network engineer builds an automation solution based on
number of hacks involving ansible, python, ydk, whatever... and this
solution gets traction and is used by the company.
Now that poor networking guy has a full-time job supporting the automation
solution, fixing bugs, developing new functionality and you just lost one
network engineer. This is a good example of jack of all trades but master of
none.
Even if you're a small operation or a start-up hiring a developer and make
him talk to the network engineer in a virtual team is a much better option.



> > People who think that netconf and yang are solving big problems and
> > are key to solve automation probably haven't done much automation.
> 
> Totally agreed. But to also be fair, NETCONF/YANG are normally being
touted
> by vendors (much like Segment Routing, 5G and SD-WAN, but I digress). I've
> not really found actual operators with anything meaningful and useful to
say
> about NETCONF/YANG.
> 
> Raise your hands if I'm talking nonesense.
> 
> For us, we find this whole NETCONF/YANG thing to be too heavy for simple
> instructions you need to send to devices, not to mention the fact that
> support within and between vendors is questionable (FlowSpec, anyone?).
> 
> I mean, that's why Ansible was so pleasing to our fingertips - all you
need is
> SSH and a large-enough, repetitive problem you want to go away quickly.
> 
Don't judge the book by its cover, in other words just give NETCONF and YANG
a try, seriously.

I'd say that NETCONF's biggest advantage over SNMP/CLI is it's transaction
mechanism particularly atomicity and consistency ("all or nothing" and "all
at once") from the full ACID, but all these are addressed by all NOS-es
supporting two stage commit via CLI (As Saku mentioned below), so not a
biggie. 
Sorry Mark XE is not one of those NOS-es, but you could still get the
functionality on XE using NETCONF ;)  

YANG on the other hand gives one a common modelling language for
representing services layer configuration and network layer configuration,
which I find useful.
But I'm a minority, I guess there aren't many of you using RFC8299 & RFC8466
as bases for decomposing your L2 and L3 services and building a service
abstraction layer, on top of network configuration layer, so YMMV.

> > Roughly netconf is new snmp and yang is new mib, what ever they enable
> > could have been enabled by existing protocols decades ago, the
> > advantages are modest and will remain so.
> 
> Completely agreed!
> 
Regarding SNMP vs NETCONF similarities, 
For pulling operational data yes, for pushing configuration not really...
see ACID above.  


> >  The key enabler for
> > automation is device accepting arbitrary new B config when it is
> > running arbitrary new A config and transition

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-24 Thread Mark Tinka

On 24/Jan/20 12:10, Saku Ytti wrote:

> In my opinion we do roughly the same thing, the same way in networks,
> with the same protocols since my start of career in 90s, very little
> has changed and you could drop competent neteng from 90s to today and
> they'd be immediately productive. Compare this to what has happened to
> compute the difference is striking.

Agreed - but is it really enough to the extent that the common buzz
sentence nowadays is "Network engineers are dead, they'll all be
replaced by software [developers]"?

I mean, I'd wager that more than half of the problems you find with
automation and tooling development is a total lack of protocol between
software developers and network engineers; in the same company. While
there has been plenty of success with a software developer reading a
networking-related RFC and writing code for that without needing to
understand, really, how IP/MPLS networks work, it's a whole other issue
trying to teach a network engineer how to write code, or a software
developer what IS-IS actually does.

I can't remember if I gave this example here before, but I know of a
network operator in Vienna who had to scramble and get their engineer
trained on CLI when they'd been setting up peering sessions fine for 3
years via a GUI, and when the GUI and automation front-end all went to
hell, that network engineer didn't know how to fall back to simple CLI
to setup even simpler BGP sessions for peering, by hand.

While clicking on GUI's is great, I don't have confidence that a network
of any decent scale can be ran, today, without some form of CLI
jockeying. And on the back of that, do we want to kill off the basics of
a network engineer in favour of Day 1 university graduates eager to
click a GUI button when provisioning your backbone, and they don't
actually understand what the "Wide Metric" checkbox actually means?

> People who think that netconf and yang are solving big problems and
> are key to solve automation probably haven't done much automation.

Totally agreed. But to also be fair, NETCONF/YANG are normally being
touted by vendors (much like Segment Routing, 5G and SD-WAN, but I
digress). I've not really found actual operators with anything
meaningful and useful to say about NETCONF/YANG.

Raise your hands if I'm talking nonesense.

For us, we find this whole NETCONF/YANG thing to be too heavy for simple
instructions you need to send to devices, not to mention the fact that
support within and between vendors is questionable (FlowSpec, anyone?).

I mean, that's why Ansible was so pleasing to our fingertips - all you
need is SSH and a large-enough, repetitive problem you want to go away
quickly.

> Roughly netconf is new snmp and yang is new mib, what ever they enable
> could have been enabled by existing protocols decades ago, the
> advantages are modest and will remain so.

Completely agreed!

>  The key enabler for
> automation is device accepting arbitrary new B config when it is
> running arbitrary new A config and transition there hitlessly.
> Generating full new config from DB+template is trivial problem, trying
> to be aware of network state and move from arbitrary state A to
> arbitrary state B with minimal amount of changes is hard and
> unnecessary problem.

I tend to agree with you, Saku. What I've heard (from the vendors,
again) is that Ansible is not great because you don't inherently get
state confirmation feedback after posting the new configuration, and
that adding that intelligence into Ansible requires time and energy to
code. Okay, fair point, I'll bite. But also, we are network engineers -
we know what commands do when they run, and we've spent decades building
templates from as simple as a Windows Notepad text to as complex as a
MySQL database.

Then again, Terraform is meant to fix that downside of Ansible, but for
me, I don't really see that as a big issue. We aren't trying to
provision services across network domains (despite what MEF's LSO
architecture will have you believe), and even if we were, do I really
want you fiddling in my network. We each know our networks better than
outsiders know them, so what gives?

> If/when network becomes more cloudified, more as-a-service, where you
> use API to turn up your own active devices and circuits where you
> want, when you want, instead of owning anything and once those
> proprietary APIs get some subset standard APIs we'll probably start to
> see openstack, kubernetes type of complexity explosion in networks
> too.

MEF's LSO, which they've been pushing since about 2014. The concept is
sexy, but honestly, I've not heard much ado in 6 years re: real-world
deployment.

Also, while I'm wild enough to be one of the first maniacs to run a
network-wide Route Reflector on a VM on a server in 2014, you won't find
me deploying said RR's in AWS or Azure, so I can access them over some
API into an Openstack/Kubernetes/Docker enclosure. Life is too
interesting enough as it is :-).

>  But as long

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-24 Thread Saku Ytti

On Fri, 24 Jan 2020 at 10:33, Mark Tinka  wrote:

> Since about 2012, every time we've felt we've come close to finding an

In my opinion we do roughly the same thing, the same way in networks,
with the same protocols since my start of career in 90s, very little
has changed and you could drop competent neteng from 90s to today and
they'd be immediately productive. Compare this to what has happened to
compute the difference is striking.

> My 1+1 assessment of all of these issues is, I believe, down to the fact
> that the industry wants to automate in an open standards manner, where

People who think that netconf and yang are solving big problems and
are key to solve automation probably haven't done much automation.
Roughly netconf is new snmp and yang is new mib, what ever they enable
could have been enabled by existing protocols decades ago, the
advantages are modest and will remain so. The key enabler for
automation is device accepting arbitrary new B config when it is
running arbitrary new A config and transition there hitlessly.
Generating full new config from DB+template is trivial problem, trying
to be aware of network state and move from arbitrary state A to
arbitrary state B with minimal amount of changes is hard and
unnecessary problem.

If/when network becomes more cloudified, more as-a-service, where you
use API to turn up your own active devices and circuits where you
want, when you want, instead of owning anything and once those
proprietary APIs get some subset standard APIs we'll probably start to
see openstack, kubernetes type of complexity explosion in networks
too. But as long as we keep owning the network most will keep running
it CLI jjockey network, touch when you must, but in many cases no one
touches it for weeks or months.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

13 matches

Site Navigation

Mail list logo

Footer information