Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-27 Thread Saku Ytti
On Mon, 27 Jan 2020 at 22:30,  wrote:

> Then nowadays there's also the possibility to enable tons upon tons of 
> streaming telemetry -where I could see it all landing in a common data lake 
> where some form of deep convolutional neural networks could be used for 
> unsupervised pattern/feature learning, -reason being I'd like the system to 
> tell me look if this counter is high and that one too and this is low then 
> this usually happens. But I'd rather wait to see what the industry offers in 
> this area than developing such solutions internally. For now I'm glad I have 
> automation projects going, when I asked whether we should have AI in network 
> strategy for 2020 I got awkward silence in response.

We should learn to crawl before we take rocket to proxima centauri.

You don't need ML/AI to find problems in your network, using algorithm
'this counter which increments at rate X stopped incrementing or
started to increment 100 times slower' and 'this counter which does
not increment, started to increment', and you'll find a lot of
problems in your network. But do you care about every problem in your
network, or only problems that customers care about?

Juniper once in EBC had some really smart academics explaining us
their ML/AI project which predicts resource needs on a given system.
They quoted how close they got to real numbers then I asked how does
it perform against naive system, after explaining by naive system I
mean system like 'my box has 1M FIB entries so FIB entry uses
RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't
tried this and couldn't tell how well the ML/AI performs against this.

Can you really train today ML/AI to determine what actually matters? I
don't think you can, because what actually matters is something that
impacted customer, and you simply cannot put enough learning data in,
you don't have nearly enough customer trouble tickets to be able to
correlate them to network data you're collecting and start predicting
which complex counter combinations are predicting customer ticket
later.

But are you at least monitoring how many networks are lost inside your
network? Delta of input/output? That is fairly trivial to cover _all
reasons for packet loss_, of course latency/jitter are not covered,
but still, it covers alot of ground fast. Do you have a single system
where you collect all data? Have you enrichened the data stuff like
npu, linecard, city, country, region? Almost no one is doing even very
basic stuff, so I think ML/AI isn't going to be the low hanging fruit
any time soon. If you have a single system with lot of labels for
every counter, you can do a lot with very naive analytics. If you
don't have the data, you can't do anything with the smartest possible
system. And I think almost no one is collecting data in such a manner
that it's actually capitalisable, because we can keep running the
network with how how we did in 90s, IF-MIB and netflow, in separate
systems, with no encrichement at all.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-27 Thread adamv0025
> From: Robert Raszuk 
> Sent: Sunday, January 26, 2020 10:18 PM
> 
> Hi Adam,
> 
> I would almost agree entirely with you except that there are two completely
> different reasons for automation.
> 
> One as you described is related to service provisioning - here we have full
> agreement.
> 
> The other one is actually of keeping your network running. Imagine router
> maintaining entire control plane perfectly fine, imagine BFD working fine to
> the box from peers but dropping between line cards via fabric from 20% to
> 80% traffic. Unfortunately this is not a theory but real world :(
> 
Very good point Robert,
There are indeed two parts to the whole automation story 
(it' obvious that this theme deserve a series of blog posts, but I keep on 
finding excuses).

The analogy I usually use in presentations is the left brain right brain 
analogy,
Where left brain is responsible for logical thinking  and right brain is 
responsible for creative thinking and intuition.
So a complete automation solution is built similarly:
Left brain is responsible for routine automated service provisioning 
- and contains models of resources, services, devices, workflows, policies -and 
you can teach it by loading new/additional models.
Right brain on the other hand is responsible for "self-driving" the network 
(yeah I know can't think of better term)
- and collects data from network and acts on distributed policies, and also 
performs trending, analytics, correlation, arbitration etc...  
Now left brain and right brain talk to each other obviously,
Policies are defined in left brain and distributed to right brain to act on 
them.
Also right brain can trigger workflows in left brain. 

Major paradigm shift for our service designers here will be that they are now 
going to be responsible not only for putting the individual service building 
blocks together in term of config (and service lifecycle workflow -tbd), but 
also in terms of policies - determining the health of the provisioned service 
(including thresholds, post-checks, ongoing checks etc...)
But following the MDE (Model driven Engineering) theme it's not just service 
designers contributing to the policy library, it's Ops teams, Security teams, 
etc...
Main advantage is see is that some of the policies that will be created for the 
soon to be automated service certification testing could then be reused for the 
particular service provisioning post-test and service lifecycle monitoring and 
vice versa.
Then obviously there are policies defining what to do in various DDoS 
scenarios, and I consider the vendor solutions actually doing analytics, 
correlation, arbitration all part of the left brain).
 
> Without proper automation in place going way above basic IGP, BGP, LDP,
> BFD etc ... you need a bit of clever automation to detect it and either alarm
> noc or if they are really smart take such router out of the SPF network wide.
> If not you sit and wait till pissed customers call - which is already a 
> failure.
> 
Then nowadays there's also the possibility to enable tons upon tons of 
streaming telemetry -where I could see it all landing in a common data lake 
where some form of deep convolutional neural networks could be used for 
unsupervised pattern/feature learning, -reason being I'd like the system to 
tell me look if this counter is high and that one too and this is low then this 
usually happens. But I'd rather wait to see what the industry offers in this 
area than developing such solutions internally. For now I'm glad I have 
automation projects going, when I asked whether we should have AI in network 
strategy for 2020 I got awkward silence in response. 


> Sure not everyone needs to be great coder ... but having network eng with
> skills sufficient enough to understand code, ability to debug it or at min
> design functional blocks of the automation routines are really must have
> today.
> 
I don't know, my experience is that working in tandem with a devops person (as 
opposed to trying to figure it myself) gets me the desired results much faster 
(and in line with whatever their sys-architecture guidelines or coding 
principles are) while I can focus on WHAT (from the network perspective) not 
HOW (coding/system perspective). Although yes for some of the POC stuff I wish 
I had some coding skills. 
But to give you a concrete example from my work, when I had a choice to read 
some python books or some more microservice architecture books I chose the 
latter as it was more important for me to know the difference between for 
instance orchestration and choreography among other aspects of microservice 
architectures to assess the pros and cons of each in order to make an educated 
argument for the service workflow engine architecture choice - so it lines up 
with what I had in mind for service layer workflows flexibility/agility.  

> And I am not even mentioning about all of the new OEM platforms with OS
> coming from completely different part of the wor

Re: [j-nsp] arp from correct IP address

2020-01-27 Thread Andrey Kostin
Interesting. I have observed a while ago that "preferred" doesn't work 
for IPv6. Opened TAC case and eventually was told that "it doesn't work 
for IPv6". Turns out that it's also broken for IPv4, but we do PPPoE, so 
DHCP is running only for IPv6, so didn't get into IPv4 issue. The 
workaround in my case was to use broadband loopback address as primary, 
thanks that it's not so critical as IPv4 primary loopback.
As we are looking into possible IPoE implementation for some services, 
thanks for heads up.


Kind regards,
Andrey Kostin

Baldur Norddahl писал 2020-01-27 00:24:
Yes subscriber management has a lot of small but important things that 
are
not quite "done". Juniper should put on a task force to get all the 
bugs

sorted out. Could be a great system if they allow it to be.

For me the trouble with this is that without functioning ARP the 
customer
becomes "MAC locked". If he wants to upgrade his equipment, he has to 
call
us so we can clear his session. We have two routers and sometimes a 
user

somehow manages to register with different MAC addresses on the two.
Needless to say that creates a lot of trouble that will not sort itself
out. With functioning ARP I believe the wrong MAC address would be
corrected soon enough without intervention.

I wish I could just have a user defined radius variable and use that
instead of $junos-preferred-source-address. My script that generates 
that
radius configuration could easily calculate the correct source address 
and

program that in with the other radius variables for each user.

I am not creating a JTAC case on this before I have a fix for my other 
JTAC
cases (IPv6 is broken, dynamic VLAN with IP demux on top is broken, 
DHCP

combined with non-DHCP is likely also broken). So far I got IPv4 fixed
(access-internal routes ignored, work around use access routes), so 
they do

work on the problems I report.

Regards,

Baldur


Den man. 27. jan. 2020 kl. 04.53 skrev Chris Kawchuk 
:



Ran into the same bug.

$junos-preffered-source-address for an unnumbered for BNG functions 
does
NOT return the "closest/must suitable address" based on the IP+Subnet 
that
was given the subscriber... contrary to the BNG template 
doucmentation. It
just defaults the actual loopback of the router. (the dynamic template 
that
gets created against a demux0. subscriber says $preffered of 
"NONE")


This means that things like Subscriber "ARP liveliness detection" 
doesn't
work/cant work. (since the subscriber won't arp-respond to an ARP 
requests

where the source isn't in the local subnet)

I've had a JTAC case open on this for 8 months. Sent full configs, 
built a
full lab for them (so they could trigger it remotely), self full 
PCAPs.


MX204 + JunOS 18.3R + BNG (DHCP/IPoE naturally)

Also on MX80 w/same code - so it's the BNG code, not the platform 
doing it.


- Ck.




On 25 Jan 2020, at 10:27 pm, Baldur Norddahl  
wrote:


Hello

I have a problem where some customer routers refuse to reply to arp 
from

our juniper mx204. The arp will look like this:

11:57:46.934484 Out arp who-has 185.24.169.60 tell 185.24.168.248

The problem is that this should have been "tell 185.24.169.1" because 
the

client is in the 185.24.169.0/24 subnet. The interface is
"unnumbered-address lo0.1" with lo0.1 having both 185.24.168.248 and
185.24.169.1 among many others. A Linux box would select the nearest
address but apparently junos does not know how to do this.

Tried adding in "preferred-source-address 
$junos-preferred-source-address"
but this just results in "preferred-source-address NONE" and does 
nothing.
Also there is zero documentation on how junos will fill in that 
variable.


Is there a solution to this? Is there a radius variable I can set with 
the

preferred source address?

Regards,

Baldur
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp




___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Juniper support offline?

2020-01-27 Thread Nathan Ward
Yep - Mid November.

> On 27/01/2020, at 10:02 PM,  
>  wrote:
> 
> Hi,
> 
>> Prsearch disappeared what, Jan 2019.. this year will it be case management 
>> and download access we lose for almost a year?
> 
> FYI, PR search came back a long time ago sometime late 2019.
> 
> -Aaron
> 
> 
> Jan 26, 2020, 08:30 by c...@ip4.de:
> 
>> Everything works here again - without resetting.
>> Looks like it’s recovering
>> 
>> 
>> 
>> Von meinem iPhone gesendet
>> 
>>> Am 26.01.2020 um 17:08 schrieb Ross Halliday 
>>> :
>>> 
>>> Seems to be back albeit a bit rocky - I could not get in with my previous 
>>> password and had to reset. SRM and Downloads appear to be functioning
>>> 
>>> Ross
>>> 
>>> 
>>> -Original Message-
>>> From: juniper-nsp  On Behalf Of Thomas 
>>> Scott
>>> Sent: January 26, 2020 6:14 AM
>>> To: Nathan Ward 
>>> Cc: Juniper NSP 
>>> Subject: Re: [j-nsp] Juniper support offline?
>>> 
>>> Just got off the phone with JTAC and was informed the website was down for
>>> "maintenance". We managed to open a case last night, but were unable to
>>> this morning... I'm hoping that the "issue" doesn't last too long..
>>> 
>>> Phone number I called was 1-888-314-5822.
>>> 
>>> - Thomas Scott | mr.thomas.sc...@gmail.com  
>>> 
>>> 
>> On Sun, Jan 26, 2020 at 5:28 AM Nathan Ward  
>> wrote:
>> 
 Hi,
 The published number on the Juniper website - 0080025864737 doesn’t work,
 and the +1888 US number went to Juniper, but to voicemail.
 I’ve called the number Liam has posted here (same as above but with +
 rather than 00), which worked - the agent had told me that there was
 maintenance yesterday and now there is no way to get copies of images
 apparently, even JTAC.
 I was told that there is no ETA for login being restored. All they can do
 is open a case so I get notified if and when it’s fixed. (Yeah, the agent
 really said “if and when”).
 Prsearch disappeared what, Jan 2019.. this year will it be case management
 and download access we lose for almost a year?
 On the off chance someone has them and is able to share, I need packages
 for 18.2R3-S1 for MX204 (so, VMHost), and 18.4R2-S2 for QFX5120.
 Those are the JTAC recommended versions, so I imagine they’ll be knocking
 about on plenty of hard drives..
 Luckily, checksums are still visible on the public site :-)
 
>> On 26/01/2020, at 8:22 PM, Liam Farr  wrote:
>> 
> I just messaged some local at Juniper NZ and they advised that
> 
 +80025864737 is working for support.
 
> Seems to work from my 2D mobile here too.
> Cheers
> Liam
> On Sun, 26 Jan 2020 at 8:16 PM, Nathan Ward  
 > wrote:
 
> Hi,
> Looks to me and colleagues of mine like Juniper support is offline.
> Last night, I was able to log in but trying to download an image got to
> 
 some stage of the redirect process and hung, then a please try again later
 message. It persisted for the next few hours of me trying every now and
 then.
 
> Today, I can’t log in at all - Invalid user/password.
> Password reset process works, but, still doesn’t let me in. Different
> 
 browsers, cleared cache, all the usual “is it on at the wall sir” 
 debugging.
 
> Hearing the same story for others.
> I’ve called both the NZ 00800 (international 800) and the US +1888
> 
 number. The former says “call cannot be completed”. The US number says
 “high volume of calls please leave a message”.
 
> We’re in New Zealand - unsure if that’s relevant.
> Are others having these same issues?
> Any insight in to what’s going on?
> It’s a long weekend here, so the local sales/SE/etc. folks I usually
> 
 deal with are likely not anywhere near their phones.
 
> --
> Nathan Ward
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net  
 juniper-nsp@puck.nether.net>
 
> https://puck.nether.net/mailman/listinfo/juniper-nsp <
> 
 https://puck.nether.net/mailman/listinfo/juniper-nsp>
 
> --
> Kind Regards
> Liam Farr
> Maxum Data
> +64-9-950-5302
> 
 ___
 juniper-nsp mailing list juniper-nsp@puck.nether.net
 https://puck.nether.net/mailman/listinfo/juniper-nsp
 
>>> ___
>>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>> ___
>>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>> 
>> 
>> ___
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>> 
> 
> ___

Re: [j-nsp] Juniper support offline?

2020-01-27 Thread aaron
Hi,

>Prsearch disappeared what, Jan 2019.. this year will it be case management and 
>download access we lose for almost a year?

FYI, PR search came back a long time ago sometime late 2019.

-Aaron


Jan 26, 2020, 08:30 by c...@ip4.de:

> Everything works here again - without resetting.
> Looks like it’s recovering
>
>
>
> Von meinem iPhone gesendet
>
>> Am 26.01.2020 um 17:08 schrieb Ross Halliday 
>> :
>>
>> Seems to be back albeit a bit rocky - I could not get in with my previous 
>> password and had to reset. SRM and Downloads appear to be functioning
>>
>> Ross
>>
>>
>> -Original Message-
>> From: juniper-nsp  On Behalf Of Thomas 
>> Scott
>> Sent: January 26, 2020 6:14 AM
>> To: Nathan Ward 
>> Cc: Juniper NSP 
>> Subject: Re: [j-nsp] Juniper support offline?
>>
>> Just got off the phone with JTAC and was informed the website was down for
>> "maintenance". We managed to open a case last night, but were unable to
>> this morning... I'm hoping that the "issue" doesn't last too long..
>>
>> Phone number I called was 1-888-314-5822.
>>
>> - Thomas Scott | mr.thomas.sc...@gmail.com  
>>
>>
> On Sun, Jan 26, 2020 at 5:28 AM Nathan Ward  wrote:
>
>>> Hi,
>>> The published number on the Juniper website - 0080025864737 doesn’t work,
>>> and the +1888 US number went to Juniper, but to voicemail.
>>> I’ve called the number Liam has posted here (same as above but with +
>>> rather than 00), which worked - the agent had told me that there was
>>> maintenance yesterday and now there is no way to get copies of images
>>> apparently, even JTAC.
>>> I was told that there is no ETA for login being restored. All they can do
>>> is open a case so I get notified if and when it’s fixed. (Yeah, the agent
>>> really said “if and when”).
>>> Prsearch disappeared what, Jan 2019.. this year will it be case management
>>> and download access we lose for almost a year?
>>> On the off chance someone has them and is able to share, I need packages
>>> for 18.2R3-S1 for MX204 (so, VMHost), and 18.4R2-S2 for QFX5120.
>>> Those are the JTAC recommended versions, so I imagine they’ll be knocking
>>> about on plenty of hard drives..
>>> Luckily, checksums are still visible on the public site :-)
>>>
> On 26/01/2020, at 8:22 PM, Liam Farr  wrote:
>
 I just messaged some local at Juniper NZ and they advised that

>>> +80025864737 is working for support.
>>>
 Seems to work from my 2D mobile here too.
 Cheers
 Liam
 On Sun, 26 Jan 2020 at 8:16 PM, Nathan Ward >>>
>>> > wrote:
>>>
 Hi,
 Looks to me and colleagues of mine like Juniper support is offline.
 Last night, I was able to log in but trying to download an image got to

>>> some stage of the redirect process and hung, then a please try again later
>>> message. It persisted for the next few hours of me trying every now and
>>> then.
>>>
 Today, I can’t log in at all - Invalid user/password.
 Password reset process works, but, still doesn’t let me in. Different

>>> browsers, cleared cache, all the usual “is it on at the wall sir” debugging.
>>>
 Hearing the same story for others.
 I’ve called both the NZ 00800 (international 800) and the US +1888

>>> number. The former says “call cannot be completed”. The US number says
>>> “high volume of calls please leave a message”.
>>>
 We’re in New Zealand - unsure if that’s relevant.
 Are others having these same issues?
 Any insight in to what’s going on?
 It’s a long weekend here, so the local sales/SE/etc. folks I usually

>>> deal with are likely not anywhere near their phones.
>>>
 --
 Nathan Ward
 ___
 juniper-nsp mailing list juniper-nsp@puck.nether.net >>>
>>> juniper-nsp@puck.nether.net>
>>>
 https://puck.nether.net/mailman/listinfo/juniper-nsp <

>>> https://puck.nether.net/mailman/listinfo/juniper-nsp>
>>>
 --
 Kind Regards
 Liam Farr
 Maxum Data
 +64-9-950-5302

>>> ___
>>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>>
>> ___
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>> ___
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

2020-01-27 Thread Saku Ytti
On Mon, 27 Jan 2020 at 00:18, Robert Raszuk  wrote:

> The other one is actually of keeping your network running. Imagine router 
> maintaining entire control plane perfectly fine, imagine BFD working fine to 
> the box from peers but dropping between line cards via fabric from 20% to 80% 
> traffic. Unfortunately this is not a theory but real world :(
>
> Without proper automation in place going way above basic IGP, BGP, LDP, BFD 
> etc ... you need a bit of clever automation to detect it and either alarm noc 
> or if they are really smart take such router out of the SPF network wide. If 
> not you sit and wait till pissed customers call - which is already a failure.

Automation and monitoring to me are a very different subjects.
Everyone has war stories of those long tail problems when something
utterly weird is happening in the network and how problematic it was
to find. But this particular example is fairly easy, either you are
polling drop counter which shows the drops or your packets in -
packets out+drop delta is off.
But there will always be massive amount of long tail risks which your
nms won't know about, things break in a very creative and complex
ways. And you can monitor these very carefully, you can screenscrape
all NPU counters and your network is behaving _right now_
suboptimally, you see NPU exceptions/trapstats increasing which should
not and you can spend months figuring out 1 issue out of hundred you
have, all of which are real issues, but which might affect one packet
in a billion.
Is it worth knowing these? We are screenscraping and graphing all NPU
counters, as these typically are not available in GUI in case of JunOS
they are not even modelled because they are PFE counters. We rarely
proactively tend to them, because fixing them causes more outages than
letting them be. But often when strange issues do happen at scale
which customers care about, these counters reduce MTTR.
So if you think you don't have active issues, you're not monitoring
well enough. When you do monitor well enough you have to decide which
issues to fix and which to let be.


-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp