Hi,

Just to correct:

I was saying that 62% of the polls timeout and that only 38% actually result in 
responses and those 38% responses take multiples of time longer to actually 
complete if polling on an in-line interface.

This is just with a simple bash script running "time check_interfaces <args>" 
from the Nagios-Tools package and doing hundreds of poll runs in a row with 
various pauses between pollings.

It would be a little less of a concern if any other product did this but the 
idea that they just sort of left it 62% broken and shipped it that way is 
really making me wonder what else only functions at 38%.

We don't have a huge budget and the ASR9902 costs almost twice as much as the 
Arista devices we would've preferred to buy [the Arista device in question has 
30x100GE ports and the ASR9902 is basically an 8x100GE router with a very 
poorly configured midplane/gearbox that ties into some sort of switch [that 
nobody seems to know how any of that works at Cisco, either].

If we had an unlimited budget we'd just mulligan this thing and buy the DCS 
devices that we want but we're stuck with it and if we're stuck with it I don't 
think it's insane to expect it to operate at least as well as an ASR9001.

Thanks,
-Drew




-----Original Message-----
From: Saku Ytti via NANOG <[email protected]> 
Sent: Friday, August 1, 2025 2:28 PM
To: North American Network Operators Group <[email protected]>
Cc: Saku Ytti <[email protected]>
Subject: Re: Cisco ASR9902 SNMP polling ... is interesting

On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <[email protected]> 
wrote:

> Also, non-management interfaces do packet processing in silicon at the ASIC 
> level and don’t have the capacity to do anything more than statistical 
> sampling of packets that require CPU-level processing to retrieve counters 
> and generate SNMP responses. 62 % is as good a sampling rate as any other.

Absolutely not. We expect to process 100% of legitimate control-plane traffic, 
e.g. BGP, ISIS, LDP, ARP, SNMP etc.

62% would be devastating.

In fair weather this is easy, in bad weather you need hardware based 
discrimination on what is expected good traffic and what is unexpected bad 
traffic.

Drew is in the right to expect functioning SNMP and is experiencing significant 
regression in behaviour compared to previous devices from the same vendor.


It would take a very long time to explain how to troubleshoot this, as it is an 
extremely complicated topic with a lot of nuance that even the best experts of 
Cisco are unaware of.  I've regularly had TAC handwave problems away 'sometimes 
it be like that' because they didn't want to do the work. Once our NOC spent 
months on a case where TAC was blaming our QoS configuration for BGP flaps, by 
the time I got on it, I escalated it to Xander, and initially even Xander 
agreed with TAC that we need to look into QoS configuration, until I reminded 
him that LPTS is not subject to QoS or ACL (which is terrible design choice, 
for reasons I'm happy to elaborate), which immediately reminded him how LPTS 
works and the TAC case finally got some traction.
This is a completely untenable situation, IOS-XR regularly has complicated 
problems that TAC is not equipped to solve and the expectation is that the user 
has deep enough knowledge to rebuff them.


--
  ++ytti
_______________________________________________
NANOG mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e=
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/[email protected]/message/F2466J65DSWXATIP7DWSXU6FDHFW7L6H/

Reply via email to