RE: Cisco ASR9902 SNMP polling ... is interesting

Drew Weaver via NANOG Fri, 08 Aug 2025 06:58:53 -0700

I'm not sure I have the minerals tbh.

-Drew



-----Original Message-----
From: Saku Ytti <[email protected]> 
Sent: Friday, August 8, 2025 9:55 AM
To: North American Network Operators Group <[email protected]>
Cc: LJ Wobker (lwobker) <[email protected]>; Marc Binderberger 
<[email protected]>; Drew Weaver <[email protected]>
Subject: Re: Cisco ASR9902 SNMP polling ... is interesting

I would chase this further with Cisco, if you have the cycles.

Often it pays dividends in the future to have a proper understanding of anatomy 
of the issue. So it's not purely for curiosity's sake.


On Fri, 8 Aug 2025 at 16:51, Drew Weaver via NANOG <[email protected]> 
wrote:
>
> One other note I'd like to make on this just for future reference:
>
> The default for SNMP in LPTS on this platform is 300 (I'm assuming 
> that is 300pps)
>
> We aren't sending 300pps of SNMP traffic at this device so nothing should 
> have been policed by it.
>
> There might be an issue with how it's counting or it's duplicating packets.
>
> Anyway setting it to 500 made everything work properly.
>
> (We aren't sending 500pps of SNMP at the machine either).
>
> Thanks,
> -Drew
>
>
> -----Original Message-----
> From: Drew Weaver via NANOG <[email protected]>
> Sent: Friday, August 8, 2025 9:32 AM
> To: 'North American Network Operators Group' <[email protected]>
> Cc: 'LJ Wobker (lwobker)' <[email protected]>; 'Marc Binderberger' 
> <[email protected]>; Drew Weaver <[email protected]>
> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
>
> I'm just replying here to let you know that this was "solved".
>
> lpts pifib hardware police
>  flow snmp rate 2000
> !
>
> I want to point out that if you set it to it's max configuration value 
> (4294967295) it ignores it entirely even though IOS XR seems to know that 
> it's maximum for this hardware is 50000.
>
> It couldn't be bothered to simply set it to 50000 if you set it to the 
> configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey 
> we know the max for this platform is 50000 so we set it to 50000 but you 
> probably shouldn't be using 50000 for this value anyway"
> It could be bothered to do absolutely nothing and silently reject the command 
> which made me laugh for about 5 minutes this morning.
>
> So thanks for that Cisco and more sincerely thank you to everyone that took 
> any time to try and assist me with this.
>
> I still would have preferred to just tell it what IP addresses to expect SNMP 
> traffic to come from and use that instead of a PPS policer but hey it's 2025 
> and preferences are luxuries.
>
> -Drew
>
>
> -----Original Message-----
> From: Saku Ytti via NANOG <[email protected]>
> Sent: Friday, August 8, 2025 3:34 AM
> To: North American Network Operators Group <[email protected]>
> Cc: LJ Wobker (lwobker) <[email protected]>; Marc Binderberger 
> <[email protected]>; Saku Ytti <[email protected]>
> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
>
> On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG 
> <[email protected]> wrote:
>
> > Then why making these assumptions? Especially with XR - not your mom 
> > & dad IT box but for ISPs or IT departments - you could provide the 
> > mechanism and either "do nothing as default" or "block everything as 
> > default". And then provide documentation and service$$$ to the 
> > customers
>
> Because while Cisco can't dimension the box well, operators do an even worse 
> job at it.
>
> On cXR we had issues where occasionally LPTS would admit too much BGP, after 
> LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it 
> is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU 
> cycles it needed to service the LPTS admitted packets, causing XIPC to drop 
> packets. This meant a couple times a month we lost on some router 1/8th of 
> BGP speakers, and Cisco explicitly refused to fix it. They literally said 
> maybe it works better in eXR (it does).
> The funny thing is, this CPU demand was created by BGP, so because XIPC 
> didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due 
> to flaps. If XIPC had had priority over BGP, the symptoms would have been 
> lessen. I pointed this out to Cisco, they agreed, but said they've previously 
> explored process priorities in cXR, but ended up having just more unstable 
> devices (unmanageable complexity for people to understand what the priorities 
> should be).
> All this while pitching that RTOS is mandatory for carrier grade NOS, while 
> behind the scene nothing for said RTOS was used, it's just flat priority all 
> around.
>
>
> Additionally LPTS is exclusively NPU level policer, if port1 congests some 
> policer, also port2 suffers, there isn't a more-specific fall-back policer 
> into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is 
> spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, 
> as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you 
> cannot a-priori ensure one port won't take out other ports.
> There was an excessive flow tap, which could be used with success in this 
> scenario, but that feature was retired, because I guess someone in cisco who 
> knew why it was needed had left, and remaining people didn't understand its 
> use case and didn't want to carry the complexity.
>
>
> All of these are actually solvable, you can deliver NOS where port1 in the 
> same NPU won't take down port2, out-of-the-box, without configuration. But it 
> requires deep understanding on what the platform can do, how it can do it, 
> and how the actual customer network works.
> This person doesn't exist.
> Cisco or Nokia cannot be even configured like this by an operator, Juniper 
> can be, but it's way too complicated for operators to do.
>
> So if you have a casual understanding how these devices work, you can bring 
> down any core devices no matter how it's protected from trivial size single 
> VPC DoS. Only reason the Internet works is because there isn't motivation to 
> break it, not because it is well protected. Which is fine, because the same 
> is true for personal safety, and focus should be on the motivation 
> mitigation, rather than absolute safety.
>
> Of course this thread isn't about protecting devices in bad weather, it is 
> about trying to make devices work in fair weather, which is a much more 
> reasonable ask.
>
> --
>   ++ytti
> _______________________________________________
> NANOG mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a
> rchives_list_nanog-40lists.nanog.org_message_V56CX5TXE7MSA2NQR6WFFZQWS
> WEDQCB5_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf
> M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=JpBzXEAHGqhw7yYz2WYDniWSu1mY
> KW1Hpnju_sjqO-Z5HFqV2hrVPk9ge-SMaqrk&s=78hSyv-0ZbBYSmiMoeY-ttfxJ9O_K8D
> ab4hkaP-mlKk&e= _______________________________________________
> NANOG mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a
> rchives_list_nanog-40lists.nanog.org_message_5QFU3TMPNYTRDQWGD6ZNYQSCG
> 56J3YBH_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf
> M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=CiPRK92BvloBNS51T81cJ1YPGgGm
> fKkdKxEIYl46ZuxxUJtYYXIsrOu-aL7rBOoR&s=bcUoPtLvZA6z0yoTtxYOPYMn8MNceeJ
> ugOEslPrbz6o&e= _______________________________________________
> NANOG mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a
> rchives_list_nanog-40lists.nanog.org_message_ORJMBJRVNLLDAYU3SMOFOW34O
> ABC7UOD_&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf
> M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=g9V7cxKwbXhjWWffG8XudwAabSTr
> kHWCrLcOhzztkzw5DkNw0QeIzeTn7DKk9e9p&s=pClfygoAgsC_PvS2a2Ni__FrYKh77ZK
> SCIAmKiS2Jno&e=



--
  ++ytti
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/[email protected]/message/IY25ESMOLAFDXMCU5AOW2KJA5Q22C4FH/

RE: Cisco ASR9902 SNMP polling ... is interesting

Reply via email to