I'm not sure I have the minerals tbh. -Drew
-----Original Message----- From: Saku Ytti <[email protected]> Sent: Friday, August 8, 2025 9:55 AM To: North American Network Operators Group <[email protected]> Cc: LJ Wobker (lwobker) <[email protected]>; Marc Binderberger <[email protected]>; Drew Weaver <[email protected]> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting I would chase this further with Cisco, if you have the cycles. Often it pays dividends in the future to have a proper understanding of anatomy of the issue. So it's not purely for curiosity's sake. On Fri, 8 Aug 2025 at 16:51, Drew Weaver via NANOG <[email protected]> wrote: > > One other note I'd like to make on this just for future reference: > > The default for SNMP in LPTS on this platform is 300 (I'm assuming > that is 300pps) > > We aren't sending 300pps of SNMP traffic at this device so nothing should > have been policed by it. > > There might be an issue with how it's counting or it's duplicating packets. > > Anyway setting it to 500 made everything work properly. > > (We aren't sending 500pps of SNMP at the machine either). > > Thanks, > -Drew > > > -----Original Message----- > From: Drew Weaver via NANOG <[email protected]> > Sent: Friday, August 8, 2025 9:32 AM > To: 'North American Network Operators Group' <[email protected]> > Cc: 'LJ Wobker (lwobker)' <[email protected]>; 'Marc Binderberger' > <[email protected]>; Drew Weaver <[email protected]> > Subject: RE: Cisco ASR9902 SNMP polling ... is interesting > > I'm just replying here to let you know that this was "solved". > > lpts pifib hardware police > flow snmp rate 2000 > ! > > I want to point out that if you set it to it's max configuration value > (4294967295) it ignores it entirely even though IOS XR seems to know that > it's maximum for this hardware is 50000. > > It couldn't be bothered to simply set it to 50000 if you set it to the > configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey > we know the max for this platform is 50000 so we set it to 50000 but you > probably shouldn't be using 50000 for this value anyway" > It could be bothered to do absolutely nothing and silently reject the command > which made me laugh for about 5 minutes this morning. > > So thanks for that Cisco and more sincerely thank you to everyone that took > any time to try and assist me with this. > > I still would have preferred to just tell it what IP addresses to expect SNMP > traffic to come from and use that instead of a PPS policer but hey it's 2025 > and preferences are luxuries. > > -Drew > > > -----Original Message----- > From: Saku Ytti via NANOG <[email protected]> > Sent: Friday, August 8, 2025 3:34 AM > To: North American Network Operators Group <[email protected]> > Cc: LJ Wobker (lwobker) <[email protected]>; Marc Binderberger > <[email protected]>; Saku Ytti <[email protected]> > Subject: Re: Cisco ASR9902 SNMP polling ... is interesting > > On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG > <[email protected]> wrote: > > > Then why making these assumptions? Especially with XR - not your mom > > & dad IT box but for ISPs or IT departments - you could provide the > > mechanism and either "do nothing as default" or "block everything as > > default". And then provide documentation and service$$$ to the > > customers > > Because while Cisco can't dimension the box well, operators do an even worse > job at it. > > On cXR we had issues where occasionally LPTS would admit too much BGP, after > LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it > is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU > cycles it needed to service the LPTS admitted packets, causing XIPC to drop > packets. This meant a couple times a month we lost on some router 1/8th of > BGP speakers, and Cisco explicitly refused to fix it. They literally said > maybe it works better in eXR (it does). > The funny thing is, this CPU demand was created by BGP, so because XIPC > didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due > to flaps. If XIPC had had priority over BGP, the symptoms would have been > lessen. I pointed this out to Cisco, they agreed, but said they've previously > explored process priorities in cXR, but ended up having just more unstable > devices (unmanageable complexity for people to understand what the priorities > should be). > All this while pitching that RTOS is mandatory for carrier grade NOS, while > behind the scene nothing for said RTOS was used, it's just flat priority all > around. > > > Additionally LPTS is exclusively NPU level policer, if port1 congests some > policer, also port2 suffers, there isn't a more-specific fall-back policer > into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is > spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, > as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you > cannot a-priori ensure one port won't take out other ports. > There was an excessive flow tap, which could be used with success in this > scenario, but that feature was retired, because I guess someone in cisco who > knew why it was needed had left, and remaining people didn't understand its > use case and didn't want to carry the complexity. > > > All of these are actually solvable, you can deliver NOS where port1 in the > same NPU won't take down port2, out-of-the-box, without configuration. But it > requires deep understanding on what the platform can do, how it can do it, > and how the actual customer network works. > This person doesn't exist. > Cisco or Nokia cannot be even configured like this by an operator, Juniper > can be, but it's way too complicated for operators to do. > > So if you have a casual understanding how these devices work, you can bring > down any core devices no matter how it's protected from trivial size single > VPC DoS. Only reason the Internet works is because there isn't motivation to > break it, not because it is well protected. Which is fine, because the same > is true for personal safety, and focus should be on the motivation > mitigation, rather than absolute safety. > > Of course this thread isn't about protecting devices in bad weather, it is > about trying to make devices work in fair weather, which is a much more > reasonable ask. > > -- > ++ytti > _______________________________________________ > NANOG mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a > rchives_list_nanog-40lists.nanog.org_message_V56CX5TXE7MSA2NQR6WFFZQWS > WEDQCB5_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf > M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=JpBzXEAHGqhw7yYz2WYDniWSu1mY > KW1Hpnju_sjqO-Z5HFqV2hrVPk9ge-SMaqrk&s=78hSyv-0ZbBYSmiMoeY-ttfxJ9O_K8D > ab4hkaP-mlKk&e= _______________________________________________ > NANOG mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a > rchives_list_nanog-40lists.nanog.org_message_5QFU3TMPNYTRDQWGD6ZNYQSCG > 56J3YBH_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf > M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=CiPRK92BvloBNS51T81cJ1YPGgGm > fKkdKxEIYl46ZuxxUJtYYXIsrOu-aL7rBOoR&s=bcUoPtLvZA6z0yoTtxYOPYMn8MNceeJ > ugOEslPrbz6o&e= _______________________________________________ > NANOG mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a > rchives_list_nanog-40lists.nanog.org_message_ORJMBJRVNLLDAYU3SMOFOW34O > ABC7UOD_&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf > M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=g9V7cxKwbXhjWWffG8XudwAabSTr > kHWCrLcOhzztkzw5DkNw0QeIzeTn7DKk9e9p&s=pClfygoAgsC_PvS2a2Ni__FrYKh77ZK > SCIAmKiS2Jno&e= -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/[email protected]/message/IY25ESMOLAFDXMCU5AOW2KJA5Q22C4FH/
