-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Friday, 1 August 2025 at 15:10, Drew Weaver via NANOG 
<[email protected]> wrote:
> 
> 
> Hello,

Hi Drew.

I haven't worked with IOS-XR for a few years but I have had problems with SNMP 
in the past.

A few years ago I was deploying 9904 chassis with a modest amount of services 
on them (not thousands of services per chassis, but hundreds, so they weren't 
idle, but certainly not under any mentionable load control-plane wise).

We noticed that SNMP polling was returning nothing for some of the services and 
it ended up being a couple of problems compounding. At that time we had 
virtually every 9xxx and 99xx chassis in the network. This problem only exists 
with these boxes, but they were also the only routers in the network with this 
exact combination of services on them. So nothing chassis specific I believe, 
this was on IOS-XR 6.something for reference.

When the SNMP process received the poll request, it in turn fires off requests 
internally to other processes to get the stats being asked for. This is/was 
(I'm out of touch now) a maximum amount of time SNMP would wait for the other 
processes to respond. If they didn't respond in time the SNMP response was sent 
without those details, or the query which was pending an answer was just 
dropped and no response sent. So problem number one was those other processes 
taking too long to respond.

Problem number two was those other processes had a bug; after provisioning 
services those processes hadn't pick up on the changes. The request came from 
the SNMP process to the other processes for stats relating to X, the other 
processes had no knowledge of X.

TAC provided us with a short term work around, which was to restart some 
processes after provisioning new services, to ensure the processes were aware 
of the new services and would respond to the SNMP process with the requested 
stats. Long term they created a DDTS and SMU to fix the inter-process timeout 
issues and missing stats issues.

I don't know exactly what you're polling, and like I said, I'm a bit out of 
touch here, but I can say that it took quite a lot of digging and working with 
TAC to bottom out the problem. We could replicate the issue in the lab which 
always helps. So if you can replicate the issue in the lab, and turn all 
debugging settings up to 11, you might be able to find something like we did 
(TAC sent some debug commands and we could trace the issue in the lab, IPC 
debgging is hard on these boxes!). Even if TAC are trying to fob you off by 
saying "oh yeah this is dropped by LTSP as expected", get them to prove it to 
you; replicate the issue in the lab and gather the debug info which shows 
how/where the request is being dropped, if they can't find the drop in LTPS, 
then LTPS isn't the problem and you need to look else were like IPC/EOBC.


Cheers,
James.

-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wsG5BAEBCgBtBYJojwuDCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0
aW9ucy5vcGVucGdwanMub3Jne6/4gXRiD1B/oyx0cm03xe+bPfK4lh4ErWip
GQvWH9oWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAlZAP/3DFVyR1e2DiJ7bv
4udRjmX0xLtEpkZM7UJGwhihiIiqW/JV+TyqEq75Ko4Hu9xOiOURkz+VkBx6
XfgbrFuXxPT/i4NhcMZ8qygSBwoAQK4Z6CIeXf9msWnly259hA5F88SB/oCc
LKOjcH6hNHVI2+5jSIMJFqNVkD/3b2eSIF3ZHbdWsZ+uq6QRMMvM7gOHuJAm
0mCiOBTUbN4oIziQdN0u3tbWVgIWulC2TyM8wy2FGyN+r5ks/jqmZQhlTASo
u+9kPtBZ4SQc0p9GwvYZN4XHXQtcftx7xrPymmXhwU+3UaE70YoSZuJVULE+
eGipYUDUiQ9OA9pj39BWZe6fpRLqgoeEl6GDiavHYLcfw3CVkMwThPUGDRFX
RDNxKpebdPEZHzsJyvqORgM+/RHYIAgqOOQIQdiZGbaiIxa8ooT06WJRkNWO
iKL2jOkXndbbxWenyw4RNZwVX50H1Y79eqUxhU24yiA0Wfs6qVCRZWP3M//g
a+BJwOBqb8gFmuJErvezWUPUNIt94UhEv8aFpVtPZ7R4IIpPzFBFlLUV4HEK
F5IU9JgqvyBagubAPeIOoUk0+DboE4gGBPTz9RGWSfdxM+D5pX/HWBh8qIwB
prO6hDk3PkkGAk4/fhd5jNmGk0hE0yKyTubE711vIJ9vXD1dJbqKgoOjSA18
t315dumB
=LkYJ
-----END PGP SIGNATURE-----

Attachment: publickey - [email protected] - 0x3E936359.asc.sig
Description: PGP signature

_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/[email protected]/message/LFEK3EROE2TNHT7KOSM5WMW5HXGR4LQL/

Reply via email to