We have a few MX80s (MX80-48T) that we're looking to deploy in certain applications where they'll be taking full Internet tables (v4 and v6). We also have a need to gather flow data on our routers, and have noticed an interesting trend in the lab.
We are not using an MS-MIC currently. This test box is running 12.3R7.7 at the moment, but we've seen this same thing in 11.4 too. When set up with full Internet routes and sampling is enabled, each time a commit is made for any change at all, RPD and sampled take turns grinding the CPU up to 100%, for up to 5-10 minutes or more post-commit, and we see changes to BGP policy sometimes stall and take a decent amount of time (on the order of several minutes or more) to actually take effect. First RPD will climb up to almost 100% CPU utilization, chew it for a few minutes, then it'll go down and sampled will climb up to almost 100% for it's couple minutes turn and chew a bit. Then sampled goes back down and RPD takes back over to 100% for a few more minutes. Eventually it all finally calms back down and normalizes back to expected levels. Turn off sampling, and any CPU spikes post-commit are only on the order of seconds, not minutes, and any policy changes take effect pretty much immediately. We've seen this regardless of how flow is configured; we've configured flow with a "simple" config, as well as inline jflow, pretty much with the same results. We're curious if anyone's had any of these same problems with jflow killing the CPU on MX80s (yeah, I know these PPC boxes are pretty weak sisters), and if there's any fix beyond the usual "Doctor, it hurts when I do this, what should I do?". "Don't do that!". It's a nice feature, shame that using it seems to come with this heavy a price. As an aside, we also see a bit of a slowdown in the RIB/FIB learning/purging on BGP session turnup/reset, which we're well aware is a known issue with sampling enabled, so I won't be shocked if this is just "how it is". I'd love to be wrong. Here's our sampling config, quick and dirty, regular and inline jflow, in case we're missing something. "Normal" Sampling: router> show configuration forwarding-options sampling { input { rate 8192; run-length 0; max-packets-per-second 20000; } family inet { output { flow-server x.x.x.x { port xxxxx; version 5; } } } } router> show configuration interfaces xe-0/0/0 unit xxx { vlan-id xxx; family inet { sampling { input; output; } } Inline Jflow Sampling: router> show configuration forwarding-options sampling { instance { BLAH-INSTANCE { input { rate 5000; } family inet { output { flow-server x.x.x.x { port xxxx; autonomous-system-type origin; no-local-dump; version-ipfix { template { BLAH-TEMPLATE; } } } inline-jflow { source-address x.x.x.x; } } } } } } router> show configuration chassis tfeb { slot 0 { sampling-instance BLAH-INSTANCE; } } router> show configuration services flow-monitoring { version-ipfix { template BLAH-TEMPLATE { flow-active-timeout 10; flow-inactive-timeout 10; template-refresh-rate { packets 10000; seconds 10; } option-refresh-rate { packets 10000; seconds 10; } ipv4-template; } } } router> show configuration interfaces xe-0/0/0 unit xxx { vlan-id xxx; family inet { sampling { input; output; } } _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp