Re: [j-nsp] A low number of Firewall filters reducing the bandwidth capacity.
On Tue, 10 Sept 2024 at 22:57, Timur Maryin via juniper-nsp wrote: > EA utilization monitoring might not be straightforward on a first look > But we have internal tools(script) which print data in nicely manner. > JTAC may be able to share that. It does have a single command to report global NPU load as percentage, which is a gross simplification but useful. Like you can see that as you keep piling IPv6 EH, global load approaches 100%, until eventually the stack is too deep, and packet gets dropped outright, and global load plummets. Usually people operating these think it's binary, either it works or it doesn't, but as it is run-to-completion, everything adds up, which is why we always get unsatisfying answers from vendors to our scaling questions, no one really knows, it's too complicated. More in depth requires checking what ucode instructions PPEs are running, how long are they spending on them etc, but this is more after you already have unexpectedly high global NPU load, and you want to figure out why and how to address. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] A low number of Firewall filters reducing the bandwidth capacity.
On Mon, 26 Aug 2024 at 21:03, Gustavo Santos wrote: > When conducting a more comprehensive assessment, one of our NMS tools managed > to record CPU usage of both FPCs averaging 95% over a 5-minute period, with a > certainty that it reached 100% at some point. But FPC CPU (intc atom or amd) isn't related to packet pushing or filter evaluation, this happen on the Trio LU/XL PPEs. Certainly high CPU is not ideal and needs to be investigated, but I struggle to understand how it would explain this To me it is fairly common that once you start investigating tricky issue, you first run into several other actual problems, before you find the right problem. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] A low number of Firewall filters reducing the bandwidth capacity.
You don't have a counter term for drops? So we can't be entirely sure, your terms aren't responsible for it? Please add counters for the discard terms. You can view the global NPU load % from the PFE CLI, as well as to a degree individual PPE. I'd say - let's ensure it's not the filter dropping the packets as requested - find out where the drops are reported (interface/extensive counters, PFE stream counters, QoS counters, NPU exception counters, MQ FI/FO counters ...) On Mon, 26 Aug 2024 at 16:43, Gustavo Santos wrote: > > Awesome, thanks for the info! > Rules are like the one below. > after adjusting the detection engine to handle as /24 network instead of /32 > hosts the issue is gone.. > As you said the issue was not caused by pps as the attack traffic was just > about 30Mpps and with adjusted rules to /24 networks > there were not more dropped packets from PFE. > > Did you know or have how to check the PPE information that should show what > may have happened? > > > Below a sample rule that was generated ( about 300 of them via netconf that > caused the slowdown). > > set term e558d83516833f77dea28e0bd5e65871-match from > destination-address 131.0.245.143/32 > set term e558d83516833f77dea28e0bd5e65871-match from protocol > 6 > set term > e558d83516833f77dea28e0bd5e65871-match from source-port 443 > set term e558d83516833f77dea28e0bd5e65871-match from > packet-length 32-63 > set term e558d83516833f77dea28e0bd5e65871-match from > tcp-flags "syn & ack & !fin & !rst & !psh" > set term e558d83516833f77dea28e0bd5e65871-match then count > Corero-auto-block-e558d83516833f77dea28e0bd5e65871-match port-mirror next term > set term e558d83516833f77dea28e0bd5e65871-action from > destination-address 131.0.245.143/32 > set term e558d83516833f77dea28e0bd5e65871-action from > protocol 6 > set term > e558d83516833f77dea28e0bd5e65871-action from source-port 443 > set term e558d83516833f77dea28e0bd5e65871-action from > packet-length 32-63 > set term e558d83516833f77dea28e0bd5e65871-action from > tcp-flags "syn & ack & !fin & !rst & !psh" > set term e558d83516833f77dea28e0bd5e65871-action then count > Corero-auto-block-e558d83516833f77dea28e0bd5e65871-discard discard > > > Em dom., 25 de ago. de 2024 às 02:36, Saku Ytti escreveu: >> >> The RE and LC CPU have nothing to do with this, you'd need to check >> the Trio PPE congestion levels to figure out if you're running out of >> cycles for ucode execution. >> >> This might improve your performance: >> https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/firewall-fast-lookup-filter.html >> >> There is also old and new trio ucode, new being 'hyper mode', but this >> may already be default on, depending on your release. Hyper mode >> should give a bit more PPS. >> >> There is precious little information available, like what exactly are >> your filters doing, what kind of PPS are you pushing in the Trio >> experiencing this, where are you seeing the drops, if you are >> dropping, they are absolutely accounted for somewhere. >> >> Unless you are really pushing very heavy PPS, I have difficulties >> seeing 100 sensible FW rules impacting performance, not saying it is >> impossible, but suspecting there is a lot more here. We'd need to deep >> dive into the rules, PPE configuration and load. >> >> On Sat, 24 Aug 2024 at 23:35, Gustavo Santos via juniper-nsp >> wrote: >> > >> > Hi, >> > >> > We have noticed that when a not so large number of firewall filters terms >> > are generated and pushed to edge routers via via NETCONF into a triplet of >> > MX10003 , >> > we start receiving customer complaints. These issues seem to be related to >> > the router's FPC limiting overall network traffic. To resolve the problem, >> > we simply deactivate the ephemeral configuration database that contains the >> > rules, which removes all the rules, >> > and the traffic flow returns to normal. Is there any known limitation or >> > bug that could cause this type of issue? >> > We typically observe this problem with more than 100 rules; with a smaller >> > number of rules, we don't experience the same issue, even with much larger >> > attacks. Is there any known bug or limitation? >> > >> > A
Re: [j-nsp] A low number of Firewall filters reducing the bandwidth capacity.
The RE and LC CPU have nothing to do with this, you'd need to check the Trio PPE congestion levels to figure out if you're running out of cycles for ucode execution. This might improve your performance: https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/firewall-fast-lookup-filter.html There is also old and new trio ucode, new being 'hyper mode', but this may already be default on, depending on your release. Hyper mode should give a bit more PPS. There is precious little information available, like what exactly are your filters doing, what kind of PPS are you pushing in the Trio experiencing this, where are you seeing the drops, if you are dropping, they are absolutely accounted for somewhere. Unless you are really pushing very heavy PPS, I have difficulties seeing 100 sensible FW rules impacting performance, not saying it is impossible, but suspecting there is a lot more here. We'd need to deep dive into the rules, PPE configuration and load. On Sat, 24 Aug 2024 at 23:35, Gustavo Santos via juniper-nsp wrote: > > Hi, > > We have noticed that when a not so large number of firewall filters terms > are generated and pushed to edge routers via via NETCONF into a triplet of > MX10003 , > we start receiving customer complaints. These issues seem to be related to > the router's FPC limiting overall network traffic. To resolve the problem, > we simply deactivate the ephemeral configuration database that contains the > rules, which removes all the rules, > and the traffic flow returns to normal. Is there any known limitation or > bug that could cause this type of issue? > We typically observe this problem with more than 100 rules; with a smaller > number of rules, we don't experience the same issue, even with much larger > attacks. Is there any known bug or limitation? > > As it is a customer traffic issue I didn't have the time to check fpc > memory or fpc shell. I just checked the routing engine and fpc cpu and > they are all fine ( under 50% fpc and under 10% RE). > > Any thoughts? > > Regards. > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Logging for shell sessions
This depends greatly on how you've set up your support. If you have Cisco HTTS or Juniper ACP or the like, where you get named engineers, then you can develop a mutual trust and give those engineers access to your network. But if you're going through a normal process, perhaps additional care is warranted. On Sun, 7 Jul 2024 at 19:34, Jared Mauch wrote: > > I don't trust my vendors to run commands on my devices, it's not > personal. If there is a diagnostic that they want run, they need to be > able to articulate the operational risk, or we may want to validate in a > virtual or real physical router. > > - Jared > > On Sun, Jul 07, 2024 at 11:07:48AM +0300, Saku Ytti via juniper-nsp wrote: > > For things like TAC use, what I've previously done is made a vendor > > shell, where the shell program is screen instead of shell, and screen > > is set up to log. > > > > > > On Sat, 6 Jul 2024 at 16:50, Job Snijders wrote: > > > > > > Perhaps it’s just about wanting to keep track “what happened?!?” > > > > > > For such a scenario, consider conserver > > > https://www.conserver.com/docs/console.man.html and script > > > http://man.openbsd.org/script to store the terminal interactions > > > > > > Assume untrusted users probably can escape these such environments > > > > > > Kind regards, > > > > > > Job > > > > > > > > -- > > ++ytti > > ___ > > juniper-nsp mailing list juniper-nsp@puck.nether.net > > https://puck.nether.net/mailman/listinfo/juniper-nsp > > -- > Jared Mauch | pgp key available via finger from ja...@puck.nether.net > clue++; | http://puck.nether.net/~jared/ My statements are only mine. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Logging for shell sessions
For things like TAC use, what I've previously done is made a vendor shell, where the shell program is screen instead of shell, and screen is set up to log. On Sat, 6 Jul 2024 at 16:50, Job Snijders wrote: > > Perhaps it’s just about wanting to keep track “what happened?!?” > > For such a scenario, consider conserver > https://www.conserver.com/docs/console.man.html and script > http://man.openbsd.org/script to store the terminal interactions > > Assume untrusted users probably can escape these such environments > > Kind regards, > > Job -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Logging for shell sessions
I don't believe there is any supported way to do this, an unsupported way, probably, but also probably an educated operator could circumvent it anyhow. You probably shouldn't allow untrusted users to access the shell. On Sat, 6 Jul 2024 at 09:26, Phil Mawson via juniper-nsp wrote: > > Hi, > > Once a user enters the unix shell on a Juniper router/switch (Ie: start > shell), it appears all standard logging of the commands typed is not captured > by syslog and obviously not sent to AAA for authorisation. > > Is there a way to capture all commands users type and send to an external > logging source? Looking through Juniper doc doesn’t have much info on this. > I’d expect we’d need something running at the kernel level on BSD. > > Understand the commands are logged in the bash history file, but ideally need > this to go off the router for audit purposes in real time. > > Cheers, > Phil. > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Junos EVO RE Filters
On Wed, 19 Jun 2024 at 20:35, heasley wrote: > And enemy of security is lack of effort? Current BMCs would be > a step backward, imiho. I wish they were better; a lot of > potential.. What is the benchmark? Is benchmark NOS fate-sharing control-plane ethernet? Or RS232? How do they outperform arbitrarily insecure BMC? I wouldn't trust the BMC for security at any rate, the MGMT LAN would be as closed as SSH to RS232, with 0 expectation that on the RS232 side there is any security. I don't think security is important at all here. I want to be able to bring the box up quickly, when it dies, without sending people on the field. I care much more about availability of service, than security (and I don't think trade-off is being made here actually), because availability you can measure, security mostly you can't, it's astrology for men. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Junos EVO RE Filters
On Tue, 18 Jun 2024 at 21:23, heasley wrote: > Yes, do that, please, but that does not really address the security > problems. BMCs typically are not updated by their owners, s/w updates > for them are rarely offered by the vendor, usually have limited filtering > & security capabilities, and are difficult to manage. > > You have to ask for much more. To me none of the above matters. I don't care how insecure the BMC is. I just want a true OOB port that works when my router does not work. I want an OOB port that won't break my router, when my OOB LAN has a broadcast storm or some other unexpected behaviour. I want an OOB port over which I can bootstrap factory new router. Perfect is the enemy of done, ask for the MVP, so you might actually get something. We've been in this RS232 world far too long, and the ethernet option we have is even worse. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Junos EVO RE Filters
On Tue, 18 Jun 2024 at 18:56, Jason Iannone via juniper-nsp wrote: > I suppose the root question is do I have to apply a management filter on my > transit interfaces for in-band management traffic? Does ACX have a new (not > fxp1) relationship between the RE and the external re0:mgmt-0/em0/fxp0 in > the management interface in the ACX? No. Lo filter applies to traffic ingressing from revenue/NPU ports, but unlike Junos classic, Lo filter does not apply to traffic ingressing from MGMT ETH. I wouldn't worry much about this. The MGMT filters have always been software, for obvious reasons, and are not very useful. Don't use the MGMT ETH. If you must, just make it clean on the other side, by not accepting trash in from any client side. If you must use MGMT ETH, keep asking your vendors for true lights out ethernet, with its own CPU, DRAM and storage. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] rib-sharding and NSR update
On Mon, 3 Jun 2024 at 05:26, Gustavo Santos via juniper-nsp wrote: > We will try it again later this year. If update threading / rib-sharding > works as expected it will be better than having non stop routing running. I think you need to contact support and work with them, NOS SW quality is terrible and whatever problem you're seeing might be some corner case that happens just you, and it will never get fixed if you're not proactive about it. > Last time we had an issue caused by bgp routing update, it tooks about 50 > minutes to advertise all needed routes to one of the transit providers, > because the time it takes to send full routing tables feed to remote peers. Could be a plethora of things, but by default the TCP window won't grow past 16kB, so if you have any latency at all, performance is destroyed. You can raise this to 64kB, but windowning is currently not supported. And in my own testing, performance was gated by the 64kB window, we were able to fill the entire window, so convergence was limited by the 64k window. But it's not necessarily super trivial to make BGP perform well, as you may have 10Mbps static policer for queue towards control-plane shared by BGP, all IGPs, etc, so if you were able to make BGP perform, you could potentially kill your IGP, so problem may recurse into complex one, and the DE who made the design choices may not anymore be employed so there may not be anyone left who will be able to make informed decision on how to change the behaviour. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] JunOS forwarding IPv6 packets with link-local source
On Fri, 17 May 2024 at 10:36, Antti Ristimäki wrote: > iACL design becomes a bit more challenging if you want to keep the > link-local things link local (e.g. there are legit ND packets with > link-local srcaddr and GUA dstaddr). It is doable, though. Not disagreeing, but what are these packets? And can you drop link-local in two forwarding-filter terms? I know ND can be any permutation, but those can be handled in earlier terms in iACL without matching addresses, by matching icmp6 types and hop-limit 255. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] JunOS forwarding IPv6 packets with link-local source
On Thu, 16 May 2024 at 21:23, Antti Ristimäki via juniper-nsp wrote: > Does anyone have any insight into this? This issue was discussed on > this list already over 10 years ago, for example: > https://puck.nether.net/pipermail/juniper-nsp/2012-April/023134.html Personally I'm not convinced I'd even want this fixed, as it likely comes with significant per-packet cost. Reality is always some pragmatic version of standard. But I'm pretty sure if you press it, Juniper will accept it as PR. If I read the IPv6 standard correctly, nodes /have to/ join the ND multicast group, which they don't, which is good, because the whole thing is dumb, fragile and expensive. ICMPv6 ND forwarding is weird, most forward it happily in all cases, some like SROS punt all ICMPv6 ND with TTL 255, transit or punt, and transit all TTL 254 or less. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ACL for lo0 template/example comprehensive list of 'things to think about'?
How IP options work is platform specific. It used to be that _transited_ IP-options were not subject to the lo0 filter, while still being risk for RE, so you'd implement forwarding-filter, where you'd police IP-options or drop out right. In more recent junipers, this behaviour has been changed so that transited IP-options are subject to lo0 filter, which makes it in practice impossible to determine if IP-option is transit or punt, especially if you run L3 MPLS VPN. I tried to argue that the new behaviour is PR, but didn't have enough patience to ram it through. So basically no one knows what their policy regarding transit IP packets are, and most accidentally change the policy from 'transit all, unlimited' to 'transit none' by upgrading devices. Of course generally this is the case for most things. On Mon, 13 May 2024 at 13:36, Martin Tonusoo via juniper-nsp wrote: > > Michael, > > got it, thanks. > > > Lee, > > the README of your repository provides an excellent introduction to RE > filtering. Based on your filters, I moved the processing of the IP > Options from edge filters to RE filters: > > https://gist.github.com/tonusoo/efd9ab4fcf2bb5a45d34d5af5e3f3e0c#file-junos-re-filters-L574:L585 > > > Martin > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] BGP timer
On Mon, 29 Apr 2024 at 10:13, Mark Tinka via juniper-nsp wrote: > It comes down to how you classify stable (well-behaved) vs. unstable > (misbehaving) interfaces. You are making this unnecessarily complicated. You could simply configure that first down event doesn't add enough points to damp, 2nd does. And you are wildly better off. Perfect is the enemy of done and kills all movement towards better. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] BGP timer
On Mon, 29 Apr 2024 at 10:07, Gert Doering via juniper-nsp wrote: > The interesting question is "how to react when underlay seems to be stable > again"? "bring up upper layers right away, with exponential decay flap > dampening" or "always wait 15 minutes to be SURE it's stable!!!"... 100%, what Mark implied was not what I was trying to communicate. Sure, go ahead and damp flapping interfaces, but to penalise on first down event, when most of them are just that, one event, to me, is just bad policy made by people who don't feel the cost. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] BGP timer
On Sun, 28 Apr 2024 at 21:20, Jeff Haas via juniper-nsp wrote: > BFD holddown is the right feature for this. > WARNING: BFD holddown is known to be problematic between Juniper and Cisco > implementations due to where each start their state machines for BFD vs. BGP. > > It was a partial motivation for BGP BFD strict: > https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-bfd-strict-mode > > BGP BFD strict was added in 23.2R1. But why is this desirable? Why do I want to prioritise stability always, instead of prioritising convergence on well-behaved interfaces and stability on poorly behaved interfaces? If I can pick just one, I'll prioritise convergence every time for both. That is, if I cannot have exponential back-off, I won't kill convergence 'just in case', because it's not me who will feel the pain of my decisions, it's my customers. Netengs and particularly infosec people quite often are unnecessarily conservative in their policies, because they don't have skin in the game, they feel the upside, but not the downside. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ACL for lo0 template/example comprehensive list of 'things to think about'?
Some comments from quick read of just IPv4. - I don't like the level of abstraction, seems it just ensures no one will bother reading it up and reuse of the filters and terms wont happen anyhow. It feels like first time learning OO language, and making everything modular, while adding overhead and abstraction for no value. Instead of having flat list, you have multiple filters in a list (which is internally concatenate in SW anyhow, into single fat list, so no HW benefit), not just that, but filters themselves refer to other filters. 1) You should have two rules for TCP services, like BGP, inbound and outbound, instead just allowing far end to connect, and self-connect is handled by flags This will allow far-end to hit any port they want, while it will not have SYN bit, it's still not safe. You could improve it, by defining DPORT in the connected check as ephemeral range the NOS uses 2) OSPF can be TTL==1, not very important for security, tho 3) traceroute and ping won't work, if router is the target DADDR and TTL > 1 4) useless use of 'router-v4', if it hit lo0, it was for us. You'd need something like this in the edge filter, not lo0 filter. And in the edge filter it's still broken, because this is all LANs, not host/32. 5) use of 'port' in NTP and other, this allows the far end to hit any port, by setting SPORT port to ntp 6) no dport in DNS, every term should have DPORT, if we are connecting, it'll be ephemeral range, otherwise far end can hit any dport, by setting sport Some of these mistakes are straight from the book, like the useless level of abstraction without actual reuse and the insecure use of 'port'. But unlike the book, at least you have ultimate permit and then ultimate deny, which is important. On Sun, 28 Apr 2024 at 12:21, Martin Tonusoo wrote: > > Hi. > > > In practical life IOS-XR control-plane is better protected than JunOS, > > as configuring JunOS securely is very involved, considering that MX > > book gets it wrong, offering horrible lo0 filter as does Cymru, what > > chance the rest of us have? > > I recently worked on a RE protection filter based on the examples > given in the "Juniper MX Series" book: > https://gist.github.com/tonusoo/efd9ab4fcf2bb5a45d34d5af5e3f3e0c > > It's a tight filter for a simple network, e.g MPLS is not in use and > thus there are no filters for signaling protocols or MPLS LSP > ping/traceroute, routing instances are not in use, authentication for > VRRPv3 or OSPF is not in use, etc. > > Few differences compared to filters in the MX book: > > * "ttl-except 1" in "accept-icmp" filter was avoided by simply moving > the traceroute related filters in front of "accept-icmp" filter > > * "discard-extension-headers" filter in the book discards certain Next > Header values and allows the rest. I changed it in a way that only > specified Next Header values are accepted and rest are discarded. Idea > is to discard unneeded extension headers as early as possible. > > * in term "neighbor-discovery-accept" in filter "accept-icmp6-misc" > only the packets with Hop Limit value of 255 should be accepted > > * the "accept-bgp-v6" filter or any other IPv6 related RE filter in > the book does not allow the router to initiate BGP sessions with other > routers. I added a term named "accept-established-bgp-v6" in filter > "accept-established-v6" which addresses this issue. > > * for the sake of readability and simplicity, I used names instead of > numbers if possible. For example "icmp-type router-solicit" instead of > "icmp-type 133". > > * in all occurrences, if it was not possible to match on the source IP > address, then I strictly policed the traffic > > * traffic from management networks is not sharing policers with > traffic from untrusted networks > > > The overall structure of the RE filters in "Juniper MX Series" book is > in my opinion very good. List of small filters which accept specific > traffic and finally discard all the rest. > > Reason for having separate v4 and v6 prefix-lists is a Junos property > to ignore the prefix-list altogether if it's used in a family inet > filter while the prefix-list contains only the inet6 networks. Same is > true if the prefix-list is used in family inet6 filter and the > prefix-list contains only inet networks. For example, if only IPv4 > name servers addresses are defined under [edit system name-server] and > prefix-list with apply-path "system name-server <*>" is used as a > source prefix-list in some family inet6 filter, then actually no > source address related restrictions apply. This can be checked with > "show filter index program" on a PFE CLI. > > > Martin -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] BGP timer
On Sat, 27 Apr 2024 at 14:29, Rolf Hanßen via juniper-nsp wrote: > at least for link flapping issues (but not other session flapping reasons) > you could set the hold-time: > set interfaces xy hold-time up 30 Since Junos 14.1 it has caught up with Cisco, and it has implemented exponential back-off for interface damping. So you don't have to cause a static penalty as above, but can penalise actually flapping interfaces, instead of killing convergence on the first transition. But indeed doesn't really address what OP is asking, and I don't think, outside scripting, there is a direct solution to what OP wants. Clearly any vendor could implement exponential back-off damping to any protocol which has up and down state, and they could write the code once, and reuse it for everything, so it's not a tall order at all. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture
This might be grounds for a feature request to Juniper, if there isn't already some magic toggle to MakeItGo. But yeah, the forwarding-table looks suspect, as if it'll do table lookup, and then will fail to discover the more-specific host-route, and discard, as the ARP entries are not copied. And yeah Alexandres' workaround seems like a cute way to force the host route into VRF, if provisioning intensive. I think two features would be nice to have a) this to copy the arp/nd entries from inet to vrf (if not already possible) b) feature to assign labels to each arp/nd host route, to avoid doing egressPE lookup (this labeled route would only be imported to the interface facing scrubber clean side, rest of the network sees the unlabeled direct aggregate) On Wed, 3 Apr 2024 at 17:04, Michael Hare wrote: > > Saku, Mark- > > Thanks for the responses. Unless I'm mistaken, short of specifying a > selective import policy, I think I'm already doing what Saku suggests, see > relevant config snippet below. Our clean VRF is L3VPN-4205. But after I saw > the lack of mac based next hops I started searching to see if there was a > protocol other than direct that I wasn't aware of. I intend to take a look > at Alexandre's workaround to understand/test, just haven't gotten there yet. > > I was able to get FBF via dirtyVRF working quickly in the meantime while I > figure out how to salvage the longest-prefix approach. > > -Michael > > ==/== > > @ # show routing-options | display inheritance no-comments > ... > interface-routes { > rib-group { > inet rib-interface-routes-v4; > inet6 rib-interface-routes-v6; > } > } > rib-groups { > rib-interface-routes-v4 { > import-rib [ inet.0 L3VPN-4205.inet.0 ]; > } > ... > rib-interface-routes-v6 { > import-rib [ inet6.0 L3VPN-4205.inet6.0 ]; > } > ... > } > > > -Original Message- > > From: juniper-nsp On Behalf Of > > Saku Ytti via juniper-nsp > > Sent: Wednesday, April 3, 2024 1:58 AM > > To: Mark Tinka > > Cc: juniper-nsp@puck.nether.net > > Subject: Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture > > > > On Wed, 3 Apr 2024 at 09:45, Saku Ytti wrote: > > > > > Actually I think I'm confused. I think it will just work. Because even > > > as the EgressPE does IP lookup due to table-label, the IP lookup still > > > points to egressMAC, instead looping back, because it's doing it in > > > the CleanVRF. > > > So I think it just works. > > > > > routing-options { > > > interface-routes { > > > rib-groups { > > > cleanVRF { > > > import-rib [ inet.0 cleanVRF.inet.0 ]; > > > import-policy cleanVRF:EXPORT; > > > > > > > This isn't exactly correct. You need to put the cleanVRF in > > interfacer-quotes and close it. > > > > Anyhow I'm 90% sure this will just work and pretty sure I've done it. > > The confusion I had was about the scrubbing route that on the > > clean-side is already host/32. For this, I can't figure out a cleanVRF > > solution, but a BGP-LU solution exists even for this problem. > > > > > > -- > > ++ytti > > ___ > > juniper-nsp mailing list juniper-nsp@puck.nether.net > > https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/junip > > er-nsp__;!!Mak6IKo!JQJvgDK7yNf4- > > 3MbfcDkWHvNajBUNxt3ZAC3DefzEkRkebYhpy3c7RX5em7pvvTJZrdrNKw79P > > QweWqGaJdIwLpkAng$ -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture
On Wed, 3 Apr 2024 at 09:45, Saku Ytti wrote: > Actually I think I'm confused. I think it will just work. Because even > as the EgressPE does IP lookup due to table-label, the IP lookup still > points to egressMAC, instead looping back, because it's doing it in > the CleanVRF. > So I think it just works. > routing-options { > interface-routes { > rib-groups { > cleanVRF { > import-rib [ inet.0 cleanVRF.inet.0 ]; > import-policy cleanVRF:EXPORT; > This isn't exactly correct. You need to put the cleanVRF in interfacer-quotes and close it. Anyhow I'm 90% sure this will just work and pretty sure I've done it. The confusion I had was about the scrubbing route that on the clean-side is already host/32. For this, I can't figure out a cleanVRF solution, but a BGP-LU solution exists even for this problem. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture
On Wed, 3 Apr 2024 at 09:37, Mark Tinka via juniper-nsp wrote: > At old job, we managed to do this with a virtual-router VRF that carried > traffic between the scrubbing PE and the egress PE via MPLS, to avoid > the IP loop. Actually I think I'm confused. I think it will just work. Because even as the EgressPE does IP lookup due to table-label, the IP lookup still points to egressMAC, instead looping back, because it's doing it in the CleanVRF. So I think it just works. So OP just needs to copy the direct route as-is, not as host/32 into cleanVRF, with something like this: routing-options { interface-routes { rib-groups { cleanVRF { import-rib [ inet.0 cleanVRF.inet.0 ]; import-policy cleanVRF:EXPORT; Now cleanVRF.inet.0 has the connected TableLabel, and as lookup is done in the cleanVRF, without the Scrubber/32 route, it'll be sent to the correct egress CE, despite doing egress IP lookup. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture
On Tue, 2 Apr 2024 at 18:25, Michael Hare via juniper-nsp wrote: > We're a US research and education ISP and we've been tasked for coming up > with an architecture to allow on premise DDoS scrubbing with an appliance. > As a first pass I've created an cleanL3VPN routing-instance to function as a > clean VRF that uses rib-groups to mirror the relevant parts of inet.0. It > is in production and is working great for customer learned BGP routes. It > falls apart when I try to protect a directly attached destination that has a > mac address in inet.0. I think I understand why and the purpose of this > message is to see if anyone has been in a similar situation and has > thoughts/advice/warnings about alternative designs. > > To explain what I see, I noticed that mac address based nexthops don't seem > to be copied from inet.0 into cleanL3VPN.inet.0. I assume this means that > mac-address based forwarding must be referencing inet.0 [see far below]. > This obviously creates a loop once the best path in inet.0 becomes a BGP /32. > For example when I'm announcing a /32 for 1.2.3.4 out of a locally attached > 1.2.3.0/26, traceroute implies the packet enters inet.0, is sent to 5.6.7.8 > as the nexthop correctly, arrives in cleanL3VPN which decides to forward to > 5.6.7.8 in a loop, even though the BGP /32 isn't part of cleanL3VPN [see > below], cleanL3VPN Is dependent on inet.0 for resolution. Even if I could > copy inet.0 mac addresses into cleanL3VPN, eventually the mac address would > age out of inet.0 because the /32 would no longer be directly connected. If > I want to be able to protect locally attached destinations so I think my > design is unworkable, I think my solutions are If I understand you correctly, the problem is not that you can't copy direct into CleanVRF, the problem is that ScrubberPE that does clean lookup in in CleanVRF, has label stack of [EgressPE TableLabel], instead of [EgressPE EgressCE], this causes the EgressPE to do IP lookup, which will then see the Direct/32 advertised by the scrubber, causing loop. While what you want is end-to-end MPLS lookup, so that egressPE MPLS lookup has egressMAC. I believe in BGP-LU you could fix this, without actually paying for duplicate RIB/FIB and without opportunistically copying routes to CleanVRF, every prefix would be scrubbable by default. You'd have per-ce for rest, but per-prefix for connected routes, I believe then you would have [EgressPE EgressMAC_CE] label for connected routes, so each host route would have their own label, allowing mac rewrite without additional local IP lookup. I'm not sure if this is the only way, I'm not sure if there would be a way in CleanVRF to force each direct/32 to have a label as well, avoiding the egress IP lookup loops. One doesn't immediately spring to mind, but technically implementation could certainly allow such mode. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] BGP route announcements and Blackholes
On Tue, 19 Mar 2024 at 19:44, Lee Starnes via juniper-nsp wrote: > The blackhole peer does receive the /32 announcement, but the aggregate > route also becomes discarded and thus routes to the other peers stop > working. I couldn't follow this, and the output you shared didn't support it. So it is not clear to me what the actual problem is. Of course if you want a blackhole, you want an internal blackhole too, so you internally are going to add some route to discard, then this is the route you'd leak to upstream. How this would impact the next-hop type or readversability of the aggregate is unclear to me, unless you're blackholing the next-hop of some route. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX204 OSPF default route injection
On Thu, 7 Mar 2024 at 03:08, Lee Starnes via juniper-nsp wrote: > Any tips or help on the best practice implementation would be greatly > appreciated. While what you want obviously is possible to accomplish. Is it a thing you actually need? I don't personally really see any need to ever carry default-route in dynamic routing protocols, in static protocols there are use cases obviously. Why not have a static floating default pointing to a dynamic recursive next-hop at the CE? This solves quite a bit of problems in a rather elegant way. One example being, if you generate a static route at edge, you have no idea about the quality of the route, the edge device may be entirely isolated and just blackholing all traffic. Whereas, perhaps your candidate route is originated only from backbone devices anycast loopback, and edge device is simply passing that host-route towards CEs, this way the CE will recurse its default route to whichever edge device happens to be connected and is passing the default route along. There are other examples of problems this addresses, discussed in this and other lists in previous years. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
On Mon, 12 Feb 2024 at 09:44, james list wrote: > I'd like to test with LACP slow, then can see if physical interface still > flaps... I don't think that's good idea, like what would we know? Would we have to wait 30 times longer, so month-3months, to hit what ever it is, before we have confidence? I would suggest - turn on debugging, to see cisco emitting LACP PDU, and juniper receiving LACP PDU - do packet capture, if at all reasonable, ideally tap, but in absence of tap mirror - turn off LACP distributed handling on junos - ping on the link, ideally 0.2-0.5s interval, to record how ping stops in relation to first syslog emitted about LACP going down - wait for 4days -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
On Sun, 11 Feb 2024 at 17:52, james list wrote: > - why physical interface flaps in DC1 if it is related to lacp ? 16:39:35.813 Juniper reports LACP timeout (so problem started at 16:39:32, (was traffic passing at 32, 33, 34 seconds?)) 16:39:36.xxx Cisco reports interface down, long after problem has already started Why Cisco reports physical interface down, I'm not sure. But clearly the problem was already happening before interface down, and first log entry is LACP timeout, which occurs 3s after the problem starts. Perhaps Juniper asserts for some reason RFI? Perhaps Cisco resets the physical interface once removed from LACP? > - why the same setup in DC2 do not report issues ? If this is is LACP related software issue, could be difference not identified. You need to gather more information, like how does ping look throughout this event, particularly before syslog entries. And if ping still works up-until syslog, you almost certainly have software issue with LACP inject at Cisco, or more likely LACP punt at Juniper. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
On Sun, 11 Feb 2024 at 15:24, james list wrote: > While on Juniper when the issue happens I always see: > > show log messages | last 440 | match LACPD_TIMEOUT > Jan 25 21:32:27.948 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp > current while timer expired current Receive State: CURRENT > Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp > current while timer expired current Receive State: CURRENT Ok so problem always starts by Juniper seeing 3seconds without LACP PDU, i.e. missing 3 consecutive LACP PDU. It would be good to ping while this problem is happening, to see if ping stops at 3s before the syslog lines, or at the same time as syslog lines. If ping stops 3s before, it's link problem from cisco to juniper. If ping stops at syslog time (my guess), it's software problem. There is unfortunately log of bug surface here, both on inject and on punt path. You could be hitting PR1541056 on the Juniper end. You could test for this by removing distributed LACP handling with 'set routing-options ppm no-delegate-processing' You could also do packet capture for LACP on both ends, to try to see if LACP was sent by Cisco and received by capture, but not by system. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Hey James, You shared this off-list, I think it's sufficiently material to share. 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel101 is down (No operational members) 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel101: Ethernet1/44 is down Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACP_INTF_DOWN: ae49: Interface marked down due to lacp timeout on member et-0/1/5 We can't know the order of events here, due to no subsecond precision enabled on Cisco end. But if failure would start from interface down, it would take 3seconds for Juniper to realise LACP failure. However we can see that it happens in less than 1s, so we can determine the interface was not down first, the first problem was Juniper not receiving 3 consecutive LACP PDUs, 1s apart, prior to noticing any type of interface state related problems. Is this always the order of events? Does it always happen with Juniper noticing problems receiving LACP PDU first? On Sun, 11 Feb 2024 at 14:55, james list via juniper-nsp wrote: > > Hi > > 1) cable has been replaced with a brand new one, they said that to check an > MPO 100 Gbs cable is not that easy > > 3) no errors reported on both side > > 2) here the output of cisco and juniper > > NEXUS1# sh interface eth1/44 transceiver details > Ethernet1/44 > transceiver is present > type is QSFP-100G-SR4 > name is CISCO-INNOLIGHT > part number is TR-FC85S-NC3 > revision is 2C > serial number is INL27050TVT > nominal bitrate is 25500 MBit/sec > Link length supported for 50/125um OM3 fiber is 70 m > cisco id is 17 > cisco extended id number is 220 > cisco part number is 10-3142-03 > cisco product id is QSFP-100G-SR4-S > cisco version id is V03 > > Lane Number:1 Network Lane >SFP Detail Diagnostics Information (internal calibration) > > > Current Alarms Warnings > Measurement HighLow High Low > > > Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C > Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V > Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 > mA > Tx Power 0.98 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 > dBm > Rx Power -1.60 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 > dBm > Transmit Fault Count = 0 > > > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:2 Network Lane >SFP Detail Diagnostics Information (internal calibration) > > > Current Alarms Warnings > Measurement HighLow High Low > > > Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C > Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V > Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 > mA > Tx Power 0.62 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 > dBm > Rx Power -1.18 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 > dBm > Transmit Fault Count = 0 > > > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:3 Network Lane >SFP Detail Diagnostics Information (internal calibration) > > > Current Alarms Warnings > Measurement HighLow High Low > > > Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C > Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V > Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 > mA > Tx Power 0.87 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 > dBm > Rx Power 0.01 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 > dBm > Transmit Fault Count = 0 > > > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning > > Lane Number:4 Network Lane >SFP Detail Diagnostics I
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
I want to clarify, I meant this in the context of the original question. That is, if you have a BGP specific problem, and no FCS errors, then you can't have link problems. But in this case, the problem is not BGP specific, in fact it has nothing to do with BGP, since the problem begins on observing link flap. On Sun, 11 Feb 2024 at 14:14, Saku Ytti wrote: > > I don't think any of these matter. You'd see FCS failure on any > link-related issue causing the BGP packet to drop. > > If you're not seeing FCS failures, you can ignore all link related > problems in this case. > > > On Sun, 11 Feb 2024 at 14:13, Havard Eidnes via juniper-nsp > wrote: > > > > > DC technicians states cable are the same in both DCs and > > > direct, no patch panel > > > > Things I would look at: > > > > * Has all the connectors been verified clean via microscope? > > > > * Optical levels relative to threshold values (may relate to the > >first). > > > > * Any end seeing any input errors? (May relate to the above > >two.) On the Juniper you can see some of this via PCS > >("Physical Coding Sublayer") unexpected events independently > >of whether you have payload traffic, not sure you can do the > >same on the Nexus boxes. > > > > Regards, > > > > - Håvard > > ___ > > juniper-nsp mailing list juniper-nsp@puck.nether.net > > https://puck.nether.net/mailman/listinfo/juniper-nsp > > > > -- > ++ytti -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
I don't think any of these matter. You'd see FCS failure on any link-related issue causing the BGP packet to drop. If you're not seeing FCS failures, you can ignore all link related problems in this case. On Sun, 11 Feb 2024 at 14:13, Havard Eidnes via juniper-nsp wrote: > > > DC technicians states cable are the same in both DCs and > > direct, no patch panel > > Things I would look at: > > * Has all the connectors been verified clean via microscope? > > * Optical levels relative to threshold values (may relate to the >first). > > * Any end seeing any input errors? (May relate to the above >two.) On the Juniper you can see some of this via PCS >("Physical Coding Sublayer") unexpected events independently >of whether you have payload traffic, not sure you can do the >same on the Nexus boxes. > > Regards, > > - Håvard > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
On Sun, 11 Feb 2024 at 13:51, james list via juniper-nsp wrote: > One think I've omit to say is that BGP is over a LACP with currently just > one interface 100 Gbs. > > I see that the issue is triggered on Cisco when eth interface seems to go > in Initializing state: Ok, so we can forget BGP entirely. And focus on why the LACP is going down. Is the LACP single port, eth1/44? When the LACP fails, does Juniper end emit any syslog? Does Juniper see the interface facing eth1/44 flapping? -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Open JTAC and CTAC cases. The amount of information provided is wildly insufficient. 'BGP flaps' what does that mean, is it always the same direction? If so, which direction thinks it's not seeing keepalives? Do you also observe loss in 'ping' between the links during the period? Purely stabbing in the dark, I'd say you always observe it in a single direction, because in that direction you are losing reliably every nTh keepalive, and statistically it takes 1-3 days to lose 3 in a row, with the probability you're seeing. Now why exactly is this, is one end not sending to wire or is one end not receiving from wire. Again stabbing in the dark, more likely that problem is in the punt path, rather than inject path, so I would focus my investigation on the party who is tearing down the session, due to lack of keepalive, on thesis this device has problem in punt path and is for some reason dropping at reliable probability BGP packets from the wire. On Sun, 11 Feb 2024 at 12:09, james list via juniper-nsp wrote: > > Dear experts > we have a couple of BGP peers over a 100 Gbs interconnection between > Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different datacenters > like this: > > DC1 > MX1 -- bgp -- NEXUS1 > MX2 -- bgp -- NEXUS2 > > DC2 > MX3 -- bgp -- NEXUS3 > MX4 -- bgp -- NEXUS4 > > The issue we see is that sporadically (ie every 1 to 3 days) we notice BGP > flaps only in DC1 on both interconnections (not at the same time), there is > still no traffic since once noticed the flaps we have blocked deploy on > production. > > We've already changed SPF (we moved the ones from DC2 to DC1 and viceversa) > and cables on both the interconnetion at DC1 without any solution. > > SFP we use in both DCs: > > Juniper - QSFP-100G-SR4-T2 > Cisco - QSFP-100G-SR4 > > over MPO cable OM4. > > Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the issue. > > Any idea or suggestion what to check or to do ? > > Thanks in advance > Cheers > James > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Fri, 9 Feb 2024 at 17:50, Tom Beecher wrote: > Completely fair, yes. My comments were mostly aimed at a vMX/cRPD comparison. > I probably wasn't clear about that. Completely agree that it doesn't make > much sense to move from an existing vRR to cRPD just because. For a > greenfield thing I'd certainly lean cRPD over VRR at least in planning. Newer > cRPD has definitely come a long way relative to older. ( Although I haven't > had reason or cycles to really ride it hard and see where I can break it > yet. :) ) Agreed on green field straight to cRPD today, and fallback to vRR if needed. Just because it is clear that vendor focus is there and wants to see you there. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Thu, 8 Feb 2024 at 17:11, Tom Beecher via juniper-nsp wrote: > For any use cases that you want protocol interaction, but not substantive > traffic forwarding capabilities , cRPD is by far the better option. No one is saying that cRPD isn't the future, just that there are a lot of existing deployments with vRR, which are run with some success, and the entire stability of the network depends on it. Whereas cRPD is a newer entrant, and early on back when I tested it, it was very feature incomplete in comparison. So those who are already running vRR, and are happy with it, changing to cRPD just to change to cRPD is simply bad risk. Many of us don't care about DRAM of vCPU, because you only need a small number of RRs, and DRAM/vCPU grows on trees. But we live in constant fear of the entire RR setup blowing up, so motivation for change needs to be solid and ideally backed by examples of success in a similar role in your circle of people. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX204 and IPv6 BGP announcements
On Thu, 8 Feb 2024 at 16:07, Mark Tinka via juniper-nsp wrote: > So internally, if it attracts any traffic for non-specific destinations, > does Junos send it /dev/null in hardware? I'd guess so... In absence of more specifics, junos by default doesn't discard but reject. There is essentially implied 0/0 static route to reject adjacency. This can be changed to be discard, or you can just nail down default discard. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Thu, 8 Feb 2024 at 10:16, Mark Tinka wrote: > Is the MX150 still a current product? My understanding is it's an x86 > platform running vMX. No longer orderable. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Thu, 8 Feb 2024 at 09:51, Roger Wiklund via juniper-nsp wrote: > I'm curious, when moving from vRR to cRPD, how do you plan to manage/setup > the infrastructure that cRPD runs on? Same concerns, I would just push it back and be a late adopter. Rock existing vRR while supported, not pre-empt into cRPD because vendor says that's the future. Let someone else work with the vendor to ensure feature parity and indeed perhaps get some appliance from the vendor. With HPE, I feel like there is a lot more incentive to sell integrated appliances to you than before. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Tue, 6 Feb 2024 at 18:35, Mark Tinka wrote: > IME, when we got all available paths, ORR was irrelevant. > > But yes, at the cost of some control plane resources. Not just opinion, fact. If you see everything, ORR does nothing but adds cost. You only need AddPath and ORR, when everything is too expensive, but you still need good choices. But even if you have resources to see all, you may not actually want to have a lot of useless signalling and overhead, as it'll add convergence time and risk of encouraging rare bugs to surface. In the case where I deployed it, having all was not realistic possibly, in that, having all would mean network upgrade cycle is determined when enough peers are added, causing RIB scale to demand triggering full upgrade cycle, despite not selling the ports already paid. You shouldn't need to upgrade your boxes, because your RIB/FIB doesn't scale, you should only need to upgrade your boxes, if you don't have holes to stick paying fiber into. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Thanks for all the fish
What do we think of HPE acquiring JNPR? I guess it was given that something's gotta give, JNPR has lost to dollar as an investment for more than 2 decades, which is not sustainable in the way we model our economy. Out of all possible outcomes: - JNPR suddenly starts to grow (how?) - JNPR defaults - JNPR gets acquired It's not the worst outcome, and from who acquires them, HPE isn't the worst option, nor the best. I guess the best option would have been, several large telcos buying it through a co-owned sister company, who then are less interested in profits, and more interested in having a device that works for them. Worst would probably have been Cisco, Nokia, Huawei. I think the main concern is that SP business is kinda shitty business, long sales times, low sales volumes, high requirements. But that's also the side of JNPR that has USP. What is the future of NPU (Trio) and Pipeline (Paradise/Triton), why would I, as HP exec, keep them alive? I need JNPR to put QFX in my DC RFPs, I don't really care about SP markets, and I can realise some savings by axing chip design and support. I think Trio is the best NPU on the market, and I think we may have a real risk losing it, and no mechanism that would guarantee new players surfacing to replace it. I do wish that JNPR had been more serious about how unsustainable it is to lose to the dollar, and had tried more to capture markets. I always suggested why not try Trio-PCI in newegg. Long tail is long, maybe if you could buy it for 2-3k, there would be a new market of Linux PCI users who want wire rate programmable features for multiple ports? Maybe ESXi server integration for various pre-VPC protection features at wire-rate? I think there might be a lot of potential in NPU-PCI, perhaps even FAB-PCI, to have more ports than single NPU-PCI. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
I tried to advocate for both, sorry if I was unclear. ORR for good options, add-path for redundancy and/or ECMPability. On Fri, 8 Dec 2023 at 19:13, Thomas Scott wrote: > > Why not both add-path + ORR? > -- > > Thomas Scott > Sr. Network Engineer > +1-480-241-7422 > tsc...@digitalocean.com > > > On Fri, Dec 8, 2023 at 11:57 AM Saku Ytti via juniper-nsp > wrote: >> >> On Fri, 8 Dec 2023 at 18:42, Vincent Bernat via juniper-nsp >> wrote: >> >> > On 2023-12-07 15:21, Michael Hare via juniper-nsp wrote: >> > > I recognize Saku's recommendation of rib sharding is a practical one at >> > > 20M routes, I'm curious if anyone is willing to admit to using it in >> > > production and on what version of JunOS. I admit to have not played >> > > with this in the lab yet, we are much smaller [3.5M RIB] worst case at >> > > this point. >> > >> > About the scale, I said routes, but they are paths. We plan to use add >> > path to ensure optimal routing (ORR could be another option, but it is >> > less common). >> >> Given a sufficient count of path options, they're not really >> alternatives, but you need both. Like you can't do add-path , as >> the clients won't scale. And you probably don't want only ORR, because >> of the convergence cost of clients not having a backup option or the >> lack of ECMP opportunity. >> >> -- >> ++ytti >> ___ >> juniper-nsp mailing list juniper-nsp@puck.nether.net >> https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Fri, 8 Dec 2023 at 18:42, Vincent Bernat via juniper-nsp wrote: > On 2023-12-07 15:21, Michael Hare via juniper-nsp wrote: > > I recognize Saku's recommendation of rib sharding is a practical one at 20M > > routes, I'm curious if anyone is willing to admit to using it in production > > and on what version of JunOS. I admit to have not played with this in the > > lab yet, we are much smaller [3.5M RIB] worst case at this point. > > About the scale, I said routes, but they are paths. We plan to use add > path to ensure optimal routing (ORR could be another option, but it is > less common). Given a sufficient count of path options, they're not really alternatives, but you need both. Like you can't do add-path , as the clients won't scale. And you probably don't want only ORR, because of the convergence cost of clients not having a backup option or the lack of ECMP opportunity. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
On Thu, 7 Dec 2023 at 16:22, Michael Hare via juniper-nsp wrote: > I recognize Saku's recommendation of rib sharding is a practical one at 20M > routes, I'm curious if anyone is willing to admit to using it in production > and on what version of JunOS. I admit to have not played with this in the > lab yet, we are much smaller [3.5M RIB] worst case at this point. 2914 uses it, not out of desire (too new, too rare), but out of necessity at scale 2914 needs. Surprisingly mature/robust for what it is, and how rare routing suites are to support any type of multithreading. Of course the design is a relatively conservative and clever compromise between building a truly multithreaded routing suite and delivering something practical on a legacy codebase. It wouldn't help in every RIB, but probably helps in every practical RIB. If you have a low amount of duplicate RIB entries it might not be very useful, as final collation of unique entries will be more or less single threaded anyhow. But I believe anyone having a truly large RIB, like 20M, will have massive duplication and will see significant benefit. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Hardware configuration for cRPD as RR
From a RPD, not cRPD perspective. - 64GB is certainly fine, you might be able to do with 32GB - Unless RRs are physically next to clients, you want to bump default 16kB TCP window to maximum 64kB window, and probably ask account team for window scaling support (unsure if this is true for cRPD, or if cRPD lets underlying kernel do this right, but you need to do same in client end anyhow) - You absolutely need sharding to put work on more than 1 core. Sharding goes up-to 31, but very likely 31 is too much, and the overhead of sharding will make it slower than running lower counts like 4-8. Your core count likely shouldn't be higher than shards+1. The sharding count and DRAM count are not specifically answerable, as it depends on what the contents of the RIB is. Do a binary search for both and measure convergence time, to find a good-enough number, I think 64/32GB and 4-8 cores are likely good picks. On Wed, 6 Dec 2023 at 22:30, Thomas Scott via juniper-nsp wrote: > > Also very curious in this regard. > > Best Regards, > -Thomas Scott > > > On Wed, Dec 6, 2023 at 12:58 PM Vincent Bernat via juniper-nsp < > juniper-nsp@puck.nether.net> wrote: > > > Hey! > > > > cRPD documentation is quite terse about resource requirements: > > > > https://www.juniper.net/documentation/us/en/software/crpd/crpd-deployment/topics/concept/crpd-hardware-requirements.html > > > > When used as a route reflector with about 20 million routes, what kind > > of hardware should we use? Documentation says about 64 GB of memory, but > > for everything else? Notably, should we have many cores but lower boost > > frequency, or not too many cores but higher boost frequency? > > > > There is a Day One book about cRPD, but they show a very outdated > > processor (Sandy Lake, 10 years old). > > > > Is anyone using cRPD as RR with a similar scale and can share the > > hardware configuration they use? Did you also optimize the underlying OS > > in some way or just use a stock configuration? > > > > Thanks. > > ___ > > juniper-nsp mailing list juniper-nsp@puck.nether.net > > https://puck.nether.net/mailman/listinfo/juniper-nsp > > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] backup routing engine authente from in-band interface
On Thu, 9 Nov 2023 at 10:38, Chen Jiang via juniper-nsp wrote: > Just want to confirm if Juniper backup routing engine could authenticate > users from in-band interface like ge-0/0/0 to the AAA server? > > If not, do we have a solution? The scenario is MX960 with dual RE and no > OOB network. But need to authenticate users login backup RE from AAA. No solution. Well sort of hacky solution, if you route AAA server statically over FXP/EM. But generally speaking, hard no, only local authentication on backup RE. But luckily they've fixed this awkward mismatch, and no remote authentication on either console on EVO at all. Another thing that might surprise people is that the lo0 filter no longer applies to EM/FXP ports in EVO. Ideally we'd all be asking vendors to implement true lights out ethernet ports, with dedicated control-planes, like Cisco CMP. So we could get rid of problematic RS232 and useless in-band MGMT ports (EM/FXP are actively dangerous). -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 - Edge Router
On Thu, 26 Oct 2023 at 16:40, Mark Tinka via juniper-nsp wrote: > I'd suggest staying very close to our SE's for the desired outcome we > want for this development. As we have seen before, Juniper appear > reasonably open to operator feedback, but we would need to give it to > them to begin with. I urge everyone to give them the same message as I've given. Any type of license, even timed license, after it expires will not cause an outage. And enforcement would be 'call home' via 'http(s)' proxy, which reports the license-use data to Juniper sales, making it a commercial problem between Juniper and you. Proxy, so that you don't need Internet access on the device. Potentially you could ask for encryption-less mode, if you want to log on the proxy what is actually being sent to the vendor. I don't give flying or any other method of locomotion fuck about leaking information. I believe this is a very reasonable give/take compromise which is marketable, but if we try to start punching holes through esoteric concerns, we'll get boxes which die periodically because someone forgot to re-up. This is a real future that may happen, unless we demand it must not. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 - Edge Router
On Thu, 26 Oct 2023 at 07:45, Mark Tinka via juniper-nsp wrote: > While there are some women who enjoy engineering, and some men who enjoy > nursing, most women don't enjoy engineering, and most men don't enjoy > nursing. I think we would move much farther ahead if we accepted this, > If you look at the data, on average, 70% of new enrollments at > university are women, and 60% of all graduands are women. And yet, 90% > of all STEM students are men, while 80% of all psychology students are > women. Perhaps there is a clue in there :-)... Even if you believe/think this, it is not in your best interest to communicate anything like this, there is nothing you can win, and significant downside potential. I believe the question is not about what data says, the question is, why does the data say that. And the thesis/belief is, data should not say that, that there is no fundamental reason why the data would say so. The question is, is the culture reinforcing this from day0, causing people to believe it is somehow inherent/natural. >From scientific POV, we currently don't have any real reason to believe there are unplastic differences in the brain from birth which cause this. There might, but science doesn't know that. Scientifically we should today expect very even distribution, unless culturally biased. But of course inequality, inequitability is everywhere, not an hyperbole, but you can't compare anything on how we choose who does what and come up with anything that resembles fair distribution. Zip code has a lot of predictive power where you'll end up in your life, and that is hardly your fault or merit. Top level managers are not just disproportionately men, but they are disproportionately men with +1.5SD height, and there is no scientific reason to believe zip code or height suggests stronger ability. It is just a really unfair world to live in, but luckily I am on the beneficiary side of the unfairness, which I am strong enough to accept. I have a curious anecdote about discriminatory outcomes, without any active discrimination. I think it's easier to discuss as it doesn't include any differences in the groups of people really. In Finland a minority natively speaks Swedish, majority Finnish. After 1000 years, the minority continues to statistically have better education, live longer, have more savings and higher salary. For this particular example, only rationale I've come up, which could explain it, is that the Swedish speaking minority choose other Swedish speaking people as their peers, so they feel lower sense of accomplishment performing at Finnish speaker mean level, which causes them to push themselves little bit further to achieve same satisfaction level as Finnish speaking majority would feel at lower level of accomplishment. Causing it to perpetuate indefinitely despite having 'fixed' all active discriminatory biases since forever. That is, if you ever create, through any mechanism at all, some biasing between groups, this bias will never completely go away. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 - Edge Router
On Wed, 25 Oct 2023 at 15:26, Aaron1 via juniper-nsp wrote: > Years ago I had to get a license to make my 10g interfaces work on my MX104 I think we need to be careful in what we are saying. We can't reject licences out right, that's not a fair ask and it won't happen. But we can reject licenses that expire in operation and cause an outage. That I think is a very reasonable ask. I know that IOS XE for example will do this, you run out of license and your box breaks. I swapped out from CRS1k to ASR1k because I knew the organisation would eventually fail to fix the license ahead of expiry. I'm happy if the device calls homes via https proxy, and reports my license use, and the sales droid tells me I'm not compliant with terms. Making it a commercial problem is fine, making it an acute technical problem is not. In your specific case, the ports never worked, you had to procure a license, and the license never dies. So from my POV, this is fine. And being absolutist here will not help, as then you can't even achieve reasonable compromise. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 - Edge Router
On Tue, 24 Oct 2023 at 22:21, Aaron Gould via juniper-nsp wrote: > My MX304 trial license expired last night, after rebooting the MX304, > various protocols no longer work. This seems more than just > honor-based... ospf, ldp, etc, no longer function. This is new to me; > that Juniper is making protocols and technologies tied to license. I > need to understand more about this, as I'm considering buying MX304's. Juniper had assured me multiple times that they strategically have decided to NEVER do this. That it's an actual decision they've considered at the highest level, that they will not downgrade devices in operation. I guess 'reboot' is not in-operation? Notion that operators are able to keep licenses up-to-date and valid is naive, we can't keep SSL certificates valid and we've had decades of time to learn, it won't happen. You will learn about the problem, when shit breaks. The right solution would be a phone-home, and a vendor sales rep calling you 'hey you have expired licenses, let's solve this'. Not breaking the boxes. Or 'your phone home hasn't worked, you need to fix it before we can re-up your support contract'. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Q. Is anyone deploying TCP Authentication Option (TCP-AO) on their BGP peering Sessions?
On Wed, 27 Sept 2023 at 03:50, Barry Greene via juniper-nsp wrote: > Q. Is anyone deploying TCP Authentication Option (TCP-AO) on their BGP > peering Sessions? > > I’m not touching routers right now. I’m wondering if anyone has deployed, > your experiences, and thoughts? For the longest time (like close to decade) no one supported it at all, not even Juniper, because Juniper implementation was pre-RFC which was incompatible with RFC. To my understanding today there is support in Junos, IOS-XE, IOS-XR, SROS, EOS and VRP. I have no operational experience to share. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Junos 21+ Killing Finger Muscle Memory...
On Sun, 16 Jul 2023 at 19:47, Tim Franklin via juniper-nsp wrote: > You missed the fun part where you have to explain *again* every few > months to the CISO and their minions why you can't adhere to the > written-by/for-Windows-admins "Patching Policy" that says everything in > the company is upgraded to "the latest release" within 14 days, no > software version is ever "more than three months old", and similar > messages of joy ;) What is the explanation? Is the explanation that NOS are closed source software with proprietary or difficult to integrate hardware. And that revenue comes from support contracts, which creates moral hazard, which does not suggest intent in outcome, but suggests bias in organic outcome towards bad software. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Wed, 5 Jul 2023 at 04:45, Mark Tinka wrote: > This is one of the reasons I prefer to use Ethernet switches to > interconnect devices in large data centre deployments. > > Connecting stuff directly into the core routers or directly together > eats up a bunch of ports, without necessarily using all the available > capacity. > > But to be fair, at the scale AWS run, I'm not exactly sure how I'd do > things. I'm sure it's perfectly reasonable, with some upsides and some downsides compared to hiding the overhead ports inside chassis fabric instead of exposing them in front-plate. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Tue, 4 Jul 2023 at 08:34, Mark Tinka wrote: > Yes, I watched this NANOG session and was also quite surprised when they > mentioned that they only plan for 25% usage of the deployed capacity. > Are they giving themselves room to peak before they move to another chip > (considering that they are likely in a never-ending installation/upgrade > cycle), or trying to maintain line-rate across a vast number of packet > sizes? Or both? You must have misunderstood. When they fully scale the current design, the design offers 100T capacity, but they've bought 400T of ports. 3/4 ports are overhead to build the design, to connect the pizzaboxes together. All ports are used, but only 1/4 are revenue. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Sun, 2 Jul 2023 at 17:15, Mark Tinka wrote: > Technically, do we not think that an oversubscribed Juniper box with a > single Trio 6 chip with no fabric is feasible? And is it not being built > because Juniper don't want to cannibalize their other distributed > compact boxes? > > The MX204, for example, is a single Trio 3 chip that is oversubscribed > by an extra 240Gbps. So we know they can do it. The issue with the MX204 > is that most customers will run out of ports before they run out of > bandwidth. Not disagreeing here, but how do we define oversubscribed here? Are all boxes oversubscribed which can't do a) 100% at max size packet and b) 100% at min size packet and c) 100% of packets to delay buffer, I think this would be quite reasonable definition, but as far as I know, no current device of non-modest scale would satisfy each 3, almost all of them would only satisfy a). Let's consider first gen trio serdes 1) 2/4 goes to fabric (btree replication) 2) 1/4 goes to delay buffer 3) 1/4 goes to WAN port (and actually like 0.2 additionally goes to lookup engine) So you're selling less than 1/4th of the serdes you ship, more than 3/4 are 'overhead'. Compared to say Silicon1, which is partially buffered, they're selling almost 1/2 of the serdes they ship. You could in theory put ports on all of these serdes in BPS terms, but not in PPS terms at least not with off-chip memory. And in each case, in a pizza box case, you could sell those fabric ports, as there is no fabric. So given NPU has always ~2x the bps in pizza box format (but usually no more pps). And in MX80/MX104 Juniper did just this, they sell 80G WAN ports, when in linecard mode it only is 40G WAN port device. I don't consider it oversubscribed, even though the minimum packet size went up, because the lookup capacity didn't increase. Curiously AMZN told Nanog their ratio, when design is fully scaled to 100T is 1/4, 400T bought ports, 100T useful ports. Unclear how long 100T was going to scale, but obviously they wouldn't launch architecture which needs to be redone next year, so when they decided 100T cap for the scale, they didn't have 100T need yet. This design was with 112Gx128 chips, and boxes were single chip, so all serdes connect ports, no fabrics, i.e. true pizzabox. I found this very interesting, because the 100T design was, I think 3 racks? And last year 50T asics shipped, next year we'd likely get 100T asics (224Gx512? or 112Gx1024?). So even hyperscalers are growing slower than silicon, and can basically put their dc-in-a-chip, greatly reducing cost (both CAPEX and OPEX) as no need for wasting 3/4th of the investment on overhead. The scale also surprised me, even though perhaps it should not have, they quoted +1M network devices, considering they quote +20M nitro system shipped, that's like <20 revenue generating compute per network device. Depending on the refresh cycle, this means amazon is buying 15-30k network devices per month, which I expect is significantly more than cisco+juniper+nokia ship combined to SP infra, so no wonder SPs get little love. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Sun, 2 Jul 2023 at 15:53, Mark Tinka via juniper-nsp wrote: > Well, by your definition, the ASR9903, for example, is a distributed > platform, which has a fabric ASIC via the RP, with 4x NPU's on the fixed > line card, 2x NPU's on the 800Gbps PEC and 4x NPU's on the 2Tbps PEC. Right as is MX304. I don't think this is 'my definition', everything was centralised originally, until Cisco7500 came out, which then had distributed forwarding capabilities. Now does centralisation truly mean BOM benefit to vendors? Probably not, but it may allow to address one lower margin market which as lower per-port performance needs, without cannibilising larger margin market. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Sun, 2 Jul 2023 at 12:11, Mark Tinka wrote: > Well, for data centre aggregation, especially for 100Gbps transit ports > to customers, centralized routers make sense (MX304, MX10003, ASR9903, > e.t.c.). But those boxes don't make sense as Metro-E routers... they can > aggregate Metro-E routers, but can't be Metro-E routers due to their cost. In this context, these are all distributed platforms, they have multiple NPUs and fabric. Centralised has a single forwarding chip, and significantly more ports than bandwidth. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Sun, 2 Jul 2023 at 11:38, Mark Tinka wrote: > So all the above sounds to me like scenarios where Metro-E rings are > built on 802.1Q/Q-in-Q/REP/STP/e.t.c., rather than IP/MPLS. Yes. Satellite is basically VLAN aggregation, but a little bit less broken. Both are much inferior to MPLS. But usually that's not the comparison due to real or perceived cost reasons. So in absence of a vendor selling you the front-plate you need, option space often considered is satellite or vlan aggregation, instead of connecting some smaller MPLS edge boxes to bigger aggregation MPLS boxes, which would be, in my opinion, obviously better. But as discussed, centralised chassis boxes are appearing as a new option to the option space. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Tue, 27 Jun 2023 at 19:47, Tarko Tikan via juniper-nsp wrote: > Single NPU doesn't mean non-redundant - those devices run two (or 4 in > ACX case) BCM NPUs and switch "linecards" over to backup NPU when > required. All without true fabric and distributed NPUs to keep the cost > down. This of course makes it more redundant than distributed box, because distributed boxes don't have NPU redundancy. Somewhat analogous how RR makes your network more redundant than full-mesh. Because in full-mesh every iBGP flap is out of order, whereas in RR a single iBGP flap has no impact. Of course parallel continues to scope of outage, in full-mesh losing single iBGP isn't a big outage, in RR it's binary, either nothing is broken or all is. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Tue, 27 Jun 2023 at 19:32, Mark Tinka wrote: > > Yes. > > How? Apart from obvious stuff like QoS getting difficult, not full feature parity with VLAN and main interface, or counters becoming less useful as many are port level so identifying true source port may not be easy. There are things that you'll just discover over time that don't even come to your mind, and I don't know what those will be in your deployment. I can give anecdotes 2*VXR termination of metro L2 ring - everything is 'ok' - ethernet pseudowire service is introduced to customers - occasionally there are loops now - well VXR goes to promisc mode when you add ethernet pseudowire, because while it has VLAN local significancy, it doesn't have per-vlan MAC filter. - now unrelated L3 VLAN, which is redundantly terminated to both VXR has customer CE down in the L2 metro - because ARP timeout is 4h, and MAC timeout is 300s, the the metro will forget the MAC fast, L3 slowly - so primary PE gets packet off of internet, sends to metro, metro floods to all ports, including secondary PE - secondary PE sends packet to primary PE, over WAN - now you learned 'oh yeah, i should have ensured there is per-vlan mac filter' and 'oh yeah, my MAC/ARP timeouts are misconfigured' - but these are probably not the examples you'll learn, they'll be something different - when you do satellite, you can solve lot of the problem scope by software as you control L2 and L3, and can do proprietary code L2 transparency - You do QinQ in L2 aggregation, to pass customer frame to aggregation termination - You do MAC rewrite in/out of the L2 aggregation (customer MAC addresses get rewritten coming in from customer, and mangled back to legitimate MAC going out to termination). You need this to pass STP and such in pseudowires from customer to termination - In termination hardware physically doesn't consider VLAN+ISIS legitimate packet and will kill it, so you have no way of supporting ISIS inside pseudowire when you have L2 aggregation to customer. Technically it's not valid, technically ISIS isn't EthernetII, and 802.3 doesn't have VLANs. But technically correct rarely reduces the red hue in customers faces when they inform about issues they are experiencing. - even if this works, there are plenty of other ways pseudowire transparency suffers with L2 aggregation, as you are experiencing set of limitations from two box, instead of one box when it comes to transparency, and these sets wont be identical - you will introduce MAC limit to your point-to-point martini product, which didn't previously exist. Because your L2 ring is redundant and you need mac learning. If it's just single switch, you can turn off MAC learning per VLAN, and be closer to satellite solution Convergence - your termination no longer observes hardware liveness detection, so you need some solution to transfer L2 port state to VLAN. Which will occasionally break, as it's new complexity. > > Like cat6500/7600 linecards without DFC, so SP gear with centralised > > logic, and dumb 'low performance' linecards. Given low performance > > these days is multi Tbps chips. > > While I'm not sure operators want that, they will take a look if the > lower price does not impact performance. > > There is more to just raw speed. I mean of course it affects performance, as you are now having all ports in single chip, instead of having many chips. But when it comes to PPS people are confused about performance, no one* (well maybe 1 in 100k running some esoteric application) cares about wire-rate. If you are running a card like 4x100 ASR9k, you absolutely want wire-speed, because there is 1 chip per port, and you want the pool port is drawing from to have 1 port wire rate free, to ingest dos in mostly idle interface. But if you have 128x100GE in a chip, you're happy with 1/3 PPS easily, probably much much less. Because you're not gonna exhaust that massive pool in any practical scenario, and several interfaces simultaneously can ingest dos. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Tue, 27 Jun 2023 at 17:40, Mark Tinka wrote: > Would that be high-density face-plate solutions for access aggregation > in the data centre, that they are> Are you suggesting standard 802.1Q/Q-in-Q > trunking from a switch into a > "pricey" router line card that support locally-significant VLAN's per > port is problematic? Yes. > I'm still a bit unclear on what you mean by "centralized"... in the > context of satellite, or standalone? Like cat6500/7600 linecards without DFC, so SP gear with centralised logic, and dumb 'low performance' linecards. Given low performance these days is multi Tbps chips. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Tue, 27 Jun 2023 at 06:02, Mark Tinka via juniper-nsp wrote: > > Similar use case here but we use a QFX as a fusion satellite if port > > expansion is required. > > Works well as an small site start up option. > > Are vendors still pushing their satellite switches :-)? > > That technology looked dodgy to me when Cisco first proposed it with > 9000v, and then Juniper and Nokia followed with their own implementations. Juniper messaging seems to be geo-specific, in EU their sales seems to sell them more willingly than in US. My understanding is that basically fusion is dead, but they don't actually have solution for access/SP market front-plate, so some sales channels are still pitching it as the solution. Nokia seems very committed to it. I think the solution space is a) centralised lookup engines - so you have cheap(er) line cards for high density low pps/bps b) satellite c) vlan aggregation Satellite is basically a specific scenario of c), but it does bring significant derisking compared to vlan aggregation, as a single instance is designing it and can solve some problems better than can be solved by vendor agnostic vlan aggregation. Vlan aggregation looks very simple on the surface but is fraught with problems, many of which are slightly better solved in satellites, and these problems will not be identified ahead of time but during the next two decades of operation. Centralised boxes haven't been available for quite a few years, but hopefully Cisco is changing that, I think it's the right compromise for SPs. But in reality I'm not sure if centralised actually makes sense, since I don't think we can axiomatically assume it costs less to the vendor, even though there is less BOM, the centralised design does add more engineering cost. It might be basically a way to sell boxes to some market at lower margins, while ensuring that hyperscalers don't buy them, instead of directly benefiting from the cost reduction. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX80 watchdog
Do you monitor RPD task memory use and Freebsd process memory use? Is it possible you are leaking memory over time, and getting DRAM pressure at the 1500d mark? It might be this: https://prsearch.juniper.net/problemreport/PR108 Initially as you said it happens at strenuous SSD access, I was thinking that Junos does have RE failover limits on disk-io read/write latency, which causes false positive RE switchovers now and again (more people have hit them, than people are aware of hitting them). But in your case this can't possibly be true, because the MX80 doesn't have two RE. But for completeness, https://www.juniper.net/documentation/us/en/software/junos/high-availability/topics/ref/statement/not-on-disk-underperform-edit-chassis.html On Mon, 12 Jun 2023 at 18:35, Tom Bird via juniper-nsp wrote: > > Afternoon, > > I've been upgrading some MX80 routers to from 15.1, consistently they > seem to fall over during periods of strenuous SSD access, or indeed once > during a "commit check". > > We thought this might be due to the uptime (~1500 days) so have been > rebooting them prior to the upgrade which has mostly stopped the problem > from happening. Not completely, however - they get stuck for about an > hour doing this, after which they reboot and continue to work. > > > watchdog: scheduling fairness gone for 3540 seconds now. > (da1:umass-sim1:1:0:0): Synchronize cache failed, status == 0x34, scsi > status == 0x0 > Automatic reboot in 15 seconds - press a key on the console to abort > Rebooting... > > > I'd like it if they waited a bit less than an hour and see the watchdog > can be configured but I can't find any useful documentation about > exactly what conditions it would fire and what the defaults are. > > Currently there is no configuration under "system processes watchdog", > and it looks like it can be enabled, disabled and the timeout set up to > 3600 seconds. > > So my question is, is it this watchdog that is resetting the thing after > an hour and would it be reasonable to set the timeout to say 300 seconds > so there was less down time if it went wrong. > > Thanks, > -- > Tom > > :: www.portfast.co.uk / @portfast > :: hosted services, domains, virtual machines, consultancy > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Unknown Attribute 28 in BGP
Either will help, configure either or both and you're good. Actual fixed release will behave the same as if drop-path-attribute 28 had been configured. That is read T, read L, seek past V, without parsing. On Sun, 11 Jun 2023 at 19:36, Einar Bjarni Halldórsson wrote: > > On 6/11/23 15:24, Saku Ytti wrote: > > set protocols bgp drop-path-attributes 28 works if your release is too > > old for set protocols bgp bgp-error-tolerance, and is preferable in > > some ways, as it will protect your downstream as well. > > > > 18.2R3-S3.11 supports protocols bgp bgp-error-tolerance, but reading > through the docs, I see: > > > The bgp-error-tolerance statement overrides this behavior so that the > > following BGP error handling is in effect: > > > > For fatal errors, Junos OS sends a notification message titled Error > > Code Update Message and resets the BGP session. An error in the > > MP_{UN}REACH attribute is considered to be fatal. The presence of multiple > > MP_{UN}REACH attributes in one BGP update is also considered to be a fatal > > error. Junos OS resets the BGP session if it cannot parse the NLRI field or > > the BGP update correctly. Failure to parse the BGP update packet can happen > > when the attribute length does not match the length of the attribute value. > > I read this section so that even if I configure bgp-error-tolerance, it > won't make a difference since junos considers this a fatal error and > resets the BGP session. > > .einar -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Unknown Attribute 28 in BGP
set protocols bgp drop-path-attributes 28 works if your release is too old for set protocols bgp bgp-error-tolerance, and is preferable in some ways, as it will protect your downstream as well. On Sun, 11 Jun 2023 at 17:25, Einar Bjarni Halldórsson via juniper-nsp wrote: > > Hi, > > We have two MX204 edge routers, each with a connection to a different > upstream provider (and some IXP peerings on both). > > Last week the IPv6 transit session on one of them starting flapping. It > turns out that we got hit with > https://labs.ripe.net/author/emileaben/unknown-attribute-28-a-source-of-entropy-in-interdomain-routing/ > > It only happened on one of our edge routers, so I assume for now that > either our other transit provider filtered the affected route updates, > or stripped the attribute. > > The post from RIPE links to > https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/topic-map/bgp-error-messages.html > but I can't see that bgp-error-tolerance helps us, since this type of > malformed update is always fatal. > > Our edge routers are both running Junos 18.2R3-S3.11. I was planning on > upgrading to 22.2R3 regardless of this error, but it would be nice to > know that this problem has been fixed in later version, or mitigations > introduced that can be used. > > Anybody know about this problem in particular, or have ideas on > mitigating malformed BGP updates? > > .einar > ISNIC > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Fri, 9 Jun 2023 at 20:37, Andrey Kostin wrote: > Sounds more like a datacenter setup, and for DC operator it could be > attractive to do at scale. For a traditional ISP with relatively small > PoPs spread across the country it may be not the case. Sure, not suggesting everyone is in the target market, but suggesting the target market includes people who are not developers with no interest in being one. For a typical access network with multiple pops, it may be the wrong optimisation point. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Fri, 9 Jun 2023 at 19:15, Andrey Kostin wrote: > Can anything else be inserted in this socket? If not, then what's the > point? For server CPUs there are many models with different clocking and > number of cores, so socket provides a flexibility. If there is only one > chip that fits the socket, then the socket is a redundant part. Not that I know. I think the point may be decouplement. BRCM doesn't want to do business with just everyone. This allows someone to build the switch, without providing the chips. Then customers can buy a switch from this vendor and chip directly from BRCM. I could imagine some big players like FB and AMZN designing their own switch, having some random shop actually build it. But Broadcom saying 'no, we don't do business with you'. This way they could actually get the switch from anywhere, while having a direct chip relationship with BRCM. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Fri, 9 Jun 2023 at 18:46, Andrey Kostin wrote: > I'm not in this market, have no qualification and resources for > development. The demand in such devices should be really massive to > justify a process like this. Are you not? You use a lot of open source software, because someone else did the hard work, and you have something practical. The same would be the thesis here, You order the PCI NPU from newegg, and you have an ecosystem of practical software to pull from various sources. Maybe you'll contribute something back, maybe not. Very typical network is a border router or two, which needs features and performance, then switches to connect to compute. People who have no resources or competence to write software could still be users in this market. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Fri, 9 Jun 2023 at 17:26, Mark Tinka wrote: > Well, the story is that Cisco are doing this with Meta and Microsoft on > their C8000 platform, and apparently, doing billions of US$ in business > on the back of that. I'm not convinced at all that leaba is being sold. I think it's sold conditionally when customers would otherwise be lost. I am reminder of this: https://www.servethehome.com/this-is-a-broadcom-tomahawk-4-64-port-400gbe-switch-chip-lga8371-intel-amd-ampere/ LGA8371 socketed BRCM TH4. Ostensibly this allows a lot more switches to appear in the market, as the switch maker doesn't need to be friendly with BRCM. They make the switch, the customer buys the chip and sockets it. Wouldn't surprise me if FB, AMZN and the likes would have pressed for something like this, so they could use cheaper sources to make the rest of the switch, sources which BRCM didn't want to play ball with. But NPU from newegg and community writes code that doesn't exist, and I think it should and there would be volume in it, but no large volume to any single customer. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] MX304 Port Layout
On Fri, 9 Jun 2023 at 16:58, Andrey Kostin via juniper-nsp wrote: > Not sure why it's eye-watering. The price of fully populated MX304 is > basically the same as it's predecessor MX10003 but it provides 3.2T BW > capacity vs 2.4T. If you compare with MX204, then MX304 is about 20% > expensive for the same total BW, but MX204 doesn't have redundant RE and > if you use it in redundant chassis configuration you will have to spend > some BW on "fabric" links, effectively leveling the price if calculated > for the same BW. I'm just comparing numbers, not considering any real That's not it, RE doesn't attach to fabric serdes. You are right that the MX304 is the successor of MX10003 not MX201. MX80, M104 and MX201 are unique in that they are true pizzabox Trios. They have exactly 1 trio, and both WAN and FAB side connect to WAN ports (not sure if MX201 just leaves them unconnected) Therefore say 40G Trio in linecard mode is 80G Trio in pizza mode (albeit PPS stays the same) as you're not wasting capacity to non-revenue fabric ports. This single Trio design makes the box very cost effective, as not only do you just have one Trio and double the capacity per Trio, but you also don't have any fabric chip and fabric serdes. MX304 however has Trio in the linecard, so it really is very much a normal chassis box. And having multiple Trios it needs fabric. I do think Juniper and the rest of the vendors keep struggling to identify 'few to many' markets, and are only good at identifying 'many to few' markets. MX304 and ever denser 512x112G serdes chips represent this. I expect many people in this list have no need for more performance than single Trio YT in any pop at all, yet they need ports. And they are not adequately addressed by vendors. But they do need the deep features of NPU. I keep hoping that someone is so disruptive that they take the nvidia/gpu approach to npu. That is, you can buy Trio PCI from newegg for 2 grand, and can program it as you wish. I think this market remains unidentified and even adjusting to cannibalization would increase market size. I can't understand why JNPR is not trying this, they've lost for 20 years to inflation in valuation, what do they have to lose? -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] JunOS RPKI/ROA database in non-default routing instance, but require an eBGP import policy in inet.0 (default:default LI:RI) to reference it.
On Tue, 6 Jun 2023 at 06:54, Mark Tinka via juniper-nsp wrote: > > While I have a lot of sympathy for Saku's pragmatism, I prefer to file off > > the ugly edges of old justifications when I can... but it's done one commit > > at a time. >> > Going back to re-do the implementation from scratch would be a > non-starter. There is simply too much water under this bridge. I am not implying it is pragmatic or possible, just correct from a design point of view. Commercial software deals with competing requirements, and these requirements are not constructive towards producing maintainable clean code. Over time commercial software becomes illiquid with its technical debt. There is no real personal reward for paying technical debt, because almost invariably it takes a lot of time, brings no new revenue and non-coder observing your work only sees the outages the debt repayment caused. While another person who creates this debt creating new invoiceable features and bug fixes in ra[pb]id manner is a star to the non-coder observers. Not to say our open source networking is always great either, Linux developers are notorious about not asking SMEs 'how has this problem been solved in other software'. There are plenty of anecdotes to choose from, but I'll give one. - In 3.6 kernel, FIB was introduced to replace flow-cache, of course anyone dealing with networking could have told kernel developers day1 why flow-cache was a poor idea, and what FIB is, how it is done, and why it is a better idea. - In 3.6 FIB implementation, ECMP was solved by essentially randomly choosing 1 option of many, per-packet. Again they could have asked even junior network engineers 'how does ECMP work, how should it be done, I'm thinking of doing like this, why do you think they've not done this in other software?' But they didn't. - in 4.4 Random ECMP was changed to do hashed ECMP I still continue to catch discussions about poor TCP performance on Linux ECMP environment, then I first ask what kernel do you have, then I explain to them why per-packet + cubic will never ever perform. So for 4 years ECMP was completely broke, and reading ECMP release notes in 4.4 not even developers had completely understood just how bad the problem one, so we can safely assume people were not running ECMP. Another example was when I tried to explain to the OpenSSH mailing list, that ''TOS' isn't a thing, and got a confident reply that TOS absolutely is a thing, prec/DSCP are not. Luckily a few years later Job fixed OpenSSH packet classification. But these examples are everywhere, so it seems you either choose software written by people who understand the problem but are forced to write unmaintainable code, or you choose software by people who are just now learning about the problem and then solve it without discovering prior art, usually wrong. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] JunOS RPKI/ROA database in non-default routing instance, but require an eBGP import policy in inet.0 (default:default LI:RI) to reference it.
On Mon, 5 Jun 2023 at 11:13, Lukas Tribus via juniper-nsp wrote: > in Cisco land I worked around VRF or source interface selection > limitations for RTR by using SSH as a transport method, which then > used SSH client source-vrf/source-interface configurations. > > I don't know if JunOS supports SSH transported RTR though. It is immaterial, it wouldn't work. If someone would actually need to make it work, they'd leak between VRF/Internet, so that RTR configured on the Internet actually goes via the NMS VRF. This could be accomplished in a multitude of poor ways. Egress could be next-table static route, ingress could be firewall filter with from source-address rtr then routing-instance default. Or it could be LT between VRF and default instance. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] JunOS RPKI/ROA database in non-default routing instance, but require an eBGP import policy in inet.0 (default:default LI:RI) to reference it.
I totally agree this should work, and it is unfortunate that you are struggling to make it work. Having said that, it is asking for trouble managing your devices in a VRF, you will continue to find issues and spend time/money working with vendors to solve those. It is safer to put the service (internet) in VRF, and leave the main table for signalling and NMS, if you want to create this distinction. It will also make it a lot more convenient to leak between instances and create subInternets, like peeringInternet, to avoid peers from default routing to you. https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/topic-map/bgp_origin_validation.html NOTE: RPKI validation is available only in the primary instance. If you configure RPKI validation for a routing instance, then the RPKI validation fails with the following error message RV instance is not running. I consider it a design error that NOS conceptually has two types of instances, main instance and VRF instance. This distinction makes it more expensive to write and maintain the NOS and it makes it more fragile. General principle applies, do boring things, get boring results. Same reason why IPv6 continues to be 2nd class citizen and is a bad candidate for your NMS/signalling AFI, people don't use it, so if you do, you will be responsible for driving vendors to fix it. Time which you likely should be spending doing something else. On Mon, 5 Jun 2023 at 06:52, Chris Kawchuk via juniper-nsp wrote: > > Great idea! but no dice. :( didn't work. > > Seems the whole "VRF -> back to base table" operations that we'd all love to > do easily in JunOS rears its ugly head yet again ;) > > FWIW - Some friends in the industry *do* use that knob, but they're "going > the other way". i.e. The RPKI RV Database is in inet.0 && Internet is in a > VRF. Apparently that does work AOK for them; however it's "fiddly" as you say. > > FWIW - here's the VRF's config... pretty darned basic. > > routing-instances { > admin { > routing-options { > validation { > notification-rib [ inet.0 inet6.0 ]; ## << No impact on the > default:default LI:RI RV database > group routinator { > session 10.x.x.x { > refresh-time 120; > port 3323; > } > } > } > } > description "Dummy admin vrf - to test RPKI inside a > routing-instance"; > instance-type vrf; > interface xe-0/0/3.0; ## << the RPKI server is setting on the > other end of this /30 > vrf-target target::xxx; > } > } > > FWIW using a vMX for testing - running JunOS 20.4R3-S4.8. > > Basically i'm asking "is there a way to do this without having to stick the > validator DB config into the basec onfig [routing-options validation {}] > stanza? > > .If it must, then yeah, it's easy enough to do a rib group, a static /32 > next-table/no-readvertise/no-install, lt-x/x/x stitch, route leak, etc...to > get the default:default instance to "use the VRF" to reach the RPKI server. > I just don't want to go down that road (yet) if I don't have to; as the > 'technical elegance' (read: OCD) portion of my brain wants to avoid that if > it can. > > - CK. > > > > > > On Jun 5, 2023, at 13:12, David Sinn wrote: > > > > I'd try the 'notification-rib' chunk in the 'validation' stanza of the > > routing-instance and see if setting inet.0 there pushes the DB the way you > > need. Certain versions of JunOS are quite broken going the other way, so > > I've had to enumerate all of the routing-instances that I want to be sure > > have a copy of the validation DB to get them to work correctly. Maybe the > > other way will work in your case. > > > > David > > > >> On Jun 4, 2023, at 7:52 PM, Chris Kawchuk via juniper-nsp > >> wrote: > >> > >> Hi All > >> > >> Been scratching my head today. As per Juniper's documentation, you can > >> indeed setup RPKI/ROA validation session inside a routing-instance. You > >> can also have it query against that instance on an import policy for that > >> VRF specifically, and if there's no session, it will revert to the default > >> RPKI RV database (if configured) under the main routing-options {} stanza > >> to check for valid/invalid, etc... > >> > >> https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/topic-map/bgp_origin_validation.html > >> > >> Thats all fine and dandy, but it seems that JNPR's implementation of > >> RPKI/ROA *assumes* that your RV database is always configured in the main > >> routing instance (i.e. the main routing-options validation {} stanza, thus > >> your RPKI server MUST be available via inet.0 ). > >> > >> Unfortunately, the situation I am faced with is the opposite. > >> > >> My RPKI/ROA server is only available via our "admin" VRF (which is how we > >> manage the device) - Our inet.0 contains the globa
Re: [j-nsp] QFX DDOS Violations
ver time: 300 seconds >> > Enabled: Yes >> > Flow detection configuration: >> > Flow detection system is off >> > Detection mode: Automatic Detect time: 0 seconds >> > Log flows: YesRecover time: 0 seconds >> > Timeout flows: No Timeout time: 0 seconds >> > Flow aggregation level configuration: >> > Aggregation level Detection mode Control mode Flow rate >> > Subscriber Automatic Drop 0 pps >> > Logical interface Automatic Drop 0 pps >> > Physical interface Automatic Drop 500 pps >> > System-wide information: >> > Aggregate bandwidth is no longer being violated >> > No. of FPCs that have received excess traffic: 1 >> > Last violation started at: 2022-11-30 09:08:02 CET >> > Last violation ended at: 2022-11-30 09:09:32 CET >> > Duration of last violation: 00:01:40 Number of violations: 1508 >> > Received: 3548252144 Arrival rate: 201 pps >> > Dropped: 49294329Max arrival rate: 160189 pps >> > Routing Engine information: >> > Bandwidth: 500 pps, Burst: 200 packets, enabled >> > Aggregate policer is never violated >> > Received: 0 Arrival rate: 0 pps >> > Dropped: 0 Max arrival rate: 0 pps >> > Dropped by individual policers: 0 >> > FPC slot 0 information: >> > Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled >> > Hostbound queue 255 >> > Aggregate policer is no longer being violated >> > Last violation started at: 2022-11-30 09:08:02 CET >> > Last violation ended at: 2022-11-30 09:09:32 CET >> > Duration of last violation: 00:01:40 Number of violations: 1508 >> > Received: 3548252144 Arrival rate: 201 pps >> > Dropped: 49294329Max arrival rate: 160189 pps >> > Dropped by individual policers: 0 >> > Dropped by aggregate policer: 50294227 >> > Dropped by flow suppression:0 >> > Flow counts: >> > Aggregation level Current Total detected State >> > Subscriber 0 0Active >> > >> > vty)# show ddos scfd proto-states vxlan >> > (sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps) >> > op-mode: a=automatic, o=always-on, x=disabled >> > fc-mode: d=drop-all, k=keep-all, p=police >> > d-t: detect time, r-t: recover time, t-t: timeout time >> > aggr-t: last aggregated/deaggreagated time >> > idx prot groupproto mode detect agg flags state sub-cfg >> > ifl-cfg ifd-cfg d-t r-t t-t aggr-t >> > --- -- --- - - - >> > - - --- --- --- -- >> > 23 6400 vxlanaggregate auto no 1 2 0 a:d:0 >> > a:d:0 a:d: 5000000 >> > >> > >> > Johan >> > >> > On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti wrote: >> > >> > > Hey, >> > > >> > > Before any potential trashing, I'd like to say that as far as I am >> > > aware Juniper (MX) is the only platform on the market which isn't >> > > trivial to DoS off the network, despite any protection users may have >> > > tried to configure. >> > > >> > > > How do you identify the source problem of DDOS violations that junos >> > logs >> > > > for QFX? For example what interface that is causing the problem? >> > > >> > > I assume you are talking about QFX10k with Paradise (PE) chipset. I'm >> > > not very familiar with it, but I know something about it when sold in >> > > PTX10k quise, but there are significant differences. Answers are from >> > > the PTX10k perspective. If you are talking about QFX5k many of the >> > > answers won't apply, but the ukern side answers should help >> > > troubleshoot it further, certainly with QFX5k the situation is worse >> > > than it would be on QFX10k. >> > > >> > > > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for >> > > > protocol/exception VXLAN:aggregate exceeded its allowed bandwidth at >
Re: [j-nsp] QFX DDOS Violations
The 'max arrival rate' is pre-policer, not the admitted rate. I don't use VXLAN, and I can't begin to guess what VXLAN traffic needs to punt. But this is not your transit VXLAN traffic. This is some VXLAN traffic that the platform thought it needed to process in the software. I would personally tcpdump the punted traffic classified as VXLAN and investigate what exactly it is. On Wed, 30 Nov 2022 at 12:15, john doe wrote: > > Hi! > > The leaf switches are QFX5k and it seems to be lacking some of the command > you mentioned. We don't have any problem with bgp sessions going down, the > impact is only the payload inside vxlan. > > Protocol Group: VXLAN > > Packet type: aggregate (Aggregate for vxlan control packets) > Aggregate policer configuration: > Bandwidth:500 pps > Burst:200 packets > Recover time: 300 seconds > Enabled: Yes > Flow detection configuration: > Flow detection system is off > Detection mode: Automatic Detect time: 0 seconds > Log flows: YesRecover time: 0 seconds > Timeout flows: No Timeout time: 0 seconds > Flow aggregation level configuration: > Aggregation level Detection mode Control mode Flow rate > Subscriber Automatic Drop 0 pps > Logical interface Automatic Drop 0 pps > Physical interface Automatic Drop 500 pps > System-wide information: > Aggregate bandwidth is no longer being violated > No. of FPCs that have received excess traffic: 1 > Last violation started at: 2022-11-30 09:08:02 CET > Last violation ended at: 2022-11-30 09:09:32 CET > Duration of last violation: 00:01:40 Number of violations: 1508 > Received: 3548252144 Arrival rate: 201 pps > Dropped: 49294329Max arrival rate: 160189 pps > Routing Engine information: > Bandwidth: 500 pps, Burst: 200 packets, enabled > Aggregate policer is never violated > Received: 0 Arrival rate: 0 pps > Dropped: 0 Max arrival rate: 0 pps > Dropped by individual policers: 0 > FPC slot 0 information: > Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled > Hostbound queue 255 > Aggregate policer is no longer being violated > Last violation started at: 2022-11-30 09:08:02 CET > Last violation ended at: 2022-11-30 09:09:32 CET > Duration of last violation: 00:01:40 Number of violations: 1508 > Received: 3548252144 Arrival rate: 201 pps > Dropped: 49294329Max arrival rate: 160189 pps > Dropped by individual policers: 0 > Dropped by aggregate policer: 50294227 > Dropped by flow suppression:0 > Flow counts: > Aggregation level Current Total detected State > Subscriber0 0Active > > vty)# show ddos scfd proto-states vxlan > (sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps) > op-mode: a=automatic, o=always-on, x=disabled > fc-mode: d=drop-all, k=keep-all, p=police > d-t: detect time, r-t: recover time, t-t: timeout time > aggr-t: last aggregated/deaggreagated time > idx prot groupproto mode detect agg flags state sub-cfg > ifl-cfg ifd-cfg d-t r-t t-t aggr-t > --- -- --- - - - > ----- - --- --- --- -- > 23 6400 vxlanaggregate auto no 1 2 0 a:d:0 a:d: > 0 a:d: 5000000 > > > Johan > > On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti wrote: >> >> Hey, >> >> Before any potential trashing, I'd like to say that as far as I am >> aware Juniper (MX) is the only platform on the market which isn't >> trivial to DoS off the network, despite any protection users may have >> tried to configure. >> >> > How do you identify the source problem of DDOS violations that junos logs >> > for QFX? For example what interface that is causing the problem? >> >> I assume you are talking about QFX10k with Paradise (PE) chipset. I'm >> not very familiar with it, but I know something about it when sold in >> PTX10k quise, but there are significant differences. Answers are from >> the PTX10k perspective. If you are talking about QFX5k many of the >> answers won't apply, but the ukern side answers should help >> troubleshoot it further, certainly with QFX5k the situation is worse >> than it
Re: [j-nsp] QFX DDOS Violations
Hey, Before any potential trashing, I'd like to say that as far as I am aware Juniper (MX) is the only platform on the market which isn't trivial to DoS off the network, despite any protection users may have tried to configure. > How do you identify the source problem of DDOS violations that junos logs > for QFX? For example what interface that is causing the problem? I assume you are talking about QFX10k with Paradise (PE) chipset. I'm not very familiar with it, but I know something about it when sold in PTX10k quise, but there are significant differences. Answers are from the PTX10k perspective. If you are talking about QFX5k many of the answers won't apply, but the ukern side answers should help troubleshoot it further, certainly with QFX5k the situation is worse than it would be on QFX10k. > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for > protocol/exception VXLAN:aggregate exceeded its allowed bandwidth at fpc 0 > for 30 times, started at... > > The configured rate for VXLAN is 500pps, ddos protection is seeing rates > over 150 000pps Do you mean you've configured: 'set system ddos-protection protocols vxlan aggregate bandwidth 500'. What exactly are you seeing? What does 'show ddos-protection protocols vxlan' say?Also 'start shell pfe network fpcX' + 'show ddos scfd proto-states vxlan' Paradise (unlike Triton and Trio) does not support PPS policing at all. So when you configure a PPS policer, what actually gets programmed is 500pps*1500B bps. I've tried to argue this is a poor default, 64B being superior choice. In paradise 500pps would admit 500*(1500/64) or about 12kpps per Paradise if those VXLAN packets were small. These would then be policed by the LC CPU ukern into 500 pps for all the Paradise chips living inside that LC CPU, before sending to RE over bme0. After DDoS but before Paradise admits packet to the LC_CPU it goes through VoQ, where most packets are classified as VoQ#2 which is 10Mbps wide with no burstability (classification, width and burstability is being changed on later images). So extremely trivial rates will cause congestion on the VoQ#2 and a lot of protocols will be competing for 10Mbps access to LC CPU, like BGP, ISIS, OSPF, LDP, ND, ARP. > This is an spine/leaf setup, one theory is that the vxlan traffic that most > of our QFX boxes are activation ddos protection for is actually vxlan > services running inside the vxlans, for example we have kubernetes clusters > using vxlan. Is that a sane theory? Not enough information to speculate. In many cases ddos classification is wrong. You can review in the PFE, 'show filter' => HOSTBOND_IPv4_FILTER then 'show filter index X program'. You can also capture punted packets on interface where RE meets FPC (I think bme0 here), in the bme0 interface TNP headers are in top of the punted packets and in the TNP headers you will see what ddos classification was used, you can turn the number into name by looking at the 'show ddos scfd proto-statates'. I naively wish I could set my ddos-protocol classification and voq classification manually in 'lo0 filter', because the infrastructure allows for great protection, but particularly when choosing which VoQ packets share there is no obvious single best solution, it depends on the environment. Like I could put RSVP, ISIS, LDP on single VoQ, as they never compete with customers, BGP in another as they will compete with customers and operators for me, and so forth. But of course this wish is naive, as the solution the vendor offers is already too complex for customers to use and giving more rope would just make the mean config worse. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Collapse spine EVPN type 5 routes issue
I would still consider as-override, or at least I would figure out the reason why it is not a good solution. On Tue, 15 Nov 2022 at 15:40, niklas rehnberg via juniper-nsp wrote: > > Hi, > Thanks for the quick reply, I hope following very simple picture may help > > ClientsClients > > | | > | EVPN/VXLAN| > | Overlay AS 6555 | > spine1 --- type 5--- spine2 > vrf WAN AS X | | vrf WAN AS X >eBGP | | eBGP > | | > PE AS Y PE AS Y > | | > > Core Network--- > > route example when loop occur > show route hidden table bgp.evpn extensive > > bgp.evpn.0: 156 destinations, 156 routes (153 active, 0 holddown, 3 hidden) > 5:10.254.0.2:100::0::5.0.0.0::16/248 (1 entry, 0 announced) > BGP /-101 > Route Distinguisher: 10.254.0.2:100 > Next hop type: Indirect, Next hop index: 0 > Address: 0x55a1fd2d2cdc > Next-hop reference count: 108, key opaque handle: (nil), > non-key opaque handle: (nil) > Source: 10.254.0.2 > Protocol next hop: 10.254.0.2 > Indirect next hop: 0x2 no-forward INH Session ID: 0 > State: > Peer AS: 6 > Age: 1:14 Metric2: 0 > Validation State: unverified > Task: BGP_6_6.10.254.0.2 > AS path: 65263 xxx I (Looped: 65263) > Communities: target:10:100 encapsulation:vxlan(0x8) > router-mac:34:11:8e:16:52:b2 > Import > Route Label: 99100 > Overlay gateway address: 0.0.0.0 > ESI 00:00:00:00:00:00:00:00:00:00 > Localpref: 100 > Router ID: 10.254.0.2 > Hidden reason: AS path loop > Secondary Tables: WAN.evpn.0 > Thread: junos-main > Indirect next hops: 1 > Protocol next hop: 10.254.0.2 > Indirect next hop: 0x2 no-forward INH Session ID: 0 > Indirect path forwarding next hops: 2 > Next hop type: Router > Next hop: 10.0.0.1 via et-0/0/46.1000 > Session Id: 0 > Next hop: 10.0.0.11 via et-0/0/45.1000 > Session Id: 0 > 10.254.0.2/32 Originating RIB: inet.0 > Node path count: 1 > Forwarding nexthops: 2 > Next hop type: Router > Next hop: 10.0.0.1 via > et-0/0/46.1000 > Session Id: 0 > Next hop: 10.0.0.11 via > et-0/0/45.1000 > Session Id: 0 > > > // Niklas > > > > > Den tis 15 nov. 2022 kl 13:58 skrev Saku Ytti : > > > Hey Niklas, > > > > My apologies, I do not understand your topology or what you are trying > > to do, and would need a lot more context. > > > > In my ignorance I would still ask, have you considered 'as-override' - > > > > https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/statement/as-override-edit-protocols-bgp.html > > this is somewhat common in another use-case, which may or may not be > > near to yours. Say you want to connect arbitrarily many CE routers to > > MPLS VPN cloud with BGP, but you don't want to get unique ASNs to > > them, you'd use a single ASN on every CE and use 'as-override' on the > > core side. > > > > Another point I'd like to make, not all implementations even verify AS > > loops in iBGP, for example Cisco does not, while Juniper does. This > > implementation detail creates bias on what people consider 'clean' and > > 'dirty' solution, as in Cisco network it's enough to allow loop at the > > edge interfaces it feels more 'clean' while in Juniper network you'd > > have to allow them in all iBGP sessions too, which
Re: [j-nsp] Collapse spine EVPN type 5 routes issue
Hey Niklas, My apologies, I do not understand your topology or what you are trying to do, and would need a lot more context. In my ignorance I would still ask, have you considered 'as-override' - https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/statement/as-override-edit-protocols-bgp.html this is somewhat common in another use-case, which may or may not be near to yours. Say you want to connect arbitrarily many CE routers to MPLS VPN cloud with BGP, but you don't want to get unique ASNs to them, you'd use a single ASN on every CE and use 'as-override' on the core side. Another point I'd like to make, not all implementations even verify AS loops in iBGP, for example Cisco does not, while Juniper does. This implementation detail creates bias on what people consider 'clean' and 'dirty' solution, as in Cisco network it's enough to allow loop at the edge interfaces it feels more 'clean' while in Juniper network you'd have to allow them in all iBGP sessions too, which suddenly makes the solution appear somehow more 'dirty'. On Tue, 15 Nov 2022 at 12:48, niklas rehnberg via juniper-nsp wrote: > > Hi all, > I have the following setup and need to know the best practices to solve > EVPN type 5 issues. > > Setup: > Two ACX7100 as collapse spine with EVPN/VXLAN > Using type 5 routes between the spines so iBGP can be avoided in > routing-instance. > Both spines has same bgp as number in the routing-instance WAN > See below for a part of configuration > > Problem: > Incoming routes from WAN router into spine1 will be advertised to spine2 as > type 5 routes > spine2 will not accept them due to AS number exit in the as-path already. > > Solution: > I can easily fix it with "loop 2" config in the routing-options part, but > is this the right way? > Does there exist any command to change the EVPN Type 5 behavior from eBGP > to iBGP? > Different AS number in routing-instance? > What are the best practices? > > Config part: > show routing-instances WAN protocols evpn > ip-prefix-routes { > advertise direct-nexthop; > encapsulation vxlan; > reject-asymmetric-vni; > vni 99100; > export EXPORT-T5-WAN; > } > policy-statement EXPORT-T5-WAN { > term 1 { > from protocol direct; > then accept; > } > term 2 { > from protocol bgp; > then accept; > } > } > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
On Fri, 21 Oct 2022 at 16:39, Chuck Anderson wrote: > Also, it appears that when Junos was changed to support DHCP Snooping, > Dynamic ARP Inspection, and IP Source Guard on trunk ports, even > though trunk ports are in "trusted" mode by default, the switch is > learning bindings on the trusted trunk ports (i.e. the uplink) and > then *programming them into TCAM* at least for IPSG. If this is true, > then Junos has created a situation where one cannot deploy IPSG > effectively unless the switch can scale to the number of entries > needed for an entire *VLAN* which may have thousands of hosts, rather > than just the access ports on a single switch stack which would > normally have only hundreds of hosts or less. Thank you for the update, and it sounds plausible to me. Features that cause ingress TCAM consumption can quickly kill EX/QFX scale. It will be very challenging to run most of the EX/QFX devices in L3 role, due to the very modest TCAM. At least if there is any care at all in lo0 and edge filters. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
I think you're gonna need JTAC. My first guess would be that this is not a supported config on the platform, but it also may be actual TCAM starvation. I'd be curious to learn what the problem was. On Thu, 13 Oct 2022 at 14:41, Chuck Anderson wrote: > > It's an internal filter created by class-of-service. The one I chose does > have a complaint, I just didn't paste the entire log originally. Here are a > few lines earlier: > > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-623-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-624-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-624-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-626-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > > VFP apparently means: > > VFP groups: VLAN Filter Processor - pre-ingress Content Aware processor (the > first thing in the Broadcom Ingress pipeline). It has maximum 1024 entries. > FIP snooping filters for example, belong to this group. > > Apparently my QoS config is too complex, which is amusing since it is > basically the ezqos-voip template config provided by Juniper: > > groups { > ezqos-voip { > class-of-service { > classifiers { > dscp ezqos-dscp-classifier { > import default; > forwarding-class ezqos-voice-fc { > loss-priority low code-points 101110; > } > forwarding-class ezqos-control-fc { > loss-priority low code-points [ 11 011000 011010 > 111000 ]; > } > forwarding-class ezqos-video-fc { > loss-priority low code-points 100010; > } > } > } > forwarding-classes { > class ezqos-best-effort queue-num 0; > class ezqos-video-fc queue-num 4; > class ezqos-voice-fc queue-num 5; > class ezqos-control-fc queue-num 7; > } > scheduler-maps { > ezqos-voip-sched-maps { > forwarding-class ezqos-voice-fc scheduler > ezqos-voice-scheduler; > forwarding-class ezqos-control-fc scheduler > ezqos-control-scheduler; > forwarding-class ezqos-video-fc scheduler > ezqos-video-scheduler; > forwarding-class ezqos-best-effort scheduler > ezqos-data-scheduler; > } > } > schedulers { > ezqos-voice-scheduler { > buffer-size percent 20; > priority strict-high; > } > ezqos-control-scheduler { > buffer-size percent 10; > priority strict-high; > } > ezqos-video-scheduler { > transmit-rate percent 70; > buffer-size percent 20; > priority low; > } > ezqos-data-scheduler { > transmit-rate { > remainder; > } > buffer-size { > remainder; > } > priority low; > } > } > } > } > } > apply-groups ezqos-voip; > class-of-service { > interfaces { > ge-* { > scheduler-map ezqos-voip-sched-maps; > unit 0 { > classifiers { > dscp ezqos-dscp-classifier; > } > } > } > mge-* { > scheduler-map ezqos-voip-sched-maps; > unit 0 { > classifiers { > dscp ezqos-dscp-classifier; > } > } > } > ae* { > unit 0 { > rewrite-rules { > dscp ezqos-dscp-rewrite; > } > } > } > } > rewrite-rules { > dscp ezqos-dscp-rewrite { > forwarding-class ezqos-voice-fc { > loss-priority low code-point 101110; > } > forwa
Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
You chose a filter which doesn't seem to complain about TCAM in the initial post. Two filters just state 'not programmed' Others about TCAM? Could you choose another filter which complains about TCAM? But certainly that output confirms 'Programmed: NO', just not entirely clear why. Maybe TCAM issue, maybe invalid bind-point, maybe invalid match, maybe invalid action. I am not familiar with the VFP_IL2L3_COS type filter. Is this filter you created? What are the terms you expect it to have? Single term to accept ether-type 0x8100? What actions? What is the bind point? On Wed, 12 Oct 2022 at 21:36, Chuck Anderson wrote: > > On Wed, Oct 12, 2022 at 08:40:46AM +0300, Saku Ytti wrote: > > - show filter dram > > - show filter hw X > > - show filter hw X show_term_info > > > > I lost a fight with JTAC about whether the TCAM exhausting filter > > should be a commit failure or not. Argument was along the line 'well > > you can keep adding routes even if you exhaust TCAM, so this should be > > the same'. > > I'm absolutely certain there are many QFX and EX networks out there > > with wildly different filters programmed than what they believe they > > have. > > Switching platform (2199 Mhz Pentium processor, 511MB memory, 0KB flash) > > FPC0(ex4300-48mp vty)# show filter dram > Name BytesAllocs Frees Failures > --- > filter 62940 1198 395 0 > filter-halp 0 0 0 0 > --- > > Total DFW Dram Usage obtained from global handle: > Total DFW Dram Usage: 78680 bytes > Total DFW allocs: 740 > Total DFW frees:0 > Outstanding DFW allocs: 740 > > Total DFW Dram Usage obtained from all filters: > Total DFW Dram Usage: 78704 bytes > Total DFW allocs: 740 > Total DFW frees:0 > Outstanding DFW allocs: 740 > > FPC0(ex4300-48mp vty)# > FPC0(ex4300-48mp vty)# show filter > Program Filters: > --- >Index Dir CntText Bss Name > -- -- -- -- > > Term Filters: > >IndexSemanticName > >1 Classic ROUTING-ENGINE >2 Classic ROUTING-ENGINE6 >3 Classic ACCESS-FILTER >17000 Classic __default_arp_policer__ >57006 Classic __jdhcpd__ >57007 Classic __dhcpv6__ >65008 Classic __jdhcpd_l2_snoop_filter__ > 16777216 Classic fnp-filter-level-all > 46137360 Classic pfe-cos-cl-610-5-1 > 46137361 Classic pfe-cos-cl-611-5-1 > 46137362 Classic pfe-cos-cl-612-5-1 > 46137363 Classic pfe-cos-cl-613-5-1 > 46137364 Classic pfe-cos-cl-614-5-1 > 46137365 Classic pfe-cos-cl-615-5-1 > 46137366 Classic pfe-cos-cl-616-5-1 > 46137367 Classic pfe-cos-cl-617-5-1 > 46137368 Classic pfe-cos-cl-618-5-1 > 46137369 Classic pfe-cos-cl-619-5-1 > 46137370 Classic pfe-cos-cl-620-5-1 > 46137371 Classic pfe-cos-cl-621-5-1 > 46137372 Classic pfe-cos-cl-622-5-1 > 46137373 Classic pfe-cos-cl-623-5-1 > 46137374 Classic pfe-cos-cl-624-5-1 > 46137375 Classic pfe-cos-cl-625-5-1 > 46137376 Classic pfe-cos-cl-626-5-1 > 46137377 Classic pfe-cos-cl-627-5-1 > 46137378 Classic pfe-cos-cl-628-5-1 > 46137379 Classic pfe-cos-cl-629-5-1 > 46137380 Classic pfe-cos-cl-630-5-1 > 46137381 Classic pfe-cos-cl-631-5-1 > 46137382 Classic pfe-cos-cl-632-5-1 > 46137383 Classic pfe-cos-cl-633-5-1 > 46137384 Classic pfe-cos-cl-634-5-1 > 46137385 Classic pfe-cos-cl-635-5-1 > 46137386 Classic pfe-cos-cl-636-5-1 > 46137387 Classic pfe-cos-cl-637-5-1 > 46137388 Classic pfe-cos-cl-638-5-1 > 46137389 Classic pfe-cos-cl-639-5-1 > 46137390 Classic pfe-cos-cl-640-5-1 > 46137391 Classic pfe-cos-cl-641-5-1 > 46137392 Classic pfe-cos-cl-642-5-1 > 46137393 Classic pfe-cos-cl-643-5-1 > 46137394 Classic pfe-cos-cl-644-5-1 > 46137395 Classic pfe-cos-cl-645-5-1 > 46137396 Classic pfe-cos-cl-646-5-1 > 46137397 Classic pfe-cos-cl-647-5-1 > 46137398 Classic pfe-cos-cl-648-5-1 > 46137399 Classic pfe-cos-cl-649-5-1 > 46137400 Classic pfe-cos-cl-656-5-1 > 46137401 Classic pfe-cos-cl-657-5-1 > 46137402 Classic pfe-cos-cl-658-5-1 > 46137403 Classic pfe-cos-cl-655-5-1 > 46137404 Classic pfe-cos-cl-650-5-1 > 46137405 Classic pfe-cos-cl-651-5-1 > 46137406 Classic pfe-cos-cl-652-5-1 > 46137407 Classic pfe-cos-cl-653-5-1 > 461
Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
Hey, Can you please provide - show filter dram - show filter hw X - show filter hw X show_term_info I lost a fight with JTAC about whether the TCAM exhausting filter should be a commit failure or not. Argument was along the line 'well you can keep adding routes even if you exhaust TCAM, so this should be the same'. I'm absolutely certain there are many QFX and EX networks out there with wildly different filters programmed than what they believe they have. On Wed, 12 Oct 2022 at 05:33, Chuck Anderson via juniper-nsp wrote: > > Has anyone seen these errors and know what the cause is? > > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-624-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-626-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-631-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-632-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-632-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-633-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-633-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-634-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-634-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-638-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-638-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-647-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-647-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-656-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-656-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-657-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-657-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-655-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-652-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-652-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-653-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-653-5-1" is NOT programmed in HW > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter > pfe-cos-cl-654-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries > Oct 11 21:41:02 ex4300-48mp fpc0 DFWE ERROR DFW: Filter : > "pfe-cos-cl-654-5-1" is NOT programmed in HW > > There is plenty of TCAM space for IRACL/IPACL entries, so this seems to be > some issue with a different TCAM partition? > > ex4300-48mp> show pfe filter hw summary > > Slot 0 > > Unit:0: > GroupGroup-ID Allocated Used Free > --- > > Ingress filter groups: > iRACL group33 2048 1148 900 > iPACL group25 51212 500 > > Egress filter groups: > > Slot 1 > > Unit:0: > GroupGroup-ID Allocated Used Free > --- > > Ingress filter groups: > iRACL group33 2048 1148 900 > iPACL group25 51212 500 > > Egress filter groups: > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Flowspec not filtering traffic.
> >> > show firewall >> > >> > Filter: __flowspec_default_inet__ >> > Counters: >> > NameBytes >> > Packets >> > 1x8.2x8.84.34,*,proto=17,port=0 19897391083 >> > 510189535 >> > >> > >> > BGP Group >> > >> > {master}[edit protocols bgp group KENTIK_FS] >> > type internal; >> > hold-time 720; >> > mtu-discovery; >> > family inet { >> > unicast; >> > flow { >> > no-validate flowspec-import; >> > } >> > } >> > } >> > >> > >> > >> > Import policy >> > {master}[edit] >> > gustavo@MX10K3# edit policy-options policy-statement flowspec-import >> > >> > {master}[edit policy-options policy-statement flowspec-import] >> > gustavo@MX10K3# show >> > term 1 { >> > then accept; >> > } >> > >> > IP transit interface >> > >> > {master}[edit interfaces ae0 unit 10] >> > gustavo@MX10K3# show >> > vlan-id 10; >> > family inet { >> > mtu 1500; >> > filter { >> > inactive: input ddos; >> > } >> > sampling { >> > input; >> > } >> > address x.x.x.x.x/31; >> > } >> > >> > >> > Em sáb., 17 de set. de 2022 às 03:00, Saku Ytti escreveu: >> > >> > > Can you provide some output. >> > > >> > > Like 'show route table inetflow.0 extensive' and config. >> > > >> > > On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp >> > > wrote: >> > > > >> > > > Hi, >> > > > >> > > > We have noticed that flowspec is not working or filtering as expected. >> > > > Trying a DDoS detection and rule generator tool, and we noticed that >> > > > the >> > > > flowspec rule is installed, >> > > > the filter counter is increasing , but no filtering at all. >> > > > >> > > > For example DDoS traffic from source port UDP port 123 is coming from >> > > > an >> > > > Internet Transit >> > > > facing interface AE0. >> > > > The destination of this traffic is to a customer Interface ET-0/0/10. >> > > > >> > > > Even with all information and "show" commands confirming that the >> > > > traffic >> > > > has been filtered, customer and snmp and netflow from the customer >> > > > facing >> > > > interface is showing that the "filtered" traffic is hitting the >> > > destination. >> > > > >> > > > Is there any caveat or limitation or anyone hit this issue? I tried >> > > > this >> > > > with two MX10003 routers one with 19.R3-xxx and the other one with >> > > > 20.4R3 >> > > > junos branch. >> > > > >> > > > Regards. >> > > > ___ >> > > > juniper-nsp mailing list juniper-nsp@puck.nether.net >> > > > https://puck.nether.net/mailman/listinfo/juniper-nsp >> > > >> > > >> > > >> > > -- >> > > ++ytti >> > > >> > ___ >> > juniper-nsp mailing list juniper-nsp@puck.nether.net >> > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Flowspec not filtering traffic.
Actually I think I'm confused, I'm just not accustomed to seeing other than 0:0 as rate, but it may be thaat the first 0 doesn't matter. I would verify 'show route flow validation detail' as well as verify presence of policers if any (in PFE 'show filter counters'). I'd also look at the filter more closely at PFE: - show filter (get the index) - show filter index X program On Sun, 18 Sept 2022 at 09:39, Saku Ytti wrote: > > Are you exceeding the configured rate for the policer? Did you expect > to drop at any rate? The rule sets a non-0 policing rate. > > On Sat, 17 Sept 2022 at 17:42, Gustavo Santos wrote: > > > > Hi Saku, > > > > PS: Real ASN was changed to 65000 on the configuration snippet. > > > > > > > > show route table inetflow.0 extensive > > > > 1x8.2x8.84.34,*,proto=17,port=0/term:7 (1 entry, 1 announced) > > TSI: > > KRT in dfwd; > > Action(s): discard,count > > Page 0 idx 0, (group KENTIK_FS type Internal) Type 1 val 0x63b7c098 > > (adv_entry) > >Advertised metrics: > > Flags: NoNexthop > > Localpref: 100 > > AS path: [65000 I > > Communities: traffic-rate:52873:0 > > Advertise: 0001 > > Path 1x8.2x8.84.34,*,proto=17,port=0 > > Vector len 4. Val: 0 > > *Flow Preference: 5 > > Next hop type: Fictitious, Next hop index: 0 > > Address: 0x5214bfc > > Next-hop reference count: 22 > > Next hop: > > State: > > Local AS: 52873 > > Age: 8w0d 20:30:33 > > Validation State: unverified > > Task: RT Flow > > Announcement bits (2): 0-Flow 1-BGP_RT_Background > > AS path: I > > Communities: traffic-rate:65000:0 > > > > show firewall > > > > Filter: __flowspec_default_inet__ > > Counters: > > NameBytes > > Packets > > 1x8.2x8.84.34,*,proto=17,port=0 19897391083 > > 510189535 > > > > > > BGP Group > > > > {master}[edit protocols bgp group KENTIK_FS] > > type internal; > > hold-time 720; > > mtu-discovery; > > family inet { > > unicast; > > flow { > > no-validate flowspec-import; > > } > > } > > } > > > > > > > > Import policy > > {master}[edit] > > gustavo@MX10K3# edit policy-options policy-statement flowspec-import > > > > {master}[edit policy-options policy-statement flowspec-import] > > gustavo@MX10K3# show > > term 1 { > > then accept; > > } > > > > IP transit interface > > > > {master}[edit interfaces ae0 unit 10] > > gustavo@MX10K3# show > > vlan-id 10; > > family inet { > > mtu 1500; > > filter { > > inactive: input ddos; > > } > > sampling { > > input; > > } > > address x.x.x.x.x/31; > > } > > > > > > Em sáb., 17 de set. de 2022 às 03:00, Saku Ytti escreveu: > >> > >> Can you provide some output. > >> > >> Like 'show route table inetflow.0 extensive' and config. > >> > >> On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp > >> wrote: > >> > > >> > Hi, > >> > > >> > We have noticed that flowspec is not working or filtering as expected. > >> > Trying a DDoS detection and rule generator tool, and we noticed that the > >> > flowspec rule is installed, > >> > the filter counter is increasing , but no filtering at all. > >> > > >> > For example DDoS traffic from source port UDP port 123 is coming from an > >> > Internet Transit > >> > facing interface AE0. > >> > The destination of this traffic is to a customer Interface ET-0/0/10. > >> > > >> > Even with all information and "show" commands confirming that the traffic > >> > has been filtered, customer and snmp and netflow from the customer facing > >> > interface is showing that the "filtered" traffic is hitting the > >> > destination. > >> > > >> > Is there any caveat or limitation or anyone hit this issue? I tried this > >> > with two MX10003 routers one with 19.R3-xxx and the other one with 20.4R3 > >> > junos branch. > >> > > >> > Regards. > >> > ___ > >> > juniper-nsp mailing list juniper-nsp@puck.nether.net > >> > https://puck.nether.net/mailman/listinfo/juniper-nsp > >> > >> > >> > >> -- > >> ++ytti > > > > -- > ++ytti -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Flowspec not filtering traffic.
Are you exceeding the configured rate for the policer? Did you expect to drop at any rate? The rule sets a non-0 policing rate. On Sat, 17 Sept 2022 at 17:42, Gustavo Santos wrote: > > Hi Saku, > > PS: Real ASN was changed to 65000 on the configuration snippet. > > > > show route table inetflow.0 extensive > > 1x8.2x8.84.34,*,proto=17,port=0/term:7 (1 entry, 1 announced) > TSI: > KRT in dfwd; > Action(s): discard,count > Page 0 idx 0, (group KENTIK_FS type Internal) Type 1 val 0x63b7c098 > (adv_entry) >Advertised metrics: > Flags: NoNexthop > Localpref: 100 > AS path: [65000 I > Communities: traffic-rate:52873:0 > Advertise: 0001 > Path 1x8.2x8.84.34,*,proto=17,port=0 > Vector len 4. Val: 0 > *Flow Preference: 5 > Next hop type: Fictitious, Next hop index: 0 > Address: 0x5214bfc > Next-hop reference count: 22 > Next hop: > State: > Local AS: 52873 > Age: 8w0d 20:30:33 > Validation State: unverified > Task: RT Flow > Announcement bits (2): 0-Flow 1-BGP_RT_Background > AS path: I > Communities: traffic-rate:65000:0 > > show firewall > > Filter: __flowspec_default_inet__ > Counters: > NameBytes Packets > 1x8.2x8.84.34,*,proto=17,port=0 19897391083510189535 > > > BGP Group > > {master}[edit protocols bgp group KENTIK_FS] > type internal; > hold-time 720; > mtu-discovery; > family inet { > unicast; > flow { > no-validate flowspec-import; > } > } > } > > > > Import policy > {master}[edit] > gustavo@MX10K3# edit policy-options policy-statement flowspec-import > > {master}[edit policy-options policy-statement flowspec-import] > gustavo@MX10K3# show > term 1 { > then accept; > } > > IP transit interface > > {master}[edit interfaces ae0 unit 10] > gustavo@MX10K3# show > vlan-id 10; > family inet { > mtu 1500; > filter { > inactive: input ddos; > } > sampling { > input; > } > address x.x.x.x.x/31; > } > > > Em sáb., 17 de set. de 2022 às 03:00, Saku Ytti escreveu: >> >> Can you provide some output. >> >> Like 'show route table inetflow.0 extensive' and config. >> >> On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp >> wrote: >> > >> > Hi, >> > >> > We have noticed that flowspec is not working or filtering as expected. >> > Trying a DDoS detection and rule generator tool, and we noticed that the >> > flowspec rule is installed, >> > the filter counter is increasing , but no filtering at all. >> > >> > For example DDoS traffic from source port UDP port 123 is coming from an >> > Internet Transit >> > facing interface AE0. >> > The destination of this traffic is to a customer Interface ET-0/0/10. >> > >> > Even with all information and "show" commands confirming that the traffic >> > has been filtered, customer and snmp and netflow from the customer facing >> > interface is showing that the "filtered" traffic is hitting the >> > destination. >> > >> > Is there any caveat or limitation or anyone hit this issue? I tried this >> > with two MX10003 routers one with 19.R3-xxx and the other one with 20.4R3 >> > junos branch. >> > >> > Regards. >> > ___ >> > juniper-nsp mailing list juniper-nsp@puck.nether.net >> > https://puck.nether.net/mailman/listinfo/juniper-nsp >> >> >> >> -- >> ++ytti -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Flowspec not filtering traffic.
Can you provide some output. Like 'show route table inetflow.0 extensive' and config. On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp wrote: > > Hi, > > We have noticed that flowspec is not working or filtering as expected. > Trying a DDoS detection and rule generator tool, and we noticed that the > flowspec rule is installed, > the filter counter is increasing , but no filtering at all. > > For example DDoS traffic from source port UDP port 123 is coming from an > Internet Transit > facing interface AE0. > The destination of this traffic is to a customer Interface ET-0/0/10. > > Even with all information and "show" commands confirming that the traffic > has been filtered, customer and snmp and netflow from the customer facing > interface is showing that the "filtered" traffic is hitting the destination. > > Is there any caveat or limitation or anyone hit this issue? I tried this > with two MX10003 routers one with 19.R3-xxx and the other one with 20.4R3 > junos branch. > > Regards. > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Outgrowing a QFX5100
On Fri, 16 Sept 2022 at 22:12, Jason Healy via juniper-nsp wrote: Hey Jason, > My question is, what would be the logical "step up" from the qfx on a small > network? I'm thinking the MX240 as it's the smallest router that has > redundant REs. However, I have no experience with the router family (we're > all EX/QFX). I'd consider a newer member of the QFX family, but I'd need to > know I'm not going to bump into a bunch of weird "unsupported on this > platform" issues. Yes. I don't immediately cannot think of any feature that isn't supported on MX that is supported on EX/QFX. Broadly speaking if you are not cost-sensitive, and you don't need the density, always buy an NPU box such as MX, because it's inherently more feature complete. Pipeline boxes like EX/QFX make sense if you are cost sensitive or need high density and can answer what your requirements are ahead of time and run a field trial against those specific requirements. In my experience for access providers your requirements are not a knowable variable, because you will introduce a new product during the life cycle of a device, therefore you will be carrying additional risk with pipeline compared to NPU. If you're a cloudy shop or incumbent telco you likely can have a frozen set of requirements that are knowable a-priori, which supports pipeline use-case. > I'm fine with EOL/aftermarket equipment; we've got a pretty traditional > layer-2 spoke-and-hub setup with layer-3 for IRB and a default route to our > ISP (no VXLAN, tunneling, etc). Our campus isn't growing so capacity isn't a > huge issue (we're 1g/10g uplinks everywhere, and the 10g aren't close to > saturation). I *might* want 40g as a handoff to an aggregation layer, but > that's about it. Thus, I'm OK with a relative lack of new features. Your problem is the slow rate interfaces and getting reasonable support for them. With MX if you are buying from a channel for chassis boxes you should be only buying LC9600, which is 24x400GE, another alternative is fixed config MX304. Both may be highly unsatisfactory to you in the front-plate. ACX portfolio may have some middle-ground to you. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Tacacs command authorization not working as intended
I believe this is best you can do: y...@a03.labxtx03.us.bb-re0# show|display set |match deny set system login class tacacs-user deny-commands "clear pppoe sessions($| no-confirm$)" y...@a03.labxtx03.us.bb-re0> clear pppoe sessions ? Possible completions: Name of PPPoE logical interface y...@a03.labxtx03.us.bb-re0> clear pppoe sessions You can't clear all, but you can clear any. On Mon, 4 Jul 2022 at 17:43, Saku Ytti wrote: > > I don't believe what you're doing is tacacs command authorization, that is > junos is not asking the tacacs server if or not it can execute the command, > something IOS and SROS can do, but which makes things like loading config > very brutal (except SROS has way to skip authorization for config loads). > > You are shipping config to the router for its allow-commands/deny-commands. > And I further believe behaviour you see is because there is distinction > between key and values, and you cannot include values in it. Similar problem > with 'apply-groups', because the parser doesn't know about values and you're > just telling what exists in the parser tree and what does not. > > > > On Mon, 4 Jul 2022 at 17:25, Pierre Emeriaud wrote: >> >> Le lun. 4 juil. 2022 à 16:18, Saku Ytti a écrit : >> > >> > I don't believe Junos has tacacs command authorization. >> >> it has. This sorta works, I've been able to allow some commands like >> 'clear network-access aaa subscriber username.*' and 'monitor >> traffic'. The issue I have is with 'clear pppoe sessions pp0'. >> >> When providing 'clear' to the user I can make it work, but I also have >> to forbid all other clear commands I don't want. >> >> foo@bar> show cli authorization >> Current user: 'GEN-USR-N' login: 'foo' class 'GEN-PROF-N' >> Permissions: >> clear -- Can clear learned network info >> (...) >> Individual command authorization: >> Allow regular expression: (clear pppoe sessions pp0.*|clear >> network-access aaa subscriber username.*|monitor traffic.*) >> Deny regular expression: (request .*|file .*|save .*|clear >> [a-o].*|clear [q-z].*|clear p[^p].*) >> >> >> foo@bar> clear ? >> Possible completions: >> network-access Clear network-access related information >> ppp Clear PPP information >> pppoeClear PPP over Ethernet information >> >> And one can reset all pppoe sessions while I only allowed 'pppoe >> session pp0.*' : >> foo@bar> clear pppoe sessions ? >> Possible completions: >> <[Enter]>Execute this command >> Name of PPPoE logical interface >> >> login configuration for your information: >> foo@bar> show configuration system login >> class GEN-PROF-N { >> idle-timeout 15; >> } >> user GEN-USR-N { >> full-name "TACACS centralized command authorization"; >> uid 2006; >> class GEN-PROF-N; >> } > > > > -- > ++ytti -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Tacacs command authorization not working as intended
I don't believe what you're doing is tacacs command authorization, that is junos is not asking the tacacs server if or not it can execute the command, something IOS and SROS can do, but which makes things like loading config very brutal (except SROS has way to skip authorization for config loads). You are shipping config to the router for its allow-commands/deny-commands. And I further believe behaviour you see is because there is distinction between key and values, and you cannot include values in it. Similar problem with 'apply-groups', because the parser doesn't know about values and you're just telling what exists in the parser tree and what does not. On Mon, 4 Jul 2022 at 17:25, Pierre Emeriaud wrote: > Le lun. 4 juil. 2022 à 16:18, Saku Ytti a écrit : > > > > I don't believe Junos has tacacs command authorization. > > it has. This sorta works, I've been able to allow some commands like > 'clear network-access aaa subscriber username.*' and 'monitor > traffic'. The issue I have is with 'clear pppoe sessions pp0'. > > When providing 'clear' to the user I can make it work, but I also have > to forbid all other clear commands I don't want. > > foo@bar> show cli authorization > Current user: 'GEN-USR-N' login: 'foo' class 'GEN-PROF-N' > Permissions: > clear -- Can clear learned network info > (...) > Individual command authorization: > Allow regular expression: (clear pppoe sessions pp0.*|clear > network-access aaa subscriber username.*|monitor traffic.*) > Deny regular expression: (request .*|file .*|save .*|clear > [a-o].*|clear [q-z].*|clear p[^p].*) > > > foo@bar> clear ? > Possible completions: > network-access Clear network-access related information > ppp Clear PPP information > pppoeClear PPP over Ethernet information > > And one can reset all pppoe sessions while I only allowed 'pppoe > session pp0.*' : > foo@bar> clear pppoe sessions ? > Possible completions: > <[Enter]>Execute this command > Name of PPPoE logical interface > > login configuration for your information: > foo@bar> show configuration system login > class GEN-PROF-N { > idle-timeout 15; > } > user GEN-USR-N { > full-name "TACACS centralized command authorization"; > uid 2006; > class GEN-PROF-N; > } > -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Tacacs command authorization not working as intended
I don't believe Junos has tacacs command authorization. You can add do allow/deny commands regexp in the user class to achieve the same without introducing the RTT lag. On Mon, 4 Jul 2022 at 15:52, Pierre Emeriaud via juniper-nsp < juniper-nsp@puck.nether.net> wrote: > Hi > > i've been trying to authorize 'clear pppoe session pp0.*' for some of > our users. They already have some allowed commands such as 'monitor > traffic' and 'clear network-access aaa subscriber username' that > works, but 'clear pppoe' is refused. > > foo@bar> clear ppp? > No valid completions > > foo@bar> clear pppoe >^ > syntax error, expecting . > > > Here are their rights on the box. They don't have 'clear' permissions > as I'd rather allow one command than refuse all the others. > > foo@bar> show cli authorization > Current user: 'GEN-USR-N' login: 'foo' class 'GEN-PROF-N' > Permissions: > configure -- Can enter configuration mode > interface -- Can view interface configuration > network -- Can access the network > routing -- Can view routing configuration > trace -- Can view trace file settings > trace-control-- Can modify trace file settings > view-- Can view current values and statistics > view-configuration-- Can view all configuration (not including secrets) > Individual command authorization: > Allow regular expression: (clear pppoe sessions pp0.*|clear > network-access aaa subscriber username.*|monitor traffic.*) > Deny regular expression: (request .*|file .*|save .*|clear log .*) > Allow configuration regular expression: (protocols pppoe > traceoptions|system processes smg-service traceoptions|system > processes general-authentication-service traceoptions|protocols > ppp-service traceoptions|services l2tp traceoptions) > Deny configuration regular expression: none > > And the tacacs configuration: > > match = @RouterBNG { > # ReadOnlyDebug > service = junos-exec { > local-user-name = GEN-USR-N > user-permissions = "configure interface network routing trace > trace-control view view-configuration" > deny-commands = "request .*|file .*|save .*|clear log .*" > allow-commands = "clear pppoe sessions pp0.*|clear network-access > aaa subscriber username.*|monitor traffic.*" > allow-configuration = "(protocols pppoe traceoptions|system > processes smg-service traceoptions|system processes > general-authentication-service traceoptions|protocols ppp-service > traceoptions|services l2tp traceoptions)" > } > } > > options I've tried: > allow-commands = "(monitor traffic.*)|(clear pppoe sessions > pp0\..*)|(clear network-access aaa subscriber username.*)" > allow-commands = "monitor traffic.*|clear pppoe sessions pp0.*|clear > network-access aaa subscriber username.*" > allow-commands = "monitor traffic|clear pppoe sessions pp0\..*|clear > network-access aaa subscriber username" > allow-commands = "clear pppoe sessions pp0.*|clear network-access aaa > subscriber username.*|monitor traffic.*" > > > Is there a way without providing 'clear' permission? 'clear > network-access' works even without it... > > thanks, > pierre > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] GRE tunnels on a QFX10002-60C
On Fri, 24 Jun 2022 at 10:54, Mark Tinka via juniper-nsp wrote:> After failing to get Netscout to natively support IS-IS, we came up with > a rather convoluted - but elegant - way to transport on-ramp/off-ramp > traffic into and out of our scrubbers. > > Basically, we use lt-* (logical tunnel) interfaces that sit both in the > global table and a VRF. We loop them to each other, and use IS-IS + BGP > + LDP to tunnel traffic natively using MPLS-based LSP's signaled by LDP > (as opposed to GRE), so that traffic an always follow the best IS-IS + > iBGP path, without the hassle of needing to run GRE between routers and > scrubbers. Many ways to skin the cat. If you can dedicate small router to the scrubber (or routing-instance if you can't) and you run BGP-LU, so you avoid useless egress IP lookup, you just ensure that the scrubber PE or scrubber instance doesn't have the more specific routes, then it'll follow the BGP-LU path to egress CE. You can scrub any and all prefixes, without any scale implications as you never need to touch the network to handle clean traffic. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] GRE tunnels on a QFX10002-60C
Tunnel interfaces are not supported on PE/Paradise, I don't think this changed in BT/Triton either. However you can decapsulate/encapsulate on ingress firewall filter, e.g.: term cleanPipe:xe-0-4-1-1 { from { source-address { a.b.c.d/32; } destination-address { e.f.g.h/30; } protocol gre; } then { count cleanPipe:xe-0-4-1-1; decapsulate gre routing-instance xe-0-4-1-1; } } Here traffic coming from a specific source address, going to a specific destination link using IP protocol 'GRE' is being counted, accepted and decapsulated into a routing-instance. In many ways filter based decapsulation is actually preferable to interface, so I have no large qualms here. What I'd actually want is egress filter decap, instead of ingress. So I could point my GRE tunnels to random addresses at customer network, and have in my edge filters static decap statement which is never updated. Like 'from scurbber/32 to anywhere, protocol gre, decap'. This way my scrubber would launch GRE tunnels to any address at customer site, routing would follow best BGP path to egress and just at the last moment, packet would get decapped. On Fri, 24 Jun 2022 at 00:24, Jon Lewis via juniper-nsp wrote: > > I've got an open support case with Juniper, but as it's gotten nowhere > since opening it last night, I figured I'd try some crowdsourcing :) > > Does anyone have working GRE tunnels terminated to a QFX10002-60C? We've > got a GRE tunnel mesh of several dozen sites, using a mix of Arista 7280s > and Juniper QFX5120s to terminate the tunnels. We're trying to add a > couple of new sites to the mesh where the tunnels will live on > QFX10002-60C. What we're seeing with the QFX10002-60C is, locally > generated traffic (i.e. ping from the QFX10002-60C to an IP reachable via > a gr-0/0/0.XX interface) works, but traffic from another device in the POP > that needs to transit a QFX10002-60C which should then route the traffic > via a gr-0/0/0.XX interface is dropped. > > I'm trying to figure out if there's something special about the > QFX10002-60C that requires some config knob not needed on QFX5120 or if > GRE is just broken on the QFX10002-60C. The QFX10002-60C are running > 20.4R3.8. > > -- > Jon Lewis, MCP :) | I route > StackPath, Sr. Neteng | therefore you are > _ http://www.lewis.org/~jlewis/pgp for PGP public key_ > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Request for data from PTX PE/Paradise (e.g. PTX1000, LC1101) operators
Hi, I'd like to return to this topic. I was confused earlier, misattributing the issue I'm seeing to MX. And now it is clear it must be on PTX. I'd again solicit for input for anyone seeing in their syslogs: a) junos: 'received pdu - length mismatch for lacp' b) iosxr: 'ROUTING-ISIS-4-ERR_BAD_PDU_LENGTH' or 'ROUTING-ISIS-4-ERR_BAD_PDU_FORMAT' c) or any other NOS which might log this, when another end is PTX The issue is very rare, so ideally you'd look at syslog for all periods you've had Paradise in the network. Thanks again, -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Request for data from Trio EA (e.g. LC2101, MX204, etc) operators
Hey, I'd like Trio EA operators to verify two things for me a) Bad LACP PDU on both sides of EA link - in syslog something like this: - kernel: et-0/0/0: received pdu - length mismatch for lacp : len 143, pdu 124 b) L3 incompletes increasing on backbone facing interface on both sides of EA link (local and egress PE if MPLS encapped by EA when sending out) - visible also in standard IF-MIB errors counter if you poll that Do you see either/both coinciding with the introduction or upgrade of EA devices? If so I'd like to look a bit deeper into that, but I also appreciate it if you just tell 'we see it, unfortunately we do not at this time have time to investigate'. Thanks! -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Junos 20 - slow RPD
Hey, > On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is > way slower in processing BGP policies and sending the routes to neighbors. > For example, on a BGP group with one neighbor and an export policy > containing 5 terms each matching a community it takes ~1min ( 100% RPD > utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s > in 18.2. > Disabling terms will reduce the time. > > Anyone experienced something similar? I don't recognise this problem specifically. It seems rather terrible regression so you probably should either open a JTAC case or do the Junos dance. If you have a large RIB/FIB ratio allowing more than 1 core to work on BGP will produce improvement: set system processes routing bgp rib-sharding number-of-shards 4 set system processes routing bgp update-threading This is a disruptive change. JNPR wanted us on 20.3 (we are on 20.3R3-S2) for rib-sharding, but we did run it previously on 20.2R3-S3 with success. We are currently targeting 21.4R1-S1. If you have memory pressure, you can expand the default 16GB DRAM to 24GB DRAM via configuration toggle (post 21.2R1). If you are comfortable hacking QEMU/KVM config manually, you can do it on any release and can entertain other sizes. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Juniper CoS - Classifiers specifically
Hey Aaron, > I'm wondering if the BA classifier stops working once an MFC is applied. It > sure seems to in testing. I feel like I've seen a diagram at some point or > document stating that MFC comes before BA in the CoS process chain. but I'm > not sure. If anyone has that link/doc please send it. I'd like to know for > sure. The implied default classifier is there until something else is configured. As you say, you can review what is currently applied by 'show class-of-service interface'. And yes, firewall based classification is done after the cos classifier, so firewall based classification overrides what our cos configuration classified packet to. You can use this to accomplish QPPB, such as instead of BGP based blackholing, you'd have BGP based class downgrade for some specifically selected SADDR or DADDR, signalled by BGP. > Oh, btw, were in the world is all this default CoS stuff derived from? I'd > like to think it's in a file somewhere that I can see in shell perhaps. But > maybe not. Maybe it's actually compiled into the Junos operating systems > itself. Or is there a way to see "show configuration" with a special option > that shows automatic/default stuff like all this CoS info? I believe they are compiled in. Juniper does also have a more appropriate way to inject defaults via 'show configuration groups junos-defaults', but that is not being used here. Of course this is the common case, for any NOS vendor defaults are typically compiled in, not injected via some common configuration scheme, in many cases this is mandatory, because having no default is impossible, like you cannot not have MTU. The standard QoS config in Junos allows any internet user to have their own protected 5% via class selector 6 and 7, potentially disrupting your signalling protocols. I consider all Junos devices misconfigured if QoS policy for edge interfaces is not explicitly defined by the operator. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Marking/shaping UDP reflection traffic
On Wed, 9 Mar 2022 at 19:48, Gert Doering via juniper-nsp wrote: > We use different classes for UDP/123, UDP/53 (exclude well-known > recursives), fragments, ... and are currently using between 20 and 100 > mbit/s for these classes. What is the right number for you depends > on "how much can your customers stomach?" and "how much do you see > under normal conditions?". We do the same, but we classify protocols to two classes 'important' and 'unimportant',. Unimportant being protocols we deem not to be used in reality for anything but abuse, and important to be dual-use. 'unimportant' gets policed on port-level out-right and 'important' gets 2coloured on port level, that exceeding traffic gets downgraded below BE. Answering 'what rate is right' is difficult without understanding better how you are policing, where and what your access ports usually look like. Do remember that JNPR policers are per NPU level by default, unlike CSCO which are per interface level and per-NPU level is not even a configurable option. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Fri, 19 Nov 2021 at 17:12, Thomas Bellman via juniper-nsp wrote: > Cut-through actually *can* help a little bit. The buffer space in > the Trident and Tomahawk chips is mostly shared between all ports; > only a small portion of it is dedicated per port[1]. If you have > lots of traffic on some ports, with little or no congestion, > enabling cut-through will leave more buffer space available for > the congested ports, as the packets will leave the switch/router > quicker. Correct, you can save packetSize * egressInts of buffer with cut-through. So if you have 48 ports and we assume 1500B frames, you can save 72kB of buffer space. > One should note though that these chips will fall back to store- > and-forward if the ingress port and egress port run at different I had hoped this was obvious, when I mentioned the percentage of frames getting cut-through. And strictly speaking, it is not 'these chips', you cannot implement cut-through without store-and-forward. You'd end up dropping most of the traffic in all but very esoteric topology/scenario. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Fri, 19 Nov 2021 at 11:50, james list wrote: > I also understood cut through cannot help but obviously I cannot change QFX > switches because we loss few udp packets for a single application, the idea > could be to change shared buffers for unused queues and add to used one, > correct ? Yes. Anything you can do to a) increase buffer (traditionally in Catalyst, EX, you can win quite bit more buffers by removing queues) b) increase egress rate (LACP to the host may help) Will help a little bit. > Based on the output provided what you suggest to change ? > I also understand this kind of change is traffic affecting. I'm not familiar with QFX tuning, but it should be fairly easy to find and test how you can increase buffers. I think your goal#1 should be move to single BE queue, and try to assign everything there and secondary goal is to add another high priority class and give it a little bit of a buffer. > I also need to understand how shared buffer queues on QFX are attached to COS > queues. Yes. I also don't know this, and I'm not sure how much room for tinkering there is, I know in catalyst and EX some gains over default config can be made, which have significant improvement when boxes have been deployed in the wrong application. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Fri, 19 Nov 2021 at 10:49, james list wrote: Hey, > I try to rephrase the question you do not understand: if I enable cut through > or change buffer is it traffic affecting ? There is no cut-through and I was hoping after reading the previous email, you'd understand why it won't help you at all nor is it desirable. Changing QoS config may be traffic affecting, but you likely do not have the monitoring capability to observe it. > Regarding the drops here the outputs (15h after clear statistics): You talked about MX, so I answered from MX perspective. But your output is not from MX. The device you actually show has exceedingly tiny buffers and is not meant for Internet WAN use, that is, it does not expect significantly higher sender rate to receiver rate with high RTT. It is meant for datacenter use, where RTT is low and speed delta is small. In real life Internet you need larger buffers because of this senderPC => internets => receiverPC Let's imagine an RTT of 200ms and receiver 10GE and sender 100GE. - 10Gbps * 200ms = 250MB TCP window needed to fill it - as TCP windows grow exponentially in absence of loss, you could have 128MB => 250MB growth - this means, senderPC might serialise 128MB of data at 100Gbps - this 128MB you can only send at 10 Gbps rate, rest you have to take into the buffers - intentionally pathological example - 'easy' fix is, that sender doesn't burst the data at its own rate, but does rate estimation and sends window growth at estimated receiver rate, this practically removes buffering needs entirely - 'easy' fix is not standard behaviour, but some cloudyshops configure their linux like this thankfully (Linux already does bandwidth estimation, and you can ask 'tc' to shape the session to esimated bandwidth' What you need to do is change the device to a one that is intended for the application you have. If you can do anything at all, what you can do, is ensure that you have minimum amount of QoS classes and those QoS classes have maximum amount of buffer. So that unused queues aren't holding empty memory while used queue is starving. But even this will have only marginal benefit. Cut-through does nothing, because your egress is congested, you can only use cut-through if egress is not congested. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Thu, 18 Nov 2021 at 23:20, james list via juniper-nsp wrote: > 1) is MX family switching by default in cut through or store and forward > mode? I was not able to find a clear information Store and forward. > 2) is in general (on MX or QFX) jeopardizing the traffic the action to > enable cut through or change buffer allocation? I don't understand the question. > I have some output discard on an interface (class best effort) and some UDP > packets are lost hence I am tuning to find a solution. I don't think how this relates to cut-through at all. Cut-through works when ingress can start writing frame to egress while still reading it, this is ~never the case in multistage ingress+egress buffered devices. And even in devices where it is the case, it only works if egress interface happens to be not serialising the packet at that time, so the percentage of frames actually getting cut-through behaviour in cut-through devices is low in typical applications, applications where it is high likely could have been replaced by a direct connection. Modern multistage devices have low single digit microseconds internal latency and nanoseconds jitter. One microsecond is about 200m in fiber, so that gives you the scale of how much distance you can reduce by reducing the delay incurred by multistage device. Now having said that, what actually is the problem. What are 'output discards', which counter are you looking at? Have you modified QoS configuration, can you share it? By default JNPR is 95% BE, 5% NC (unlike Cisco, which is 100% BE, which I think is better default), and buffer allocation is same, so if you are actually QoS tail-dropping in default JNPR configuration, you're creating massive delays, because the buffer allocation us huge and your problem is rather simply that you're offering too much to the egress, and best you can do is reduce buffer allocation to have lower collateral damage. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp