Hi Johan I experienced a similar issue in my evpn-vxlan environment on QFX5120-48y switches. The DDOS alert occurred whenever a large number of VM migrations occurred simultaneously in my environment, some times there were 20 VM's in simultaneous migration and the DDOS alarmed.
To solve this, I set the following value in the configuration: qfx5120> show configuration system ddos-protection protocols vxlan { aggregate { bandwidth 10000; burst 12000; } } Em qua., 30 de nov. de 2022 às 07:16, john doe via juniper-nsp < juniper-nsp@puck.nether.net> escreveu: > Hi! > > The leaf switches are QFX5k and it seems to be lacking some of the command > you mentioned. We don't have any problem with bgp sessions going down, the > impact is only the payload inside vxlan. > > Protocol Group: VXLAN > > Packet type: aggregate (Aggregate for vxlan control packets) > Aggregate policer configuration: > Bandwidth: 500 pps > Burst: 200 packets > Recover time: 300 seconds > Enabled: Yes > Flow detection configuration: > Flow detection system is off > Detection mode: Automatic Detect time: 0 seconds > Log flows: Yes Recover time: 0 seconds > Timeout flows: No Timeout time: 0 seconds > Flow aggregation level configuration: > Aggregation level Detection mode Control mode Flow rate > Subscriber Automatic Drop 0 pps > Logical interface Automatic Drop 0 pps > Physical interface Automatic Drop 500 pps > System-wide information: > Aggregate bandwidth is no longer being violated > No. of FPCs that have received excess traffic: 1 > Last violation started at: 2022-11-30 09:08:02 CET > Last violation ended at: 2022-11-30 09:09:32 CET > Duration of last violation: 00:01:40 Number of violations: 1508 > Received: 3548252144 Arrival rate: 201 pps > Dropped: 49294329 Max arrival rate: 160189 pps > Routing Engine information: > Bandwidth: 500 pps, Burst: 200 packets, enabled > Aggregate policer is never violated > Received: 0 Arrival rate: 0 pps > Dropped: 0 Max arrival rate: 0 pps > Dropped by individual policers: 0 > FPC slot 0 information: > Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled > Hostbound queue 255 > Aggregate policer is no longer being violated > Last violation started at: 2022-11-30 09:08:02 CET > Last violation ended at: 2022-11-30 09:09:32 CET > Duration of last violation: 00:01:40 Number of violations: 1508 > Received: 3548252144 Arrival rate: 201 pps > Dropped: 49294329 Max arrival rate: 160189 pps > Dropped by individual policers: 0 > Dropped by aggregate policer: 50294227 > Dropped by flow suppression: 0 > Flow counts: > Aggregation level Current Total detected State > Subscriber 0 0 Active > > vty)# show ddos scfd proto-states vxlan > (sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps) > op-mode: a=automatic, o=always-on, x=disabled > fc-mode: d=drop-all, k=keep-all, p=police > d-t: detect time, r-t: recover time, t-t: timeout time > aggr-t: last aggregated/deaggreagated time > idx prot group proto mode detect agg flags state sub-cfg > ifl-cfg ifd-cfg d-t r-t t-t aggr-t > --- ---- -------- -------- ---- ------ --- ----- ----- --------- > --------- --------- --- --- --- ------ > 23 6400 vxlan aggregate auto no 1 2 0 a:d: 0 > a:d: 0 a:d: 500 0 0 0 0 > > > Johan > > On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti <s...@ytti.fi> wrote: > > > Hey, > > > > Before any potential trashing, I'd like to say that as far as I am > > aware Juniper (MX) is the only platform on the market which isn't > > trivial to DoS off the network, despite any protection users may have > > tried to configure. > > > > > How do you identify the source problem of DDOS violations that junos > logs > > > for QFX? For example what interface that is causing the problem? > > > > I assume you are talking about QFX10k with Paradise (PE) chipset. I'm > > not very familiar with it, but I know something about it when sold in > > PTX10k quise, but there are significant differences. Answers are from > > the PTX10k perspective. If you are talking about QFX5k many of the > > answers won't apply, but the ukern side answers should help > > troubleshoot it further, certainly with QFX5k the situation is worse > > than it would be on QFX10k. > > > > > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for > > > protocol/exception VXLAN:aggregate exceeded its allowed bandwidth at > > fpc 0 > > > for 30 times, started at... > > > > > > The configured rate for VXLAN is 500pps, ddos protection is seeing > rates > > > over 150 000pps > > > > Do you mean you've configured: > > 'set system ddos-protection protocols vxlan aggregate bandwidth 500'. > > What exactly are you seeing? What does 'show ddos-protection protocols > > vxlan' say?Also 'start shell pfe network fpcX' + 'show ddos scfd > > proto-states vxlan' > > > > Paradise (unlike Triton and Trio) does not support PPS policing at > > all. So when you configure a PPS policer, what actually gets > > programmed is 500pps*1500B bps. I've tried to argue this is a poor > > default, 64B being superior choice. > > In paradise 500pps would admit 500*(1500/64) or about 12kpps per > > Paradise if those VXLAN packets were small. These would then be > > policed by the LC CPU ukern into 500 pps for all the Paradise chips > > living inside that LC CPU, before sending to RE over bme0. > > After DDoS but before Paradise admits packet to the LC_CPU it goes > > through VoQ, where most packets are classified as VoQ#2 which is > > 10Mbps wide with no burstability (classification, width and > > burstability is being changed on later images). So extremely trivial > > rates will cause congestion on the VoQ#2 and a lot of protocols will > > be competing for 10Mbps access to LC CPU, like BGP, ISIS, OSPF, LDP, > > ND, ARP. > > > > > This is an spine/leaf setup, one theory is that the vxlan traffic that > > most > > > of our QFX boxes are activation ddos protection for is actually vxlan > > > services running inside the vxlans, for example we have kubernetes > > clusters > > > using vxlan. Is that a sane theory? > > > > Not enough information to speculate. > > In many cases ddos classification is wrong. You can review in the PFE, > > 'show filter' => HOSTBOND_IPv4_FILTER then 'show filter index X > > program'. You can also capture punted packets on interface where RE > > meets FPC (I think bme0 here), in the bme0 interface TNP headers are > > in top of the punted packets and in the TNP headers you will see what > > ddos classification was used, you can turn the number into name by > > looking at the 'show ddos scfd proto-statates'. > > > > > > I naively wish I could set my ddos-protocol classification and voq > > classification manually in 'lo0 filter', because the infrastructure > > allows for great protection, but particularly when choosing which VoQ > > packets share there is no obvious single best solution, it depends on > > the environment. Like I could put RSVP, ISIS, LDP on single VoQ, as > > they never compete with customers, BGP in another as they will compete > > with customers and operators for me, and so forth. But of course this > > wish is naive, as the solution the vendor offers is already too > > complex for customers to use and giving more rope would just make the > > mean config worse. > > > > -- > > ++ytti > > > _______________________________________________ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp