Re: [j-nsp] Cut through and buffer questions
On 11/19/21 17:05, Thomas Bellman via juniper-nsp wrote: [1] Of the 12 Mbyte buffer space in Trident 2, which is used in QFX5100 and EX4600, 3 Mbyte is used for per-port dedicated buffers, and 9 Mbyte is shared between all ports. I believe on later chips an even larger percentage is shared. This is what got us off the EX4600 (and the EX4550, prior to that). Been on the Arista 7280R ever since, and we are happy. The only problem is Arista seem to stop producing current code for older boxes at some point, as we've seen with our 7508E's. Not a major drama since we use these purely for Layer 2 switching, but if we were running them as IP/MPLS routers, we'd be royally pissed. Mark. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Fri, 19 Nov 2021 at 17:12, Thomas Bellman via juniper-nsp wrote: > Cut-through actually *can* help a little bit. The buffer space in > the Trident and Tomahawk chips is mostly shared between all ports; > only a small portion of it is dedicated per port[1]. If you have > lots of traffic on some ports, with little or no congestion, > enabling cut-through will leave more buffer space available for > the congested ports, as the packets will leave the switch/router > quicker. Correct, you can save packetSize * egressInts of buffer with cut-through. So if you have 48 ports and we assume 1500B frames, you can save 72kB of buffer space. > One should note though that these chips will fall back to store- > and-forward if the ingress port and egress port run at different I had hoped this was obvious, when I mentioned the percentage of frames getting cut-through. And strictly speaking, it is not 'these chips', you cannot implement cut-through without store-and-forward. You'd end up dropping most of the traffic in all but very esoteric topology/scenario. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
Can you please share the output of: show class-of-service shared-buffer on your QFX5100 ? Cheers James Il giorno ven 19 nov 2021 alle ore 11:58 Thomas Bellman ha scritto: > On 2021-11-19 09:49, james list via juniper-nsp wrote: > > > I try to rephrase the question you do not understand: if I enable cut > > through or change buffer is it traffic affecting ? > > On the QFX 5xxx series and (at least) EX 46xx series, the forwarding > ASIC needs to reset in order to change between store-and-forward and > cut-through, and traffic will be lost until the reprogramming has been > completed. Likewise, changing buffer config will need to reset the > ASIC. When I have tested it, this has taken at most one second, though, > so for many people it will be a non-event. > > One thing to remember when using cut-through forwarding, is that packets > that have suffered bit errors or truncation, so the CRC checksum is > incorrect, will still be forwarded, and not be discarded by the switch. > This is usually not a problem in itself, but if you are not aware of it, > it is easy to get confused when troubleshooting bit errors (you see > ingress errors on one switch, and think it is the link to the switch > that has problems, but in reality it might just be that the switch on > the other end that is forwarding broken packets *it* received). > > > > Regarding the drops here the outputs (15h after clear statistics): > [...abbreviated...] > > Queue: 0, Forwarding classes: best-effort > > Transmitted: > > Packets :6929684309190446 pps > > Bytes: 4259968408584 761960360 bps > > Total-dropped packets: 1592 0 pps > > Total-dropped bytes : 2244862 0 bps > [...]> Queue: 7, Forwarding classes: network-control > > Transmitted: > > Packets : 59234 0 pps > > Bytes: 4532824 504 bps > > Total-dropped packets: 0 0 pps > > Total-dropped bytes : 0 0 bps > > Queue: 8, Forwarding classes: mcast > > Transmitted: > > Packets : 655370488 pps > > Bytes:5102847425663112 bps > > Total-dropped packets: 279 0 pps > > Total-dropped bytes :423522 0 bps > > These drop figures don't immediately strike me as excessive. We > certainly have much higher drop percentages, and don't see much > practical performance problems. But it will very much depend on > your application. The one thing I note is that you have much > more multicast than we do, and you see drops in that forwarding > class. > > I didn't quite understand if you see actual application or > performance problems. > > > > show class-of-service shared-buffer > > Ingress: > > Total Buffer : 12480.00 KB > > Dedicated Buffer : 2912.81 KB > > Shared Buffer: 9567.19 KB > > Lossless : 861.05 KB > > Lossless Headroom : 4305.23 KB > > Lossy : 4400.91 KB > > This looks like a QFX5100 or EX4600, with the 12 Mbyte buffer in the > Broadcom Trident 2 chip. You probably want to read this page, to > understand how to configure buffer allocation for your needs: > > > https://www.juniper.net/documentation/us/en/software/junos/traffic-mgmt-qfx/topics/concept/cos-qfx-series-buffer-configuration-understanding.html > > In my network, we only have best-effort traffic, and very little > multi- or broadcast traffic (basically just ARP/Neighbour discovery, > DHCP, and OSPF), so we use these settings on our QFX5100 and EX4600 > switches: > > forwarding-options { > cut-through; > } > class-of-service { > /* Max buffers to best-effort traffic, minimum for lossless > ethernet */ > shared-buffer { > ingress { > percent 100; > buffer-partition lossless { percent 5; } > buffer-partition lossless-headroom { percent 0; } > buffer-partition lossy { percent 95; } > } > egress { > percent 100; > buffer-partition lossless { percent 5; } > buffer-partition lossy { percent 75; } > buffer-partition multicast { percent 20; } > } > } > } > > (On our QFX5120 switches, I have moved even more buffer space to > the "lossy" classes.) But you need to tune to *your* needs; the > above is for our needs. > > > /Bellman > > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On 2021-11-19 10:07, Saku Ytti via juniper-nsp wrote: > Cut-through does nothing, because your egress is congested, you can > only use cut-through if egress is not congested. Cut-through actually *can* help a little bit. The buffer space in the Trident and Tomahawk chips is mostly shared between all ports; only a small portion of it is dedicated per port[1]. If you have lots of traffic on some ports, with little or no congestion, enabling cut-through will leave more buffer space available for the congested ports, as the packets will leave the switch/router quicker. One should note though that these chips will fall back to store- and-forward if the ingress port and egress port run at different speeds. (In theory, it should be possible to do cut-through as long as the egress port is not faster than the ingress port, but as far as I know, any speed mismatch causes store-and-forward to be used). Also, if you have rate limiting or shaping enabled on the ingress or egress port, the chips will fall back to store-and-forward. Whether this helps *enough*, is another question. :-) I believe in general, it will only make a pretty small difference in buffer usage. I enabled cut-through forwarding on our QFX5xxx:es and EX4600:s a few years ago, and any change in packet drop rates or TCP performance (both local and long-distance) was lost way down in the noise. But I have seen reports from others that saw a meaningful, if not exactly huge, difference; that was several years ago, though, and I didn't save any reference to the report, so you might want to classify that as hearsay... (I have kept cut-through enabled on our devices, since I don't know of any practical disadvantages, and it *might* help a tiny little bit in some cases.) [1] Of the 12 Mbyte buffer space in Trident 2, which is used in QFX5100 and EX4600, 3 Mbyte is used for per-port dedicated buffers, and 9 Mbyte is shared between all ports. I believe on later chips an even larger percentage is shared. -- Thomas Bellman, National Supercomputer Centre, Linköping Univ., Sweden "We don't understand the software, and sometimes we don't understand the hardware, but we can *see* the blinking lights!" signature.asc Description: OpenPGP digital signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On 2021-11-19 09:49, james list via juniper-nsp wrote: > I try to rephrase the question you do not understand: if I enable cut > through or change buffer is it traffic affecting ? On the QFX 5xxx series and (at least) EX 46xx series, the forwarding ASIC needs to reset in order to change between store-and-forward and cut-through, and traffic will be lost until the reprogramming has been completed. Likewise, changing buffer config will need to reset the ASIC. When I have tested it, this has taken at most one second, though, so for many people it will be a non-event. One thing to remember when using cut-through forwarding, is that packets that have suffered bit errors or truncation, so the CRC checksum is incorrect, will still be forwarded, and not be discarded by the switch. This is usually not a problem in itself, but if you are not aware of it, it is easy to get confused when troubleshooting bit errors (you see ingress errors on one switch, and think it is the link to the switch that has problems, but in reality it might just be that the switch on the other end that is forwarding broken packets *it* received). > Regarding the drops here the outputs (15h after clear statistics): [...abbreviated...] > Queue: 0, Forwarding classes: best-effort > Transmitted: > Packets :6929684309190446 pps > Bytes: 4259968408584 761960360 bps > Total-dropped packets: 1592 0 pps > Total-dropped bytes : 2244862 0 bps [...]> Queue: 7, Forwarding classes: network-control > Transmitted: > Packets : 59234 0 pps > Bytes: 4532824 504 bps > Total-dropped packets: 0 0 pps > Total-dropped bytes : 0 0 bps > Queue: 8, Forwarding classes: mcast > Transmitted: > Packets : 655370488 pps > Bytes:5102847425663112 bps > Total-dropped packets: 279 0 pps > Total-dropped bytes :423522 0 bps These drop figures don't immediately strike me as excessive. We certainly have much higher drop percentages, and don't see much practical performance problems. But it will very much depend on your application. The one thing I note is that you have much more multicast than we do, and you see drops in that forwarding class. I didn't quite understand if you see actual application or performance problems. > show class-of-service shared-buffer > Ingress: > Total Buffer : 12480.00 KB > Dedicated Buffer : 2912.81 KB > Shared Buffer: 9567.19 KB > Lossless : 861.05 KB > Lossless Headroom : 4305.23 KB > Lossy : 4400.91 KB This looks like a QFX5100 or EX4600, with the 12 Mbyte buffer in the Broadcom Trident 2 chip. You probably want to read this page, to understand how to configure buffer allocation for your needs: https://www.juniper.net/documentation/us/en/software/junos/traffic-mgmt-qfx/topics/concept/cos-qfx-series-buffer-configuration-understanding.html In my network, we only have best-effort traffic, and very little multi- or broadcast traffic (basically just ARP/Neighbour discovery, DHCP, and OSPF), so we use these settings on our QFX5100 and EX4600 switches: forwarding-options { cut-through; } class-of-service { /* Max buffers to best-effort traffic, minimum for lossless ethernet */ shared-buffer { ingress { percent 100; buffer-partition lossless { percent 5; } buffer-partition lossless-headroom { percent 0; } buffer-partition lossy { percent 95; } } egress { percent 100; buffer-partition lossless { percent 5; } buffer-partition lossy { percent 75; } buffer-partition multicast { percent 20; } } } } (On our QFX5120 switches, I have moved even more buffer space to the "lossy" classes.) But you need to tune to *your* needs; the above is for our needs. /Bellman signature.asc Description: OpenPGP digital signature ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Fri, 19 Nov 2021 at 11:50, james list wrote: > I also understood cut through cannot help but obviously I cannot change QFX > switches because we loss few udp packets for a single application, the idea > could be to change shared buffers for unused queues and add to used one, > correct ? Yes. Anything you can do to a) increase buffer (traditionally in Catalyst, EX, you can win quite bit more buffers by removing queues) b) increase egress rate (LACP to the host may help) Will help a little bit. > Based on the output provided what you suggest to change ? > I also understand this kind of change is traffic affecting. I'm not familiar with QFX tuning, but it should be fairly easy to find and test how you can increase buffers. I think your goal#1 should be move to single BE queue, and try to assign everything there and secondary goal is to add another high priority class and give it a little bit of a buffer. > I also need to understand how shared buffer queues on QFX are attached to COS > queues. Yes. I also don't know this, and I'm not sure how much room for tinkering there is, I know in catalyst and EX some gains over default config can be made, which have significant improvement when boxes have been deployed in the wrong application. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
Hi I mentioned MX and QFX (output related QFX5100) in first email because traffic pattern spread both. I never mentioned internet. I also understood cut through cannot help but obviously I cannot change QFX switches because we loss few udp packets for a single application, the idea could be to change shared buffers for unused queues and add to used one, correct ? Based on the output provided what you suggest to change ? I also understand this kind of change is traffic affecting. I also need to understand how shared buffer queues on QFX are attached to COS queues. Thanks, cheers James Il giorno ven 19 nov 2021 alle ore 10:07 Saku Ytti ha scritto: > On Fri, 19 Nov 2021 at 10:49, james list wrote: > > Hey, > > > I try to rephrase the question you do not understand: if I enable cut > through or change buffer is it traffic affecting ? > > There is no cut-through and I was hoping after reading the previous > email, you'd understand why it won't help you at all nor is it > desirable. Changing QoS config may be traffic affecting, but you > likely do not have the monitoring capability to observe it. > > > Regarding the drops here the outputs (15h after clear statistics): > > You talked about MX, so I answered from MX perspective. But your > output is not from MX. > > The device you actually show has exceedingly tiny buffers and is not > meant for Internet WAN use, that is, it does not expect significantly > higher sender rate to receiver rate with high RTT. It is meant for > datacenter use, where RTT is low and speed delta is small. > > In real life Internet you need larger buffers because of this > senderPC => internets => receiverPC > > Let's imagine an RTT of 200ms and receiver 10GE and sender 100GE. > - 10Gbps * 200ms = 250MB TCP window needed to fill it > - as TCP windows grow exponentially in absence of loss, you could have > 128MB => 250MB growth > - this means, senderPC might serialise 128MB of data at 100Gbps > - this 128MB you can only send at 10 Gbps rate, rest you have to take > into the buffers > - intentionally pathological example > - 'easy' fix is, that sender doesn't burst the data at its own rate, > but does rate estimation and sends window growth at estimated receiver > rate, this practically removes buffering needs entirely > - 'easy' fix is not standard behaviour, but some cloudyshops configure > their linux like this thankfully (Linux already does bandwidth > estimation, and you can ask 'tc' to shape the session to esimated > bandwidth' > > What you need to do is change the device to a one that is intended for > the application you have. > If you can do anything at all, what you can do, is ensure that you > have minimum amount of QoS classes and those QoS classes have maximum > amount of buffer. So that unused queues aren't holding empty memory > while used queue is starving. But even this will have only marginal > benefit. > > Cut-through does nothing, because your egress is congested, you can > only use cut-through if egress is not congested. > > > > -- > ++ytti > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
On Fri, 19 Nov 2021 at 10:49, james list wrote: Hey, > I try to rephrase the question you do not understand: if I enable cut through > or change buffer is it traffic affecting ? There is no cut-through and I was hoping after reading the previous email, you'd understand why it won't help you at all nor is it desirable. Changing QoS config may be traffic affecting, but you likely do not have the monitoring capability to observe it. > Regarding the drops here the outputs (15h after clear statistics): You talked about MX, so I answered from MX perspective. But your output is not from MX. The device you actually show has exceedingly tiny buffers and is not meant for Internet WAN use, that is, it does not expect significantly higher sender rate to receiver rate with high RTT. It is meant for datacenter use, where RTT is low and speed delta is small. In real life Internet you need larger buffers because of this senderPC => internets => receiverPC Let's imagine an RTT of 200ms and receiver 10GE and sender 100GE. - 10Gbps * 200ms = 250MB TCP window needed to fill it - as TCP windows grow exponentially in absence of loss, you could have 128MB => 250MB growth - this means, senderPC might serialise 128MB of data at 100Gbps - this 128MB you can only send at 10 Gbps rate, rest you have to take into the buffers - intentionally pathological example - 'easy' fix is, that sender doesn't burst the data at its own rate, but does rate estimation and sends window growth at estimated receiver rate, this practically removes buffering needs entirely - 'easy' fix is not standard behaviour, but some cloudyshops configure their linux like this thankfully (Linux already does bandwidth estimation, and you can ask 'tc' to shape the session to esimated bandwidth' What you need to do is change the device to a one that is intended for the application you have. If you can do anything at all, what you can do, is ensure that you have minimum amount of QoS classes and those QoS classes have maximum amount of buffer. So that unused queues aren't holding empty memory while used queue is starving. But even this will have only marginal benefit. Cut-through does nothing, because your egress is congested, you can only use cut-through if egress is not congested. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
Hi ytti I try to rephrase the question you do not understand: if I enable cut through or change buffer is it traffic affecting ? Regarding the drops here the outputs (15h after clear statistics): Physical interface: xe-0/0/19, Enabled, Physical link is Up Interface index: 939, SNMP ifIndex: 626, Generation: 441 Description: xxx Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Media type: Fiber Device flags : Present Running Interface flags: SNMP-Traps Internal: 0x4000 Link flags : None CoS queues : 12 supported, 12 maximum usable queues Hold-times : Up 0 ms, Down 0 ms Current address: 5c:45:27:a5:c6:36, Hardware address: 5c:45:27:a5:c6:36 Last flapped : 2021-03-21 02:40:21 CET (34w5d 06:49 ago) Statistics last cleared: 2021-11-18 18:26:13 CET (15:03:31 ago) Traffic statistics: Input bytes :3114439584439746871624 bps Output bytes :4196208682119871170072 bps Input packets: 6583209468 204576 pps Output packets: 6821793016 203445 pps IPv6 transit statistics: Input bytes : 0 Output bytes : 0 Input packets: 0 Output packets: 0 Input errors: Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Bucket drops: 0, Policed discards: 0, L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO errors: 0, Resource errors: 0 Output errors: Carrier transitions: 0, Errors: 0, Drops: 1871, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0, Bucket drops: 0 Egress queues: 12 supported, 5 in use Queue counters: Queued packets Transmitted packets Dropped packets 00 6810956602 1592 300 0 400 0 7058647 0 80 6505305 279 Queue number: Mapped forwarding classes 0 best-effort 3 fcoe 4 no-loss 7 network-control 8 mcast show interfaces queue xe-0/0/19 Physical interface: xe-0/0/19, Enabled, Physical link is Up Interface index: 939, SNMP ifIndex: 626 Description: Forwarding classes: 16 supported, 5 in use Egress queues: 12 supported, 5 in use Queue: 0, Forwarding classes: best-effort Queued: Packets : 0 0 pps Bytes: 0 0 bps Transmitted: Packets :6929684309190446 pps Bytes: 4259968408584 761960360 bps Tail-dropped packets : Not Available RL-dropped packets : 0 0 pps RL-dropped bytes : 0 0 bps Total-dropped packets: 1592 0 pps Total-dropped bytes : 2244862 0 bps Queue: 3, Forwarding classes: fcoe Queued: Packets : 0 0 pps Bytes: 0 0 bps Transmitted: Packets : 0 0 pps Bytes: 0 0 bps Tail-dropped packets : Not Available RL-dropped packets : 0 0 pps RL-dropped bytes : 0 0 bps Total-dropped packets: 0 0 pps Total-dropped bytes : 0 0 bps Queue: 4, Forwarding classes: no-loss Queued: Packets : 0 0 pps Bytes: 0 0 bps Transmitted: Packets : 0 0 pps Bytes: 0 0 bps Tail-dropped packets : Not Available RL-dropped packets : 0 0 pps RL-dropped bytes : 0 0 bps Total-dropped packets: 0 0 pps Total-dropped bytes : 0 0 bps Queue: 7, Forwarding classes: network-control Queued: Packets : 0 0 pps Bytes:
Re: [j-nsp] Cut through and buffer questions
On Thu, 18 Nov 2021 at 23:20, james list via juniper-nsp wrote: > 1) is MX family switching by default in cut through or store and forward > mode? I was not able to find a clear information Store and forward. > 2) is in general (on MX or QFX) jeopardizing the traffic the action to > enable cut through or change buffer allocation? I don't understand the question. > I have some output discard on an interface (class best effort) and some UDP > packets are lost hence I am tuning to find a solution. I don't think how this relates to cut-through at all. Cut-through works when ingress can start writing frame to egress while still reading it, this is ~never the case in multistage ingress+egress buffered devices. And even in devices where it is the case, it only works if egress interface happens to be not serialising the packet at that time, so the percentage of frames actually getting cut-through behaviour in cut-through devices is low in typical applications, applications where it is high likely could have been replaced by a direct connection. Modern multistage devices have low single digit microseconds internal latency and nanoseconds jitter. One microsecond is about 200m in fiber, so that gives you the scale of how much distance you can reduce by reducing the delay incurred by multistage device. Now having said that, what actually is the problem. What are 'output discards', which counter are you looking at? Have you modified QoS configuration, can you share it? By default JNPR is 95% BE, 5% NC (unlike Cisco, which is 100% BE, which I think is better default), and buffer allocation is same, so if you are actually QoS tail-dropping in default JNPR configuration, you're creating massive delays, because the buffer allocation us huge and your problem is rather simply that you're offering too much to the egress, and best you can do is reduce buffer allocation to have lower collateral damage. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Cut through and buffer questions
Hi all Questions: 1) is MX family switching by default in cut through or store and forward mode? I was not able to find a clear information 2) is in general (on MX or QFX) jeopardizing the traffic the action to enable cut through or change buffer allocation? I have some output discard on an interface (class best effort) and some UDP packets are lost hence I am tuning to find a solution. Thanks in advance for any hint Cheers James ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp