Re: Lossy cogent p2p experiences?

2023-09-11 Thread David Hubbard
Some interesting new developments on this, independent of the divergent network 
equipment discussion. 

Cogent had a field engineer at the east coast location where my local loop 
(10gig wave) meets their equipment, i.e. (me – patch cable to loop provider’s 
wave equipment – wave – patch cable to Cogent equipment).  On the other end, 
the geographically distant west coast direction, it’s Cogent equipment to my 
equipment in the same facility with just patch cable.  They connected some 
model of EXFO’s NetBlazer FTBx 8880-series testing device to a port on their 
east coast network device, not disconnecting my circuit.  Originally, they were 
planning to have someone physically loop at their equipment at the other end, 
but I volunteered that my Arista gear supports a provider-facing loop at the 
transceiver level if they wanted to try that, so my loop, cabling, and 
transceiver could be part of the testing.

One direction at a time, they interrupted the point to point config to create a 
point to point between one direction of my gear, set to loopback mode, and the 
NetBlazer device.  The device was set to use five parallel streams.  In the 
close direction, where the third-party wave is involved, they ran at full 5 x 
2gbps for thirty minutes, had zero packets lost, no issues.  My monitoring 
confirmed this rate of port input was occurring, although oddly not output, but 
perhaps Arista doesn’t “see”/count the retransmitted packets in phy loopback 
mode.

In the distant direction across their backbone, their equipment at the remote 
end, and the fiber patch cable to me, they tested at 9.5 Gbit for thirty 
minutes through my device in loopback mode.  The result was, of 2.6B packets 
sent, only 334 packets lost.  They configured for 9.5 gbps rate of testing, so 
five 1.9gbps streams.  Across the five streams, the report has a “frame loss” 
and out of sequence section.  Zero out of sequence, but among the five streams, 
loss seconds / count were 3 / 26, 3 / 48, 1 / 5, 13 / 221, 1 / 34.  I’m not 
familiar with this testing device, but to me that suggests it’s stating how 
many of the total seconds experienced loss, and the counted packet loss.  So 
really the only one that stands out is the one with thirteen seconds where loss 
occurred, but the packet counts we’re talking about are miniscule.  Again, my 
monitoring at the interface level showed this 9.5gbps of testing occurring for 
the thirty minutes the report says.

So, now I’m just completely confused.  How is this device, traversing the same 
equipment, ports, cables, able to achieve far greater average throughput, and 
almost no loss, across a very long duration?  There are times I’ll be able to 
achieve nearly the same, but never for a test longer than ten seconds as it 
just falls off from there.  For example, I did a five parallel stream TCP test 
with iperf just now and did achieve a net throughput of 8.16 Gbps with about 
1200 retransmits.  Same five stream test run for half hour like theirs, I got 
no better than 2.64 Gbps and 183,000 retransmits.

iperf and UDP allow me to see loss at any rate of transmit exceeding ~140mbps, 
in just seconds, not a half hour.  To rule out my gear, I’m also able to 
perform the same tests from the same systems (both VM and physical) using 
public addresses and traversing the internet, as these are publicly connected 
systems.  I get far lower loss and much greater throughput on the internet 
path.  For example, simple ten second test of a single stream at 400 Mbit UDP; 
5 packets lost across internet, 491 across P2P.  Single stream TCP across the 
internet for ten seconds; 3.47 Gbps, 162 retransmits.  Across the P2P, this 
time at least, 637 Mbps, 3633 retransmits.

David



From: David Hubbard 
Date: Friday, September 1, 2023 at 10:19 AM
To: Nanog@nanog.org 
Subject: Re: Lossy cogent p2p experiences?
The initial and recurring packet loss occurs on any flow of more than ~140 
Mbit.  The fact that it’s loss-free under that rate is what furthers my opinion 
it’s config-based somewhere, even though they say it isn’t.

From: NANOG  on behalf 
of Mark Tinka 
Date: Friday, September 1, 2023 at 10:13 AM
To: Mike Hammett , Saku Ytti 
Cc: nanog@nanog.org 
Subject: Re: Lossy cogent p2p experiences?

On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a network 
that can't deliver anything acceptable.

Unless Cogent are not trying to accept (and by extension, may not be able to 
guarantee) large Ethernet flows because they can't balance them across their 
various core links, end-to-end...

Pure conjecture...

Mark.


Re: Lossy cogent p2p experiences?

2023-09-10 Thread Saku Ytti
On Sat, 9 Sept 2023 at 21:36, Benny Lyne Amorsen
 wrote:

> The Linux TCP stack does not immediately start backing off when it
> encounters packet reordering. In the server world, packet-based
> round-robin is a fairly common interface bonding strategy, with the
> accompanying reordering, and generally it performs great.

If you have
Linux - 1RU cat-or-such - Router - Internet

Mostly round-robin between Linux-1RU is gonna work, because it
satisfies the a) non congested b) equal rtt c) non-distributed (single
pipeline ASIC switch, honoring ingress order on egress),
requirements. But it is quite a special case, and of course there is
only a round-robin on one link in one direction.

Between 3.6-4.4 all multipath in Linux was broken, and I still to this
day help people with problems on multipath complaining it doesn't
perform (in LAN!).

3.6 introduced FIB to replace flow-cache, and made multipath essentially random
4.4 replaced random with hash

When I ask them 'do you see reordering', people mostly reply 'no',
because they look at PCAP and it doesn't look important to the human
observer, it is such an insignificant amount.. Invariable problem goes
away with hashing. (netstat -s is better than intuition on PCAP).


-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-09 Thread Mark Tinka




On 9/9/23 22:29, Dave Cohen wrote:

At a previous $dayjob at a Tier 1, we would only support LAG for a 
customer L2/3 service if the ports were on the same card. The response 
we gave if customers pushed back was "we don't consider LAG a form of 
circuit protection, so we're not going to consider physical resiliency 
in the design", which was true, because we didn't, but it was beside 
the point. The real reason was that getting our switching/routing 
platform to actually run traffic symmetrically across a LAG, which 
most end users considered expected behavior in a LAG, required a 
reconfiguration of the default hash, which effectively meant that 
[switching/routing vendor]'s TAC wouldn't help when something 
invariably went wrong. So it wasn't that it wouldn't work (my 
recollection at least is that everything ran fine in lab environments) 
but we didn't trust the hardware vendor support.


We've had the odd bug here and there with LAG's for things like VRRP, 
BFD, e.t.c. But we have not run into that specific issue before on 
ASR1000's, ASR9000's, CRS-X's and MX. 98% of our network is Juniper 
nowadays, but even when we ran Cisco and had LAG's across multiple line 
cards, we didn't see this problem.


The only hashing issue we had with LAG's is when we tried to carry Layer 
2 traffic across them in the core. But this was just a limitation of the 
CRS-X, and happened also on member links of a LAG that shared the same 
line card.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-09 Thread Dave Cohen
At a previous $dayjob at a Tier 1, we would only support LAG for a customer
L2/3 service if the ports were on the same card. The response we gave if
customers pushed back was "we don't consider LAG a form of circuit
protection, so we're not going to consider physical resiliency in the
design", which was true, because we didn't, but it was beside the point.
The real reason was that getting our switching/routing platform to actually
run traffic symmetrically across a LAG, which most end users considered
expected behavior in a LAG, required a reconfiguration of the default hash,
which effectively meant that [switching/routing vendor]'s TAC wouldn't help
when something invariably went wrong. So it wasn't that it wouldn't work
(my recollection at least is that everything ran fine in lab environments)
but we didn't trust the hardware vendor support.

On Sat, Sep 9, 2023 at 3:36 PM Mark Tinka  wrote:

>
>
> On 9/9/23 20:44, Randy Bush wrote:
>
> > i am going to be foolish and comment, as i have not seen this raised
> >
> > if i am running a lag, i can not resist adding a bit of resilience by
> > having it spread across line cards.
> >
> > surprise!  line cards from vendor  do not have uniform hashing
> > or rotating algorithms.
>
> We spread all our LAG's across multiple line cards wherever possible
> (wherever possible = chassis-based hardware).
>
> I am not intimately aware of any hashing concerns for LAG's that
> traverse multiple line cards in the same chassis.
>
> Mark.
>


-- 
- Dave Cohen
craetd...@gmail.com
@dCoSays
www.venicesunlight.com


Re: Lossy cogent p2p experiences?

2023-09-09 Thread Mark Tinka




On 9/9/23 20:44, Randy Bush wrote:


i am going to be foolish and comment, as i have not seen this raised

if i am running a lag, i can not resist adding a bit of resilience by
having it spread across line cards.

surprise!  line cards from vendor  do not have uniform hashing
or rotating algorithms.


We spread all our LAG's across multiple line cards wherever possible 
(wherever possible = chassis-based hardware).


I am not intimately aware of any hashing concerns for LAG's that 
traverse multiple line cards in the same chassis.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-09 Thread Randy Bush
i am going to be foolish and comment, as i have not seen this raised

if i am running a lag, i can not resist adding a bit of resilience by
having it spread across line cards.

surprise!  line cards from vendor  do not have uniform hashing
or rotating algorithms.

randy


Re: Lossy cogent p2p experiences?

2023-09-09 Thread Benny Lyne Amorsen
Mark Tinka  writes:

> Oh? What is it then, if it's not spraying successive packets across
> member links?

It sprays the packets more or less randomly across links, and each link
then does individual buffering. It introduces an unnecessary random
delay to each packet, when it could just place them successively on the
next link.

> Ummh, no, it won't.
>
> If it did, it would have been widespread. But it's not.

It seems optimistic to argue that we have reached perfection in
networking.

The Linux TCP stack does not immediately start backing off when it
encounters packet reordering. In the server world, packet-based
round-robin is a fairly common interface bonding strategy, with the
accompanying reordering, and generally it performs great.



Re: Lossy cogent p2p experiences?

2023-09-08 Thread Fred Baker
It was intended to detect congestion. The obvious response was in some way to 
pace the sender(s) so that it was alleviated.

Sent using a machine that autocorrects in interesting ways...

> On Sep 7, 2023, at 11:19 PM, Mark Tinka  wrote:
> 
> 
> 
>> On 9/7/23 09:51, Saku Ytti wrote:
>> 
>> Perhaps if congestion control used latency or FEC instead of loss, we
>> could tolerate reordering while not underperforming under loss, but
>> I'm sure in decades following that decision we'd learn new ways how we
>> don't understand any of this.
> 
> Isn't this partly what ECN was meant for? It's so old I barely remember what 
> it was meant to solve :-).
> 
> Mark.


Re: Lossy cogent p2p experiences?

2023-09-08 Thread Saku Ytti
On Fri, 8 Sept 2023 at 09:17, Mark Tinka  wrote:

> > Unfortunately that is not strict round-robin load balancing.
>
> Oh? What is it then, if it's not spraying successive packets across
> member links?

I believe the suggestion is that round-robin out-performs random
spray. Random spray is what the HPC world is asking, not round-robin.
Now I've not operated such network where per-packet is useful, so I'm
not sure why you'd want round-robin over random spray, but I can see
easily why you'd want either a) random traffic or b) random spray, if
neither are true, if you have strict round-robin and you have
non-random traffic, say every other packet is big data delivery, every
other packet is small ACK, you can easily synchronise one link to 100%
util, and and another near 0%, if you do true round-robin, but not of
you do random spray.
I don't see downside random spray would have over round-robin, but I
wouldn't be shocked if there is one.


I see this thread is mostly starting to loop around two debates

1) Reordering is not a problem
   - if you control the application, you can make it 0 problem
   - if you use TCP shipping in Androids, iOS, macOS, Windows, Linux,
BSD reordering is in practice as bad as packet loss.
   - people who know this in the list, don't know it because they read
it, they know it, because they got caught pants down and learned it,
because they had reordering and tcp performance was destroyed, even at
very low reorder rates
   - we could design TCP congestion control that is very tolerant to
reordering, but I cannot say if it would be overall win or loss

2) Reordering won't happen in per-packet, if there is no congestion
and latencies are equal
   - the receiving distributed router (~all of them) do not have
global synchronisation, they do not make any guarantees that ingress
order is honored for egress, when ingress is >1 interface, the amount
of reordering this alone causes will destroy customer expectation of
TCP performance
   - we could quite easily guarantee order as long as interfaces are
in same hardware complex, but it would be very difficult to guarantee
between hardware complexes


-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-08 Thread Mark Tinka




On 9/7/23 09:51, Saku Ytti wrote:


Perhaps if congestion control used latency or FEC instead of loss, we
could tolerate reordering while not underperforming under loss, but
I'm sure in decades following that decision we'd learn new ways how we
don't understand any of this.


Isn't this partly what ECN was meant for? It's so old I barely remember 
what it was meant to solve :-).


Mark.


Re: Lossy cogent p2p experiences?

2023-09-08 Thread Mark Tinka




On 9/7/23 09:31, Benny Lyne Amorsen wrote:


Unfortunately that is not strict round-robin load balancing.


Oh? What is it then, if it's not spraying successive packets across 
member links?




  I do not
know about any equipment that offers actual round-robin
load-balancing.


Cisco had both per-destination and per-packet. Is that not it in the 
networking world?




Juniper's solution will cause way too much packet reordering for TCP to
handle. I am arguing that strict round-robin load balancing will
function better than hash-based in a lot of real-world
scenarios.


Ummh, no, it won't.

If it did, it would have been widespread. But it's not.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-07 Thread Masataka Ohta

Saku Ytti wrote:


And you will be wrong. Packet arriving out of order, will be
considered previous packet lost by host, and host will signal need for
resend.


As I already quote the very old and fundamental paper on
the E2E argument:

End-To-End Arguments in System Design

https://groups.csail.mit.edu/ana/Publications/PubPDFs/End-to-End%20Arguments%20in%20System%20Design.pdf
: 3.4 Guaranteeing FIFO Message Delivery

and as is described in rfc2001,

   Since TCP does not know whether a duplicate ACK is caused by a lost
   ^^^
   segment or just a reordering of segments, it waits for a small number
   ^
   of duplicate ACKs to be received.  It is assumed that if there is
   just a reordering of the segments, there will be only one or two
   duplicate ACKs before the reordered segment is processed, which will
   then generate a new ACK.  If three or more duplicate ACKs are
 ^^^
   received in a row, it is a strong indication that a segment has been
   
   lost.
   -

in networking, it is well known that "Guaranteeing FIFO Message
Delivery" by the network is impossible because packets arriving
out of order without packet losses is inevitable and is not
uncommon.

As such, slight reordering is *NOT* interpreted as previous
packet loss.

The allowed amount of reordering depends on TCP implementations
and can be controlled by upgrading TCP.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-07 Thread Saku Ytti
On Thu, 7 Sept 2023 at 15:45, Benny Lyne Amorsen
 wrote:

> Juniper's solution will cause way too much packet reordering for TCP to
> handle. I am arguing that strict round-robin load balancing will
> function better than hash-based in a lot of real-world
> scenarios.

And you will be wrong. Packet arriving out of order, will be
considered previous packet lost by host, and host will signal need for
resend.

-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-07 Thread Masataka Ohta

Tom Beecher wrote:


Well, not exactly the same thing. (But it's my mistake, I was referring to
L3 balancing, not L2 interface stuff.)


That should be a correct referring.


load-balance per-packet will cause massive reordering,


If buffering delay of ECM paths can not be controlled , yes.


because it's random
spray , caring about nothing except equal loading of the members.


Equal loading on point to point links between two routers by
(weighted) round robin means mostly same buffering delay, which
won't cause massive reordering.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-07 Thread Benny Lyne Amorsen
Mark Tinka  writes:

>     set interfaces ae2 aggregated-ether-options load-balance per-packet
>
> I ran per-packet on a Juniper LAG 10 years ago. It produced 100%
> perfect traffic distribution. But the reordering was insane, and the
> applications could not tolerate it.

Unfortunately that is not strict round-robin load balancing. I do not
know about any equipment that offers actual round-robin
load-balancing.

Juniper's solution will cause way too much packet reordering for TCP to
handle. I am arguing that strict round-robin load balancing will
function better than hash-based in a lot of real-world
scenarios.



Re: Lossy cogent p2p experiences?

2023-09-07 Thread Saku Ytti
On Thu, 7 Sept 2023 at 00:00, David Bass  wrote:

> Per packet LB is one of those ideas that at a conceptual level are great, but 
> in practice are obvious that they’re out of touch with reality.  Kind of like 
> the EIGRP protocol from Cisco and using the load, reliability, and MTU 
> metrics.

Those multi metrics are in ISIS as well (if you don't use wide). And I
agree those are not for common cases, but I wouldn't be shocked if
someone has legitimate MTR use-case where different metric-type
topologies are very useful. But as long as we keep context as the
Internet, true.

100% reordering does not work for the Internet, not without changing
all end hosts. And by changing those, it's not immediately obvious how
we end-up in better place, like if we wait bit longer to signal
packet-loss, likely we end up in worse place, as reordering just is so
dang rare today, because congestion control choices have made sure no
one reorders, or customers will yell at you, yet packet-loss remains
common.
Perhaps if congestion control used latency or FEC instead of loss, we
could tolerate reordering while not underperforming under loss, but
I'm sure in decades following that decision we'd learn new ways how we
don't understand any of this.

But for non-internet applications, where you control hosts, per-packet
is used and needed, I think HPC applications, and GPU farms etc. are
the users who asked JNPR to implement this.



-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Masataka Ohta

Benny Lyne Amorsen wrote:


TCP looks quite different in 2023 than it did in 1998. It should handle
packet reordering quite gracefully;


Maybe and, even if it isn't, TCP may be modified. But that
is not my primary point.

ECMP, in general, means pathes consist of multiple routers
and links. The links have various bandwidth and other
traffic may be merged at multi access links or on routers.

Then, it is hopeless for the load balancing points to
control buffers of the routers in the pathes and delays
caused by buffers, which makes per-packet load balancing
hopeless.

However, as I wrote to Mark Tinka;

: If you have multiple parallel links over which many slow
: TCP connections are running, which should be your assumption,

with "multiple parallel links", which are single hop
pathes, it is possible for the load balancing point
to control amount of buffer occupancy of the links
and delays caused by the buffers almost same, which
should eliminate packet reordering within a flow,
especially when " many slow TCP connections are
running".

And, simple round robin should be good enough
for most of the cases (no lab testing at all, yet).

A little more aggressive approach is to fully
share a single buffer by all the parallel links.
But as it is not compatible with router architecture
today, I did not proposed the approach.

Masataka Ohta




Re: Lossy cogent p2p experiences?

2023-09-06 Thread David Bass
Per packet LB is one of those ideas that at a conceptual level are great,
but in practice are obvious that they’re out of touch with reality.  Kind
of like the EIGRP protocol from Cisco and using the load, reliability, and
MTU metrics.

On Wed, Sep 6, 2023 at 1:13 PM Mark Tinka  wrote:

>
>
> On 9/6/23 18:52, Tom Beecher wrote:
>
> > Well, not exactly the same thing. (But it's my mistake, I was
> > referring to L3 balancing, not L2 interface stuff.)
>
> Fair enough.
>
>
> > load-balance per-packet will cause massive reordering, because it's
> > random spray , caring about nothing except equal loading of the
> > members. It's a last resort option that will cause tons of reordering.
> > (And they call that out quite clearly in docs.) If you don't care
> > about reordering it's great.
> >
> > load-balance adaptive generally did a decent enough job last time I
> > used it much.
>
> Yep, pretty much my experience too.
>
>
> > stateful was hit or miss ; sometimes it tested amazing, other times
> > not so much. But it wasn't a primary requirement so I never dove into why
>
> Never tried stateful.
>
> Moving 802.1Q trunk from N x 10Gbps LAG's to native 100Gbps links
> resolved this load balancing conundrum for us. Of course, it works well
> because we spread these router<=>switch links across several 100Gbps
> ports, so no single trunk is ever that busy, even for customers buying N
> x 10Gbps services.
>
> Mark.
>


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Saku Ytti
On Wed, 6 Sept 2023 at 19:28, Mark Tinka  wrote:

> Yes, this has been my understanding of, specifically, Juniper's
> forwarding complex.

Correct, packet is sprayed to some PPE, and PPEs do not run in
deterministic time, after PPEs there is reorder block that restores
flow, if it has to.
EZchip is same with its TOPs.

> Packets are chopped into near-same-size cells, sprayed across all
> available fabric links by the PFE logic, given a sequence number, and
> protocol engines ensure oversubscription is managed by a request-grant
> mechanism between PFE's.

This isn't the mechanism that causes reordering, it's the ingress and
egress lookup where Packet or PacketHead is sprayed to some PPE where
it can occur.

Can find some patents on it:
https://www.freepatentsonline.com/8799909.html
When a PPE 315 has finished processing a header, it notifies a Reorder
Block 321. The Reorder Block 321 is responsible for maintaining order
for headers belonging to the same flow, and pulls a header from a PPE
315 when that header is at the front of the queue for its reorder
flow.

Note this reorder happens even when you have exactly 1 ingress
interface and exactly 1 egress interface, as long as you have enough
PPS, you will reorder outside flows, even without fabric being
involved.

-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/6/23 18:52, Tom Beecher wrote:

Well, not exactly the same thing. (But it's my mistake, I was 
referring to L3 balancing, not L2 interface stuff.)


Fair enough.


load-balance per-packet will cause massive reordering, because it's 
random spray , caring about nothing except equal loading of the 
members. It's a last resort option that will cause tons of reordering. 
(And they call that out quite clearly in docs.) If you don't care 
about reordering it's great.


load-balance adaptive generally did a decent enough job last time I 
used it much.


Yep, pretty much my experience too.


stateful was hit or miss ; sometimes it tested amazing, other times 
not so much. But it wasn't a primary requirement so I never dove into why


Never tried stateful.

Moving 802.1Q trunk from N x 10Gbps LAG's to native 100Gbps links 
resolved this load balancing conundrum for us. Of course, it works well 
because we spread these router<=>switch links across several 100Gbps 
ports, so no single trunk is ever that busy, even for customers buying N 
x 10Gbps services.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Tom Beecher
>
> Unless you specifically configure true "per-packet" on your LAG:
>

Well, not exactly the same thing. (But it's my mistake, I was referring to
L3 balancing, not L2 interface stuff.)

load-balance per-packet will cause massive reordering, because it's random
spray , caring about nothing except equal loading of the members. It's a
last resort option that will cause tons of reordering. (And they call that
out quite clearly in docs.) If you don't care about reordering it's great.

load-balance adaptive generally did a decent enough job last time I used it
much. stateful was hit or miss ; sometimes it tested amazing, other times
not so much. But it wasn't a primary requirement so I never dove into why


On Wed, Sep 6, 2023 at 12:04 PM Mark Tinka  wrote:

>
>
> On 9/6/23 17:27, Tom Beecher wrote:
>
> >
> > At least on MX, what Juniper calls 'per-packet' is really 'per-flow'.
>
> Unless you specifically configure true "per-packet" on your LAG:
>
>  set interfaces ae2 aggregated-ether-options load-balance per-packet
>
> I ran per-packet on a Juniper LAG 10 years ago. It produced 100% perfect
> traffic distribution. But the reordering was insane, and the
> applications could not tolerate it.
>
> If you applications can tolerate reordering, per-packet is fine. In the
> public Internet space, it seems we aren't there yet.
>
> Mark.
>


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/6/23 12:01, Saku Ytti wrote:


Fun fact about the real world, devices do not internally guarantee
order. That is, even if you have identical latency links, 0
congestion, order is not guaranteed between packet1 coming from
interfaceI1 and packet2 coming from interfaceI2, which packet first
goes to interfaceE1 is unspecified.
This is because packets inside lookup engine can be sprayed to
multiple lookup engines, and order is lost even for packets coming
from interface1 exclusively, however after the lookup the order is
restored for _flow_, it is not restored between flows, so packets
coming from interface1 with random ports won't be same order going out
from interface2.

So order is only restored inside a single lookup complex (interfaces
are not guaranteed to be in the same complex) and only for actual
flows.


Yes, this has been my understanding of, specifically, Juniper's 
forwarding complex.


Packets are chopped into near-same-size cells, sprayed across all 
available fabric links by the PFE logic, given a sequence number, and 
protocol engines ensure oversubscription is managed by a request-grant 
mechanism between PFE's.


I'm not sure what mechanisms other vendors implement, but certainly OoO 
cells in the Juniper forwarding complex is not a concern within the same 
internal system itself.


Mark.


RE: Lossy cogent p2p experiences?

2023-09-06 Thread Brian Turnbow via NANOG
> If you applications can tolerate reordering, per-packet is fine. In the public
> Internet space, it seems we aren't there yet.

Yeah  this
During lockdown here in Italy one day we started getting calls about 
performance issues performance degradation, vpns dropping or becoming unusable, 
and general randomness of this isn't working like it used to.
All the lines checked out, no bandwidth contention etc  only strange thing we 
found was all affected sessions had a lot of OOR packets with a particular 
network in Italy.
With them we traced it down to traffic flowing through one  IXP and found they 
had added capacity between two switches and it had been configured with per 
packet balancing.
It was changed to flow based balancing and  everything went back to normal.

Brian


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/6/23 11:20, Benny Lyne Amorsen wrote:


TCP looks quite different in 2023 than it did in 1998. It should handle
packet reordering quite gracefully; in the best case the NIC will
reassemble the out-of-order TCP packets into a 64k packet and the OS
will never even know they were reordered. Unfortunately current
equipment does not seem to offer per-packet load balancing, so we cannot
test how well it works.


I ran per-packet load balancing on a Juniper LAG between 2015 - 2016. 
Let's just say I won't be doing that again.


It balanced beautifully, but OoO packets made customers' lives 
impossible. So we went back to adaptive load balancing.




It is possible that per-packet load balancing will work a lot better
today than it did in 1998, especially if the equipment does buffering
before load balancing and the links happen to be fairly short and not
very diverse.

Switching back to per-packet would solve quite a lot of problems,
including elephant flows and bad hashing.

I would love to hear about recent studies.


2016 is not 1998, and certainly not 2023... but I've not heard about any 
improvements in Internet-based applications being better at handling OoO 
packets.


Open to new info.

100Gbps ports has given us some breathing room, as have larger buffers 
on Arista switches to move bandwidth management down to the user-facing 
port and not the upsteam router. Clever Trio + Express chips have also 
enabled reasonably even traffic distribution with per-flow load balancing.


We shall revisit the per-flow vs. per-packet problem when 100Gbps starts 
to become as rampant as 10Gbps did.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/6/23 17:27, Tom Beecher wrote:



At least on MX, what Juniper calls 'per-packet' is really 'per-flow'.


Unless you specifically configure true "per-packet" on your LAG:

    set interfaces ae2 aggregated-ether-options load-balance per-packet

I ran per-packet on a Juniper LAG 10 years ago. It produced 100% perfect 
traffic distribution. But the reordering was insane, and the 
applications could not tolerate it.


If you applications can tolerate reordering, per-packet is fine. In the 
public Internet space, it seems we aren't there yet.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/6/23 16:14, Saku Ytti wrote:


For example Juniper offers true per-packet, I think mostly used in
high performance computing.


Cisco did it too with CEF supporting "ip load-sharing per-packet" at the 
interface level.


I am not sure this is still supported on modern code/boxes.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Tom Beecher
>
> For example Juniper offers true per-packet, I think mostly used in
> high performance computing.
>

At least on MX, what Juniper calls 'per-packet' is really 'per-flow'.

On Wed, Sep 6, 2023 at 10:17 AM Saku Ytti  wrote:

> On Wed, 6 Sept 2023 at 17:10, Benny Lyne Amorsen
>  wrote:
>
> > TCP looks quite different in 2023 than it did in 1998. It should handle
> > packet reordering quite gracefully; in the best case the NIC will
>
> I think the opposite is true, TCP was designed to be order agnostic.
> But everyone uses cubic, and for cubic reorder is the same as packet
> loss. This is a good trade-off. You need to decide if you want to
> recover fast from occasional packet loss, or if you want to be
> tolerant of reordering.
> The moment cubic receives frame+1 it expects, it acks frame-1 again,
> signalling loss of packet, causing unnecessary resend and window size
> reduction.
>
> > will never even know they were reordered. Unfortunately current
> > equipment does not seem to offer per-packet load balancing, so we cannot
> > test how well it works.
>
> For example Juniper offers true per-packet, I think mostly used in
> high performance computing.
>
> --
>   ++ytti
>


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Saku Ytti
On Wed, 6 Sept 2023 at 17:10, Benny Lyne Amorsen
 wrote:

> TCP looks quite different in 2023 than it did in 1998. It should handle
> packet reordering quite gracefully; in the best case the NIC will

I think the opposite is true, TCP was designed to be order agnostic.
But everyone uses cubic, and for cubic reorder is the same as packet
loss. This is a good trade-off. You need to decide if you want to
recover fast from occasional packet loss, or if you want to be
tolerant of reordering.
The moment cubic receives frame+1 it expects, it acks frame-1 again,
signalling loss of packet, causing unnecessary resend and window size
reduction.

> will never even know they were reordered. Unfortunately current
> equipment does not seem to offer per-packet load balancing, so we cannot
> test how well it works.

For example Juniper offers true per-packet, I think mostly used in
high performance computing.

-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Benny Lyne Amorsen
Mark Tinka  writes:

> And just because I said per-flow load balancing has been the gold
> standard for the last 25 years, does not mean it is the best
> solution. It just means it is the gold standard.

TCP looks quite different in 2023 than it did in 1998. It should handle
packet reordering quite gracefully; in the best case the NIC will
reassemble the out-of-order TCP packets into a 64k packet and the OS
will never even know they were reordered. Unfortunately current
equipment does not seem to offer per-packet load balancing, so we cannot
test how well it works.

It is possible that per-packet load balancing will work a lot better
today than it did in 1998, especially if the equipment does buffering
before load balancing and the links happen to be fairly short and not
very diverse.

Switching back to per-packet would solve quite a lot of problems,
including elephant flows and bad hashing.

I would love to hear about recent studies.


/Benny



Re: Lossy cogent p2p experiences?

2023-09-06 Thread Masataka Ohta

William Herrin wrote:


I recognize what happens in the real world, not in the lab or text books.


What's the difference between theory and practice?


W.r.t. the fact that there are so many wrong theories
and wrong practices, there is no difference.


In theory, there is no difference.


Especially because the real world includes labs and text
books and, as such, all the theories including all the wrong
ones exist in the real world.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-06 Thread William Herrin
On Wed, Sep 6, 2023 at 12:23 AM Mark Tinka  wrote:
> I recognize what happens in the real world, not in the lab or text books.

What's the difference between theory and practice? In theory, there is
no difference.


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Masataka Ohta

Saku Ytti wrote:


Fun fact about the real world, devices do not internally guarantee
order. That is, even if you have identical latency links, 0
congestion, order is not guaranteed between packet1 coming from
interfaceI1 and packet2 coming from interfaceI2, which packet first
goes to interfaceE1 is unspecified.


So, you lack fundamental knowledge on the E2E argument fully
applicable to situations in the real world Internet.

In the very basic paper on the E2E argument published in 1984:

End-To-End Arguments in System Design

https://groups.csail.mit.edu/ana/Publications/PubPDFs/End-to-End%20Arguments%20in%20System%20Design.pdf

reordering is recognized both as the real and the theoretical
world as:

3.4 Guaranteeing FIFO Message Delivery
Ensuring that messages arrive at the receiver in the same
order in which they are sent is another function usually
assigned to the communication subsystem.

which means, according to the paper, the "function" of
reordering by network can not be complete or correct, and,
unlike you, I'm fully aware of it.

> This is because packets inside lookup engine can be sprayed to
> multiple lookup engines, and order is lost even for packets coming
> from interface1 exclusively, however after the lookup the order is
> restored for _flow_, it is not restored between flows, so packets
> coming from interface1 with random ports won't be same order going out
> from interface2.

That is a broken argument for how identification of flows by
intelligent intermediate entities could work against the E2E
argument and the reality initiated this thread.

In the real world, according to the E2E argument, attempts to identify
flows by intelligent intermediate entities is just harmful from the
beginning, which is why flow driven architecture including that of
MPLS is broken and hopeless.

I really hope you understand the meaning of "intelligent intermediate
entities" in the context of the E2E argument.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-06 Thread Saku Ytti
On Wed, 6 Sept 2023 at 10:27, Mark Tinka  wrote:

> I recognize what happens in the real world, not in the lab or text books.

Fun fact about the real world, devices do not internally guarantee
order. That is, even if you have identical latency links, 0
congestion, order is not guaranteed between packet1 coming from
interfaceI1 and packet2 coming from interfaceI2, which packet first
goes to interfaceE1 is unspecified.
This is because packets inside lookup engine can be sprayed to
multiple lookup engines, and order is lost even for packets coming
from interface1 exclusively, however after the lookup the order is
restored for _flow_, it is not restored between flows, so packets
coming from interface1 with random ports won't be same order going out
from interface2.

So order is only restored inside a single lookup complex (interfaces
are not guaranteed to be in the same complex) and only for actual
flows.

It is designed this way, because no one runs networks which rely on
order outside these parameters, and no one even knows their kit works
like this, because they don't have to.



-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/6/23 09:12, Masataka Ohta wrote:



you now recognize that per-flow load balancing is not a very
good idea.


You keep moving the goal posts. Stay on-topic.

I was asking you to clarify your post as to whether you were speaking of 
per-flow or per-packet load balancing. You did not do that, but I did 
not return to that question because your subsequent posts inferred that 
you were talking to per-packet load balancing.


And just because I said per-flow load balancing has been the gold 
standard for the last 25 years, does not mean it is the best solution. 
It just means it is the gold standard.


I recognize what happens in the real world, not in the lab or text books.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Masataka Ohta

Mark Tinka wrote:


Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?


No...


So, though you wrote:

>> If you have multiple parallel links over which many slow
>> TCP connections are running, which should be your assumption,
>> the proper thing to do is to use the links with round robin
>> fashion without hashing. Without buffer bloat, packet
>> reordering probability within each TCP connection is
>> negligible.
>
> So you mean, what... per-packet load balancing, in lieu of per-flow
> load balancing?

you now recognize that per-flow load balancing is not a very
good idea.

Good.


you are saying that.


See above to find my statement of "without hashing".

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/4/23 13:27, Nick Hilliard wrote:



this is an excellent example of what we're not talking about in this 
thread.


It is amusing how he tried to pivot the discussion. Nobody was talking 
about how lane transport in optical modules works.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-06 Thread Mark Tinka




On 9/4/23 13:04, Masataka Ohta wrote:


Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?


No... you are saying that.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-05 Thread Tom Beecher
>
> Cogent support has been about as bad as you can get.  Everything is great,
> clean your fiber, iperf isn’t a good test, install a physical loop oh wait
> we don’t want that so go pull it back off, new updates come at three to
> seven day intervals, etc.  If the performance had never been good to begin
> with I’d have just attributed this to their circuits, but since it worked
> until late June, I know something has changed.  I’m hoping someone else has
> run into this and maybe knows of some hints I could give them to
> investigate.  To me it sounds like there’s a rate limiter / policer defined
> somewhere in the circuit, or an overloaded interface/device we’re forced to
> traverse, but they assure me this is not the case and claim to have
> destroyed and rebuilt the logical circuit.
>

Sure smells like port buffer issues somewhere in the middle. ( mismatched
deep / shallow, or something configured to support jumbo frames, but
buffers not optimized for them)

On Thu, Aug 31, 2023 at 11:57 AM David Hubbard <
dhubb...@dino.hostasaurus.com> wrote:

> Hi all, curious if anyone who has used Cogent as a point to point provider
> has gone through packet loss issues with them and were able to successfully
> resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic
> locations that have about 52ms of latency.  Mine is set up to support both
> jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits,
> so they’re not like waves, and that the expected single session TCP
> performance may be limited to a few gbit/sec, but I should otherwise be
> able to fully utilize the circuit given enough flows.
>
>
>
> Circuit went live earlier this year, had zero issues with it.  Testing
> with common tools like iperf would allow several gbit/sec of TCP traffic
> using single flows, even without an optimized TCP stack.  Using parallel
> flows or UDP we could easily get close to wire speed.  Starting about ten
> weeks ago we had a significant slowdown, to even complete failure, of
> bursty data replication tasks between equipment that was using this
> circuit.  Rounds of testing demonstrate that new flows often experience
> significant initial packet loss of several thousand packets, and will then
> have ongoing lesser packet loss every five to ten seconds after that.
> There are times we can’t do better than 50 Mbit/sec, but it’s rare to
> achieve gigabit most of the time unless we do a bunch of streams with a lot
> of tuning.  UDP we also see the loss, but can still push many gigabits
> through with one sender, or wire speed with several nodes.
>
>
>
> For equipment which doesn’t use a tunable TCP stack, such as storage
> arrays or vmware, the retransmits completely ruin performance or may result
> in ongoing failure we can’t overcome.
>
>
>
> Cogent support has been about as bad as you can get.  Everything is great,
> clean your fiber, iperf isn’t a good test, install a physical loop oh wait
> we don’t want that so go pull it back off, new updates come at three to
> seven day intervals, etc.  If the performance had never been good to begin
> with I’d have just attributed this to their circuits, but since it worked
> until late June, I know something has changed.  I’m hoping someone else has
> run into this and maybe knows of some hints I could give them to
> investigate.  To me it sounds like there’s a rate limiter / policer defined
> somewhere in the circuit, or an overloaded interface/device we’re forced to
> traverse, but they assure me this is not the case and claim to have
> destroyed and rebuilt the logical circuit.
>
>
>
> Thanks!
>


Re: Lossy cogent p2p experiences?

2023-09-05 Thread Masataka Ohta

Nick Hilliard wrote:


Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?


this is an excellent example of what we're not talking about in this 
thread.


Not "we", but "you".

A 100G serdes is an unbuffered mechanism which includes a PLL, and this 
allows the style of clock/signal synchronisation required for the 
deserialised 4x25G lanes to be reserialised at the far end.  This is one 
of the mechanisms used for packet / cell / bit spray, and it works 
really well.


That's why I, instead of fully shared buffer, mentioned round robin
as the proper solution for the case.

This thread is talking about buffered transmission links on routers / 
switches on systems which provide no clocking synchronisation and not 
even a guarantee that the bearer circuits have comparable latencies. 
ECMP / hash based load balancing is a crock, no doubt about it;


See the first three lines of this mail to find that I explicitly
mentioned "equal speed multi parallel point to point links" as the
context for round robin.

As I already told you:

: In theory, you can always fabricate unrealistic counter examples
: against theories by ignoring essential assumptions of the theories.

you are keep ignoring essential assumptions for no good purposes.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-04 Thread Masataka Ohta

William Herrin wrote:


Well it doesn't show up in long slow pipes because the low
transmission speed spaces out the packets,


Wrong. That is a phenomenon with slow access and fast backbone,
which has nothing to do with this thread.

If backbone is as slow as access, there can be no "space out"
possible.


and it doesn't show up in
short fat pipes because there's not enough delay to cause the
burstiness.


Short pipe means speed of burst shows up continuously
without interruption.

> So I don't know how you figure it has nothing to do with
> long fat pipes,

That's your problem.

Masataka Ohta


Re: Lossy cogent p2p experiences?

2023-09-04 Thread William Herrin
On Mon, Sep 4, 2023 at 7:07 AM Masataka Ohta
 wrote:
> William Herrin wrote:
> > So, I've actually studied this in real-world conditions and TCP
> > behaves exactly as I described in my previous email for exactly the
> > reasons I explained.
>
> Yes of course, which is my point. Your problem is that your
> point of slow start has nothing to do with long fat pipe.

Well it doesn't show up in long slow pipes because the low
transmission speed spaces out the packets, and it doesn't show up in
short fat pipes because there's not enough delay to cause the
burstiness. So I don't know how you figure it has nothing to do with
long fat pipes, but you're plain wrong.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: Lossy cogent p2p experiences?

2023-09-04 Thread Masataka Ohta

William Herrin wrote:


No, not at all. First, though you explain slow start,
it has nothing to do with long fat pipe. Long fat
pipe problem is addressed by window scaling (and SACK).


So, I've actually studied this in real-world conditions and TCP
behaves exactly as I described in my previous email for exactly the
reasons I explained.


Yes of course, which is my point. Your problem is that your
point of slow start has nothing to do with long fat pipe.

> Window scaling and SACK makes it possible for TCP to grow to consume
> the entire whole end-to-end pipe when the pipe is at least as large as
> the originating interface and -empty- of other traffic.

Totally wrong.

Unless the pipe is long and fat, a plain TCP without window scaling
or SACK is to grow to consume the entire whole end-to-end pipe when
the pipe is at least as large as the originating interface and
-empty- of other traffic.

> Those
> conditions are rarely found in the real world.

It is usual that TCP consumes all the available bandwidth.

Exceptions, not so rare in the real world, are plain TCPs over
long fat pipes.

Masataka Ohta




Re: Lossy cogent p2p experiences?

2023-09-04 Thread William Herrin
On Mon, Sep 4, 2023 at 12:13 AM Masataka Ohta
 wrote:
> William Herrin wrote:
> > That sounds like normal TCP behavior over a long fat pipe.
>
> No, not at all. First, though you explain slow start,
> it has nothing to do with long fat pipe. Long fat
> pipe problem is addressed by window scaling (and SACK).

So, I've actually studied this in real-world conditions and TCP
behaves exactly as I described in my previous email for exactly the
reasons I explained. If you think it doesn't, you don't know what
you're talking about.

Window scaling and SACK makes it possible for TCP to grow to consume
the entire whole end-to-end pipe when the pipe is at least as large as
the originating interface and -empty- of other traffic. Those
conditions are rarely found in the real world.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: Lossy cogent p2p experiences?

2023-09-04 Thread Nick Hilliard

Masataka Ohta wrote on 04/09/2023 12:04:

Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?


this is an excellent example of what we're not talking about in this thread.

A 100G serdes is an unbuffered mechanism which includes a PLL, and this 
allows the style of clock/signal synchronisation required for the 
deserialised 4x25G lanes to be reserialised at the far end.  This is one 
of the mechanisms used for packet / cell / bit spray, and it works 
really well.


This thread is talking about buffered transmission links on routers / 
switches on systems which provide no clocking synchronisation and not 
even a guarantee that the bearer circuits have comparable latencies. 
ECMP / hash based load balancing is a crock, no doubt about it; it's 
just less crocked than other approaches where there are no guarantees 
about device and bearer circuit behaviour.


Nick


Re: Lossy cogent p2p experiences?

2023-09-04 Thread Masataka Ohta

Mark Tinka wrote:


ECMP, surely, is a too abstract concept to properly manage/operate
simple situations with equal speed multi parallel point to point links.


I must have been doing something wrong for the last 25 years.


Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-04 Thread Masataka Ohta

William Herrin wrote:


Hi David,

That sounds like normal TCP behavior over a long fat pipe.


No, not at all. First, though you explain slow start,
it has nothing to do with long fat pipe. Long fat
pipe problem is addressed by window scaling (and SACK).

As David Hubbard wrote:

: I've got a non-rate-limited 10gig circuit

and

: The initial and recurring packet loss occurs on any flow of
: more than ~140 Mbit.

the problem is caused not by wire speed limitation of a "fat"
pipe but by artificial policing at 140M.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-04 Thread Masataka Ohta

Nick Hilliard wrote:


In this case, "Without buffer bloat" is an essential assumption.


I can see how this conclusion could potentially be reached in
specific styles of lab configs,


I'm not interested in how poorly you configure your
lab.


but the real world is more complicated and


And, this thread was initiated because of unreasonable
behavior apparently caused by stupid attempts for
automatic flow detection followed by policing.

That is the real world.

Moreover, it has been well known both in theory and
practice that flow driven architecture relying on
automatic detection of flows does not scale and is
no good, though MPLS relies on the broken flow
driven architecture.

> Generally in real world situations on the internet, packet reordering
> will happen if you use round robin, and this will impact performance
> for higher speed flows.

That is my point already stated by me. You don't have to repeat
it again.

> It's true that per-hash load
> balancing is a nuisance, but it works better in practice on larger
> heterogeneous networks than RR.

Here, you implicitly assume large number of slower speed flows
against your statement of "higher speed flows".

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-03 Thread Nick Hilliard

Masataka Ohta wrote on 03/09/2023 14:32:

See, for example, the famous paper of "Sizing Router Buffers".

With thousands of TCP connections at the backbone recognized
by the paper, buffers with thousands of packets won't cause
packet reordering.

What you said reminds me of the old saying: in theory, there's no 
difference between theory and practice, but in practice there is.


In theory, you can always fabricate unrealistic counter examples
against theories by ignoring essential assumptions of the theories.

In this case, "Without buffer bloat" is an essential assumption.


I can see how this conclusion could potentially be reached in specific 
styles of lab configs, but the real world is more complicated and the 
assumptions you've made don't hold there, especially the implicit ones. 
Buffer bloat will make this problem worse, but small buffers won't 
eliminate the problem.


That isn't to say that packet / cell spray arrangements can't work. 
There are some situations where they can work reasonably well, given 
specific constraints, e.g. limited distance transmission path and path 
congruence with far-side reassembly (!), but these are the exception. 
Usually this only happens inside network devices rather than between 
devices, but occasionally you see products on the market which support 
this between devices with varying degrees of success.


Generally in real world situations on the internet, packet reordering 
will happen if you use round robin, and this will impact performance for 
higher speed flows. There are several reasons for this, but mostly they 
boil down to a lack of control over the exact profile of the packets 
that the devices are expected to transmit, and no guarantee that the 
individual bearer channels have identical transmission characteristics. 
Then multiply that across the N load-balanced hops that each flow will 
take between source and destination.  It's true that per-hash load 
balancing is a nuisance, but it works better in practice on larger 
heterogeneous networks than RR.


Nick



Re: Lossy cogent p2p experiences?

2023-09-03 Thread William Herrin
On Thu, Aug 31, 2023 at 2:42 PM David Hubbard
 wrote:
> any new TCP flow is subject to numerous dropped packets at establishment and 
> then ongoing loss every five to ten seconds.

Hi David,

That sounds like normal TCP behavior over a long fat pipe. After
establishment, TCP sends a burst of 10 packets at wire speed. There's
a long delay and then they basically get acked all at once so it sends
another burst of 20 packets this time. This doubling burst repeats
itself until one of the bursts  overwhelms the buffers of a mid-path
device, causing one or a bunch of them to be lost. That kicks it out
of "slow start" so that it stops trying to double the window size
every time. Depending on how aggressive your congestion control
algorithm is, it then slightly increases the window size until it
loses packets, and then falls back to a smaller size.

It actually takes quite a while for the packets to spread out over the
whole round trip time. They like to stay bunched up in bursts. If
those bursts align with other users' traffic and overwhelm a midpoint
buffer again, well, there you go.

I have a hypothesis that TCP performance could be improved by
intentionally spreading out the early packets. Essentially, upon
receiving an ack to the first packet that contained data, start a rate
limiter that allows only one packet per 1/20th of the round trip time
to be sent for the next 20 packets. I left the job where I was looking
at that and haven't been back to it.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: Lossy cogent p2p experiences?

2023-09-03 Thread Mark Tinka




On 9/3/23 15:01, Masataka Ohta wrote:


Why, do you think, you can rely on existence of flows?


You have not quite answered my question - but I will assume you are in 
favour of per-packet load balancing.


I have deployed per-packet load balancing before, ironically, trying to 
deal with large EoMPLS flows in a LAG more than a decade ago. I won't be 
doing that again... OoO packets is nasty at scale.




And nothing beyond, of course.


No serious operator polices in the core.



ECMP, surely, is a too abstract concept to properly manage/operate
simple situations with equal speed multi parallel point to point links.


I must have been doing something wrong for the last 25 years.

Mark.



Re: Lossy cogent p2p experiences?

2023-09-03 Thread Mark Tinka




On 9/3/23 15:01, Masataka Ohta wrote:


Why, do you think, you can rely on existence of flows?


You have not quite answered my question - but I will assume you are in 
favour of per-packet load balancing.


I have deployed per-packet load balancing before, ironically, trying to 
deal with large EoMPLS flows in a LAG more than a decade ago. I won't be 
doing that again... OoO packets is nasty at scale.




And nothing beyond, of course.


No serious operators polices in the core.



ECMP, surely, is a too abstract concept to properly manage/operate
simple situations with equal speed multi parallel point to point links.


I must have been doing something wrong for the last 25 years.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-03 Thread Masataka Ohta

Nick Hilliard wrote:


the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering probability within each TCP connection is
negligible.


Can you provide some real world data to back this position up?


See, for example, the famous paper of "Sizing Router Buffers".

With thousands of TCP connections at the backbone recognized
by the paper, buffers with thousands of packets won't cause
packet reordering.

What you said reminds me of the old saying: in theory, there's no 
difference between theory and practice, but in practice there is.


In theory, you can always fabricate unrealistic counter examples
against theories by ignoring essential assumptions of the theories.

In this case, "Without buffer bloat" is an essential assumption.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-03 Thread Masataka Ohta

Mark Tinka wrote:

So you mean, what... per-packet load balancing, in lieu of per-flow load 
balancing?


Why, do you think, you can rely on existence of flows?


So, if you internally have 10 parallel 1G circuits expecting
perfect hashing over them, it is not "non-rate-limited 10gig".


It is understood in the operator space that "rate limiting" generally 
refers to policing at the edge/access.


And nothing beyond, of course.

The core is always abstracted, and that is just capacity planning and 
management by the operator.


ECMP, surely, is a too abstract concept to properly manage/operate
simple situations with equal speed multi parallel point to point links.

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-03 Thread Nick Hilliard

Masataka Ohta wrote on 03/09/2023 08:59:

the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering probability within each TCP connection is
negligible.


Can you provide some real world data to back this position up?

What you said reminds me of the old saying: in theory, there's no 
difference between theory and practice, but in practice there is.


Nick


Re: Lossy cogent p2p experiences?

2023-09-03 Thread Mark Tinka




On 9/3/23 09:59, Masataka Ohta wrote:



If you have multiple parallel links over which many slow
TCP connections are running, which should be your assumption,
the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering probability within each TCP connection is
negligible.


So you mean, what... per-packet load balancing, in lieu of per-flow load 
balancing?





So, if you internally have 10 parallel 1G circuits expecting
perfect hashing over them, it is not "non-rate-limited 10gig".


It is understood in the operator space that "rate limiting" generally 
refers to policing at the edge/access.


The core is always abstracted, and that is just capacity planning and 
management by the operator.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-03 Thread Masataka Ohta

Mark Tinka wrote:


Wrong. It can be performed only at the edges by policing total
incoming traffic without detecting flows.


I am not talking about policing in the core, I am talking about 
detection in the core.


I'm not talking about detection at all.

Policing at the edge is pretty standard. You can police a 50Gbps EoMPLS 
flow coming in from a customer port in the edge. If you've got N x 
10Gbps links in the core and the core is unable to detect that flow in 
depth to hash it across all those 10Gbps links, you can end up putting 
all or a good chunk of that 50Gbps of EoMPLS traffic into a single 
10Gbps link in the core, despite all other 10Gbps links having ample 
capacity available.


Relying on hash is a poor way to offer wide bandwidth.

If you have multiple parallel links over which many slow
TCP connections are running, which should be your assumption,
the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering probability within each TCP connection is
negligible.

Faster TCP may suffer from packet reordering during slight
congestion, but the effect is like that of RED.

Anyway, in this case, the situation is:

:Moreover, as David Hubbard wrote:
:> I've got a non-rate-limited 10gig circuit

So, if you internally have 10 parallel 1G circuits expecting
perfect hashing over them, it is not "non-rate-limited 10gig".

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-02 Thread Nick Hilliard

Masataka Ohta wrote on 02/09/2023 16:04:

100 50Mbps flows are as harmful as 1 5Gbps flow.


This is quite an unusual opinion. Maybe you could explain?

Nick


Re: Lossy cogent p2p experiences?

2023-09-02 Thread Mark Tinka




On 9/2/23 17:38, Masataka Ohta wrote:


Wrong. It can be performed only at the edges by policing total
incoming traffic without detecting flows.


I am not talking about policing in the core, I am talking about 
detection in the core.


Policing at the edge is pretty standard. You can police a 50Gbps EoMPLS 
flow coming in from a customer port in the edge. If you've got N x 
10Gbps links in the core and the core is unable to detect that flow in 
depth to hash it across all those 10Gbps links, you can end up putting 
all or a good chunk of that 50Gbps of EoMPLS traffic into a single 
10Gbps link in the core, despite all other 10Gbps links having ample 
capacity available.




There is no such algorithms because, as I wrote:

: 100 50Mbps flows are as harmful as 1 5Gbps flow.


Do you operate a large scale IP/MPLS network? Because I do, and I know 
what I see with the equipment we deploy.


You are welcome to deny it all you want, however. Not much I can do 
about that.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-02 Thread Masataka Ohta

Mark Tinka wrote:

it is the 
core's ability to balance the Layer 2 payload across multiple links 
effectively.


Wrong. It can be performed only at the edges by policing total
incoming traffic without detecting flows.


While some vendors have implemented adaptive load balancing algorithms


There is no such algorithms because, as I wrote:

: 100 50Mbps flows are as harmful as 1 5Gbps flow.

Masataka Ohta


Re: Lossy cogent p2p experiences?

2023-09-02 Thread Mark Tinka




On 9/2/23 17:04, Masataka Ohta wrote:


Both of you are totally wrong, because the proper thing to do
here is to police, if *ANY*, based on total traffic without
detecting any flow.


I don't think it's as much an issue of flow detection as it is the 
core's ability to balance the Layer 2 payload across multiple links 
effectively.


At our shop, we understand the limitations of trying to carry large 
EoMPLS flows across an IP/MPLS network that is, primarily, built to 
carry IP traffic.


While some vendors have implemented adaptive load balancing algorithms 
on decent (if not custom) silicon that can balance EoMPLS flows as well 
as they can IP flows, it is hit & miss depending on the code, hardware, 
vendor, e.t.c.


In our case, our ability to load balance EoMPLS flows as well as we do 
IP flows has improved since we moved to the PTX1000/10001 for our core 
routers. But even then, we will not sell anything above 40Gbps as an 
EoMPLS service. Once it gets there, time for EoDWDM. At least, until 
800Gbps or 1Tbps Ethernet ports become both technically viable and 
commercially feasible.


For as long as core links are based on 100Gbps and 400Gbps ports, 
optical carriage for 40Gbps and above is more sensible than EoMPLS.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-02 Thread Masataka Ohta

Mark Tinka wrote:


On 9/1/23 15:59, Mike Hammett wrote:


I wouldn't call 50 megabit/s an elephant flow


Fair point.


Both of you are totally wrong, because the proper thing to do
here is to police, if *ANY*, based on total traffic without
detecting any flow.

100 50Mbps flows are as harmful as 1 5Gbps flow.

Moreover, as David Hubbard wrote:

> I’ve got a non-rate-limited 10gig circuit

there is no point of policing.

Detection of elephant flows were wrongly considered useful
with flow driven architecture to automatically bypass L3
processing for the flows, when L3 processing capability
were wrongly considered limited.

Then, topology driven architecture of MPLS appeared, even
though topology driven is flow driven (you can't put inner
labels of MPLS without knowing detailed routing information
at the destinations, which is hidden at the source through
route aggregation, on demand after detecting flows.)

Masataka Ohta



Re: Lossy cogent p2p experiences?

2023-09-02 Thread Mark Tinka




On 9/2/23 08:43, Saku Ytti wrote:


What in particular are you missing?
As I explained, PTX/MX both allow for example speculating on transit
pseudowires having CW on them. Which is non-default and requires
'zero-control-word'. You should be looking at 'hash-key' on PTX and
'enhanced-hash-key' on MX.  You don't appear to have a single stanza
configured, but I do wonder what you wanted to configure when you
noticed the missing ability to do so.


Sorry for the confusion - let me provide some background context since 
we deployed the PTX ages ago (and core nodes are typically boring).


The issue we ran into was to do with our deployment tooling, which was 
based on 'enhanced-hash-key' that is required for MPC's on the MX.


The tooling used to deploy the PTX was largely built on what we use to 
deploy the MX, with tweaks of critically different items. At the time, 
we did not know that the PTX required 'hash-key' as opposed to 
'enhanced-hash-key'. So nothing got deployed on the PTX specifically for 
load balancing (we might have assumed it to have been non-existent or 
incomplete feature at the time).


So the "surprise" I speak of is how well it all worked with load 
balancing across LAG's and EoMPLS traffic compared to the CRS-X, despite 
not having any load balancing features explicitly configured, which is 
still the case today.


It works, so we aren't keen to break it.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-02 Thread Saku Ytti
On Fri, 1 Sept 2023 at 22:56, Mark Tinka  wrote:

> PTX1000/10001 (Express) offers no real configurable options for load
> balancing the same way MX (Trio) does. This is what took us by surprise.

What in particular are you missing?

As I explained, PTX/MX both allow for example speculating on transit
pseudowires having CW on them. Which is non-default and requires
'zero-control-word'. You should be looking at 'hash-key' on PTX and
'enhanced-hash-key' on MX.  You don't appear to have a single stanza
configured, but I do wonder what you wanted to configure when you
noticed the missing ability to do so.

-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mark Tinka



On 9/1/23 21:52, Mike Hammett wrote:

It doesn't help the OP at all, but this is why (thus far, anyway), I 
overwhelmingly prefer wavelength transport to anything switched. Can't 
have over-subscription or congestion issues on a wavelength.


Large IP/MPLS operators insist on optical transport for their own 
backbone, but are more than willing to sell packet for transport. I find 
this amusing :-).


I submit that customers who can't afford large links (1Gbps or below) 
are forced into EoMPLS transport due to cost.


Other customers are also forced into EoMPLS transport because there is 
no other option for long haul transport in their city other than a 
provider who can only offer EoMPLS.


There is a struggling trend from some medium sized operators looking to 
turn an optical network into a packet network, i.e., they will ask for a 
100Gbps EoDWDM port, but only seek to pay for a 25Gbps service. The 
large port is to allow them to scale in the future without too much 
hassle, but they want to pay for the bandwidth they use, which is hard 
to limit anyway if it's a proper EoDWDM channel. I am swatting such 
requests away because you tie up a full 100Gbps channel on the line side 
for the majority of hardware that does pure EoDWDM, which is a 
contradiction to the reason a packet network makes sense for sub-rate 
services.


Mark.

Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mark Tinka




On 9/1/23 15:55, Saku Ytti wrote:


Personally I would recommend turning off LSR payload heuristics,
because there is no accurate way for LSR to tell what the label is
carrying, and wrong guess while rare will be extremely hard to root
cause, because you will never hear it, because the person suffering
from it is too many hops away from problem being in your horizon.
I strongly believe edge imposing entropy or fat is the right way to
give LSR hashing hints.


PTX1000/10001 (Express) offers no real configurable options for load 
balancing the same way MX (Trio) does. This is what took us by surprise.


This is all we have on our PTX:

tinka@router# show forwarding-options
family inet6 {
    route-accounting;
}
load-balance-label-capability;

[edit]
tinka@router#

Mark.


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mike Hammett
It doesn't help the OP at all, but this is why (thus far, anyway), I 
overwhelmingly prefer wavelength transport to anything switched. Can't have 
over-subscription or congestion issues on a wavelength. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "David Hubbard"  
To: "Nanog@nanog.org"  
Sent: Thursday, August 31, 2023 10:55:19 AM 
Subject: Lossy cogent p2p experiences? 



Hi all, curious if anyone who has used Cogent as a point to point provider has 
gone through packet loss issues with them and were able to successfully 
resolve? I’ve got a non-rate-limited 10gig circuit between two geographic 
locations that have about 52ms of latency. Mine is set up to support both jumbo 
frames and vlan tagging. I do know Cogent packetizes these circuits, so they’re 
not like waves, and that the expected single session TCP performance may be 
limited to a few gbit/sec, but I should otherwise be able to fully utilize the 
circuit given enough flows. 

Circuit went live earlier this year, had zero issues with it. Testing with 
common tools like iperf would allow several gbit/sec of TCP traffic using 
single flows, even without an optimized TCP stack. Using parallel flows or UDP 
we could easily get close to wire speed. Starting about ten weeks ago we had a 
significant slowdown, to even complete failure, of bursty data replication 
tasks between equipment that was using this circuit. Rounds of testing 
demonstrate that new flows often experience significant initial packet loss of 
several thousand packets, and will then have ongoing lesser packet loss every 
five to ten seconds after that. There are times we can’t do better than 50 
Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a 
bunch of streams with a lot of tuning. UDP we also see the loss, but can still 
push many gigabits through with one sender, or wire speed with several nodes. 

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or 
vmware, the retransmits completely ruin performance or may result in ongoing 
failure we can’t overcome. 

Cogent support has been about as bad as you can get. Everything is great, clean 
your fiber, iperf isn’t a good test, install a physical loop oh wait we don’t 
want that so go pull it back off, new updates come at three to seven day 
intervals, etc. If the performance had never been good to begin with I’d have 
just attributed this to their circuits, but since it worked until late June, I 
know something has changed. I’m hoping someone else has run into this and maybe 
knows of some hints I could give them to investigate. To me it sounds like 
there’s a rate limiter / policer defined somewhere in the circuit, or an 
overloaded interface/device we’re forced to traverse, but they assure me this 
is not the case and claim to have destroyed and rebuilt the logical circuit. 

Thanks! 


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mark Tinka



On 9/1/23 15:59, Mike Hammett wrote:


I wouldn't call 50 megabit/s an elephant flow


Fair point.

Mark.

RE: Lossy cogent p2p experiences?

2023-09-01 Thread Tony Wicks
Yes adaptive load balancing very much helps but the weakness is it is normally 
only fully supported on vendor silicon not merchant silicon. Much of the 
transport edge is merchant silicon due to the per packet cost being far lower 
and the general requirement to just pass not manipulate packets. Using the 
Nokia kit for example the 7750 does a great job of "adaptive-load-balancing" 
but the 7250 is lacklustre at best.

-Original Message-
From: NANOG  On Behalf Of Saku Ytti
Sent: Friday, September 1, 2023 8:51 PM
To: Eric Kuhnke 
Cc: nanog@nanog.org
Subject: Re: Lossy cogent p2p experiences?

Luckily there is quite a reasonable solution to the problem, called 'adaptive 
load balancing', where software monitors balancing, and biases the hash_result 
=> egress_interface tables to improve balancing when dealing with elephant 
flows.




Re: Lossy cogent p2p experiences?

2023-09-01 Thread Saku Ytti
On Fri, 1 Sept 2023 at 18:37, Lukas Tribus  wrote:

> On the hand a workaround at the edge at least for EoMPLS would be to
> enable control-word.

Juniper LSR can actually do heuristics on pseudowires with CW.

-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Lukas Tribus
On Fri, 1 Sept 2023 at 15:55, Saku Ytti  wrote:
>
> On Fri, 1 Sept 2023 at 16:46, Mark Tinka  wrote:
>
> > Yes, this was our conclusion as well after moving our core to PTX1000/10001.
>
> Personally I would recommend turning off LSR payload heuristics,
> because there is no accurate way for LSR to tell what the label is
> carrying, and wrong guess while rare will be extremely hard to root
> cause, because you will never hear it, because the person suffering
> from it is too many hops away from problem being in your horizon.
> I strongly believe edge imposing entropy or fat is the right way to
> give LSR hashing hints.

If you need to load-balance labelled IP traffic though, all your edge
devices would have to impose entropy/fat.

On the hand a workaround at the edge at least for EoMPLS would be to
enable control-word.


Lukas


Re: Lossy cogent p2p experiences?

2023-09-01 Thread David Hubbard
The initial and recurring packet loss occurs on any flow of more than ~140 
Mbit.  The fact that it’s loss-free under that rate is what furthers my opinion 
it’s config-based somewhere, even though they say it isn’t.

From: NANOG  on behalf 
of Mark Tinka 
Date: Friday, September 1, 2023 at 10:13 AM
To: Mike Hammett , Saku Ytti 
Cc: nanog@nanog.org 
Subject: Re: Lossy cogent p2p experiences?

On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a network 
that can't deliver anything acceptable.

Unless Cogent are not trying to accept (and by extension, may not be able to 
guarantee) large Ethernet flows because they can't balance them across their 
various core links, end-to-end...

Pure conjecture...

Mark.


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mark Tinka



On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a 
network that can't deliver anything acceptable.


Unless Cogent are not trying to accept (and by extension, may not be 
able to guarantee) large Ethernet flows because they can't balance them 
across their various core links, end-to-end...


Pure conjecture...

Mark.

Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mike Hammett
I wouldn't call 50 megabit/s an elephant flow 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Mark Tinka"  
To: "Mike Hammett" , "Saku Ytti"  
Cc: nanog@nanog.org 
Sent: Friday, September 1, 2023 8:56:03 AM 
Subject: Re: Lossy cogent p2p experiences? 




On 9/1/23 15:44, Mike Hammett wrote: 



and I would say the OP wasn't even about elephant flows, just about a network 
that can't deliver anything acceptable. 



Unless Cogent are not trying to accept (and by extension, may not be able to 
guarantee) large Ethernet flows because they can't balance them across their 
various core links, end-to-end... 

Pure conjecture... 

Mark. 



Re: Lossy cogent p2p experiences?

2023-09-01 Thread Saku Ytti
On Fri, 1 Sept 2023 at 16:46, Mark Tinka  wrote:

> Yes, this was our conclusion as well after moving our core to PTX1000/10001.

Personally I would recommend turning off LSR payload heuristics,
because there is no accurate way for LSR to tell what the label is
carrying, and wrong guess while rare will be extremely hard to root
cause, because you will never hear it, because the person suffering
from it is too many hops away from problem being in your horizon.
I strongly believe edge imposing entropy or fat is the right way to
give LSR hashing hints.


-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mark Tinka




On 9/1/23 15:29, Saku Ytti wrote:


PTX and MX as LSR look inside pseudowire to see if it's IP (dangerous
guess to make for LSR), CSR/ASR9k does not. So PTX and MX LSR will
balance your pseudowire even without FAT.


Yes, this was our conclusion as well after moving our core to PTX1000/10001.

Mark.


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mike Hammett
and I would say the OP wasn't even about elephant flows, just about a network 
that can't deliver anything acceptable. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Saku Ytti"  
To: "Mark Tinka"  
Cc: nanog@nanog.org 
Sent: Friday, September 1, 2023 8:29:12 AM 
Subject: Re: Lossy cogent p2p experiences? 

On Fri, 1 Sept 2023 at 14:54, Mark Tinka  wrote: 

> When we switched our P devices to PTX1000 and PTX10001, we've had 
> surprisingly good performance of all manner of traffic across native 
> IP/MPLS and 802.1AX links, even without explicitly configuring FAT for 
> EoMPLS traffic. 

PTX and MX as LSR look inside pseudowire to see if it's IP (dangerous 
guess to make for LSR), CSR/ASR9k does not. So PTX and MX LSR will 
balance your pseudowire even without FAT. I've had no problem having 
ASR9k LSR balancing FAT PWs. 

However this is a bit of a sidebar, because the original problem is 
about elephant flows, which FAT does not help with. But adaptive 
balancing does. 


-- 
++ytti 



Re: Lossy cogent p2p experiences?

2023-09-01 Thread Saku Ytti
On Fri, 1 Sept 2023 at 14:54, Mark Tinka  wrote:

> When we switched our P devices to PTX1000 and PTX10001, we've had
> surprisingly good performance of all manner of traffic across native
> IP/MPLS and 802.1AX links, even without explicitly configuring FAT for
> EoMPLS traffic.

PTX and MX as LSR look inside pseudowire to see if it's IP (dangerous
guess to make for LSR), CSR/ASR9k does not. So PTX and MX LSR will
balance your pseudowire even without FAT. I've had no problem having
ASR9k LSR balancing FAT PWs.

However this is a bit of a sidebar, because the original problem is
about elephant flows, which FAT does not help with. But adaptive
balancing does.


-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Mark Tinka




On 9/1/23 10:50, Saku Ytti wrote:


It is a very plausible theory, and everyone has this problem to a
lesser or greater degree. There was a time when edge interfaces were
much lower capacity than backbone interfaces, but I don't think that
time will ever come back. So this problem is systemic.
Luckily there is quite a reasonable solution to the problem, called
'adaptive load balancing', where software monitors balancing, and
biases the hash_result => egress_interface tables to improve balancing
when dealing with elephant flows.


We didn't have much success with FAT when the PE was an MX480 and the P 
a CRS-X (FP40 + FP140 line cards). This was regardless of whether the 
core links were native IP/MPLS or 802.1AX.


When we switched our P devices to PTX1000 and PTX10001, we've had 
surprisingly good performance of all manner of traffic across native 
IP/MPLS and 802.1AX links, even without explicitly configuring FAT for 
EoMPLS traffic.


Of course, our policy is to never transport EoMPLS servics in excess of 
40Gbps. Once a customer requires 41Gbps of EoMPLS service or more, we 
move them to EoDWDM. Cheaper and more scalable that way. It does help 
that we operate both a Transport and IP/MPLS network, but I understand 
this may not be the case for most networks.


Mark.


Re: Lossy cogent p2p experiences?

2023-09-01 Thread Saku Ytti
On Thu, 31 Aug 2023 at 23:56, Eric Kuhnke  wrote:

> The best working theory that several people I know in the neteng community 
> have come up with is because Cogent does not want to adversely impact all 
> other customers on their router in some sites, where the site's upstreams and 
> links to neighboring POPs are implemented as something like 4 x 10 Gbps. In 
> places where they have not upgraded that specific router to a full 100 Gbps 
> upstream. Moving large flows >2Gbps could result in flat topping a traffic 
> chart on just 1 of those 10Gbps circuits.

It is a very plausible theory, and everyone has this problem to a
lesser or greater degree. There was a time when edge interfaces were
much lower capacity than backbone interfaces, but I don't think that
time will ever come back. So this problem is systemic.
Luckily there is quite a reasonable solution to the problem, called
'adaptive load balancing', where software monitors balancing, and
biases the hash_result => egress_interface tables to improve balancing
when dealing with elephant flows.

-- 
  ++ytti


Re: Lossy cogent p2p experiences?

2023-08-31 Thread David Hubbard
That’s not what I’m trying to do, that’s just what I’m using during testing to 
demonstrate the loss to them.  It’s intended to bridge a number of networks 
with hundreds of flows, including inbound internet sources, but any new TCP 
flow is subject to numerous dropped packets at establishment and then ongoing 
loss every five to ten seconds.  The initial loss and ongoing bursts of loss 
cause the TCP window to shrink so much that any single flow, between systems 
that can’t be optimized, ends up varying from 50 Mbit/sec to something far 
short of a gigabit.  It was also fine for six months before this miserable 
behavior began in late June.


From: Eric Kuhnke 
Date: Thursday, August 31, 2023 at 4:51 PM
To: David Hubbard 
Cc: Nanog@nanog.org 
Subject: Re: Lossy cogent p2p experiences?
Cogent has asked many people NOT to purchase their ethernet private circuit 
point to point service unless they can guarantee that you won't move any single 
flow of greater than 2 Gbps. This works fine as long as the service is used 
mostly for mixed IP traffic like a bunch of randomly mixed customers together.

What you are trying to do is probably against the guidelines their engineering 
group has given them for what they can sell now.

This is a known weird limitation with Cogent's private circuit service.

The best working theory that several people I know in the neteng community have 
come up with is because Cogent does not want to adversely impact all other 
customers on their router in some sites, where the site's upstreams and links 
to neighboring POPs are implemented as something like 4 x 10 Gbps. In places 
where they have not upgraded that specific router to a full 100 Gbps upstream. 
Moving large flows >2Gbps could result in flat topping a traffic chart on just 
1 of those 10Gbps circuits.



On Thu, Aug 31, 2023 at 10:04 AM David Hubbard 
mailto:dhubb...@dino.hostasaurus.com>> wrote:
Hi all, curious if anyone who has used Cogent as a point to point provider has 
gone through packet loss issues with them and were able to successfully 
resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic 
locations that have about 52ms of latency.  Mine is set up to support both 
jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits, so 
they’re not like waves, and that the expected single session TCP performance 
may be limited to a few gbit/sec, but I should otherwise be able to fully 
utilize the circuit given enough flows.

Circuit went live earlier this year, had zero issues with it.  Testing with 
common tools like iperf would allow several gbit/sec of TCP traffic using 
single flows, even without an optimized TCP stack.  Using parallel flows or UDP 
we could easily get close to wire speed.  Starting about ten weeks ago we had a 
significant slowdown, to even complete failure, of bursty data replication 
tasks between equipment that was using this circuit.  Rounds of testing 
demonstrate that new flows often experience significant initial packet loss of 
several thousand packets, and will then have ongoing lesser packet loss every 
five to ten seconds after that.  There are times we can’t do better than 50 
Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a 
bunch of streams with a lot of tuning.  UDP we also see the loss, but can still 
push many gigabits through with one sender, or wire speed with several nodes.

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or 
vmware, the retransmits completely ruin performance or may result in ongoing 
failure we can’t overcome.

Cogent support has been about as bad as you can get.  Everything is great, 
clean your fiber, iperf isn’t a good test, install a physical loop oh wait we 
don’t want that so go pull it back off, new updates come at three to seven day 
intervals, etc.  If the performance had never been good to begin with I’d have 
just attributed this to their circuits, but since it worked until late June, I 
know something has changed.  I’m hoping someone else has run into this and 
maybe knows of some hints I could give them to investigate.  To me it sounds 
like there’s a rate limiter / policer defined somewhere in the circuit, or an 
overloaded interface/device we’re forced to traverse, but they assure me this 
is not the case and claim to have destroyed and rebuilt the logical circuit.

Thanks!


Re: Lossy cogent p2p experiences?

2023-08-31 Thread Eric Kuhnke
Cogent has asked many people NOT to purchase their ethernet private circuit
point to point service unless they can guarantee that you won't move any
single flow of greater than 2 Gbps. This works fine as long as the service
is used mostly for mixed IP traffic like a bunch of randomly mixed
customers together.

What you are trying to do is probably against the guidelines their
engineering group has given them for what they can sell now.

This is a known weird limitation with Cogent's private circuit service.

The best working theory that several people I know in the neteng community
have come up with is because Cogent does not want to adversely impact all
other customers on their router in some sites, where the site's upstreams
and links to neighboring POPs are implemented as something like 4 x 10
Gbps. In places where they have not upgraded that specific router to a full
100 Gbps upstream. Moving large flows >2Gbps could result in flat topping a
traffic chart on just 1 of those 10Gbps circuits.



On Thu, Aug 31, 2023 at 10:04 AM David Hubbard <
dhubb...@dino.hostasaurus.com> wrote:

> Hi all, curious if anyone who has used Cogent as a point to point provider
> has gone through packet loss issues with them and were able to successfully
> resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic
> locations that have about 52ms of latency.  Mine is set up to support both
> jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits,
> so they’re not like waves, and that the expected single session TCP
> performance may be limited to a few gbit/sec, but I should otherwise be
> able to fully utilize the circuit given enough flows.
>
>
>
> Circuit went live earlier this year, had zero issues with it.  Testing
> with common tools like iperf would allow several gbit/sec of TCP traffic
> using single flows, even without an optimized TCP stack.  Using parallel
> flows or UDP we could easily get close to wire speed.  Starting about ten
> weeks ago we had a significant slowdown, to even complete failure, of
> bursty data replication tasks between equipment that was using this
> circuit.  Rounds of testing demonstrate that new flows often experience
> significant initial packet loss of several thousand packets, and will then
> have ongoing lesser packet loss every five to ten seconds after that.
> There are times we can’t do better than 50 Mbit/sec, but it’s rare to
> achieve gigabit most of the time unless we do a bunch of streams with a lot
> of tuning.  UDP we also see the loss, but can still push many gigabits
> through with one sender, or wire speed with several nodes.
>
>
>
> For equipment which doesn’t use a tunable TCP stack, such as storage
> arrays or vmware, the retransmits completely ruin performance or may result
> in ongoing failure we can’t overcome.
>
>
>
> Cogent support has been about as bad as you can get.  Everything is great,
> clean your fiber, iperf isn’t a good test, install a physical loop oh wait
> we don’t want that so go pull it back off, new updates come at three to
> seven day intervals, etc.  If the performance had never been good to begin
> with I’d have just attributed this to their circuits, but since it worked
> until late June, I know something has changed.  I’m hoping someone else has
> run into this and maybe knows of some hints I could give them to
> investigate.  To me it sounds like there’s a rate limiter / policer defined
> somewhere in the circuit, or an overloaded interface/device we’re forced to
> traverse, but they assure me this is not the case and claim to have
> destroyed and rebuilt the logical circuit.
>
>
>
> Thanks!
>


Lossy cogent p2p experiences?

2023-08-31 Thread David Hubbard
Hi all, curious if anyone who has used Cogent as a point to point provider has 
gone through packet loss issues with them and were able to successfully 
resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic 
locations that have about 52ms of latency.  Mine is set up to support both 
jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits, so 
they’re not like waves, and that the expected single session TCP performance 
may be limited to a few gbit/sec, but I should otherwise be able to fully 
utilize the circuit given enough flows.

Circuit went live earlier this year, had zero issues with it.  Testing with 
common tools like iperf would allow several gbit/sec of TCP traffic using 
single flows, even without an optimized TCP stack.  Using parallel flows or UDP 
we could easily get close to wire speed.  Starting about ten weeks ago we had a 
significant slowdown, to even complete failure, of bursty data replication 
tasks between equipment that was using this circuit.  Rounds of testing 
demonstrate that new flows often experience significant initial packet loss of 
several thousand packets, and will then have ongoing lesser packet loss every 
five to ten seconds after that.  There are times we can’t do better than 50 
Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a 
bunch of streams with a lot of tuning.  UDP we also see the loss, but can still 
push many gigabits through with one sender, or wire speed with several nodes.

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or 
vmware, the retransmits completely ruin performance or may result in ongoing 
failure we can’t overcome.

Cogent support has been about as bad as you can get.  Everything is great, 
clean your fiber, iperf isn’t a good test, install a physical loop oh wait we 
don’t want that so go pull it back off, new updates come at three to seven day 
intervals, etc.  If the performance had never been good to begin with I’d have 
just attributed this to their circuits, but since it worked until late June, I 
know something has changed.  I’m hoping someone else has run into this and 
maybe knows of some hints I could give them to investigate.  To me it sounds 
like there’s a rate limiter / policer defined somewhere in the circuit, or an 
overloaded interface/device we’re forced to traverse, but they assure me this 
is not the case and claim to have destroyed and rebuilt the logical circuit.

Thanks!