Re: [Int-area] [EXTERNAL] Re: IP Parcels improves performance for end systems

Joel M. Halpern Thu, 24 Mar 2022 13:38:38 -0700

I understood that.  I just don't see the benefit.

We have a host. It is assembling data to send. It is doing soprogressively.It can either send in nice sized pieces (9K? 64K) as it has the data andeverything flows so that the receiver can process the data in pieces.

Or it can wait until it has a VERY large amount of data to send. Get asmall I/O benefit in shipping all that data out. (Having spent latencycollecting the information). And then have some router upstream spendcycles, power, etc breaking that back down into reasonable pieces? Why?

Note that if you really want just the host I/O benefit. work with theTCP (or presumably QUIC) offload folks so that the host sends data inwhatever size it likes to the outboard engine. And gets a continuousstream of bytes / blocks from the outboard engine when receiving. Nochanges to the transport protocol. No changes to IP. No adaptationlayer. No router trying to break a "parcel" apart.


Yours,
Joel

On 3/24/2022 4:25 PM, Templin (US), Fred L wrote:

Joel, what you may be missing is that we are introducing a new layer in
the Internet architecture known as the Adaptation Layer - that layer that
logically resides between L3 and L2. Remember AAL5? it is kind of like that,
except over heterogeneous Internetworks instead of over a switched fabric
with 53B cells.

So yes, the data is sent in pieces but the pieces are broken down progressively
to smaller pieces through the layers. But, the core routers will see no changes
while the end systems will see the benefits of more efficient packaging through
the use of parcels.

Fred

-----Original Message-----
From: Joel M. Halpern [mailto:j...@joelhalpern.com]
Sent: Thursday, March 24, 2022 12:41 PM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: int-area <int-area@ietf.org>
Subject: [EXTERNAL] Re: [Int-area] IP Parcels improves performance for end 
systems

EXT email: be mindful of links/attachments.

I will observe that if one is sending a very large set of data, one
needs to assemble that very large set of data.  I have trouble
constructing a situation in which is better to spend all the time
assembling it, and then start sending data once it is all assembled.
Send it in pieces.  I suppose that there are a few corner cases where
all the data is in memory to send for other reasons.  And the receiver
wants to get it all in memory (rather than needing to store or process
it a piece at a time.)  Although most of those divide the data and send
it to different places, so as to parallelize the computation.

You claimed that if we had Terabit IP we would have terabit link MTUs.
Since we have 64K IP, and do not have 64K link MTUs, I think we need
real evidence.

Yours,
Joel

On 3/24/2022 3:25 PM, Templin (US), Fred L wrote:

Joel, I can demonstrate today (and have documented) that some ULPs see dramatic
increases in performance proportional to the ULP segment sizes they use. This 
is true
when the ULP segments are encapsulated and fragmented, and so must also be true
when they can be sent over the wire in once piece over large-MTU links and 
paths.
This was known even back in the day when NFS was run over UDP and saw dramatic
performance gains for boosting the NFS ULP block size.

Your argument seems to be one of "let's just accept the status quo and never 
mind
how we got here". What I am saying is that we can fix things to work the way 
they
should have all along, and without having to do a forklift upgrade on the entire
Internet. The OMNI service can be deployed on existing networking gear to make
the virtual link extend from the core out to as far as the edges as possible 
making
that entire expanse parcel-capable. And, then large-MTU parcel-capable links can
begin to proliferate in the edges at a pace that suits them.

BTW, coincidentally, my professional career got started in 1983 also. 
Admittedly,
I did not get into network driver and NIC architecture support until 1986.

Fred

-----Original Message-----
From: Joel M. Halpern [mailto:j...@joelhalpern.com]
Sent: Thursday, March 24, 2022 12:11 PM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: int-area <int-area@ietf.org>
Subject: [EXTERNAL] Re: [Int-area] IP Parcels improves performance for end 
systems

EXT email: be mindful of links/attachments.

I do remember token ring.  (I was working from 1983 for folks who
delivered 50 megabits starting in 1976, and built some of the best FDDI
around at the time.)

I am not claiming that increasing the MTU from 1500 to 9K did nothing.
I am claiming that diminishing returns has distinctly set in.
If the Data Center folks (who tend these days to have the highest
demand) really want a 64K link, they would have one.  They don't.  They
prefer to use Ethernet.
The improvement via increasing the MTU further runs into many obstacles,
including such issues as error detection code coverage), application
desired communication size, retransmission costs, and on and on.
Yes, they can all be overcome.   But the returns get smaller and smaller.

So absent real evidence that there is a problem needing the network
stack and protocol to change, I just don't see this (IP Parcels) as
providing enough benefit to justify the work.

Yours,
Joel

On 3/24/2022 3:05 PM, Templin (US), Fred L wrote:

Hi Joel,

-----Original Message-----
From: Joel M. Halpern [mailto:j...@joelhalpern.com]
Sent: Thursday, March 24, 2022 11:41 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: int-area <int-area@ietf.org>
Subject: Re: [Int-area] IP Parcels improves performance for end systems

This exchange seems to assume facts not in evidence.


It is a fact that back in the 1980's the architects took simple token ring,
changed the over-the-wire coding to 4B/5B, replaced the copper with
fiber and then boosted the MTU by a factor of 3 and called it FDDI. They
were able to claim what at the time was an astounding 100Mbps (i.e., in
comparison to the 10Mbps Ethernet of the day), but the performance
gain was largely due to the increase in the MTU. They told me: "Fred,
go figure out the path MTU problem", and they said: "go talk to Jeff
Mogul out in Palo Alto who knows something about it". But, then, the
Path MTU discovery group took a left turn at Albuquerque and left the
Internet as a tiny MTU wasteland. We have the opportunity to fix all
of that now - so, let's get it right for once.

Fred


And the whole premise is spending resources in other parts of the
network for a marginal diminishing return in the hosts.

It simply does not add up.

Yours,
Joel

On 3/24/2022 2:19 PM, Templin (US), Fred L wrote:

The category 1) links are not yet in existence, but once parcels start to
enter the mainstream innovation will drive the creation of new kinds of
data links (1TB Ethernet?) that will be rolled out as new hardware.


I want to put a gold star next to the above. AFAICT, pushing the MTU and
implementing IP parcels can get us to 1TB Ethernet practically overnight.
Back in the 1980's, FDDI proved that pushing to larger MTUs could boost
throughput without changing the speed of light, so why wouldn't the same
concept work for Ethernet in the modern era?

Fred

-----Original Message-----
From: Int-area [mailto:int-area-boun...@ietf.org] On Behalf Of Templin (US), 
Fred L
Sent: Thursday, March 24, 2022 9:45 AM
To: Tom Herbert <t...@herbertland.com>
Cc: int-area <int-area@ietf.org>; Eggert, Lars <l...@netapp.com>; 
l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end systems

Hi Tom - responses below:

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Thursday, March 24, 2022 9:09 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; 
l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end systems

On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Tom - see below:

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Thursday, March 24, 2022 6:22 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; 
l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end systems

On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Tom, looks like you have switched over to HTML which can be a real 
conversation-killer.

But, to some points you raised that require a response:

You can't turn it off UDP checksums for IPv6 (except for narrow case of 
encapsulation).




That sounds like a good reason to continue to use IPv4 – at least as far as end 
system

addressing is concerned – right?



Not at all. All NICs today provide checksum offload and so it's
basically zero cost to perform the UDP checksum. The fact that we
don't have to do extra checks on the UDPv6 checksum field to see if
it's zero actually is a performance improvement over UDPv4. (btw, I
will present implementation of the Internet checksum at TSVGWG Friday,
this will include discussion of checksum offloads).


Actually, my assertion wasn't good to begin with because for IPv6 even if UDP
checksums are turned off the OMNI encapsulation layer includes a checksum
that ensures the integrity of the IPv6 header. UDP checksums off for IPv6 when
OMNI encapsulation is used is perfectly fine.

I assume you are referring to RFC6935 and RFC6936 that allow the UDPv6
to be zero for tunneling with a very constrained set of conditions.

If it's a standard per packet Internet checksum then a lot of HW could do it. 
If it's something like CRC32 then probably not.




The integrity check is covered in RFC5327, and I honestly haven’t had a chance 
to

look at that myself yet.

LTP is a nice experiment, but I'm more interested as to the interaction between 
IP parcels and TCP or QUIC.




Please be aware that while LTP may seem obscure at the moment that may be 
changing now

that the core DTN standards have been published. As DTN use becomes more 
widespread I

think we can see LTP also come into wider adoption.



     My assumption is that IP parcels is intended to be a general solution
of all protocols. Maybe in the next draft you could discuss the
details of TCP in IP parcels including how to offload the TCP
checksum.


I could certainly add that. For TCP, each of the concatenated segments would
include its own TCP header with checksum field included. Any hardware that
knows the structure of an IP Parcel can then simply do the TCP checksum
offload function for each segment.


To be honest, the odds of ever getting support in NIC hardware for IP
parcels are extremely slim. Hardware vendors are driven by economics,
so the only way they would do that would be to demonstrate widespread
deployment of the protocol. But even then, with all the legacy
hardware in deployment it will take many years before there's any
appreciable traction. IMO, the better approach is to figure out how to
leverage the existing hardware features for use with IP parcels.


There will be two kinds of links that will need to be "Parcel-capable":
1) Edge network (physical) links that natively forward large parcels, and
2) OMNI (virtual) links that forward parcels using encapsulation and
fragmentation.

The category 1) links are not yet in existence, but once parcels start to
enter the mainstream innovation will drive the creation of new kinds of
data links (1TB Ethernet?) that will be rolled out as new hardware. And
that new hardware can be made to understand the structure of parcels
from the beginning. The category 2) links might take a large parcel from
the upper layers on the local node (or one that has been forwarded by
a parcel-capable link) and break it down into smaller sub-parcels then
apply IP fragmentation to each sub-parcel and send the fragments to an
OMNI link egress node. You know better than me how checksum offload
could be applied in an environment like that.

There was quite a bit of work and discussion on this in Linux. I believe the 
deviation from the standard was motivated by some

deployed devices required the IPID be set on receive, and setting IPID with DF 
equals to 1 is thought to be innocuous. You may

want to look at Alex Duyck's papers on UDP GSO, he wrote a lot of code in this 
area.




RFC6864 has quite a bit to say about coding IP ID with DF=1 – mostly in the 
negative.

But, what I have seen in the linux code seems to indicate that there is not 
even any

coordination between the GSO source and the GRO destination – instead, GRO 
simply

starts gluing together packets that appear to have consecutive IP IDs without 
ever first

checking that they were sent by a peer that was earnestly doing GSO. These 
aspects

would make it very difficult to work GSO/GRO into an IETF standard, plus it 
doesn’t

work for IPv6 at all where there is no IP ID included by default. IP Parcels 
addresses

all of these points, and can be made into a standard.



Huh? GRO/GSO works perfectly fine with IPV6.


Where is the spec for that? My understanding is that GSO/GRO leverages the
IP ID for IPv4. But, for IPv6, there is no IP ID unless you include a Fragment 
Header.
Does IPv6 somehow do GSO/GRO differently?


GRO and GSO don't use the IPID to match a flow. The primary match is
the TCP 4-tuple.


Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is what is used
to match the flow. But, you need more than that in order to correctly paste
back together with GRO the segments of an original ULP buffer that was
broken down by GSO - you need Identifications and/or other markings in
the IP headers to give a reassembly context. Otherwise, GRO might end
up gluing together old and new pieces of ULP data and/or impart a lot of
reordering. IP Parcels have well behaved Identifications and Parcel IDs so
that the original ULP buffer context is honored during reassembly.

There's also another possibility with IPv6-- use jumbograms. For
instance, instead of GRO reassembling segments up to a 64K packet, it
could be modified to reassemble up to a 4G packet using IPv6
jumbograms where one really big packet is given to the stack.

But we probably don't even need jumbograms for that. In Linux, GRO
might be taught to reassemble up to 4G super packet and set a flag bit
in the skbuf to ignore the IP payload field and get the length from
the skbuf len field (as though a jumbogram was received). This trick
would work for IPV4 and IPv6 and GSO as well. It should also work TSO
if the device takes the IP payload length to be that for each segment.


Yes, I was planning to give that a try to see what kind of performance
can be gotten with GSO/GRO when you exceed 64KB. But, my concern
with GSO/GRO is that the reassembly is (relatively) unguided and
haphazard and can result in mis-ordered concatenations. And, there is
no protocol by which the GRO receiver can imply that the things it is
gluing together actually originated from a sender that is earnestly doing
GSO. So, I do not see how GSO/GRO as I see it in the implementation
could be made into a standard, whereas there is a clear path for
standardizing IP parcels.

Another thing I forgot to mention is that in my experiments with GSO/GRO
I found that it won't let me set a GSO segment size that would cause the
resulting IP packets to exceed the path MTU (i.e., it won't allow 
fragmentation).
I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473 and then
allowed the IPv6 layer to apply fragmentation to the encapsulated packet.
That way, I can use IPv4 GSO segment sizes up to ~64KB.

Fred

Tom

Thanks - Fred

Tom



Fred



From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Wednesday, March 23, 2022 9:37 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; 
l...@eggert.org
Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves performance for end 
systems



EXT email: be mindful of links/attachments.






On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L <fred.l.temp...@boeing.com> 
wrote:

Hi Tom,

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Wednesday, March 23, 2022 6:19 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org; l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end systems

On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Tom, see below:

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Tuesday, March 22, 2022 10:00 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org
Subject: Re: [Int-area] IP Parcels improves performance for end systems

On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Lars, I did a poor job of answering your question. One of the most important 
aspects of

IP Parcels in relation to TSO and GSO/GRO is that transports get to use a full 
4MB buffer

instead of the 64KB limit in current practices. This is possible due to the IP 
Parcel jumbo

payload option encapsulation which provides a 32-bit length field instead of 
just a 16-bit.

By allowing the transport to present the IP layer with a buffer of up to 4MB, 
it reduces

the overhead, minimizes system calls and interrupts, etc.



So, yes, IP Parcels is very much about improving the performance for end 
systems in

comparison with current practice (GSO/GRO and TSO).


Hi Fred,

The nice thing about TSO/GSO/GRO is that they don't require any
changes to the protocol as just implementation techniques, also
they're one sided opitmizations meaning for instance that TSO can be
used at the sender without requiring GRO to be used at the receiver.
My understanding is that IP parcels requires new protocol that would
need to be implemented on both endpoints and possibly in some routers.


It is not entirely true that the protocol needs to be implemented on both
endpoints . Sources that send IP Parcels send them into a Parcel-capable path
which ends at either the final destination or a router for which the next hop is
not Parcel-capable. If the Parcel-capable path extends all the way to the final
destination, then the Parcel is delivered to the destination which knows how
to deal with it. If the Parcel-capable path ends at a router somewhere in the
middle, the router opens the Parcel and sends each enclosed segment as an
independent IP packet. The final destination is then free to apply GRO to the
incoming IP packets even if it does not understand Parcels.

IP Parcels is about efficient shipping and handling just like the major online
retailer service model I described during the talk. The goal is to deliver the
fewest and largest possible parcels to the final destination rather than
delivering lots of small IP packets. It is good for the network and good for
the end systems both. If this were not true, then Amazon would send the
consumer 50 small boxes with 1 item each instead of 1 larger box with all
50 items inside. And, we all know what they would choose to do.

Do you have data that shows the benefits of IP Parcels in light of
these requirements?


I have data that shows that GSO/GRO is good for packaging sizes up to 64KB
even if the enclosed segments will require IP fragmentation upon transmission.
The data implies that even larger packaging sizes (up to a maximum of 4MB)
would be better still.


Fred,

You seem to be only looking at the problem from a per packet cost
point of view. There is also per byte cost, particularly in the
computation of the TCP/UDP checksum. The cost is hidden in modern
implementations by checksum offload, and for segmentation offload we
have methods to preserve the utility of checksum offload. IP parcels
will have to also leverage checksum offload, because if the checksum
is not offloaded then the cost of computing the payload checksum in
CPU would dwarf any benefits we'd get by using segments larger than
64K.


There is plenty of opportunity to apply hardware checksum offload since
the structure of a Parcel will be very standard. My experiments have been
with a protocol called LTP which is layered over UDP/IP as some other
upper layer protocols are. LTP includes a segment-by-segment checksum
that is used at its level in the absence of lower layer integrity checks, so
for larger Parcels LTP would use that and turn off UDP checksums
altogether.



You can't turn it off UDP checksums for IPv6 (except for narrow case of 
encapsulation).



As far as I am aware, there are currently no hardware
checksum offload implementations available for calculating the
LTP checksums.



If it's a standard per packet Internet checksum then a lot of HW could do it. 
If it's something like CRC32 then probably not.



LTP is a nice experiment, but I'm more interested as to the interaction between 
IP parcels and TCP or QUIC.




Speaking of standard, AFAICT GSO/GRO are doing something very
non-standard. GSO seems to be coding the IP ID field in the IPv4
headers of packets with DF=1 which goes against RFC 6864. When
DF=1, GSO cannot simply claim the IP ID and code it as if there were
some sort of protocol. Or, if it does, there would be no way to
standardize it.



There was quite a bit of work and discussion on this in Linux. I believe the 
deviation from the standard was motivated by some

deployed

devices required the IPID be set on receive, and setting IPID with DF equals to 
1 is thought to be innocuous. You may want to

look

at

Alex

Duyck's papers on UDP GSO, he wrote a lot of code in this area.




Tom




Fred

Tom

Fred

Thanks,
Tom




Thanks - Fred

_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area


_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area

_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area


_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] [EXTERNAL] Re: IP Parcels improves performance for end systems

Reply via email to