Re: [Int-area] IP Parcels improves performance for end systems

Joel M. Halpern Thu, 24 Mar 2022 14:30:40 -0700

Fundamentally Fred, by not having the host send things in timely piecesyou have created work. Having some other platform do that work does notmean it does not need to get done. It still does. And since gettingsuch big pieces costs latency, I can not see how the savings in I/Ooperations at the host (paid for with I/Os and processing to breakthings up) make sense as an abstract network model. As a model for aninterface between a host and a smart NIC card? Maybe. I will leavethat to the NIC card vendors. Who have been playing all sorts of clevertricks for years without IP needing to get involved. And with minimaland simple modifications to the host applications.


I fear we are getting into repeating ourselves.


Yours,
Joel

On 3/24/2022 4:54 PM, Templin (US), Fred L wrote:

Again, expect the breaking/reassembling to happen mostly near the edges of the 
network.
And, not necessarily on dedicated router platforms (in fact, probably not on 
dedicated
router platforms). Implications of loss at the IP fragment level are discussed 
in my
recent APNIC article:

https://blog.apnic.net/2022/02/18/omni-an-adaptation-layer-for-the-internet/

But, in terms of Parcel reassembly, reordering is not a problem and strict 
reassembly
is not required. It is OK if a Parcel that is broken up in transit gets 
delivered as
multiple smaller parcels, and even if some of the segments within a parcel are
delivered in a slightly reordered position from the way they were originally
transmitted. ULPs have sequence numbers and the like to put the segments
back together in the proper order. It all works, and again, it does not impact
the vast majority of the deployed base.

Fred

-----Original Message-----
From: Haoyu Song [mailto:haoyu.s...@futurewei.com]
Sent: Thursday, March 24, 2022 1:42 PM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>; Joel M. Halpern 
<j...@joelhalpern.com>
Cc: int-area <int-area@ietf.org>
Subject: RE: [Int-area] IP Parcels improves performance for end systems

Understood. But some router or whatever will need to do the parcel break and 
assembly anyway. In high speed network, this is much more
challenging than at the host, due to the buffer, scheduling, packet loss, 
out-of-order issues mentioned earlier.

Haoyu

-----Original Message-----
From: Templin (US), Fred L <fred.l.temp...@boeing.com>
Sent: Thursday, March 24, 2022 1:12 PM
To: Haoyu Song <haoyu.s...@futurewei.com>; Joel M. Halpern 
<j...@joelhalpern.com>
Cc: int-area <int-area@ietf.org>
Subject: RE: [Int-area] IP Parcels improves performance for end systems

Hi, no it is not the case that routers deep within the network will be asked to 
forward jumbos - that is not what we are after. Routers in the
core will continue to forward common-sized IP packets the way they have always 
done - nothing within that realm needs to change.

Where parcels will have a visible footprint is at or near the edges near where 
the end systems live. Everything else in between will continue
to see plain old IP packets the way they have always done.

Thanks - Fred

-----Original Message-----
From: Haoyu Song [mailto:haoyu.s...@futurewei.com]
Sent: Thursday, March 24, 2022 12:27 PM
To: Joel M. Halpern <j...@joelhalpern.com>; Templin (US), Fred L
<fred.l.temp...@boeing.com>
Cc: int-area <int-area@ietf.org>
Subject: [EXTERNAL] RE: [Int-area] IP Parcels improves performance for
end systems

EXT email: be mindful of links/attachments.

I have the similar concern. The IP parcels make me worried about the
buffer and scheduling for those huge parcels in network routers (the
buffer size over bandwidth ratio is becoming smaller and smaller, the
packet loss/reorder could happen after parcel break in the network, the parcel 
assembly in network could be even harder than in host ).  Is

there any analysis or evaluation for its impact to the network? How will the 
routers be upgraded to support IP parcels?

On the other hand, the NIC becomes more and more smart and powerful, and can 
efficiently offload a lot of data manipulation functions.
Do we really want to optimize the host and complicate the network?

Best regards,
Haoyu

-----Original Message-----
From: Int-area <int-area-boun...@ietf.org> On Behalf Of Joel M.
Halpern
Sent: Thursday, March 24, 2022 11:41 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: int-area <int-area@ietf.org>
Subject: Re: [Int-area] IP Parcels improves performance for end
systems

This exchange seems to assume facts not in evidence.

And the whole premise is spending resources in other parts of the network for a 
marginal diminishing return in the hosts.

It simply does not add up.

Yours,
Joel

On 3/24/2022 2:19 PM, Templin (US), Fred L wrote:

The category 1) links are not yet in existence, but once parcels
start to enter the mainstream innovation will drive the creation of
new kinds of data links (1TB Ethernet?) that will be rolled out as new hardware.


I want to put a gold star next to the above. AFAICT, pushing the MTU
and implementing IP parcels can get us to 1TB Ethernet practically overnight.
Back in the 1980's, FDDI proved that pushing to larger MTUs could
boost throughput without changing the speed of light, so why
wouldn't the same concept work for Ethernet in the modern era?

Fred

-----Original Message-----
From: Int-area [mailto:int-area-boun...@ietf.org] On Behalf Of
Templin (US), Fred L
Sent: Thursday, March 24, 2022 9:45 AM
To: Tom Herbert <t...@herbertland.com>
Cc: int-area <int-area@ietf.org>; Eggert, Lars <l...@netapp.com>;
l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end
systems

Hi Tom - responses below:

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Thursday, March 24, 2022 9:09 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>;
l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end
systems

On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Tom - see below:

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Thursday, March 24, 2022 6:22 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area
<int-area@ietf.org>; l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for end
systems

On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Tom, looks like you have switched over to HTML which can be a real 
conversation-killer.

But, to some points you raised that require a response:

You can't turn it off UDP checksums for IPv6 (except for narrow case of 
encapsulation).




That sounds like a good reason to continue to use IPv4 - at
least as far as end system

addressing is concerned - right?



Not at all. All NICs today provide checksum offload and so it's
basically zero cost to perform the UDP checksum. The fact that
we don't have to do extra checks on the UDPv6 checksum field to
see if it's zero actually is a performance improvement over UDPv4.
(btw, I will present implementation of the Internet checksum at
TSVGWG Friday, this will include discussion of checksum offloads).


Actually, my assertion wasn't good to begin with because for IPv6
even if UDP checksums are turned off the OMNI encapsulation layer
includes a checksum that ensures the integrity of the IPv6 header.
UDP checksums off for IPv6 when OMNI encapsulation is used is perfectly fine.

I assume you are referring to RFC6935 and RFC6936 that allow the
UDPv6 to be zero for tunneling with a very constrained set of conditions.

If it's a standard per packet Internet checksum then a lot of HW could do it. 
If it's something like CRC32 then probably not.




The integrity check is covered in RFC5327, and I honestly
haven't had a chance to

look at that myself yet.

LTP is a nice experiment, but I'm more interested as to the interaction between 
IP parcels and TCP or QUIC.




Please be aware that while LTP may seem obscure at the moment
that may be changing now

that the core DTN standards have been published. As DTN use
becomes more widespread I

think we can see LTP also come into wider adoption.



   My assumption is that IP parcels is intended to be a general
solution of all protocols. Maybe in the next draft you could
discuss the details of TCP in IP parcels including how to
offload the TCP checksum.


I could certainly add that. For TCP, each of the concatenated
segments would include its own TCP header with checksum field
included. Any hardware that knows the structure of an IP Parcel
can then simply do the TCP checksum offload function for each segment.


To be honest, the odds of ever getting support in NIC hardware for
IP parcels are extremely slim. Hardware vendors are driven by
economics, so the only way they would do that would be to
demonstrate widespread deployment of the protocol. But even then,
with all the legacy hardware in deployment it will take many years
before there's any appreciable traction. IMO, the better approach
is to figure out how to leverage the existing hardware features for use with IP 
parcels.


There will be two kinds of links that will need to be "Parcel-capable":
1) Edge network (physical) links that natively forward large
parcels, and
2) OMNI (virtual) links that forward parcels using encapsulation
and fragmentation.

The category 1) links are not yet in existence, but once parcels
start to enter the mainstream innovation will drive the creation of
new kinds of data links (1TB Ethernet?) that will be rolled out as
new hardware. And that new hardware can be made to understand the
structure of parcels from the beginning. The category 2) links
might take a large parcel from the upper layers on the local node
(or one that has been forwarded by a parcel-capable link) and break
it down into smaller sub-parcels then apply IP fragmentation to
each sub-parcel and send the fragments to an OMNI link egress node.
You know better than me how checksum offload could be applied in an environment 
like that.

There was quite a bit of work and discussion on this in Linux.
I believe the deviation from the standard was motivated by
some

deployed devices required the IPID be set on receive, and
setting IPID with DF equals to 1 is thought to be innocuous.
You may

want to look at Alex Duyck's papers on UDP GSO, he wrote a lot of code in this 
area.




RFC6864 has quite a bit to say about coding IP ID with DF=1 - mostly in the 
negative.

But, what I have seen in the linux code seems to indicate that
there is not even any

coordination between the GSO source and the GRO destination -
instead, GRO simply

starts gluing together packets that appear to have consecutive
IP IDs without ever first

checking that they were sent by a peer that was earnestly doing
GSO. These aspects

would make it very difficult to work GSO/GRO into an IETF
standard, plus it doesn't

work for IPv6 at all where there is no IP ID included by default.
IP Parcels addresses

all of these points, and can be made into a standard.



Huh? GRO/GSO works perfectly fine with IPV6.


Where is the spec for that? My understanding is that GSO/GRO
leverages the IP ID for IPv4. But, for IPv6, there is no IP ID unless you 
include a Fragment Header.
Does IPv6 somehow do GSO/GRO differently?


GRO and GSO don't use the IPID to match a flow. The primary match
is the TCP 4-tuple.


Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is
what is used to match the flow. But, you need more than that in
order to correctly paste back together with GRO the segments of an
original ULP buffer that was broken down by GSO - you need
Identifications and/or other markings in the IP headers to give a reassembly 
context.
Otherwise, GRO might end up gluing together old and new pieces of
ULP data and/or impart a lot of reordering. IP Parcels have well
behaved Identifications and Parcel IDs so that the original ULP buffer context 
is honored during reassembly.

There's also another possibility with IPv6-- use jumbograms. For
instance, instead of GRO reassembling segments up to a 64K packet,
it could be modified to reassemble up to a 4G packet using IPv6
jumbograms where one really big packet is given to the stack.

But we probably don't even need jumbograms for that. In Linux, GRO
might be taught to reassemble up to 4G super packet and set a flag
bit in the skbuf to ignore the IP payload field and get the length
from the skbuf len field (as though a jumbogram was received).
This trick would work for IPV4 and IPv6 and GSO as well. It should
also work TSO if the device takes the IP payload length to be that for each 
segment.


Yes, I was planning to give that a try to see what kind of
performance can be gotten with GSO/GRO when you exceed 64KB. But,
my concern with GSO/GRO is that the reassembly is (relatively)
unguided and haphazard and can result in mis-ordered
concatenations. And, there is no protocol by which the GRO receiver
can imply that the things it is gluing together actually originated
from a sender that is earnestly doing GSO. So, I do not see how
GSO/GRO as I see it in the implementation could be made into a
standard, whereas there is a clear path for standardizing IP parcels.

Another thing I forgot to mention is that in my experiments with
GSO/GRO I found that it won't let me set a GSO segment size that
would cause the resulting IP packets to exceed the path MTU (i.e., it won't 
allow fragmentation).
I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473
and then allowed the IPv6 layer to apply fragmentation to the encapsulated 
packet.
That way, I can use IPv4 GSO segment sizes up to ~64KB.

Fred

Tom

Thanks - Fred

Tom



Fred



From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Wednesday, March 23, 2022 9:37 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area
<int-area@ietf.org>; l...@eggert.org
Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves
performance for end systems



EXT email: be mindful of links/attachments.






On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L <fred.l.temp...@boeing.com> 
wrote:

Hi Tom,

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Wednesday, March 23, 2022 6:19 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org;
l...@eggert.org
Subject: Re: [Int-area] IP Parcels improves performance for
end systems

On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Tom, see below:

-----Original Message-----
From: Tom Herbert [mailto:t...@herbertland.com]
Sent: Tuesday, March 22, 2022 10:00 AM
To: Templin (US), Fred L <fred.l.temp...@boeing.com>
Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org
Subject: Re: [Int-area] IP Parcels improves performance for
end systems

On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L
<fred.l.temp...@boeing.com> wrote:


Lars, I did a poor job of answering your question. One of
the most important aspects of

IP Parcels in relation to TSO and GSO/GRO is that
transports get to use a full 4MB buffer

instead of the 64KB limit in current practices. This is
possible due to the IP Parcel jumbo

payload option encapsulation which provides a 32-bit length field instead of 
just a 16-bit.

By allowing the transport to present the IP layer with a
buffer of up to 4MB, it reduces

the overhead, minimizes system calls and interrupts, etc.



So, yes, IP Parcels is very much about improving the
performance for end systems in

comparison with current practice (GSO/GRO and TSO).


Hi Fred,

The nice thing about TSO/GSO/GRO is that they don't require
any changes to the protocol as just implementation
techniques, also they're one sided opitmizations meaning for
instance that TSO can be used at the sender without requiring GRO to be used at 
the receiver.
My understanding is that IP parcels requires new protocol
that would need to be implemented on both endpoints and possibly in some 
routers.


It is not entirely true that the protocol needs to be
implemented on both endpoints . Sources that send IP Parcels
send them into a Parcel-capable path which ends at either the
final destination or a router for which the next hop is not
Parcel-capable. If the Parcel-capable path extends all the
way to the final destination, then the Parcel is delivered to
the destination which knows how to deal with it. If the
Parcel-capable path ends at a router somewhere in the middle,
the router opens the Parcel and sends each enclosed segment
as an independent IP packet. The final destination is then
free to

apply GRO to the incoming IP packets even if it does not understand Parcels.


IP Parcels is about efficient shipping and handling just like
the major online retailer service model I described during
the talk. The goal is to deliver the fewest and largest
possible parcels to the final destination rather than
delivering lots of small IP packets. It is good for the
network and good for the end systems both. If this were not
true, then Amazon would send the consumer 50 small boxes with
1 item each instead of 1 larger box with all
50 items inside. And, we all know what they would choose to do.

Do you have data that shows the benefits of IP Parcels in
light of these requirements?


I have data that shows that GSO/GRO is good for packaging
sizes up to 64KB even if the enclosed segments will require IP fragmentation 
upon transmission.
The data implies that even larger packaging sizes (up to a
maximum of 4MB) would be better still.


Fred,

You seem to be only looking at the problem from a per packet
cost point of view. There is also per byte cost, particularly
in the computation of the TCP/UDP checksum. The cost is hidden
in modern implementations by checksum offload, and for
segmentation offload we have methods to preserve the utility
of checksum offload. IP parcels will have to also leverage
checksum offload, because if the checksum is not offloaded
then the cost of computing the payload checksum in CPU would
dwarf any benefits we'd get by using segments larger than 64K.


There is plenty of opportunity to apply hardware checksum
offload since the structure of a Parcel will be very standard.
My experiments have been with a protocol called LTP which is
layered over UDP/IP as some other upper layer protocols are.
LTP includes a segment-by-segment checksum that is used at its
level in the absence of lower layer integrity checks, so for
larger Parcels LTP would use that and turn off UDP checksums altogether.



You can't turn it off UDP checksums for IPv6 (except for narrow case of 
encapsulation).



As far as I am aware, there are currently no hardware checksum
offload implementations available for calculating the LTP
checksums.



If it's a standard per packet Internet checksum then a lot of HW could do it. 
If it's something like CRC32 then probably not.



LTP is a nice experiment, but I'm more interested as to the interaction between 
IP parcels and TCP or QUIC.




Speaking of standard, AFAICT GSO/GRO are doing something very
non-standard. GSO seems to be coding the IP ID field in the
IPv4 headers of packets with DF=1 which goes against RFC 6864.
When DF=1, GSO cannot simply claim the IP ID and code it as if
there were some sort of protocol. Or, if it does, there would
be no way to standardize it.



There was quite a bit of work and discussion on this in Linux.
I believe the deviation from the standard was motivated by some

deployed

devices required the IPID be set on receive, and setting IPID
with DF equals to 1 is thought to be innocuous. You may want to
look at

Alex

Duyck's papers on UDP GSO, he wrote a lot of code in this area.




Tom




Fred

Tom

Fred

Thanks,
Tom




Thanks - Fred

_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://nam11.safelinks.protection.outlook.com/?url=https%3
A%
2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fint-area&amp;data=
04
%7C01%7Chaoyu.song%40futurewei.com%7Cd4e1296f169f4c6e89c208
da
0dc5db3a%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C63783
74
40712893274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
QI
joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=
u3
I44d091J7Au0YexeDc3ckAHRT37spe7sXGyjQK6gc%3D&amp;reserved=0


_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
ww
.ietf.org%2Fmailman%2Flistinfo%2Fint-area&amp;data=04%7C01%7Chaoyu.
so
ng%40futurewei.com%7Cd4e1296f169f4c6e89c208da0dc5db3a%7C0fee8ff2a3b
24
0189c753a1d5591fedc%7C1%7C1%7C637837440712893274%7CUnknown%7CTWFpbG
Zs
b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
%3
D%7C3000&amp;sdata=u3I44d091J7Au0YexeDc3ckAHRT37spe7sXGyjQK6gc%3D&a
mp
;reserved=0

_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
ietf.org%2Fmailman%2Flistinfo%2Fint-area&amp;data=04%7C01%7Chaoyu.so
ng
%40futurewei.com%7Cd4e1296f169f4c6e89c208da0dc5db3a%7C0fee8ff2a3b240
18
9c753a1d5591fedc%7C1%7C1%7C637837440712893274%7CUnknown%7CTWFpbGZsb3
d8
eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
C3
000&amp;sdata=u3I44d091J7Au0YexeDc3ckAHRT37spe7sXGyjQK6gc%3D&amp;res
er
ved=0


_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
ietf.org%2Fmailman%2Flistinfo%2Fint-&amp;data=04%7C01%7Chaoyu.song%40f
uturewei.com%7C33ae226047e14e8d3bdf08da0dd2a16c%7C0fee8ff2a3b240189c75
3a1d5591fedc%7C1%7C0%7C637837495548621685%7CUnknown%7CTWFpbGZsb3d8eyJW
IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&
amp;sdata=izbuwb09yAhVCe8JycuTVIaT48Lin8x73Da7V0vrBmk%3D&amp;reserved=
0
area&amp;data=04%7C01%7Chaoyu.song%40futurewei.com%7Cd4e1296f169f4c6e8
9c208da0dc5db3a%7C0fee8ff2a3b240189c753a1d55
91fedc%7C1%7C1%7C637837440712893274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
LCJXVCI6Mn0%3D%7C3000&amp;sdata=u3I44d091J7Au0YexeDc3ckAHRT37spe7sXGyj
QK6gc%3D&amp;reserved=0


_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] IP Parcels improves performance for end systems

Reply via email to