On Fri, Jan 9, 2026 at 2:03 PM dave seddon <[email protected]> wrote: > > G'day Tom and Haoyu, > > I'm trying to join the discussion about "draft: Scale-Up Network > Header (SUNH)", but I just joined the mail list, so I don't know if > posting to the subject line will do it. ( Apologies if this breaks > threading )
Hi Dave, Thanks for the comments! > > Drafts: > https://datatracker.ietf.org/doc/draft-herbert-sunh/ > https://datatracker.ietf.org/doc/html/draft-song-ship-edge-05 > > It seems like the discussion centers on the address length. > > The SUNH "1.1. Problem statement" is very clear > " > 8% overhead in a 256 byte packet, and the forty bytes of IPv6 header > would be about 16% overhead > " > > Absolutely minimizing overhead makes sense currently, but for how long > do we expect this to be true? Tom, since you've been talking to > people who run the largest AI clusters in the world, you expect this > to hold true for the foreseeable future. It's actually working in that other direction. Header overhead in the data center is an emerging problem. For a long time, we didn't really care too much about header overhead in the datacenter basically because link utilization wasn't very high, and there were a lot of large packets to amortize the header overhead. In fact, many hyperscalers moved to IPv6 thereby doubling network header overhead without even thinking twice. The problem we have in AI is that people are trying to drive utilization to 100%, there's much less heterogeneity in workloads, and packet sizes can by well less than an MTU for some workloads. The 10% overhead that we didn't care about in the past is not popping up as a problem. (of course some of those that transitioned to IPv6 might be regretting that now ;-) ) There's also secondary issues concerning the length of addresses. It's much more efficient to switch on 16-bit addresses as opposed to 32 or 128-bit addresses. > > > Tom - I wonder if draft-herbert-sunh would benefit from a small > summary, maybe with a table, that compares the proposed addressing to > other protocols that are common within data centers? > > For example, comparing protocols by their header, address lengths, and > "overhead" > - PCIe ( IEEE have paywalls, so it's hard to find a good source. > Maybe this: > https://www.pearsonhighered.com/assets/samplechapter/0/3/2/1/0321156307.pdf > ) Okay. > - Infiniband ( addressing scheme found here on page 625 > https://hjemmesider.diku.dk/~vinter/CC/Infinibandchap42.pdf ) > - Ethernet > - Ethernet with 802.1q ( and qnq ) > - IPv4 > - IPv6 > - SUNH > ... > > Now that the context is established, explain why 16 bits were chosen > for the source/destination address. I guess, but it's not in the > document; You were considering the number of hosts in the domain. The numbers being thrown around for scale-up networks seem to be a couple of thousand nodes at most. 16 bits nicely rounds to the power of two and allows plenty of space to scale to reasonably large GPU clusters. Also, for scale-up we anticipate pretty flat networks with may two or three hops at most (justifies smaller Hop Limits in the protocol). > > Nit pick (sorry). "care must be taken to ensure the minimum packet > size is maintained". Might help to explain why. It's the minimum Ethernet packet size of 64 byte. Without a payload length field like IPv4 has, we need some way to be able to send packets less than 64 bytes of logical length in 64 bytes on-the-wire without ambiguity as to what the real size in. I can add some text. > > Re section "TCP and UDP in SUNH". I remember recently Stuart from > Apple saying something pretty interesting about UDP: "If IP had port > numbers, you wouldn't really need a UDP header at all." Hee, I remember at my first IETF in the 90's Brian Carpenter was saying the same thing :-) > > Multicast? It might be worth mentioning multicast and explaining why > it isn't discussed. e.g. No requirement for this, or it might be > considered in the future if a need arises. Well, we haven't needed it for the past forty years so why start using it now :-) Seriously though, AI applications aren't typically using multicast. I suppose someone might envision using multicast for collective offloads, but that sounds like something that might never get past the experimental stage. Also, since the SUNH address space is private, nothing precludes anyone from defining their own multicast addresses in the existing space. What I don't think we want is to define a multicast prefix or any mandated structure on SUNH addresses. Likewise, if someone wants to do hierarchical addressing in 16 bits that's their local decision. Tom > > > > Haoyu - I really like your draft-song-ship-edge-05 Hierarchical > addressing stuff: > a) > This reminds me of good old fiber channel addressing, and I suppose > the more modern Infiniband/RDMA. > b) > The words "variable length" are scary because variability clearly > isn't ideal for hardware. I guess when you say "variable length" you > don't actually mean the addresses would vary dynamically, but that > there could be a range of set fixed length addressing that could be > selected for different deployment scenarios? +1 "Variable length" is an anathema for hardware engineers! > c) > One core concept of draft-song-ship-edge-05, is that traffic destined > for IoT devices needs a long, unique address, while the traffic > _sourced_ from these devices towards the data center can have a much > smaller destination address. > I recall Geoff Huston discussing IPv6 at a recent NANGO, where he > commented that because of the pervasive use of anycast by a relatively > small number of CDNs, that the Internet might only need a /24 worth of > addresses for 99% of all traffic. > Other network protocols with asymmetric addresses include: > - PCIe (Requester vs Completer addressing) > - In InfiniBand / RDMA, requests carry full destination addressing > (QPN + LID/GID + path), while responses omit it and are routed > implicitly using the established queue-pair and path state, making the > addressing directionally asymmetric. > - QUIC has explicit directional asymmetry in connection IDs > > > -- > Regards, > Dave Seddon > > _______________________________________________ > Int-area mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ Int-area mailing list -- [email protected] To unsubscribe send an email to [email protected]
