Hi Xiaohu, Thank you for sharing the draft. However, I’m a little confused about the context because it’s not fully aligned with my understanding of the scale up network. Below are some observations.
1.Existing SUNs are all L2-based including NVlink, UALink. Even ESUN could be not IP based due to the overhead. 2.Scale Up Network uses totally different protocol stack than RDMA because it’s mainly for communication with memory semantic (load/store). 3. Usually packet spraying is preferred than flow-based ECMP due to the bandwidth utilization and latency requirements. This applies to both scale up and scale out. 4. Gaudi doesn’t differentiate between scale up and scale out. Since it doesn’t support memory semantics, I consider it a fake scale up. Best regards, Haoyu From: Tiger Xu <[email protected]> Sent: Thursday, February 26, 2026 11:58 PM To: rtgwg <[email protected]> Subject: [rtgwg] FWD: New Version Notification for draft-xu-rtgwg-fare-in-sun-02.txt Hi all, Any comments or suggestions are welcome. Notably, an Ethernet/IP-based scale-up network architecture has been successfully implemented by multiple GPU/XPU vendors (e.g., Intel's Gaudi3 and Microsoft’s Maia 200 AI accelerator), and efficient multipath load-balancing is a highly desirable capability for such architectures. For more details, please refer to https://cdrdv2-public.intel.com/817486/gaudi-3-ai-accelerator-white-paper.pdf https://techcommunity.microsoft.com/blog/azureinfrastructureblog/deep-dive-into-the-maia-200-architecture/4489312. Best regards, Xiaohu 发件人: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> 日期: 星期五, 2026年2月27日 14:56 收件人: Chao Li <[email protected]<mailto:[email protected]>>, Fajie Yang <[email protected]<mailto:[email protected]>>, Hua Wang <[email protected]<mailto:[email protected]>>, Jian Guo <[email protected]<mailto:[email protected]>>, Nan Wang <[email protected]<mailto:[email protected]>>, Nan Wang <[email protected]<mailto:[email protected]>>, Peilong Wang <[email protected]<mailto:[email protected]>>, Tianyou Zhou <[email protected]<mailto:[email protected]>>, Wang Xiaojun <[email protected]<mailto:[email protected]>>, Weifeng Zhang <[email protected]<mailto:[email protected]>>, Xiang Li <[email protected]<mailto:[email protected]>>, Xiaohu Xu <[email protected]<mailto:[email protected]>>, Xiaojun Wang <[email protected]<mailto:[email protected]>>, Yan Zhuang <[email protected]<mailto:[email protected]>>, Yinben Xia <[email protected]<mailto:[email protected]>>, Yongtao Yang <[email protected]<mailto:[email protected]>>, Zongying He <[email protected]<mailto:[email protected]>> 主题: New Version Notification for draft-xu-rtgwg-fare-in-sun-02.txt A new version of Internet-Draft draft-xu-rtgwg-fare-in-sun-02.txt has been successfully submitted by Xiaohu Xu and posted to the IETF repository. Name: draft-xu-rtgwg-fare-in-sun Revision: 02 Title: Fully Adaptive Routing Ethernet in Scale-Up Networks Date: 2026-02-26 Group: Individual Submission Pages: 9 URL: https://www.ietf.org/archive/id/draft-xu-rtgwg-fare-in-sun-02.txt Status: https://datatracker.ietf.org/doc/draft-xu-rtgwg-fare-in-sun/ HTMLized: https://datatracker.ietf.org/doc/html/draft-xu-rtgwg-fare-in-sun Diff: https://author-tools.ietf.org/iddiff?url2=draft-xu-rtgwg-fare-in-sun-02 Abstract: The Mixture of Experts (MoE) has become a dominant paradigm in transformer-based artificial intelligence (AI) large language models (LLMs). It is widely adopted in both distributed training and distributed inference. To enable efficient expert parallelization and even tensor parallelization across dozens or even hundreds of Graphics Processing Units (GPUs) in MoE architectures, an ultra-high- throughput, ultra-low-latency AI scale-up network (SUN) is critical. This document describes how to extend the Weighted Equal-Cost Multi- Path (WECMP) load-balancing mechanism, referred to as Fully Adaptive Routing Ethernet (FARE), which was originally designed for scale-out networks, to scale-up networks. The IETF Secretariat
_______________________________________________ rtgwg mailing list -- [email protected] To unsubscribe send an email to [email protected]
