Dear All,


We will organize a side meeting of multicast for AI networks on Tuesday, March 
17, 14:30-16:00 (UTC+8), in room Jiangsu. If you are interested in AIDC or 
network layer multicast, please come to the side meeting to discuss.If you are 
remote participant, please access the meeting via the link: 
https://ietf.webex.com/meet/ietfsidemeeting2

 

The exponential growth in parameter size and training data for Large Language 
Models (LLMs) is driving increasing complexity and scale in AI computing. 
Emerging scenarios require efficient data synchronization across geographically 
distributed data centers and GPU modules within data centers, involving massive 
point-to-multipoint transmissions. These are not only typical multicast use 
cases but also impose higher requirements on network multicast capabilities. 
Applications such as MoE token distribution and model replication etc. are 
creating new requirements for multicast in AI computing networks.

 

Please check the initial agenda of the side meeting in the following: 

0. Welcome & Review  Chairs

1. Multicast Requirements and Gap Analysis in AIDC  ( Kefei Liu, China Mobile)  

Abstract: Multicast is a promising technique for enhancing the efficiency of 
point-to-multipoint data transmission during large language model training and 
inference in Artificial Intelligence Data Centers (AIDCs). This talk will 
analyze key requirements of multicast in AIDCs, as well as gaps between 
capabilities of existing multicast techniques and these requirements.


2. Multicast Requirements and Challenges in Super-node Applications  (Zezhong 
Yang, UNIVISTA) 

Abstract: In recent years, large language models have advanced rapidly. Their 
unique traffic characteristics impose significant challenges on conventional 
network components. Offloading computation to network devices has become a 
critical requirement for the evolution of supernodes, yet many technical 
challenges remain in implementation.


3. Group RDMA – Reliable Multicast Transmission (Haibo Wang, Huawei)

Abstract: MPI_Bcast, a key collective in HPC, faces performance bottlenecks in 
RoCE due to the lack of native reliable multicast. Current solutions force a 
trade-off between software reliability over UD multicast and non-scalable 
unicast emulation, sacrificing RDMA39s low-latency advantage. We propose Group 
RDMA over RC, where hosts use unmodified RC queue pairs while an Intelligent 
Switch Fabric performs hardware-level replication of intercepted unicast 
packets and aggregates ACKs from all receivers. This approach combines RC39s 
strong reliability with hardware multicast efficiency, reducing source CPU load 
and latency for scalable AI training workloads.


4. Optimized Use of BIER in AIML Data Centers (Jeffery Zhang, HPE)

Abstract: Multicast is gaining traction in AI Data Centers (AIDC) for efficient 
large-scale data distribution, particularly in collective operations like 
All2All and AllReduce. Technologies such as In-Network Compute (INC) can also 
benefit from offloading flow distribution to the network. BIER is particularly 
suitable for AIDC due to its ability to handle bursty, short-lived all2all 
flows without requiring per-flow multicast tree state establishment. This 
document proposes further optimizations for BIER in AIDC and similar 
deployments, and updates RFC4604 by introducing an IGMP/MLD extension that 
enables sources to report receiver information to First Hop Routers. This 
extension is useful in scenarios where the source is unable to impose BIER 
encapsulation.


5. Open Discussion 

 

To further advance this work, currently we have some relevant drafts, which you 
can access via the following link :

1)Requirements and Gap Analysis of Multicast in AI Data Centers

(https://datatracker.ietf.org/doc/draft-zhang-rtgwg-multicast-requirements-gaps-aidc/)

2)Multicast Use Cases for Large Language Model Synchronization

(https://datatracker.ietf.org/doc/draft-liu-rtgwg-llmsync-multicast/)

3)Multicast usage in LLM MoE

(https://datatracker.ietf.org/doc/draft-zhang-rtgwg-llmmoe-multicast/)

4)Optimized Use of BIER in AIML Data Centers

(https://datatracker.ietf.org/doc/draft-zzhang-bier-optimized-use-in-aidc/)

 

Welcome to join us! We hope to see you there!

 

If you have any questions, please feel free to contact us: 
[email protected], [email protected]





Best Regards

Yisong



_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to