Hi Tony,

Thank you for your comment. I fully agree with your observation that it is 
crucial to reduce the risk of oscillation when propagating real-time topology 
and link congestion information across the network. Since link congestion 
information is significantly dynamic, it's essential to use the threshold of 
available link capacity variation to trigger and suppress an update. At least, 
link capacity information that is relatively stable can be used to achieve 
global load-balancing, particularly when multiple physical links are deployed 
between peers.

Some network chip vendors have adopted their proprietary signals to propagate 
congestion information for the purpose of global load-balancing. The draft just 
proposes an alternative approach based on open standards which could be used 
across different network chips.

Best regards,
Xiaohu



发件人: Tony Li <tony1ath...@gmail.com> 代表 Tony Li <tony...@tony.li>
日期: 星期五, 2023年11月24日 00:36
收件人: xuxiaohu_i...@hotmail.com <xuxiaohu_i...@hotmail.com>
抄送: lsr@ietf.org <lsr@ietf.org>
主题: Re: [Lsr] New Version Notification for draft-xu-lsr-fare-00.txt


Hi Xiaohu,

One way of achieving this would be to use the Unreserved Bandwidth TLV 
(https://datatracker.ietf.org/doc/html/rfc5305#autoid-10) to report the unused 
bandwidth on a link.

Then, you would have to explain how this does not become an oscillator. I’m not 
optimistic.

Regards,
Tony


On Nov 23, 2023, at 8:27 AM, xuxiaohu_i...@hotmail.com wrote:

Hi all,

Any comments or suggestions are welcome.

Best regards,
Xiaohu



发件人: internet-dra...@ietf.org <internet-dra...@ietf.org>
日期: 星期五, 2023年11月24日 00:13
收件人: Xiaohu Xu <xuxiaohu_i...@hotmail.com>
主题: New Version Notification for draft-xu-lsr-fare-00.txt

A new version of Internet-Draft draft-xu-lsr-fare-00.txt has been successfully
submitted by Xiaohu Xu and posted to the
IETF repository.

Name:     draft-xu-lsr-fare
Revision: 00
Title:    Fully Adaptive Routing Ethernet
Date:     2023-11-22
Group:    Individual Submission
Pages:    7
URL:      https://www.ietf.org/archive/id/draft-xu-lsr-fare-00.txt
Status:   https://datatracker.ietf.org/doc/draft-xu-lsr-fare/
HTMLized: https://datatracker.ietf.org/doc/html/draft-xu-lsr-fare


Abstract:

   Large language models (LLMs) like ChatGPT have become increasingly
   popular in recent years due to their impressive performance in
   various natural language processing tasks.  These models are built by
   training deep neural networks on massive amounts of text data, often
   consisting of billions or even trillions of parameters.  However, the
   training process for these models can be extremely resource-
   intensive, requiring the deployment of thousands or even tens of
   thousands of GPUs in a single AI training cluster.  Therefore, three-
   stage or even five-stage CLOS networks are commonly adopted for AI
   networks.  The non-blocking nature of the network become increasingly
   critical for large-scale AI models.  Therefore, adaptive routing is
   necessary to dynamically load balance traffic to the same destination
   over multiple ECMP paths, based on network capacity and even
   congestion information along those paths.



The IETF Secretariat


_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to