Hi Tony,

We are not focused on QoS. Instead, we aim to utilize the large buffers of routers to notify upstream devices to slow down or pause packet transmission before the congestion queue is full, thereby achieving losslessness.

The cache-based retransmission method has a significant negative impact on performance, so this is not a good choice. 

Of course, directly applying traditional PFC to wide area networks still poses challenges. Therefore, we are proposing an enhanced PFC mechanism.

BR,
Zhengxin


在 2025年7月24日,18:01,Tony Li <[email protected]> 写道:


【本邮件为外部邮件,请注意核实发件人身份,并谨慎处理邮件内容中的链接及附件】
Apparently, I still don't understand your requirements. You say lossless, yet you aren't willing to deal with retransmissions. This would seem to be problematic when there are link errors.

If what you seek is simply QoS, well, we've solved that problem before. Flow control is not necessary.

T


On Thu, Jul 24, 2025 at 11:49 AM 韩政鑫(联通集团本部) <[email protected]> wrote:
Hi Tony,

Thanks for your valuable historical perspective. We’ve reviewed materials on LAPB and X.25 networks, and it’s true that early approaches like LAPB had limitations—leading Internet designers to adopt a different architectural path.

LAPB in X.25 relies on hop - by - hop retransmission for error correction, introducing significant latency and throughput bottlenecks. However, modern flow control mechanisms, such as PFC, detect queue thresholds and rapidly throttle traffic upstream of congestion points. This actively prevents congestion without retransmission, using backpressure with extremely low latency.

We fully agree that preventing congestion for all traffic across the entire network is impractical and would incur severe costs. Instead of targeting all traffic, we prioritize high-priority services to ensure their performance. Is there value in precisely preventing congestion for high - priority flows to reduce packet loss and guarantee high throughput for RDMA transmission over long distance? This is why we propose tenant / flow-level refined flow control is necessary.

Additionally, we believe upgrading all network devices is not feasible. There should be a lightweight, cross - hop technical solution. For example, only the routers at both ends are upgraded. In special cases, such as when the distance is quite long, a few intermediate nodes may be further upgraded to quickly alleviate congestion.

BR,
ZhengXin


Zhengxin Han


 
发件人: Tony Li
发送时间: 2025-07-23 21:29
收件人: 
主题: [rtgwg] Re: Continue discussion on “Use Cases, Requirements, and Framework for Implementing Lossless Techniques in Wide Area Networks” presentation in the RTGWG
【本邮件为外部邮件,请注意核实发件人身份,并谨慎处理邮件内容中的链接及附件】
Hi,

If your goal is to prevent congestion loss in the network, then you will find that you effectively need to prevent congestion in the network. 
That is possible and has been done before.  The approach for doing this is to ensure that each router has flow-control and retransmission at the link layer.  You also need to extend this back to the originating hosts.

This has been done before.  See the LAPB link layer protocol that underlies X.25 networks.  The performance implications are rather severe.

You might consider that these approaches are an entirely different architecture that the Internet designers decided to avoid back around 1969.

Regards,
Tony


On Wed, Jul 23, 2025 at 3:14 PM <"韩政鑫(联通集团本部)"@mf1-de.cloudmails.net> wrote:

Hi all


       We gave a presentation in the RTGWG session, focusing on the topic “Use Cases, Requirements, and Framework for Implementing Lossless Techniques in Wide Area Networks”. During the meeting, we got two comments. Since time limited there,we can continue the discussion over this email list.


1、Shouldn’t this be handled at layer four (the transport layer) or the application layer using forward error correction(FEC)? That way, it can be solved end - to - end, instead of requiring further communication between routing devices. (Comment from Yuval SHAVITT).

Response:

  • FEC is to detect and correct bit errors in data transmission, which ensures data integrity and reduces packet loss caused by bit errors. However, our primary focus is on packet loss resulting from network congestion due to traffic aggregation and bursts,and such packet loss significantly affects RDMA throughput and transmission efficiency.
  • To address this, we propose using fine-grained flow control mechanisms (e.g., enhanced PFC) in WAN between the routing devices to promptly mitigate congestion, achieving extremely low packet loss rate, and guarantee efficient RDMA transmissions over long distance. Meanwhile, to avoid large-scale upgrades of network device, we have also submitted a draft to the spring working group that supports cross-hop flow control notification and processing (https://datatracker.ietf.org/doc/draft-ruan-spring-priority-flow-control-sid/).
  • Admittedly, end-to-end solutions at layer four or the application layer, such as fast source rate control notifications (e.g., ECN, Fast CNP) are also integrated into our framework to tackle issues from the source end. Nevertheless, WAN has long RTTs, these mechanisms may suffer from delayed responses, limiting their effectiveness in rapidly alleviating congestion.
  • We think network device optimizations and end-side improvements are complementary rather than conflicting. Similar to data center networks, combining network-layer technologies with transport/application layer mechanisms can achieve lossless transmission. Besides, as communication operators, we focus more on the network side and hope to further reduce the packet loss rate in WANs to provide robust network services for upper-layer applications.

2、Regarding the relationship with DetNet, here are some of our thoughts, and we welcome further discussions and insights from the DetNet.

  • Deterministic networking typically emphasizes bounded low latency and jitter, catering to latency critical scenarios like industrial control. Our current focus, however, is on efficient transmission of massive TB/PB level data over long-distance, for example, distributed AI training and inference across geographically dispersed data centers.
  • From our view, deterministic networking can achieve lossless transmission (with zero packet loss) through pre-resource reservation and time-slot-based scheduling. Does the deterministic network eliminate network congestion entirely? Additionally, lossless transmission (with extremely low packet loss nearly 0) could also be achieved by congestion control, path optimization, QoS etc. So does each approach is suited to different scenarios, with varying trade-offs between effectiveness and implementation costs

Draft links:

https://datatracker.ietf.org/doc/draft-hs-rtgwg-wan-lossless-uc 

https://datatracker.ietf.org/doc/draft-hs-rtgwg-wan-lossless-framework/


  Any feedback and comments are welcome!


 Best Regards,

Zhengxin Han


Zhengxin Han

Next Generation Internet Research Department

Research Institute
CHINA UNITED NETWORK COMMUNICATIONS CORPORATION LIMITED

Mobile: +86-18601275531
E-mail: 
[email protected]


_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to