Dear Nitsan:

      It relies on the controller to configure cache resources, combined with 
routing devices' cache threshold settings and backpressure message 
interactions. In accordance with preset trigger conditions and cross-device 
buffer coordination processes, it ensures that router buffers absorb traffic 
bursts within the time.
       For details, please refer to clause 3.2 of the 
draft-hs-rtgwg-wan-lossless-framework. The content is as follows:
  
3.2.  Use and Management of Multi-Level Network Buffers
 
   Since temporary bandwidth is shared and not dedicated, it exhibits
   weaker SLA guarantees.  If traffic experiences jitter during
   transmission, network device buffers can absorb packets to reduce
   packet loss.
 
   3.2.1.  Specific Requirements:
 
   *  *Single Device Buffer Sharing and Management*: Single devices
      should implement fine-grained buffer divisions based on traffic
      priority and slice.  These buffers should be isolated to avoid
      mutual interference.  Initial buffer resource allocation is
      determined by the controller and configured across all devices in
      the domain via control plane protocols.
 
   *  *Cross-Device Buffer Coordination*: Given the nature of large data
      transmissions, a single device's buffer might be insufficient for
      absorbing bursty traffic.  Therefore, multiple devices' buffers of
      the same fine-grained type (e.g., same priority and slice) should
      be used collectively.  For example, if device C in the path
      A->B->C is congested and its buffer is insufficient, it should
      notify upstream devices B or A to utilize their similar buffers to
      absorb some traffic.  This involves:
 
      -  Control Signaling: Using control signaling packets to notify
         upstream devices to buffer packets, reducing the burden on the
         congested device.  If upstream device buffers also reach a
         threshold, further notifications should be triggered upstream.
         Control signaling should include buffer index (e.g., slice ID),
         control instructions, and parameters.  Controller configuration
         or segment routing can help determine upstream device
         addresses.  Upon congestion relief, upstream devices should be
         notified to release buffered traffic.  This notification
         mechanism can be inspired by IEEE PFC mechanisms but requires
         more granular backpressure.
 
      -  Trigger Conditions for Buffer Coordination: The local device-
         triggering cross-device buffer coordination requires pre-set
         conditions.  Controllers can configure device-specific
         thresholds to customize trigger conditions for each device,
         slice, and priority.

BR,
Zhengxin


Zhengxin Han
 
发件人: Nitsan Dolev
发送时间: 2025-07-24 20:12
收件人: 韩政鑫(联通集团本部); Tony Li
抄送: rtgwg
主题: [rtgwg] Re: [EXTERNAL] Re: Continue discussion on “Use Cases, Requirements, 
and Framework for Implementing Lossless Techniques in Wide Area Networks” 
presentation in the RTGWG
Dear Zhengxin, 
 
 
Could you please explain who one can ensure that your below proposal can be 
done within the time neighboring router buffers absorb bursts? 
 
Looking forward, 
 
Nitsan Dolev  
 
 
From: 韩政鑫(联通集团本部) <[email protected]> 
Sent: Thursday, July 24, 2025 1:56 PM
To: Tony Li <[email protected]>
Cc: rtgwg <[email protected]>
Subject: [EXTERNAL] [rtgwg] Re: Continue discussion on “Use Cases, 
Requirements, and Framework for Implementing Lossless Techniques in Wide Area 
Networks” presentation in the RTGWG
 
Hi Tony,
 
We are not focused on QoS. Instead, we aim to utilize the large buffers of 
routers to notify upstream devices to slow down or pause packet transmission 
before the congestion queue is full, thereby achieving losslessness.
 
The cache-based retransmission method has a significant negative impact on 
performance, so this is not a good choice. 
 
Of course, directly applying traditional PFC to wide area networks still poses 
challenges. Therefore, we are proposing an enhanced PFC mechanism.
 
BR,
Zhengxin
 


在 2025年7月24日,18:01,Tony Li <[email protected]> 写道:

【本邮件为外部邮件,请注意核实发件人身份,并谨慎处理邮件内容中的链接及附件】
Apparently, I still don't understand your requirements. You say lossless, yet 
you aren't willing to deal with retransmissions. This would seem to be 
problematic when there are link errors.
 
If what you seek is simply QoS, well, we've solved that problem before. Flow 
control is not necessary.
 
T
 
 
On Thu, Jul 24, 2025 at 11:49 AM 韩政鑫(联通集团本部) <[email protected]> wrote:
Hi Tony,
 
Thanks for your valuable historical perspective. We’ve reviewed materials on 
LAPB and X.25 networks, and it’s true that early approaches like LAPB had 
limitations—leading Internet designers to adopt a different architectural path.
 
LAPB in X.25 relies on hop - by - hop retransmission for error correction, 
introducing significant latency and throughput bottlenecks. However, modern 
flow control mechanisms, such as PFC, detect queue thresholds and rapidly 
throttle traffic upstream of congestion points. This actively prevents 
congestion without retransmission, using backpressure with extremely low 
latency.
 
We fully agree that preventing congestion for all traffic across the entire 
network is impractical and would incur severe costs. Instead of targeting all 
traffic, we prioritize high-priority services to ensure their performance. Is 
there value in precisely preventing congestion for high - priority flows to 
reduce packet loss and guarantee high throughput for RDMA transmission over 
long distance? This is why we propose tenant / flow-level refined flow control 
is necessary.
 
Additionally, we believe upgrading all network devices is not feasible. There 
should be a lightweight, cross - hop technical solution. For example, only the 
routers at both ends are upgraded. In special cases, such as when the distance 
is quite long, a few intermediate nodes may be further upgraded to quickly 
alleviate congestion.
 
BR,
ZhengXin
 


Zhengxin Han
 
 
发件人: Tony Li
发送时间: 2025-07-23 21:29
收件人: 
抄送: rtgwg; shavitt; 庞冉(联通集团本部); 阮征(联通集团本部)
主题: [rtgwg] Re: Continue discussion on “Use Cases, Requirements, and Framework 
for Implementing Lossless Techniques in Wide Area Networks” presentation in the 
RTGWG
【本邮件为外部邮件,请注意核实发件人身份,并谨慎处理邮件内容中的链接及附件】
Hi,
 
If your goal is to prevent congestion loss in the network, then you will find 
that you effectively need to prevent congestion in the network. 
That is possible and has been done before.  The approach for doing this is to 
ensure that each router has flow-control and retransmission at the link layer.  
You also need to extend this back to the originating hosts.
 
This has been done before.  See the LAPB link layer protocol that underlies 
X.25 networks.  The performance implications are rather severe.
 
You might consider that these approaches are an entirely different architecture 
that the Internet designers decided to avoid back around 1969.
 
Regards,
Tony
 
 
On Wed, Jul 23, 2025 at 3:14 PM <"韩政鑫(联通集团本部)"@mf1-de.cloudmails.net> wrote:
Hi all,
 
       We gave a presentation in the RTGWG session, focusing on the topic “Use 
Cases, Requirements, and Framework for Implementing Lossless Techniques in Wide 
Area Networks”. During the meeting, we got two comments. Since time limited 
there,we can continue the discussion over this email list.
 
1、Shouldn’t this be handled at layer four (the transport layer) or the 
application layer using forward error correction(FEC)? That way, it can be 
solved end - to - end, instead of requiring further communication between 
routing devices. (Comment from Yuval SHAVITT).
Response:
FEC is to detect and correct bit errors in data transmission, which ensures 
data integrity and reduces packet loss caused by bit errors. However, our 
primary focus is on packet loss resulting from network congestion due to 
traffic aggregation and bursts,and such packet loss significantly affects RDMA 
throughput and transmission efficiency.
To address this, we propose using fine-grained flow control mechanisms (e.g., 
enhanced PFC) in WAN between the routing devices to promptly mitigate 
congestion, achieving extremely low packet loss rate, and guarantee efficient 
RDMA transmissions over long distance. Meanwhile, to avoid large-scale upgrades 
of network device, we have also submitted a draft to the spring working group 
that supports cross-hop flow control notification and processing 
(https://datatracker.ietf.org/doc/draft-ruan-spring-priority-flow-control-sid/).
Admittedly, end-to-end solutions at layer four or the application layer, such 
as fast source rate control notifications (e.g., ECN, Fast CNP) are also 
integrated into our framework to tackle issues from the source end. 
Nevertheless, WAN has long RTTs, these mechanisms may suffer from delayed 
responses, limiting their effectiveness in rapidly alleviating congestion.
We think network device optimizations and end-side improvements are 
complementary rather than conflicting. Similar to data center networks, 
combining network-layer technologies with transport/application layer 
mechanisms can achieve lossless transmission. Besides, as communication 
operators, we focus more on the network side and hope to further reduce the 
packet loss rate in WANs to provide robust network services for upper-layer 
applications.
2、Regarding the relationship with DetNet, here are some of our thoughts, and we 
welcome further discussions and insights from the DetNet.
Deterministic networking typically emphasizes bounded low latency and jitter, 
catering to latency critical scenarios like industrial control. Our current 
focus, however, is on efficient transmission of massive TB/PB level data over 
long-distance, for example, distributed AI training and inference across 
geographically dispersed data centers.
From our view, deterministic networking can achieve lossless transmission (with 
zero packet loss) through pre-resource reservation and time-slot-based 
scheduling. Does the deterministic network eliminate network congestion 
entirely? Additionally, lossless transmission (with extremely low packet loss 
nearly 0) could also be achieved by congestion control, path optimization, QoS 
etc. So does each approach is suited to different scenarios, with varying 
trade-offs between effectiveness and implementation costs?
Draft links:
https://datatracker.ietf.org/doc/draft-hs-rtgwg-wan-lossless-uc 
https://datatracker.ietf.org/doc/draft-hs-rtgwg-wan-lossless-framework/
 
  Any feedback and comments are welcome!
 
 Best Regards,
Zhengxin Han


Zhengxin Han
Next Generation Internet Research Department
Research Institute
CHINA UNITED NETWORK COMMUNICATIONS CORPORATION LIMITED
Mobile: +86-18601275531
E-mail: [email protected]
 
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Disclaimer
This e-mail together with any attachments may contain information of Ribbon 
Communications Inc. and its Affiliates that is confidential and/or proprietary 
for the sole use of the intended recipient. Any review, disclosure, reliance or 
distribution by others or forwarding without express permission is strictly 
prohibited. If you are not the intended recipient, please notify the sender 
immediately and then delete all copies, including any attachments. 
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to