Hi all,
We gave a presentation in the RTGWG session, focusing on the topic “Use
Cases, Requirements, and Framework for Implementing Lossless Techniques in Wide
Area Networks”. During the meeting, we got two comments. Since time limited
there,we can continue the discussion over this email list.
1、Shouldn’t this be handled at layer four (the transport layer) or the
application layer using forward error correction(FEC)? That way, it can be
solved end - to - end, instead of requiring further communication between
routing devices. (Comment from Yuval SHAVITT).
Response:
FEC is to detect and correct bit errors in data transmission, which ensures
data integrity and reduces packet loss caused by bit errors. However, our
primary focus is on packet loss resulting from network congestion due to
traffic aggregation and bursts,and such packet loss significantly affects RDMA
throughput and transmission efficiency.
To address this, we propose using fine-grained flow control mechanisms (e.g.,
enhanced PFC) in WAN between the routing devices to promptly mitigate
congestion, achieving extremely low packet loss rate, and guarantee efficient
RDMA transmissions over long distance. Meanwhile, to avoid large-scale upgrades
of network device, we have also submitted a draft to the spring working group
that supports cross-hop flow control notification and processing
(https://datatracker.ietf.org/doc/draft-ruan-spring-priority-flow-control-sid/).
Admittedly, end-to-end solutions at layer four or the application layer, such
as fast source rate control notifications (e.g., ECN, Fast CNP) are also
integrated into our framework to tackle issues from the source end.
Nevertheless, WAN has long RTTs, these mechanisms may suffer from delayed
responses, limiting their effectiveness in rapidly alleviating congestion.
We think network device optimizations and end-side improvements are
complementary rather than conflicting. Similar to data center networks,
combining network-layer technologies with transport/application layer
mechanisms can achieve lossless transmission. Besides, as communication
operators, we focus more on the network side and hope to further reduce the
packet loss rate in WANs to provide robust network services for upper-layer
applications.
2、Regarding the relationship with DetNet, here are some of our thoughts, and we
welcome further discussions and insights from the DetNet.
Deterministic networking typically emphasizes bounded low latency and jitter,
catering to latency critical scenarios like industrial control. Our current
focus, however, is on efficient transmission of massive TB/PB level data over
long-distance, for example, distributed AI training and inference across
geographically dispersed data centers.
From our view, deterministic networking can achieve lossless transmission (with
zero packet loss) through pre-resource reservation and time-slot-based
scheduling. Does the deterministic network eliminate network congestion
entirely? Additionally, lossless transmission (with extremely low packet loss
nearly 0) could also be achieved by congestion control, path optimization, QoS
etc. So does each approach is suited to different scenarios, with varying
trade-offs between effectiveness and implementation costs?
Draft links:
https://datatracker.ietf.org/doc/draft-hs-rtgwg-wan-lossless-uc
https://datatracker.ietf.org/doc/draft-hs-rtgwg-wan-lossless-framework/
Any feedback and comments are welcome!
Best Regards,
Zhengxin Han
Zhengxin Han
Next Generation Internet Research Department
Research Institute
CHINA UNITED NETWORK COMMUNICATIONS CORPORATION LIMITED
Mobile: +86-18601275531
E-mail: [email protected]
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]