[jira] [Comment Edited] (HDDS-15463) Streaming Write Pipeline without Raft

Ivan Andika (Jira) Tue, 02 Jun 2026 18:47:24 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085647#comment-18085647
 ]


Ivan Andika edited comment on HDDS-15463 at 6/3/26 1:46 AM:
------------------------------------------------------------

+1 on this direction. Although Streaming Write Pipeline removes the Raft 
overhead of the WriteChunk by making the streaming the actual data, there is 
still Raft overhead on the data commit (PutBlock and Client Watch Commit). The 
mix of both Raft and streaming is sometimes also pretty challenging to debug 
and reason about. So I think either we use full Raft Pipeline (i.e. V1) or full 
streaming (i.e V3). The hope is that we can saturate the datanodes IO with less 
number of pipelines compared to V1 and V2. 

Just wondering whether this be similar to CRAQ 
(https://issues.apache.org/jira/browse/HDDS-12578 requires tracking) or HDFS 
DataStreamer? We can take a look at 
https://transactional.blog/blog/2024-data-replication-design-spectrum for the 
tradeoffs. For CRAQ, this will make the write guarantee to be stronger 
(requires all 3 nodes to replicate the data) than Ratis pipeline 
(MAJORITY_COMMITTED only requires leader to apply the data and 1 follower to 
commit the transaction and promise to apply the data), but the write latency 
can increase if one of the datanode is slow or stuck.

Also, recently I came across a Ceph paper that argues that Storage Backends 
should be implemented on top of OS FileSystem 
(https://dl.acm.org/doi/10.1145/3341301.3359656), which causes Ceph to 
implements their own backend (BlueStore). This might not be in scope since it 
requires reworking the Ozone DN Backend, but I think it's worth thinking about.

Looking forward to the design. 


was (Author: JIRAUSER298977):
+1 on this direction. Although Streaming Write Pipeline removes the Raft 
overhead of the WriteChunk by making the streaming the actual data, there is 
still Raft overhead on the data commit (PutBlock and Client Watch Commit). The 
mix of both Raft and streaming is sometimes also pretty challenging to debug 
and reason about. So I think either we support full Raft Pipeline (i.e. V1) or 
full streaming (i.e V3). The hope is that we can saturate the datanodes IO with 
less number of pipelines compared to V1 and V2. 

Just wondering whether this be similar to CRAQ 
(https://issues.apache.org/jira/browse/HDDS-12578) and HDFS DataStreamer? We 
can take a look at 
https://transactional.blog/blog/2024-data-replication-design-spectrum for the 
tradeoffs.

Also, recently I came across a Ceph paper that argues that Storage Backends 
should be implemented on top of OS FileSystem 
(https://dl.acm.org/doi/10.1145/3341301.3359656), which causes Ceph to 
implements their own backend (BlueStore). This might not be in scope since it 
requires reworking the Ozone DN Backend, but I think it's worth thinking about.

Looking forward to the design. 

> Streaming Write Pipeline without Raft
> -------------------------------------
>
>                 Key: HDDS-15463
>                 URL: https://issues.apache.org/jira/browse/HDDS-15463
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: Ozone Client, Ozone Datanode
>            Reporter: Tsz-wo Sze
>            Assignee: Tsz-wo Sze
>            Priority: Major
>
> - V1) Raft Pipeline: Use Raft for both WriteChunk and PutBlock
> - V2)Streaming Write Pipeline (HDDS-4454): Use Ratis streaming (RATIS-979) 
> for WriteChunk and Raft for PutBlock
> - V3) Streaming Write Pipeline without Raft: Use Ratis streaming for both 
> WriteChunk and PutBlock.
> We implement V3 in this JIRA.  Will post a design and create subtasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-15463) Streaming Write Pipeline without Raft

Reply via email to