[jira] [Comment Edited] (HDDS-15463) Streaming Write Pipeline without Raft

Ivan Andika (Jira) Tue, 02 Jun 2026 18:37:11 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085647#comment-18085647
 ]


Ivan Andika edited comment on HDDS-15463 at 6/3/26 1:36 AM:
------------------------------------------------------------

+1 on this direction. Although Streaming Write Pipeline removes the Raft 
overhead of the WriteChunk by making the streaming the actual data, there is 
still Raft overhead on the data commit (PutBlock and Client Watch Commit). The 
mix of both Raft and streaming is sometimes also pretty challenging to debug 
and reason about. So I think either we support full Raft Pipeline (i.e. V1) or 
full streaming (i.e V3). The hope is that we can saturate the datanodes IO with 
less number of pipelines compared to V1 and V2. 

Just wondering whether this be similar to CRAQ 
(https://issues.apache.org/jira/browse/HDDS-12578) and HDFS DataStreamer? We 
can take a look at 
https://transactional.blog/blog/2024-data-replication-design-spectrum for the 
tradeoffs.

Also, recently I came across a Ceph paper that argues that Storage Backends 
should be implemented on top of OS FileSystem 
(https://dl.acm.org/doi/10.1145/3341301.3359656), which causes Ceph to 
implements their own backend (BlueStore). This might not be in scope since it 
requires reworking the Ozone DN Backend, but I think it's worth thinking about.

Looking forward to the design. 


was (Author: JIRAUSER298977):
+1 on this direction. Although Streaming Write Pipeline removes the Raft 
overhead of the WriteChunk by making the streaming the actual data, there is 
still Raft overhead on the data commit (PutBlock and Client Watch Commit). The 
mix of both Raft and streaming is sometimes also pretty challenging to debug 
and reason about. So I think either we support full Raft Pipeline (i.e. V1) or 
full streaming (i.e V3). The hope is that we can saturate the datanodes IO with 
less number of pipelines compared to V1 and V2. 

Just wondering whether this be similar to CRAQ 
(https://issues.apache.org/jira/browse/HDDS-12578) and HDFS datastream? We can 
take a look at 
https://transactional.blog/blog/2024-data-replication-design-spectrum for the 
tradeoffs.

Also, recently I came across a Ceph paper that argues that Storage Backends 
should be implemented on top of OS FileSystem 
(https://dl.acm.org/doi/10.1145/3341301.3359656), which causes Ceph to 
implements their own backend (BlueStore). This might not be in scope since it 
requires reworking the Ozone DN Backend, but I think it's worth thinking about. 

> Streaming Write Pipeline without Raft
> -------------------------------------
>
>                 Key: HDDS-15463
>                 URL: https://issues.apache.org/jira/browse/HDDS-15463
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: Ozone Client, Ozone Datanode
>            Reporter: Tsz-wo Sze
>            Assignee: Tsz-wo Sze
>            Priority: Major
>
> - V1) Raft Pipeline: Use Raft for both WriteChunk and PutBlock
> - V2)Streaming Write Pipeline (HDDS-4454): Use Ratis streaming (RATIS-979) 
> for WriteChunk and Raft for PutBlock
> - V3) Streaming Write Pipeline without Raft: Use Ratis streaming for both 
> WriteChunk and PutBlock.
> We implement V3 in this JIRA.  Will post a design and create subtasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-15463) Streaming Write Pipeline without Raft

Reply via email to