[
https://issues.apache.org/jira/browse/HDDS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085647#comment-18085647
]
Ivan Andika edited comment on HDDS-15463 at 6/3/26 1:36 AM:
------------------------------------------------------------
+1 on this direction. Although Streaming Write Pipeline removes the Raft
overhead of the WriteChunk by making the streaming the actual data, there is
still Raft overhead on the data commit (PutBlock and Client Watch Commit). The
mix of both Raft and streaming is sometimes also pretty challenging to debug
and reason about. So I think either we support full Raft Pipeline (i.e. V1) or
full streaming (i.e V3). The hope is that we can saturate the datanodes IO with
less number of pipelines compared to V1 and V2.
Just wondering whether this be similar to CRAQ
(https://issues.apache.org/jira/browse/HDDS-12578) and HDFS DataStreamer? We
can take a look at
https://transactional.blog/blog/2024-data-replication-design-spectrum for the
tradeoffs.
Also, recently I came across a Ceph paper that argues that Storage Backends
should be implemented on top of OS FileSystem
(https://dl.acm.org/doi/10.1145/3341301.3359656), which causes Ceph to
implements their own backend (BlueStore). This might not be in scope since it
requires reworking the Ozone DN Backend, but I think it's worth thinking about.
Looking forward to the design.
was (Author: JIRAUSER298977):
+1 on this direction. Although Streaming Write Pipeline removes the Raft
overhead of the WriteChunk by making the streaming the actual data, there is
still Raft overhead on the data commit (PutBlock and Client Watch Commit). The
mix of both Raft and streaming is sometimes also pretty challenging to debug
and reason about. So I think either we support full Raft Pipeline (i.e. V1) or
full streaming (i.e V3). The hope is that we can saturate the datanodes IO with
less number of pipelines compared to V1 and V2.
Just wondering whether this be similar to CRAQ
(https://issues.apache.org/jira/browse/HDDS-12578) and HDFS datastream? We can
take a look at
https://transactional.blog/blog/2024-data-replication-design-spectrum for the
tradeoffs.
Also, recently I came across a Ceph paper that argues that Storage Backends
should be implemented on top of OS FileSystem
(https://dl.acm.org/doi/10.1145/3341301.3359656), which causes Ceph to
implements their own backend (BlueStore). This might not be in scope since it
requires reworking the Ozone DN Backend, but I think it's worth thinking about.
> Streaming Write Pipeline without Raft
> -------------------------------------
>
> Key: HDDS-15463
> URL: https://issues.apache.org/jira/browse/HDDS-15463
> Project: Apache Ozone
> Issue Type: New Feature
> Components: Ozone Client, Ozone Datanode
> Reporter: Tsz-wo Sze
> Assignee: Tsz-wo Sze
> Priority: Major
>
> - V1) Raft Pipeline: Use Raft for both WriteChunk and PutBlock
> - V2)Streaming Write Pipeline (HDDS-4454): Use Ratis streaming (RATIS-979)
> for WriteChunk and Raft for PutBlock
> - V3) Streaming Write Pipeline without Raft: Use Ratis streaming for both
> WriteChunk and PutBlock.
> We implement V3 in this JIRA. Will post a design and create subtasks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]