[
https://issues.apache.org/jira/browse/RATIS-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078471#comment-18078471
]
Tsz-wo Sze commented on RATIS-2509:
-----------------------------------
[~ivanandika] , thanks for running the benchmark! It is very insightful.
Could you also post the flamegraph for the "OM leader-read that does not go
through Ratis" case?
As you mentioned, there are ByteString conversion. In Ozone, since OM uses
Hadoop RPC to receive a request from the network, the ratis request is just a
local RaftServer call (i.e. no RaftClient). So, we may add a OmMessage class
in Ozone to avoid ByteString conversion:
{code}
class OmMessage implements Message {
private final OMRequest omRequest;
private final RaftClientRequest raftClientRequest;
public OmMessage(OMRequest omRequest, boolean isWrite) {
this.omRequest = omRequest;
this.raftClientRequest = RaftClientRequest.newBuilder()
.setClientId(getClientId())
.setServerId(server.getId())
.setGroupId(raftGroupId)
.setCallId(getCallId())
.setMessage(this) // <----------- no ByteString conversion
.setType(isWrite ? RaftClientRequest.writeRequestType() :
getRaftReadRequestType(omRequest))
.build();
}
public OMRequest getOmRequest() {
return omRequest;
}
public RaftClientRequest getRaftClientRequest() {
return raftClientRequest;
}
@Override
public ByteString getContent() {
throw new UnsupportedOperationException("Not supported yet.");
}
}
{code}
> Introduce local read API to reduce serde cost
> ---------------------------------------------
>
> Key: RATIS-2509
> URL: https://issues.apache.org/jira/browse/RATIS-2509
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Attachments: om-benchmark-leader-read-all-ratis.html
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Recently, we did a benchmark comparing the OM leader-read that does not go
> through Ratis and OM leader read that go through Ratis (through
> submitClientRequestAsync). We saw that there is up to 25% decrease in read
> throughput although the we make the raft.server.read.option to be DEFAULT
> which should return immediately (235020 QPS -> 180433 QPS or 24% reduction in
> throughput for pure reads with 100 threads).
> The overheads seem to be because of request/response proto conversion,
> RaftClientRequest construction, future chaining, .get() blocking, Ratis
> metrics/reply building, and parsing the Ratis response back into OMResponse.
> See [^om-benchmark-leader-read-all-ratis.html] for the flamegraph.
> This means that if we submit linearizable read to follower, we incurs these
> overhead on top of the linearizable read overhead (e.g. ReadIndex, etc).
> We can try to find a way to reduce this overhead. We might need to implement
> another read flow without the overhead (unlike writes which requires
> AppendEntries request to the followers which require serde, read can be
> served locally).
> One idea is that if we are submitting to RaftServer, we can use
> submitServerRequestAsync which does not require RaftClientRequest and
> RaftClientReply serde.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)