[ 
https://issues.apache.org/jira/browse/RATIS-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated RATIS-2524:
-------------------------------
    Description: 
Currently each read will trigger a ReadIndex call. If network overhead is high, 
this can be the bottleneck.

One improvement is to batch reads together to a single ReadIndex call.

Rule: A ReadIndex result may only serve reads whose invocation happened before 
the ReadIndex request is logically issued.


{code:java}
t1: read A arrives at follower
t2: read B arrives at follower
t3: follower sends one ReadIndex request for batch [A, B]
t4: leader processes ReadIndex and returns index I
t5: follower applies >= I
t6: A and B query local state and complete 
{code}

It's not

{code:java}
t1: read A arrives
t2: follower sends ReadIndex request
t3: leader processes it
t4: read B arrives
t5: follower attaches B to A's ReadIndex result 
{code}

This can be implemented using batching window with small batching interval 
(e.g. 500 microseconds or less depending on the average latency). We will batch 
the reads during the batching interval into one window. After the batching 
interval is done, we will seal this window (i.e. no more reads will be added 
into this read) and then we will send a ReadIndex that covers all the reads 
under the sealed window (e.g. if the window has 5 read requests then 1 
ReadIndex will amortize the cost of ReadIndex).

This idea is similar to the paper 
https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf (https://law-theorem.com/) 
where the "sync" lightweight write operation is replaced with ReadIndex (which 
is also a form of "sync").

  was:
Currently each read will trigger a ReadIndex call. If network overhead is high, 
this can be the bottleneck.

One improvement is to batch reads together to a single ReadIndex call.

Rule: A ReadIndex result may only serve reads whose invocation happened before 
the ReadIndex request is logically issued.


{code:java}
t1: read A arrives at follower
t2: read B arrives at follower
t3: follower sends one ReadIndex request for batch [A, B]
t4: leader processes ReadIndex and returns index I
t5: follower applies >= I
t6: A and B query local state and complete 
{code}

It's not

{code:java}
t1: read A arrives
t2: follower sends ReadIndex request
t3: leader processes it
t4: read B arrives
t5: follower attaches B to A's ReadIndex result 
{code}

This can be implemented using batching window with small batching interval 
(e.g. 500 microseconds or less depending on the average latency). We will batch 
the reads during the batching interval into one window. After the batching 
interval is done, we will seal this window (i.e. no more reads will be added 
into this read) and then we will send a ReadIndex that covers all the reads 
under the sealed window (e.g. if the window has 5 read requests then 1 
ReadIndex will amortize the cost of ReadIndex).

This idea is similar to the paper 
https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf (https://law-theorem.com/).


> Implement ReadIndex coalescing
> ------------------------------
>
>                 Key: RATIS-2524
>                 URL: https://issues.apache.org/jira/browse/RATIS-2524
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently each read will trigger a ReadIndex call. If network overhead is 
> high, this can be the bottleneck.
> One improvement is to batch reads together to a single ReadIndex call.
> Rule: A ReadIndex result may only serve reads whose invocation happened 
> before the ReadIndex request is logically issued.
> {code:java}
> t1: read A arrives at follower
> t2: read B arrives at follower
> t3: follower sends one ReadIndex request for batch [A, B]
> t4: leader processes ReadIndex and returns index I
> t5: follower applies >= I
> t6: A and B query local state and complete 
> {code}
> It's not
> {code:java}
> t1: read A arrives
> t2: follower sends ReadIndex request
> t3: leader processes it
> t4: read B arrives
> t5: follower attaches B to A's ReadIndex result 
> {code}
> This can be implemented using batching window with small batching interval 
> (e.g. 500 microseconds or less depending on the average latency). We will 
> batch the reads during the batching interval into one window. After the 
> batching interval is done, we will seal this window (i.e. no more reads will 
> be added into this read) and then we will send a ReadIndex that covers all 
> the reads under the sealed window (e.g. if the window has 5 read requests 
> then 1 ReadIndex will amortize the cost of ReadIndex).
> This idea is similar to the paper 
> https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf 
> (https://law-theorem.com/) where the "sync" lightweight write operation is 
> replaced with ReadIndex (which is also a form of "sync").



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to