[GitHub] [kafka] jsancio commented on a change in pull request #10278: KAFKA-10526: leader fsync deferral on write

2021-03-17 Thread GitBox


jsancio commented on a change in pull request #10278:
URL: https://github.com/apache/kafka/pull/10278#discussion_r596219371



##
File path: raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java
##
@@ -1876,7 +1876,7 @@ private long maybeAppendBatches(
 BatchAccumulator.CompletedBatch batch = iterator.next();
 appendBatch(state, batch, currentTimeMs);
 }
-flushLeaderLog(state, currentTimeMs);
+//flushLeaderLog(state, currentTimeMs);

Review comment:
   We are tracking the LEO in two places:
   
   1. In `ReplicatedLog::endOffset`. This gets increase every time the log gets 
appended: 
https://github.com/apache/kafka/blob/28ee656081d5b7984c324c3ea3fc9c34614d17db/core/src/main/scala/kafka/log/Log.scala#L1302
   2. The `LeaderState` also stores what is now the LEO. One suggestion is for 
`LeaderState` to instead store the "flush offsets". In `LeaderState` the 
follower's flush offset is the LEO but for the local replica the "flush offset" 
may not be the LEO.
   
   An example of the high-watermark increasing but the LEO not changing:
   1. follower: LEO = 10
   2. leader: LEO = 100, FlushOffset = 100, HW = 0
   
   Follower successfully fetches for offset 10 => Leader: LEO = 100, 
FlushOffset = 100, HW = 10.
   Follower successfully fetches for offset 20 => Leader: LEO = 100, 
FlushOffset = 100, HW = 20.
   
   In this example if the leader already flushed to the LEO then there is no 
need to flush again when increasing the HW.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jsancio commented on a change in pull request #10278: KAFKA-10526: leader fsync deferral on write

2021-03-14 Thread GitBox


jsancio commented on a change in pull request #10278:
URL: https://github.com/apache/kafka/pull/10278#discussion_r593988386



##
File path: raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java
##
@@ -1876,7 +1876,7 @@ private long maybeAppendBatches(
 BatchAccumulator.CompletedBatch batch = iterator.next();
 appendBatch(state, batch, currentTimeMs);
 }
-flushLeaderLog(state, currentTimeMs);
+//flushLeaderLog(state, currentTimeMs);

Review comment:
   I think it makes sense to move this flush to 
`KafkaRaftClient::onUpdateLeaderHighWatermark`.
   
   It is possible for the high-watermark to increase without the LEO 
increasing. To avoid unnecessary flush calls, the leader should remember in 
`LeaderState` the LEO when it flushed the log.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jsancio commented on a change in pull request #10278: KAFKA-10526: leader fsync deferral on write

2021-03-12 Thread GitBox


jsancio commented on a change in pull request #10278:
URL: https://github.com/apache/kafka/pull/10278#discussion_r593419637



##
File path: raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java
##
@@ -1876,7 +1876,7 @@ private long maybeAppendBatches(
 BatchAccumulator.CompletedBatch batch = iterator.next();
 appendBatch(state, batch, currentTimeMs);
 }
-flushLeaderLog(state, currentTimeMs);
+//flushLeaderLog(state, currentTimeMs);

Review comment:
   The invariant that the leader most satisfy is that the `highWatermark <= 
flushOffset`. The current implementation satisfies this by flushing after every 
append and implicitly defining `flushOffset == logEndOffset`. At a high-level, 
I think the goals is to allow `highWatermark <= flushOffset <= logEndOffset`.
   
   On the follower, things are a little different. On the follower the 
`flushOffset == logEndOffset` before a `Fetch` request can be sent. This is 
because the leader assumes that the fetch offset in the `Fetch` request is the 
offset that the follower has successfully replicated.
   
   The advantage of appending without flushing as soon as possible replication 
latency. The leader cannot replicate record batches to the followers and 
observers until they have been appended to the log.
   
   I am not exactly sure how exactly we want to implement this since I haven't 
looked at the details but I think you are correct that on the leader side of 
things we want to increase the `flushOffset` in the `Fetch` request handling 
code as the leader attempts to increase the high-watermark. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org