Hello Mike Percy, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/8563 to review the following change. Change subject: KUDU-1078. Fix 'ops in future' error under load ...................................................................... KUDU-1078. Fix 'ops in future' error under load This fixes an error which often ends up in the logs when the WAL is under a lot of load: E0823 01:22:37.049101 24094 consensus_queue.cc:415] T 6ffafa79dd91459fb6a55da4b52f477d P 1a6f2219a4cc41f5803b03f8c4498f56 [LEADER]: Error trying to read ahead of the log whi le preparing peer request: Incomplete: Op with index 294511 is ahead of the local log (next sequential op: 294511). Destination peer: Peer: b039d94e5c3f486e907e7567d148cf21, Is new: false, Last received: 3.294511, Next index: 294512, Last known committed idx: 294508, Last exchange result: SUCCESS, Needs tablet copy: false As described in the JIRA, the issue was the following: - in LogCache::AppendOperations, we do the following order: 1. under lock: add new operations to the cache map 2. outside lock: Log::AsyncAppendReplicates (which may block when under load) 3. under lock: increase next_sequential_op_index_ - in LogCache::ReadOps(...), we do: 1. Look for the operation in the cache. 2. If it's a hit, iterate forward in the cache until we stop "hitting" So the following interleaving caused the error to be loggged: Thread 1 --------- - call LogCache::AppendOperations() with batch 10..20 - inserts 10..20 into the cache - blocks on AsyncAppendReplicates Thread 2 --------- - preparing a request for a follower, calls ReadOps(10) --- results in ops 10..20 being copied into the outgoing request - sends the request Later, while Thread 1 is still blocked: - response received from the peer for 10..20. 'last_replicated' = 20, so next_index = 21 - we call ReadOps(21) --- next_sequential_op_id_ has still not advanced from 10 --- this triggers the error message indicated above The fix, as hinted in the JIRA, is to update 'next_sequential_op_index_' before appending to the log, rather than after. This doesn't have any looser guarantees in terms of durability, since the WAL append itself is completely asynchronous -- appending to the WAL just enqueues the entries onto an in-memory queue and then asynchronously wakes up the background WAL appender thread. The new unit test crashed very rapidly without the fix with a CHECK failure showing the same Status::Incomplete() message as expected. With the fix it runs reliably. Change-Id: Ie062adcc7fcaa48f09bbe382267d8f755353a443 --- M src/kudu/consensus/log_cache-test.cc M src/kudu/consensus/log_cache.cc 2 files changed, 48 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/8563/1 -- To view, visit http://gerrit.cloudera.org:8080/8563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie062adcc7fcaa48f09bbe382267d8f755353a443 Gerrit-Change-Number: 8563 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Mike Percy <mpe...@apache.org>