Hello Mike Percy,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8563

to review the following change.


Change subject: KUDU-1078. Fix 'ops in future' error under load
......................................................................

KUDU-1078. Fix 'ops in future' error under load

This fixes an error which often ends up in the logs when the WAL is
under a lot of load:

E0823 01:22:37.049101 24094 consensus_queue.cc:415] T 
6ffafa79dd91459fb6a55da4b52f477d P 1a6f2219a4cc41f5803b03f8c4498f56 [LEADER]: 
Error trying to read ahead of the log whi le preparing peer request: 
Incomplete: Op with index 294511 is ahead of the local log (next sequential op: 
294511). Destination peer: Peer: b039d94e5c3f486e907e7567d148cf21, Is new: 
false, Last received: 3.294511, Next index: 294512, Last known committed idx: 
294508, Last exchange result: SUCCESS, Needs tablet copy: false

As described in the JIRA, the issue was the following:

- in LogCache::AppendOperations, we do the following order:
  1. under lock: add new operations to the cache map
  2. outside lock: Log::AsyncAppendReplicates (which may block when under load)
  3. under lock: increase next_sequential_op_index_

- in LogCache::ReadOps(...), we do:
   1. Look for the operation in the cache.
   2. If it's a hit, iterate forward in the cache until we stop "hitting"

So the following interleaving caused the error to be loggged:

Thread 1
---------
- call LogCache::AppendOperations() with batch 10..20
- inserts 10..20 into the cache
- blocks on AsyncAppendReplicates

Thread 2
---------
- preparing a request for a follower, calls ReadOps(10)
--- results in ops 10..20 being copied into the outgoing request
- sends the request

Later, while Thread 1 is still blocked:

- response received from the peer for 10..20. 'last_replicated' = 20, so 
next_index = 21
- we call ReadOps(21)
--- next_sequential_op_id_ has still not advanced from 10
--- this triggers the error message indicated above

The fix, as hinted in the JIRA, is to update 'next_sequential_op_index_' before
appending to the log, rather than after. This doesn't have any looser
guarantees in terms of durability, since the WAL append itself is
completely asynchronous -- appending to the WAL just enqueues the
entries onto an in-memory queue and then asynchronously wakes up the
background WAL appender thread.

The new unit test crashed very rapidly without the fix with a CHECK
failure showing the same Status::Incomplete() message as expected. With
the fix it runs reliably.

Change-Id: Ie062adcc7fcaa48f09bbe382267d8f755353a443
---
M src/kudu/consensus/log_cache-test.cc
M src/kudu/consensus/log_cache.cc
2 files changed, 48 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/8563/1
--
To view, visit http://gerrit.cloudera.org:8080/8563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie062adcc7fcaa48f09bbe382267d8f755353a443
Gerrit-Change-Number: 8563
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>

Reply via email to