Onur Karaman created KAFKA-4453:
-----------------------------------
Summary: add request prioritization
Key: KAFKA-4453
URL: https://issues.apache.org/jira/browse/KAFKA-4453
Project: Kafka
Issue Type: Bug
Reporter: Onur Karaman
Assignee: Onur Karaman
Today all requests (client requests, broker requests, controller requests) to a
broker are put into the same queue. They all have the same priority. So a
backlog of requests ahead of the controller request will delay the processing
of controller requests. This causes requests infront of the controller request
to get processed based on stale state.
Side effects may include giving clients stale metadata\[1\], rejecting
ProduceRequests and FetchRequests, and data loss (for some unofficial\[2\]
definition of data loss in terms of messages beyond the high watermark)\[3\].
We'd like to minimize the number of requests processed based on stale state.
With request prioritization, controller requests get processed before regular
queued up requests, so requests can get processed with up-to-date state.
\[1\] Say a client's MetadataRequest is sitting infront of a controller's
UpdateMetadataRequest on a given broker's request queue. Suppose the
MetadataRequest is for a topic whose partitions have recently undergone
leadership changes and that these leadership changes are being broadcasted from
the controller in the later UpdateMetadataRequest. Today the broker processes
the MetadataRequest before processing the UpdateMetadataRequest, meaning the
metadata returned to the client will be stale. The client will waste a
roundtrip sending requests to the stale partition leader, get a
NOT_LEADER_FOR_PARTITION error, and will have to start all over and query the
topic metadata again.
\[2\] The official definition of data loss in kafka is when we lose a
"committed" message. A message is considered "committed" when all in sync
replicas for that partition have applied it to their log.
\[3\] Say a number of ProduceRequests are sitting infront of a controller's
LeaderAndIsrRequest on a given broker's request queue. Suppose the
ProduceRequests are for partitions whose leadership has recently shifted out
from the current broker to another broker in the replica set. Today the broker
processes the ProduceRequests before the LeaderAndIsrRequest, meaning the
ProduceRequests are getting processed on the former partition leader. As part
of becoming a follower for a partition, the broker truncates the log to the
high-watermark. With weaker ack settings such as acks=1, the leader may
successfully write to its own log, respond to the user with a success, process
the LeaderAndIsrRequest making the broker a follower of the partition, and
truncate the log to a point before the user's produced messages. So users have
a false sense that their produce attempt succeeded while in reality their
messages got erased. While technically part of what they signed up for with
acks=1, it can still come as a surprise.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)