[
https://issues.apache.org/jira/browse/KAFKA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lucas Wang reassigned KAFKA-4453:
---------------------------------
Assignee: Lucas Wang (was: Onur Karaman)
> add request prioritization
> --------------------------
>
> Key: KAFKA-4453
> URL: https://issues.apache.org/jira/browse/KAFKA-4453
> Project: Kafka
> Issue Type: Bug
> Reporter: Onur Karaman
> Assignee: Lucas Wang
> Priority: Major
>
> Today all requests (client requests, broker requests, controller requests) to
> a broker are put into the same queue. They all have the same priority. So a
> backlog of requests ahead of the controller request will delay the processing
> of controller requests. This causes requests infront of the controller
> request to get processed based on stale state.
> Side effects may include giving clients stale metadata\[1\], rejecting
> ProduceRequests and FetchRequests, and data loss (for some unofficial\[2\]
> definition of data loss in terms of messages beyond the high watermark)\[3\].
> We'd like to minimize the number of requests processed based on stale state.
> With request prioritization, controller requests get processed before regular
> queued up requests, so requests can get processed with up-to-date state.
> \[1\] Say a client's MetadataRequest is sitting infront of a controller's
> UpdateMetadataRequest on a given broker's request queue. Suppose the
> MetadataRequest is for a topic whose partitions have recently undergone
> leadership changes and that these leadership changes are being broadcasted
> from the controller in the later UpdateMetadataRequest. Today the broker
> processes the MetadataRequest before processing the UpdateMetadataRequest,
> meaning the metadata returned to the client will be stale. The client will
> waste a roundtrip sending requests to the stale partition leader, get a
> NOT_LEADER_FOR_PARTITION error, and will have to start all over and query the
> topic metadata again.
> \[2\] The official definition of data loss in kafka is when we lose a
> "committed" message. A message is considered "committed" when all in sync
> replicas for that partition have applied it to their log.
> \[3\] Say a number of ProduceRequests are sitting infront of a controller's
> LeaderAndIsrRequest on a given broker's request queue. Suppose the
> ProduceRequests are for partitions whose leadership has recently shifted out
> from the current broker to another broker in the replica set. Today the
> broker processes the ProduceRequests before the LeaderAndIsrRequest, meaning
> the ProduceRequests are getting processed on the former partition leader. As
> part of becoming a follower for a partition, the broker truncates the log to
> the high-watermark. With weaker ack settings such as acks=1, the leader may
> successfully write to its own log, respond to the user with a success,
> process the LeaderAndIsrRequest making the broker a follower of the
> partition, and truncate the log to a point before the user's produced
> messages. So users have a false sense that their produce attempt succeeded
> while in reality their messages got erased. While technically part of what
> they signed up for with acks=1, it can still come as a surprise.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)