Neha Narkhede created KAFKA-1155:
------------------------------------
Summary: Kafka server can miss zookeeper watches during long
zkclient callbacks
Key: KAFKA-1155
URL: https://issues.apache.org/jira/browse/KAFKA-1155
Project: Kafka
Issue Type: Bug
Components: controller
Affects Versions: 0.8, 0.8.1
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
On getting a zookeeper watch, zkclient invokes the blocking user callback and
only re-registers the watch after the callback returns. This leaves a possibly
large window of time when Kafka has not registered for watches on the desired
zookeeper paths and hence can miss important state changes (on the controller).
In any case, it is worth noting that even though zookeeper has a
read-and-set-watch API, there can always be a window of time between the watch
being fired, the callback and the read-and-set-watch API call. Due to the
zkclient wrapper, it is difficult to handle this properly in the Kafka code
unless we directly use the zookeeper client. One way of getting around this
issue is to use timestamps on the paths and when a watch fires, check if the
timestamp in zk is different from the one in the callback handler.
--
This message was sent by Atlassian JIRA
(v6.1#6144)