[
https://issues.apache.org/jira/browse/KAFKA-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manikumar resolved KAFKA-6432.
------------------------------
Resolution: Fixed
Fix Version/s: 2.1.0
> Lookup indices may cause unnecessary page fault
> -----------------------------------------------
>
> Key: KAFKA-6432
> URL: https://issues.apache.org/jira/browse/KAFKA-6432
> Project: Kafka
> Issue Type: Improvement
> Components: core, log
> Reporter: Ying Zheng
> Assignee: Ying Zheng
> Priority: Major
> Fix For: 2.1.0
>
> Attachments: Binary Search - Diagram 1.png, Binary Search - Diagram
> 2.png
>
>
> For each topic-partition, Kafka broker maintains two indices: one for message
> offset, one for message timestamp. By default, a new index entry is appended
> to each index for every 4KB messages. The lookup of the indices is a simple
> binary search. The indices are mmaped files, and cached by Linux page cache.
> Both consumer fetch and follower fetch have to do an offset lookup, before
> accessing the actual message data. The simple binary search algorithm used
> for looking up the index is not cache friendly, and may cause page faults
> even on high QPS topic-partitions.
> For example (diagram 1), when looking up an index entry in page 12, the
> binary search algorithm has to read page 0, 6, 9 and 11. After new messages
> are appended to the topic-partition, the index grows to 13 pages. Now, if the
> follower fetch request looking up the 1st index entry of page 13, the binary
> search algorithm will go to page 0, 7, 10 and 12. Among those pages, page 7
> and 10 have not been used for a long time, and may already be swapped to hard
> disk.
> Actually, in a normal Kafka broker, all the follower fetch requests and most
> consumer fetch requests should only look up the last few entries of the
> index. We can make the index lookup more cache friendly, by searching in the
> last one or two pages of the index first. (Diagram 2)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)