[ 
https://issues.apache.org/jira/browse/KAFKA-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Bradstreet resolved KAFKA-9137.
-------------------------------------
    Resolution: Fixed

Closed by [https://github.com/apache/kafka/pull/7640]

> Maintenance of FetchSession cache causing FETCH_SESSION_ID_NOT_FOUND in live 
> sessions
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9137
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9137
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Lucas Bradstreet
>            Priority: Major
>
> We have recently seen cases where brokers end up in a bad state where fetch 
> session evictions occur at a high rate (> 16 per second) after a roll. This 
> increase in eviction rate included the following pattern in our logs:
>  
> {noformat}
> broker 6: October 31st 2019, 17:52:45.496 Created a new incremental 
> FetchContext for session id 2046264334, epoch 9790: added (), updated (), 
> removed ()
> broker 6: October 31st 2019, 17:52:45.496 Created a new incremental 
> FetchContext for session id 2046264334, epoch 9791: added (), updated (), 
> removed () broker 6: October 31st 2019, 17:52:45.500 Created a new 
> incremental FetchContext for session id 2046264334, epoch 9792: added (), 
> updated (lkc-7nv6o_tenant_soak_topic_144p-67), removed () 
> broker 6: October 31st 2019, 17:52:45.501 Created a new incremental 
> FetchContext for session id 2046264334, epoch 9793: added (), updated 
> (lkc-7nv6o_tenant_soak_topic_144p-59, lkc-7nv6o_tenant_soak_topic_144p-123, 
> lkc-7nv6o_tenant_soak_topic_144p-11, lkc-7nv6o_tenant_soak_topic_144p-3, 
> lkc-7nv6o_tenant_soak_topic_144p-67, lkc-7nv6o_tenant_soak_topic_144p-115), 
> removed () 
> broker 6: October 31st 2019, 17:52:45.501 Evicting stale FetchSession 
> 2046264334. 
> broker 6: October 31st 2019, 17:52:45.502 Session error for 2046264334: no 
> such session ID found. 
> broker 4: October 31st 2019, 17:52:45.813 [ReplicaFetcher replicaId=4, 
> leaderId=6, fetcherId=0] Node 6 was unable to process the fetch request with 
> (sessionId=2046264334, epoch=9793): FETCH_SESSION_ID_NOT_FOUND.  
> {noformat}
> This pattern appears to be problematic for two reasons. Firstly, the replica 
> fetcher for broker 4 was clearly able to send multiple incremental fetch 
> requests to broker 6, and receive replies, and did so right up to the point 
> where broker 6 evicted its fetch session within milliseconds of multiple 
> fetch requests. The second problem is that replica fetchers are considered 
> privileged for the fetch session cache, and should not be evicted by consumer 
> fetch sessions. This cluster only has 12 brokers and 1000 fetch session cache 
> slots (the default for max.incremental.fetch.session.cache.slots), and it 
> thus very unlikely that this session should have been evicted by another 
> replica fetcher session.
> This cluster also appears to be causing cycles of fetch session evictions 
> where the cluster never stabilizes into a state where fetch sessions are not 
> evicted. The above logs are the best example I could find of a case where a 
> session clearly should not have been evicted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to