[ 
https://issues.apache.org/jira/browse/KAFKA-14914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720372#comment-17720372
 ] 

li xiangyuan commented on KAFKA-14914:
--------------------------------------

After create this jira ticket we had encountered this problem again, we try
to fetch the index file but failed, for now we downgrade our aws ec2
instance to c6 and haven't met this. we are keeping tracking this.

Luke Chen (Jira) <j...@apache.org> 于2023年5月5日周五 09:46写道:



> binarySearch in AbstactIndex may execute with infinite loop
> -----------------------------------------------------------
>
>                 Key: KAFKA-14914
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14914
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.4.0
>            Reporter: li xiangyuan
>            Priority: Major
>         Attachments: stack.1.txt, stack.2.txt, stack.3.txt
>
>
> Recently our servers in production environment may suddenly stop handle 
> request frequently(for now 3 times in less than 10 days),   please check the 
> stack file uploaded, it show that 1 
> ioThread(data-plane-kafka-request-handler-11) hold  the ReadLock of 
> Partition's leaderIsrUpdateLock and keep run the binarySearch function, once 
> any thread(kafka-scheduler-2) need WriteMode Of this lock, then all requests 
> read this partition need ReadMode Lock will use out all ioThreads and then 
> this broker couldn't handle any request.
> the 3 stack files are fetched with interval  about 6 minute, with my 
> standpoint i just could think obviously the  binarySearch function cause dead 
> lock and I presuppose maybe the index block values in offsetIndex (at least 
> in mmap) are not sorted.
>  
> detail information:
> this problem appear in 2 brokers
> broker version: 2.4.0
> jvm: openjdk 11
> hardware: aws c7g 4xlarge, this is a arm64 server, we recently upgrade our 
> servers from c6g 4xlarge to this type, when we use c6g haven't meet this 
> problem, we don't know whether arm or aws c7g server have any problem.
> other: once we restart broker, it will recover, so we doubt offset index file 
> may not corrupted and maybe something wrong with mmap.
> plz give any suggestion solve this problem, thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to