[
https://issues.apache.org/jira/browse/KAFKA-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajini Sivaram resolved KAFKA-7697.
-----------------------------------
Resolution: Fixed
Reviewer: Jason Gustafson
> Possible deadlock in kafka.cluster.Partition
> --------------------------------------------
>
> Key: KAFKA-7697
> URL: https://issues.apache.org/jira/browse/KAFKA-7697
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: Gian Merlino
> Assignee: Rajini Sivaram
> Priority: Blocker
> Fix For: 2.2.0, 2.1.1
>
> Attachments: threaddump.txt
>
>
> After upgrading a fairly busy broker from 0.10.2.0 to 2.1.0, it locked up
> within a few minutes (by "locked up" I mean that all request handler threads
> were busy, and other brokers reported that they couldn't communicate with
> it). I restarted it a few times and it did the same thing each time. After
> downgrading to 0.10.2.0, the broker was stable. I attached a thread dump from
> the last attempt on 2.1.0 that shows lots of kafka-request-handler- threads
> trying to acquire the leaderIsrUpdateLock lock in kafka.cluster.Partition.
> It jumps out that there are two threads that already have some read lock
> (can't tell which one) and are trying to acquire a second one (on two
> different read locks: 0x0000000708184b88 and 0x000000070821f188):
> kafka-request-handler-1 and kafka-request-handler-4. Both are handling a
> produce request, and in the process of doing so, are calling
> Partition.fetchOffsetSnapshot while trying to complete a DelayedFetch. At the
> same time, both of those locks have writers from other threads waiting on
> them (kafka-request-handler-2 and kafka-scheduler-6). Neither of those locks
> appear to have writers that hold them (if only because no threads in the dump
> are deep enough in inWriteLock to indicate that).
> ReentrantReadWriteLock in nonfair mode prioritizes waiting writers over
> readers. Is it possible that kafka-request-handler-1 and
> kafka-request-handler-4 are each trying to read-lock the partition that is
> currently locked by the other one, and they're both parked waiting for
> kafka-request-handler-2 and kafka-scheduler-6 to get write locks, which they
> never will, because the former two threads own read locks and aren't giving
> them up?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)