Rajini Sivaram created KAFKA-9632:
-------------------------------------

             Summary: Transient test failure: 
PartitionLockTest.testAppendReplicaFetchWithUpdateIsr
                 Key: KAFKA-9632
                 URL: https://issues.apache.org/jira/browse/KAFKA-9632
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.5.0
            Reporter: Rajini Sivaram
            Assignee: Rajini Sivaram


When running this test with {color:#660e7a}numRecordsPerProducer {color}= 
{color:#0000ff}500, {color:#172b4d}the test fails intermittently. The test uses 
MockTime and runs concurrent log operations. This can cause issues when 
attempting to roll a segment since Log and MockScheduler don't work well 
together. MockScheduler currently runs tasks while holding the MockScheduler 
lock. This can cause a deadlock if a thread attempts to schedule a task while 
holding a lock which is also acquired within a scheduled task.{color}
{color}

{color:#0000ff}{color:#172b4d}The issue in this test occurs when these two 
operations happen concurrently:{color}{color}

{color:#0000ff}{color:#172b4d}1) LogManager.cleanupLogs is a scheduled task 
that acquires Log lock. When run with MockScheduler, the thread holds 
MockScheduler lock and then attempts to acquire Log lock.{color}{color}

{color:#0000ff}{color:#172b4d}2) Partition.appendLogsToLeader holds Log lock 
and attempts to acquire MockScheduler lock in order to schedule a 
roll().{color}{color}

{color:#0000ff}{color:#172b4d}Since locking order is reversed in 1) and 2), 
this causes a deadlock.{color}{color}

{color:#0000ff}{color:#172b4d}The test itself can be easily fixed by avoiding 
roll() in the test. But it will be good to fix MockScheduler to enable it to be 
used in this case.{color}{color}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to