[ 
https://issues.apache.org/jira/browse/HDDS-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-5122:
--------------------------------------
    Description: 
During SCM reinitialialisation, ratis server is spinned up to check if an 
existing ratis group exists or not, and closes the server without starting it. 
In ratis, the segmented raft log worker thraeds are started during init() 
itself but get closed during raftServer.close() only if the server transitions 
to RUNNING state which causes the issue.

 
{code:java}
Attaching to process ID 266710, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.232-b09
Deadlock Detection:No deadlocks found.Thread 266745: (state = BLOCKED)Locked 
ownable synchronizers:
    - NoneThread 266783: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) 
@bci=20, line=215 (Compiled frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long)
 @bci=78, line=2078 (Compiled frame)
 - 
org.apache.ratis.util.DataBlockingQueue.poll(org.apache.ratis.util.TimeDuration)
 @bci=134, line=137 (Compiled frame)
 - org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run() 
@bci=16, line=292 (Interpreted frame)
 - 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$161.run()
 @bci=4 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)Locked ownable 
synchronizers:
    - NoneThread 266761: (state = BLOCKED)Locked ownable synchronizers:
    - NoneThread 266760: (state = BLOCKED)Locked ownable synchronizers:
    - NoneThread 266759: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=144 (Compiled frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=165 (Compiled frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=36, line=216 (Interpreted 
frame)Locked ownable synchronizers:
    - NoneThread 266758: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
 - java.lang.ref.Reference.tryHandlePending(boolean) @bci=54, line=191 
(Compiled frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=1, line=153 (Interpreted 
frame)Locked ownable synchronizers:
    - None
{code}

  was:During SCM reinitialialisation, ratis server is spinned up to check if an 
existing ratis group exists or not, and closes the server without starting it. 
In ratis, the segmented raft log worker thraeds are started during init() 
itself but get closed during raftServer.close() only if the server transitions 
to RUNNING state which causes the issue.


> SCM Reinitialization can end up leaking Ratis Segmented RaftLogWorker threads
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-5122
>                 URL: https://issues.apache.org/jira/browse/HDDS-5122
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>            Reporter: István Fajth
>            Assignee: Shashikant Banerjee
>            Priority: Major
>
> During SCM reinitialialisation, ratis server is spinned up to check if an 
> existing ratis group exists or not, and closes the server without starting 
> it. In ratis, the segmented raft log worker thraeds are started during init() 
> itself but get closed during raftServer.close() only if the server 
> transitions to RUNNING state which causes the issue.
>  
> {code:java}
> Attaching to process ID 266710, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 25.232-b09
> Deadlock Detection:No deadlocks found.Thread 266745: (state = BLOCKED)Locked 
> ownable synchronizers:
>     - NoneThread 266783: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) 
> @bci=20, line=215 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long)
>  @bci=78, line=2078 (Compiled frame)
>  - 
> org.apache.ratis.util.DataBlockingQueue.poll(org.apache.ratis.util.TimeDuration)
>  @bci=134, line=137 (Compiled frame)
>  - org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run() 
> @bci=16, line=292 (Interpreted frame)
>  - 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$161.run()
>  @bci=4 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)Locked ownable 
> synchronizers:
>     - NoneThread 266761: (state = BLOCKED)Locked ownable synchronizers:
>     - NoneThread 266760: (state = BLOCKED)Locked ownable synchronizers:
>     - NoneThread 266759: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=144 (Compiled 
> frame)
>  - java.lang.ref.ReferenceQueue.remove() @bci=2, line=165 (Compiled frame)
>  - java.lang.ref.Finalizer$FinalizerThread.run() @bci=36, line=216 
> (Interpreted frame)Locked ownable synchronizers:
>     - NoneThread 266758: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
>  - java.lang.ref.Reference.tryHandlePending(boolean) @bci=54, line=191 
> (Compiled frame)
>  - java.lang.ref.Reference$ReferenceHandler.run() @bci=1, line=153 
> (Interpreted frame)Locked ownable synchronizers:
>     - None
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to