[jira] [Commented] (NIFI-10052) Avoid obtaining any locks when creating/sending heartbeats

ASF subversion and git services (Jira) Wed, 17 Aug 2022 11:07:08 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580923#comment-17580923
 ]


ASF subversion and git services commented on NIFI-10052:
--------------------------------------------------------

Commit 2685856c629aa1bda20019981ed932ccecf9415a in nifi's branch 
refs/heads/main from Hsin-Ying Lee
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=2685856c62 ]

NIFI-10052 Avoid obtaining any locks when creating/sending heartbeats (#6298)



> Avoid obtaining any locks when creating/sending heartbeats
> ----------------------------------------------------------
>
>                 Key: NIFI-10052
>                 URL: https://issues.apache.org/jira/browse/NIFI-10052
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Hsin-Ying Lee
>            Priority: Major
>              Labels: cluster, heartbeat, stability
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When NiFi creates a heartbeat to send to the coordinator, it must obtain a 
> few locks in order to generate that heartbeat. We should avoid obtaining any 
> read locks, write locks, or synchronized monitors, especially those that may 
> be held for a while. Doing so can result in NiFi getting disconnected from 
> the cluster if a write lock is held for a long time.
> Specifically, the following locks are obtained, at minimum:
>  * FlowController readLock in the createHeartbeatMessage() method. Due to 
> refactoring, this read lock is not necessary at all.
>  * revisionManager.getRevisionUpdateCount() is synchronized. However, the 
> synchronization here is not needed, as it just returns an AtomicLong.get(). 
> This is perhaps the most important lock to avoid because any update to a 
> component or group of components happens within 
> revisionManager.updateRevision, which also is synchronized. So a large 
> request like deleting thousands of components will block heartbeats from 
> being created until this completes.
>  * FlowController.getTotalFlowFileCount - this may be the most challenging to 
> eliminate. It calls ProcessGroup.getConnections() and 
> ProcessGroup.getProcessGroups(), which means that it must obtain the read 
> lock of the Process Group twice - for every Process Group in the flow. We may 
> be able to change StandardProcessGroup's connections and processGroups maps 
> to ConcurrentHashMap's and just introduce a getQueueSize() method on 
> ProcessGroup that can avoid having to lock so much
>  * This createHeartbeatMessage() method also appears to reference 
> FlowController's {{connectionStatus}} member variable without any locks, 
> although it is not volatile and documentation indicates that it's guarded by 
> read/write lock. So that needs to be addressed in order to ensure that the 
> connectionStatus is always accurately reported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-10052) Avoid obtaining any locks when creating/sending heartbeats

Reply via email to