Mark Payne created NIFI-5480: -------------------------------- Summary: Improve efficiency of how components are looked up by Identifier Key: NIFI-5480 URL: https://issues.apache.org/jira/browse/NIFI-5480 Project: Apache NiFi Issue Type: Improvement Components: Core Framework Reporter: Mark Payne Assignee: Mark Payne
When we lookup a component by ID, we do so by obtaining the Root Process Group and then calling {{findLocalConnectable(String id)}}. This method obtains a read lock, then checks its map of Processors, its map of Input Ports, its map of Output Ports, and its map of Funnels. If no match is found, it then calls getRemoteProcessGroups() to iterate over each of those, looking for a Remote Input/Output Port with that ID. This call to {{getRemoteProcessGroups()}} creates a new {{HashSet}} that is then returned. If no match is found, we then call {{getProcessGroups()}} which also creates a new {{HashSet}} of ProcessGroup objects, and we iterate over those (recursively). This means that for each call to lookup a component by ID, we have to create two {{HashSet}}s - for each Process Group on the canvas, until the component is found. Consider a flow that has a dozen Process Groups and several thousand Processors/ports/funnels. If we then click "Start" on the root group, we must create up to 24 {{HashSet}} objects and obtain 12 Read Locks. This is done for each component, so for 1,000 Processors it will create 24,000 {{HashSet}}s and obtain 12,000 Read Locks. Also, since this is a mutable request, this has to be done for both the first and second phase of the request, which results in a total of 48,000 {{HashSet}}s and 24,000 Read Locks being obtained. Testing with 10,000 Processors I am seeing requests take well over 30 seconds to complete. All just to find a component by identifier. We can make this much more efficient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)