[ 
https://issues.apache.org/jira/browse/NIFI-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated NIFI-15889:
------------------------------
    Description: 
On NiFi startup, VersionedFlowSynchronizer synchronizes all child Process 
Groups, and for each PG it iterates the versioned Parameter Contexts and calls 
StandardParameterContextManager.getParameterContextNameMapping(). That method 
rebuilds the entire name→PC HashMap from a stream collect on every call, giving 
the synchronization an effective complexity of:

  O(N_PG × N_PC^2)

  For flows with thousands of Parameter Contexts the rebuild dominates startup 
time and turns a minute-scale boot into an hour-scale one.

*Affected code*

  
nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/parameter/StandardParameterContextManager.java

  @Override
  public Map<String, ParameterContext> getParameterContextNameMapping()

{       return parameterContexts.values().stream()               
.collect(Collectors.toMap(ParameterContext::getName, Function.identity()));   }

  Hot caller: 
StandardVersionedComponentSynchronizer#createParameterContextWithoutReferences, 
which is invoked from addProcessGroup for every child PG. The same name mapping 
is rebuilt N_PG times even though the underlying parameterContexts map barely 
changes between calls.

*Proposed fix*

  Maintain a name index incrementally and cache an unmodifiable view; reads 
become O(1), and the unique-name check at insert becomes O(1) too.

  private final Map<String, ParameterContext> parameterContexts = new 
HashMap<>();
  private final Map<String, ParameterContext> nameIndex = new HashMap<>();
  private volatile Map<String, ParameterContext> cachedNameMapping = 
Collections.emptyMap();

  @Override
  public synchronized void addParameterContext(final ParameterContext pc) {
      Objects.requireNonNull(pc);

      final ParameterContext existingById = 
parameterContexts.get(pc.getIdentifier());
      if (existingById != null && !(existingById instanceof 
ReferenceOnlyParameterContext))

{           throw new IllegalStateException("Cannot add Parameter Context 
because another Parameter Context already exists with the same ID");       }

      final ParameterContext existingByName = nameIndex.get(pc.getName());
      if (existingByName != null && existingByName != existingById)

{           throw new IllegalStateException("Cannot add Parameter Context 
because another Parameter Context already exists with the name '" + pc + "'");  
     }

      // When replacing a ReferenceOnlyParameterContext placeholder, its name
      // differs from the real PC name, so its stale entry must be removed
      // from nameIndex first.
      if (existingById instanceof ReferenceOnlyParameterContext)

{           nameIndex.remove(existingById.getName(), existingById);       }

      parameterContexts.put(pc.getIdentifier(), pc);
      nameIndex.put(pc.getName(), pc);
      cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));
  }

  @Override
  public synchronized ParameterContext removeParameterContext(final String id) {
      Objects.requireNonNull(id);
      final ParameterContext removed = parameterContexts.remove(id);
      if (removed != null)

{           nameIndex.remove(removed.getName(), removed);           
cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));      
 }

      return removed;
  }

  @Override
  public Map<String, ParameterContext> getParameterContextNameMapping()

{       return cachedNameMapping;   }

  Single file, ~25 lines. Complexity changes from O(N_PG × N_PC^2) to O(N_PG × 
N_PC).

  Subtle case: ReferenceOnlyParameterContext placeholder

  ReferenceOnlyParameterContext is used to break the chicken-and-egg cycle when 
Parameter Contexts inherit from each other. Its name is "Reference-Only 
Parameter Context [<id>]", which is different from the real PC's name. When 
addParameterContext is called to replace a placeholder with the real PC, the 
placeholder's old name must be explicitly removed from nameIndex; otherwise 
VersionedFlowSynchronizer.removeMissingParameterContexts walks the cached name 
map, sees "Reference-Only Parameter Context [...]" not in the proposed name 
list, looks up its ID, and calls removeParameterContext — deleting the real PC.

*Backwards compatibility*

  - Public method signature is unchanged.
  - Returned map is now an unmodifiable view — callers that mutated it would 
already have been buggy.
  - All mutators remain synchronized.
  - Existing unique-name and unique-ID invariants are preserved.

  was:
On NiFi startup, VersionedFlowSynchronizer synchronizes all child Process 
Groups, and for each PG it iterates the versioned Parameter Contexts and calls 
StandardParameterContextManager.getParameterContextNameMapping(). That method 
rebuilds the entire name→PC HashMap from a stream collect on every call, giving 
the synchronization an effective complexity of:

  O(N_PG × N_PC^2)

  For flows with thousands of Parameter Contexts the rebuild dominates startup 
time and turns a minute-scale boot into an hour-scale one.

  Affected code

  
nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/parameter/StandardParameterContextManager.java

  @Override
  public Map<String, ParameterContext> getParameterContextNameMapping() {
      return parameterContexts.values().stream()
              .collect(Collectors.toMap(ParameterContext::getName, 
Function.identity()));
  }

  Hot caller: 
StandardVersionedComponentSynchronizer#createParameterContextWithoutReferences, 
which is invoked from addProcessGroup for every child PG. The same name mapping 
is rebuilt N_PG times even though the underlying parameterContexts map barely 
changes between calls.

  Proposed fix

  Maintain a name index incrementally and cache an unmodifiable view; reads 
become O(1), and the unique-name check at insert becomes O(1) too.

  private final Map<String, ParameterContext> parameterContexts = new 
HashMap<>();
  private final Map<String, ParameterContext> nameIndex = new HashMap<>();
  private volatile Map<String, ParameterContext> cachedNameMapping = 
Collections.emptyMap();

  @Override
  public synchronized void addParameterContext(final ParameterContext pc) {
      Objects.requireNonNull(pc);

      final ParameterContext existingById = 
parameterContexts.get(pc.getIdentifier());
      if (existingById != null && !(existingById instanceof 
ReferenceOnlyParameterContext)) {
          throw new IllegalStateException("Cannot add Parameter Context because 
another Parameter Context already exists with the same ID");
      }

      final ParameterContext existingByName = nameIndex.get(pc.getName());
      if (existingByName != null && existingByName != existingById) {
          throw new IllegalStateException("Cannot add Parameter Context because 
another Parameter Context already exists with the name '" + pc + "'");
      }

      // When replacing a ReferenceOnlyParameterContext placeholder, its name
      // differs from the real PC name, so its stale entry must be removed
      // from nameIndex first.
      if (existingById instanceof ReferenceOnlyParameterContext) {
          nameIndex.remove(existingById.getName(), existingById);
      }

      parameterContexts.put(pc.getIdentifier(), pc);
      nameIndex.put(pc.getName(), pc);
      cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));
  }

  @Override
  public synchronized ParameterContext removeParameterContext(final String id) {
      Objects.requireNonNull(id);
      final ParameterContext removed = parameterContexts.remove(id);
      if (removed != null) {
          nameIndex.remove(removed.getName(), removed);
          cachedNameMapping = Collections.unmodifiableMap(new 
HashMap<>(nameIndex));
      }
      return removed;
  }

  @Override
  public Map<String, ParameterContext> getParameterContextNameMapping() {
      return cachedNameMapping;
  }

  Single file, ~25 lines. Complexity changes from O(N_PG × N_PC^2) to O(N_PG × 
N_PC).

  Subtle case: ReferenceOnlyParameterContext placeholder

  ReferenceOnlyParameterContext is used to break the chicken-and-egg cycle when 
Parameter Contexts inherit from each other. Its name is "Reference-Only 
Parameter Context [<id>]", which is different from the real PC's name. When 
addParameterContext is called to replace a placeholder with the real PC, the 
placeholder's old name must be explicitly removed from nameIndex; otherwise 
VersionedFlowSynchronizer.removeMissingParameterContexts walks the cached name 
map, sees "Reference-Only Parameter Context [...]" not in the proposed name 
list, looks up its ID, and calls removeParameterContext — deleting the real PC.

  Backwards compatibility

  - Public method signature is unchanged.
  - Returned map is now an unmodifiable view — callers that mutated it would 
already have been buggy.
  - All mutators remain synchronized.
  - Existing unique-name and unique-ID invariants are preserved.


> Cache Parameter Context name mapping to fix O(N_PG × N_PC^2) startup 
> synchronization
> ------------------------------------------------------------------------------------
>
>                 Key: NIFI-15889
>                 URL: https://issues.apache.org/jira/browse/NIFI-15889
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 2.5.0
>            Reporter: Xinyu Wang
>            Priority: Major
>
> On NiFi startup, VersionedFlowSynchronizer synchronizes all child Process 
> Groups, and for each PG it iterates the versioned Parameter Contexts and 
> calls StandardParameterContextManager.getParameterContextNameMapping(). That 
> method rebuilds the entire name→PC HashMap from a stream collect on every 
> call, giving the synchronization an effective complexity of:
>   O(N_PG × N_PC^2)
>   For flows with thousands of Parameter Contexts the rebuild dominates 
> startup time and turns a minute-scale boot into an hour-scale one.
> *Affected code*
>   
> nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/parameter/StandardParameterContextManager.java
>   @Override
>   public Map<String, ParameterContext> getParameterContextNameMapping()
> {       return parameterContexts.values().stream()               
> .collect(Collectors.toMap(ParameterContext::getName, Function.identity()));   
> }
>   Hot caller: 
> StandardVersionedComponentSynchronizer#createParameterContextWithoutReferences,
>  which is invoked from addProcessGroup for every child PG. The same name 
> mapping is rebuilt N_PG times even though the underlying parameterContexts 
> map barely changes between calls.
> *Proposed fix*
>   Maintain a name index incrementally and cache an unmodifiable view; reads 
> become O(1), and the unique-name check at insert becomes O(1) too.
>   private final Map<String, ParameterContext> parameterContexts = new 
> HashMap<>();
>   private final Map<String, ParameterContext> nameIndex = new HashMap<>();
>   private volatile Map<String, ParameterContext> cachedNameMapping = 
> Collections.emptyMap();
>   @Override
>   public synchronized void addParameterContext(final ParameterContext pc) {
>       Objects.requireNonNull(pc);
>       final ParameterContext existingById = 
> parameterContexts.get(pc.getIdentifier());
>       if (existingById != null && !(existingById instanceof 
> ReferenceOnlyParameterContext))
> {           throw new IllegalStateException("Cannot add Parameter Context 
> because another Parameter Context already exists with the same ID");       }
>       final ParameterContext existingByName = nameIndex.get(pc.getName());
>       if (existingByName != null && existingByName != existingById)
> {           throw new IllegalStateException("Cannot add Parameter Context 
> because another Parameter Context already exists with the name '" + pc + 
> "'");       }
>       // When replacing a ReferenceOnlyParameterContext placeholder, its name
>       // differs from the real PC name, so its stale entry must be removed
>       // from nameIndex first.
>       if (existingById instanceof ReferenceOnlyParameterContext)
> {           nameIndex.remove(existingById.getName(), existingById);       }
>       parameterContexts.put(pc.getIdentifier(), pc);
>       nameIndex.put(pc.getName(), pc);
>       cachedNameMapping = Collections.unmodifiableMap(new 
> HashMap<>(nameIndex));
>   }
>   @Override
>   public synchronized ParameterContext removeParameterContext(final String 
> id) {
>       Objects.requireNonNull(id);
>       final ParameterContext removed = parameterContexts.remove(id);
>       if (removed != null)
> {           nameIndex.remove(removed.getName(), removed);           
> cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));    
>    }
>       return removed;
>   }
>   @Override
>   public Map<String, ParameterContext> getParameterContextNameMapping()
> {       return cachedNameMapping;   }
>   Single file, ~25 lines. Complexity changes from O(N_PG × N_PC^2) to O(N_PG 
> × N_PC).
>   Subtle case: ReferenceOnlyParameterContext placeholder
>   ReferenceOnlyParameterContext is used to break the chicken-and-egg cycle 
> when Parameter Contexts inherit from each other. Its name is "Reference-Only 
> Parameter Context [<id>]", which is different from the real PC's name. When 
> addParameterContext is called to replace a placeholder with the real PC, the 
> placeholder's old name must be explicitly removed from nameIndex; otherwise 
> VersionedFlowSynchronizer.removeMissingParameterContexts walks the cached 
> name map, sees "Reference-Only Parameter Context [...]" not in the proposed 
> name list, looks up its ID, and calls removeParameterContext — deleting the 
> real PC.
> *Backwards compatibility*
>   - Public method signature is unchanged.
>   - Returned map is now an unmodifiable view — callers that mutated it would 
> already have been buggy.
>   - All mutators remain synchronized.
>   - Existing unique-name and unique-ID invariants are preserved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to