[
https://issues.apache.org/jira/browse/NIFI-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinyu Wang updated NIFI-15889:
------------------------------
Description:
On NiFi startup, VersionedFlowSynchronizer synchronizes all child Process
Groups, and for each PG it iterates the versioned Parameter Contexts and calls
StandardParameterContextManager.getParameterContextNameMapping(). That method
rebuilds the entire name→PC HashMap from a stream collect on every call, giving
the synchronization an effective complexity of:
O(N_PG × N_PC^2)
For flows with thousands of Parameter Contexts the rebuild dominates startup
time and turns a minute-scale boot into an hour-scale one.
*Affected code*
nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/parameter/StandardParameterContextManager.java
@Override
public Map<String, ParameterContext> getParameterContextNameMapping()
{ return parameterContexts.values().stream()
.collect(Collectors.toMap(ParameterContext::getName, Function.identity())); }
Hot caller:
StandardVersionedComponentSynchronizer#createParameterContextWithoutReferences,
which is invoked from addProcessGroup for every child PG. The same name mapping
is rebuilt N_PG times even though the underlying parameterContexts map barely
changes between calls.
*Proposed fix*
Maintain a name index incrementally and cache an unmodifiable view; reads
become O(1), and the unique-name check at insert becomes O(1) too.
private final Map<String, ParameterContext> parameterContexts = new
HashMap<>();
private final Map<String, ParameterContext> nameIndex = new HashMap<>();
private volatile Map<String, ParameterContext> cachedNameMapping =
Collections.emptyMap();
@Override
public synchronized void addParameterContext(final ParameterContext pc) {
Objects.requireNonNull(pc);
final ParameterContext existingById =
parameterContexts.get(pc.getIdentifier());
if (existingById != null && !(existingById instanceof
ReferenceOnlyParameterContext))
{ throw new IllegalStateException("Cannot add Parameter Context
because another Parameter Context already exists with the same ID"); }
final ParameterContext existingByName = nameIndex.get(pc.getName());
if (existingByName != null && existingByName != existingById)
{ throw new IllegalStateException("Cannot add Parameter Context
because another Parameter Context already exists with the name '" + pc + "'");
}
// When replacing a ReferenceOnlyParameterContext placeholder, its name
// differs from the real PC name, so its stale entry must be removed
// from nameIndex first.
if (existingById instanceof ReferenceOnlyParameterContext)
{ nameIndex.remove(existingById.getName(), existingById); }
parameterContexts.put(pc.getIdentifier(), pc);
nameIndex.put(pc.getName(), pc);
cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));
}
@Override
public synchronized ParameterContext removeParameterContext(final String id) {
Objects.requireNonNull(id);
final ParameterContext removed = parameterContexts.remove(id);
if (removed != null)
{ nameIndex.remove(removed.getName(), removed);
cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));
}
return removed;
}
@Override
public Map<String, ParameterContext> getParameterContextNameMapping()
{ return cachedNameMapping; }
Single file, ~25 lines. Complexity changes from O(N_PG × N_PC^2) to O(N_PG ×
N_PC).
Subtle case: ReferenceOnlyParameterContext placeholder
ReferenceOnlyParameterContext is used to break the chicken-and-egg cycle when
Parameter Contexts inherit from each other. Its name is "Reference-Only
Parameter Context [<id>]", which is different from the real PC's name. When
addParameterContext is called to replace a placeholder with the real PC, the
placeholder's old name must be explicitly removed from nameIndex; otherwise
VersionedFlowSynchronizer.removeMissingParameterContexts walks the cached name
map, sees "Reference-Only Parameter Context [...]" not in the proposed name
list, looks up its ID, and calls removeParameterContext — deleting the real PC.
*Backwards compatibility*
- Public method signature is unchanged.
- Returned map is now an unmodifiable view — callers that mutated it would
already have been buggy.
- All mutators remain synchronized.
- Existing unique-name and unique-ID invariants are preserved.
was:
On NiFi startup, VersionedFlowSynchronizer synchronizes all child Process
Groups, and for each PG it iterates the versioned Parameter Contexts and calls
StandardParameterContextManager.getParameterContextNameMapping(). That method
rebuilds the entire name→PC HashMap from a stream collect on every call, giving
the synchronization an effective complexity of:
O(N_PG × N_PC^2)
For flows with thousands of Parameter Contexts the rebuild dominates startup
time and turns a minute-scale boot into an hour-scale one.
Affected code
nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/parameter/StandardParameterContextManager.java
@Override
public Map<String, ParameterContext> getParameterContextNameMapping() {
return parameterContexts.values().stream()
.collect(Collectors.toMap(ParameterContext::getName,
Function.identity()));
}
Hot caller:
StandardVersionedComponentSynchronizer#createParameterContextWithoutReferences,
which is invoked from addProcessGroup for every child PG. The same name mapping
is rebuilt N_PG times even though the underlying parameterContexts map barely
changes between calls.
Proposed fix
Maintain a name index incrementally and cache an unmodifiable view; reads
become O(1), and the unique-name check at insert becomes O(1) too.
private final Map<String, ParameterContext> parameterContexts = new
HashMap<>();
private final Map<String, ParameterContext> nameIndex = new HashMap<>();
private volatile Map<String, ParameterContext> cachedNameMapping =
Collections.emptyMap();
@Override
public synchronized void addParameterContext(final ParameterContext pc) {
Objects.requireNonNull(pc);
final ParameterContext existingById =
parameterContexts.get(pc.getIdentifier());
if (existingById != null && !(existingById instanceof
ReferenceOnlyParameterContext)) {
throw new IllegalStateException("Cannot add Parameter Context because
another Parameter Context already exists with the same ID");
}
final ParameterContext existingByName = nameIndex.get(pc.getName());
if (existingByName != null && existingByName != existingById) {
throw new IllegalStateException("Cannot add Parameter Context because
another Parameter Context already exists with the name '" + pc + "'");
}
// When replacing a ReferenceOnlyParameterContext placeholder, its name
// differs from the real PC name, so its stale entry must be removed
// from nameIndex first.
if (existingById instanceof ReferenceOnlyParameterContext) {
nameIndex.remove(existingById.getName(), existingById);
}
parameterContexts.put(pc.getIdentifier(), pc);
nameIndex.put(pc.getName(), pc);
cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));
}
@Override
public synchronized ParameterContext removeParameterContext(final String id) {
Objects.requireNonNull(id);
final ParameterContext removed = parameterContexts.remove(id);
if (removed != null) {
nameIndex.remove(removed.getName(), removed);
cachedNameMapping = Collections.unmodifiableMap(new
HashMap<>(nameIndex));
}
return removed;
}
@Override
public Map<String, ParameterContext> getParameterContextNameMapping() {
return cachedNameMapping;
}
Single file, ~25 lines. Complexity changes from O(N_PG × N_PC^2) to O(N_PG ×
N_PC).
Subtle case: ReferenceOnlyParameterContext placeholder
ReferenceOnlyParameterContext is used to break the chicken-and-egg cycle when
Parameter Contexts inherit from each other. Its name is "Reference-Only
Parameter Context [<id>]", which is different from the real PC's name. When
addParameterContext is called to replace a placeholder with the real PC, the
placeholder's old name must be explicitly removed from nameIndex; otherwise
VersionedFlowSynchronizer.removeMissingParameterContexts walks the cached name
map, sees "Reference-Only Parameter Context [...]" not in the proposed name
list, looks up its ID, and calls removeParameterContext — deleting the real PC.
Backwards compatibility
- Public method signature is unchanged.
- Returned map is now an unmodifiable view — callers that mutated it would
already have been buggy.
- All mutators remain synchronized.
- Existing unique-name and unique-ID invariants are preserved.
> Cache Parameter Context name mapping to fix O(N_PG × N_PC^2) startup
> synchronization
> ------------------------------------------------------------------------------------
>
> Key: NIFI-15889
> URL: https://issues.apache.org/jira/browse/NIFI-15889
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 2.5.0
> Reporter: Xinyu Wang
> Priority: Major
>
> On NiFi startup, VersionedFlowSynchronizer synchronizes all child Process
> Groups, and for each PG it iterates the versioned Parameter Contexts and
> calls StandardParameterContextManager.getParameterContextNameMapping(). That
> method rebuilds the entire name→PC HashMap from a stream collect on every
> call, giving the synchronization an effective complexity of:
> O(N_PG × N_PC^2)
> For flows with thousands of Parameter Contexts the rebuild dominates
> startup time and turns a minute-scale boot into an hour-scale one.
> *Affected code*
>
> nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/parameter/StandardParameterContextManager.java
> @Override
> public Map<String, ParameterContext> getParameterContextNameMapping()
> { return parameterContexts.values().stream()
> .collect(Collectors.toMap(ParameterContext::getName, Function.identity()));
> }
> Hot caller:
> StandardVersionedComponentSynchronizer#createParameterContextWithoutReferences,
> which is invoked from addProcessGroup for every child PG. The same name
> mapping is rebuilt N_PG times even though the underlying parameterContexts
> map barely changes between calls.
> *Proposed fix*
> Maintain a name index incrementally and cache an unmodifiable view; reads
> become O(1), and the unique-name check at insert becomes O(1) too.
> private final Map<String, ParameterContext> parameterContexts = new
> HashMap<>();
> private final Map<String, ParameterContext> nameIndex = new HashMap<>();
> private volatile Map<String, ParameterContext> cachedNameMapping =
> Collections.emptyMap();
> @Override
> public synchronized void addParameterContext(final ParameterContext pc) {
> Objects.requireNonNull(pc);
> final ParameterContext existingById =
> parameterContexts.get(pc.getIdentifier());
> if (existingById != null && !(existingById instanceof
> ReferenceOnlyParameterContext))
> { throw new IllegalStateException("Cannot add Parameter Context
> because another Parameter Context already exists with the same ID"); }
> final ParameterContext existingByName = nameIndex.get(pc.getName());
> if (existingByName != null && existingByName != existingById)
> { throw new IllegalStateException("Cannot add Parameter Context
> because another Parameter Context already exists with the name '" + pc +
> "'"); }
> // When replacing a ReferenceOnlyParameterContext placeholder, its name
> // differs from the real PC name, so its stale entry must be removed
> // from nameIndex first.
> if (existingById instanceof ReferenceOnlyParameterContext)
> { nameIndex.remove(existingById.getName(), existingById); }
> parameterContexts.put(pc.getIdentifier(), pc);
> nameIndex.put(pc.getName(), pc);
> cachedNameMapping = Collections.unmodifiableMap(new
> HashMap<>(nameIndex));
> }
> @Override
> public synchronized ParameterContext removeParameterContext(final String
> id) {
> Objects.requireNonNull(id);
> final ParameterContext removed = parameterContexts.remove(id);
> if (removed != null)
> { nameIndex.remove(removed.getName(), removed);
> cachedNameMapping = Collections.unmodifiableMap(new HashMap<>(nameIndex));
> }
> return removed;
> }
> @Override
> public Map<String, ParameterContext> getParameterContextNameMapping()
> { return cachedNameMapping; }
> Single file, ~25 lines. Complexity changes from O(N_PG × N_PC^2) to O(N_PG
> × N_PC).
> Subtle case: ReferenceOnlyParameterContext placeholder
> ReferenceOnlyParameterContext is used to break the chicken-and-egg cycle
> when Parameter Contexts inherit from each other. Its name is "Reference-Only
> Parameter Context [<id>]", which is different from the real PC's name. When
> addParameterContext is called to replace a placeholder with the real PC, the
> placeholder's old name must be explicitly removed from nameIndex; otherwise
> VersionedFlowSynchronizer.removeMissingParameterContexts walks the cached
> name map, sees "Reference-Only Parameter Context [...]" not in the proposed
> name list, looks up its ID, and calls removeParameterContext — deleting the
> real PC.
> *Backwards compatibility*
> - Public method signature is unchanged.
> - Returned map is now an unmodifiable view — callers that mutated it would
> already have been buggy.
> - All mutators remain synchronized.
> - Existing unique-name and unique-ID invariants are preserved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)