Re: [PR] AMBARI-26614: Config and config-group changes are slow to propagate to agents on large clusters [ambari]

via GitHub Tue, 09 Jun 2026 07:59:50 -0700


eubnara commented on code in PR #4135:
URL: https://github.com/apache/ambari/pull/4135#discussion_r3381666463



##########
ambari-server/src/main/java/org/apache/ambari/server/agent/stomp/AgentConfigsHolder.java:
##########
@@ -84,9 +99,50 @@ public void updateData(Long clusterId, List<Long> hostIds) 
throws AmbariExceptio
       }
     }
 
-    for (Long hostId : hostIds) {
-      AgentConfigsUpdateEvent agentConfigsUpdateEvent = 
configHelper.getHostActualConfigs(hostId);
-      updateData(agentConfigsUpdateEvent);
+    final List<Long> targetHostIds = hostIds;
+    // Resolve each host's full (all-cluster) config so a host belonging to 
more than one cluster does
+    // not lose the other clusters' configs when its cached event is replaced. 
The shared cache
+    // (concurrent for the parallel path) resolves each cluster's desired 
configs at most once.
+    Map<Long, Map<String, DesiredConfig>> cachedClustersDesiredConfigs = new 
ConcurrentHashMap<>();
+    // IMPORTANT - DO NOT MOVE THIS READ INTO THE PARALLEL BLOCK BELOW.
+    // Pre-resolve the changed cluster's desired configs here, on the calling 
(request) thread, while
+    // that thread's write transaction is still open. The per-host recompute 
below runs on ForkJoinPool
+    // workers, and ConfigHelper.getHostActualConfigsExcludeCluster resolves a 
cluster's desired configs
+    // lazily via:  cachedClustersDesiredConfigs.computeIfAbsent(clusterId, id 
-> cl.getDesiredConfigs(false))
+    // A pool worker is a DIFFERENT thread with no inherited persistence 
context/transaction (the
+    // EntityManager/transaction is bound to the writing thread), so that lazy 
read sees only the last
+    // COMMITTED snapshot - never this request's still-uncommitted change - 
and pushes the PREVIOUS
+    // config value to agents. Because .get() keeps this transaction waiting 
for the workers, the worker
+    // read is always pre-commit, producing a deterministic off-by-one: each 
change lands on agents one
+    // change late, and the host stays stale until an ambari-server/agent 
restart re-resolves it.
+    // Seeding the cache here (read-your-writes on the request thread => fresh 
value) makes computeIfAbsent
+    // a no-op for this cluster, so the workers never call getDesiredConfigs 
themselves.
+    cachedClustersDesiredConfigs.put(clusterId, 
clusters.get().getCluster(clusterId).getDesiredConfigs(false));

Review Comment:
   I fixed the previous mentioned bug here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] AMBARI-26614: Config and config-group changes are slow to propagate to agents on large clusters [ambari]

Reply via email to