DaisyModi opened a new pull request, #4182: URL: https://github.com/apache/gobblin/pull/4182
## Summary GOBBLIN-2257 removed `synchronized` from `onAddSpec` and introduced a multi-threaded executor for parallel flow compilation, improving `flowConfigsV2` GET API P99 latency. However, `FlowCatalog.specSyncObjects` is a plain `HashMap` that is now accessed concurrently from multiple `updateOrAddSpecHelper` calls without synchronization. ### The bug In `updateOrAddSpecHelper`, concurrent threads call: - `specSyncObjects.put(...)` (adding sync objects) - `specSyncObjects.remove(...)` (cleaning up after persist) `HashMap` is not thread-safe for concurrent modifications. This causes: - **Lost entries** — a flow's syncObject disappears from the map - **`LaunchDagProc - error`** — `getSyncObject()` returns null, breaking synchronization with `NonScheduledJobRunner`, causing DAG initialization failures - **`DagNode or its job status not found for Reevaluate`** — orphaned DAGs from failed initialization ### The fix - **`HashMap` → `ConcurrentHashMap`** for `specSyncObjects` — drop-in replacement, no API change, no impact on P99 latency improvement - **Downgrade "SpecStore is missing in FlowCatalog" log from ERROR to WARN** — with concurrent writes to SpecStore, `getSpecURIs()` can list a URI before `addSpec()` fully commits the data. This is a transient condition already handled by exponential backoff retries; ERROR level creates false alarms in monitoring ## Test plan - [ ] Existing `FlowCatalogTest` passes - [ ] Verify in production: `LaunchDagProc - error` and `DagNode not found` errors should stop occurring - [ ] Verify in production: "SpecStore is missing in FlowCatalog" now logs at WARN instead of ERROR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
