uds5501 commented on PR #18466: URL: https://github.com/apache/druid/pull/18466#issuecomment-3267606887
There's a flaw in the original approach. When the autoscaler event is triggered, the offsets are of time `t0`. When an autoscaler is triggered, the supervisor attempts to set the offsets present at `t0`, however, as part of the updateConfig, a checkpoint is triggered by the runners, this makes the offsets `t1` (offset t1 >= offset t0), so now the offsets being sent at updateConfig time are irrelevant. Instead, the chronology has to be: - [Supervisor] sets an internal `isUpdatingConfig` flag internally, hits the `updateConfig` API. - [TaskRunner] `isConfigChangeOngoing=true` , task runner is paused and checkpoint is triggered. - [Supervisor] Hits the setEndOffset API, checks if there was an ongoing `isUpdatingConfig` (continue normally, if not then attempt creating the new sequence as part of this endOffsetCall and toggle off the `isConfigChangeOngoing` in the runner). Concerns: - However this breaks the current design, the `updateConfig` API is not really updating config anymore, instead it's just performing a pause and forcing a checkpoint, so do we rename this to `pauseAndCheckpoint`? - This API will be very specific to just catering the auto scaler repartitioning of perpetually running task and won't be extensible (at least in any other way I could look at it) for general config updates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
