uds5501 commented on PR #18466:
URL: https://github.com/apache/druid/pull/18466#issuecomment-3267606887

   There's a flaw in the original approach. When the autoscaler event is 
triggered, the offsets are of time `t0`.
   When an autoscaler is triggered, the supervisor attempts to set the offsets 
present at `t0`, however, as part of the updateConfig, a checkpoint is 
triggered by the runners, this makes the offsets `t1` (offset t1 >= offset t0), 
so now the offsets being sent at updateConfig time are irrelevant.
   
   Instead, the chronology has to be:
   - [Supervisor] sets an internal `isUpdatingConfig` flag internally, hits the 
`updateConfig` API.
   - [TaskRunner] `isConfigChangeOngoing=true` , task runner is paused and 
checkpoint is triggered.
   - [Supervisor] Hits the setEndOffset API, checks if there was an ongoing 
`isUpdatingConfig` (continue normally, if not then attempt creating the new 
sequence as part of this endOffsetCall and toggle off the 
`isConfigChangeOngoing` in the runner).
   
   Concerns:
   - However this breaks the current design, the `updateConfig` API is not 
really updating config anymore, instead it's just performing a pause and 
forcing a checkpoint, so do we rename this to `pauseAndCheckpoint`?
   - This API will be very specific to just catering the auto scaler 
repartitioning of perpetually running task and won't be extensible (at least in 
any other way I could look at it) for general config updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to