[
https://issues.apache.org/jira/browse/FLINK-29344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036675#comment-18036675
]
RocMarshal commented on FLINK-29344:
------------------------------------
> *FYI:* Add a summary and outcomes of the latest discussion.
*> Song:* Fine-grained and reactive represent two different approaches to
determining job resources, and we haven’t seen a clear demand for supporting
both at the same time.{*}{*}
*> Pan:* In fact, even when the desired resource request equals the maximum
available resources, fine-grained resource management still makes sense — in
such cases, it’s roughly equivalent to the capabilities of the Default
Scheduler.
*> Song:* From what I’ve seen, the main use case for the reactive mode is in
*autoscaling* scenarios, typically used together with the {*}Kubernetes
Operator{*}. Kubernetes monitors the metrics and scales the TaskManagers
horizontally; Flink then configures the job parallelism based on the number of
available TaskManagers. In such cases, fine-grained resource management isn’t
applicable.
Fine-grained management makes sense when the job itself has {*}explicit
resource requirements{*}. In most business cases, it’s unacceptable to “just
run with lower parallelism” when resources are insufficient, since that would
cause unacceptable latency. If multiple jobs all run under resource shortages,
it would be better to *stop lower-priority jobs* and let high-priority ones run
without delay.
If we only consider the situation where the maximum resources can always be
satisfied, then reactive mode is unnecessary.
*> Pan:* I see. So, can we say that at the moment there isn’t a strong enough
*demand or motivation* to push this feature forward — not just a question of
solving the technical design issues that have already been discussed?
*> Song:* Yes, that’s right. It’s not something that can be easily addressed
without significant effort. Given the current situation and the lack of strong
demand, we don’t plan to actively invest in this feature for now.
*> Pan:* Got it. Let me make a brief summary:
> We’ll wait and see whether *there’s a more appropriate time to push this
> feature forward. At present, it seems we lack three key conditions or
> triggers:*
*> - A clear or strong demand/motivation* (e.g., few user requests for this
feature). *Priority: High*
{*}> - Solutions to the issues mentioned in the JIRA{*}. *Priority: Medium*
*> - Available reviewers* to help move the feature forward. *Priority: Medium*
*Thanks CC [~xtsong]*
> Make Adaptive Scheduler supports Fine-Grained Resource Management
> -----------------------------------------------------------------
>
> Key: FLINK-29344
> URL: https://issues.apache.org/jira/browse/FLINK-29344
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Xintong Song
> Assignee: Chesnay Schepler
> Priority: Major
>
> This ticket is a reflection of the following Slack discussion:
> {quote}
> Donatien Schmitz
> Adaptive Scheduler thread:
> Hey all, it seems like the Adaptive Scheduler does not support fine grain
> resource management. I have fixed it and would like to know if you would be
> interested in a PR or if it was purposely designed to not support Fine grain
> resource management.
> rmetzger
> @Donatien Schmitz: I’m concerned that we don’t have a lot of review capacity
> right now, and I’m now aware of any users asking for it.
> rmetzger
> I couldn’t find a ticket for adding this feature, did you find one?
> If not, can you add one? This will allow us to at least making this feature
> show up on google, and people might comment on it, if they need it.
> rmetzger
> If the change is fairly self-contained, is unlikely to cause instabilities,
> then we can also consider merging it
> rmetzger
> @Xintong Song what do you think?
> Xintong Song
> @rmetzger, thanks for involving me.
> @Donatien Schmitz, thanks for bringing this up, and for volunteering on
> fixing this. Could you explain a bit more about how do you plan to fix this?
> Fine-grained resource management is not yet supported by adaptive scheduler,
> because there’s an issue that we haven’t find a good solution for. Namely, if
> only part of the resource requirements can be fulfilled, how do we decide
> which requirements should be fulfilled. E.g., say the job declares it needs
> 10 slots with resource 1 for map tasks, and another 10 slots with resource 2
> for reduce tasks. If there’s not enough resources (say only 10 slots can be
> allocated for simplicity), how many slots for map / reduce tasks should be
> allocated? Obviously, <10 map, 0 reduce> & <0 map, 10 reduce> would not work.
> For this example, a proportional scale-down (<5 map, 5 reduce>) seems
> reasonable. However, a proportional scale-down is not always easy (e.g.,
> requirements is <100 map, 1 reduce>), and the issue grows more complicated if
> you take lots of stages and the differences of slot sizes into consideration.
> I’d like to see adaptive scheduler also supports fine-grained resource
> management. If there’s a good solution to the above issue, I’d love to help
> review the effort.
> Donatien Schmitz
> Dear Robert and Xintong, thanks for reading and reacting to my message! I'll
> reply tomorrow (GTM +1 time) if that's quite alright with you. Best, Donatien
> Schmitz
> Donatien Schmitz
> @Xintong Song
> * We are working on fine-grain scheduling for resource optimisation of long
> running or periodic jobs. One of the feature we are experiencing is a
> "rescheduling plan", a mapping of operators and Resource Profiles that can be
> dynamically applied to a running job. This rescheduling would be triggered by
> policies about some metrics (focus on RocksDB in our case).
> * While developing this new feature, we decided to implement it on the
> Adpative Scheduler instead of the Base Scheduler because the logic brought by
> the state machine already present made it more logical: transitions from
> states Executing -> Cancelling -> Rescheduling -> Waiting for Resources ->
> Creating -> Executing
> * In our case we are working on a POC and thus focusing on a real simple job
> with a // of 1. The issue you brought is indeed something we have faced while
> raising the // of the job.
> * If you create a Jira Ticket we can discuss it over there if you'd like!
> Donatien Schmitz
> @rmetzger The changes do not break the default resource management but does
> not fix the issue brought out by Xintong.
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)