RocMarshal commented on code in PR #25647:
URL: https://github.com/apache/flink/pull/25647#discussion_r1852202449
##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SlotSharingExecutionSlotAllocatorFactory.java:
##########
@@ -40,13 +41,16 @@ public SlotSharingExecutionSlotAllocatorFactory(
PhysicalSlotProvider slotProvider,
boolean slotWillBeOccupiedIndefinitely,
PhysicalSlotRequestBulkChecker bulkChecker,
- Duration allocationTimeout) {
+ Duration allocationTimeout,
+ TaskManagerOptions.TaskManagerLoadBalanceMode
taskManagerLoadBalanceMode) {
this(
slotProvider,
slotWillBeOccupiedIndefinitely,
bulkChecker,
allocationTimeout,
- new LocalInputPreferredSlotSharingStrategy.Factory());
+ taskManagerLoadBalanceMode ==
TaskManagerOptions.TaskManagerLoadBalanceMode.TASKS
+ ? new
TaskBalancedPreferredSlotSharingStrategy.Factory()
Review Comment:
In fact, the phenomenon of uneven task scheduling results is mainly caused
by two stages:
1. tasks->slot
2. slots->taskmanager
Because prior to the initial feature freeze, we anticipated that the entire
FLIP would be at risk. Our original intention in initiating this PR was to
release this change as early as possible to address the imbalance issue in the
task ->slot dimension. But now there is no time risk with this feature, so we
plan to release both dimensions of functionality together.
BTW, the optimizations and changes mentioned above are highly likely not to
be fixed in the current PR as it may be abandoned.
But we will place the corresponding optimization in the appropriate PR
position in the future. If you are willing to help with the review, I would
greatly appreciate it! :)
##########
docs/layouts/shortcodes/generated/all_taskmanager_section.html:
##########
@@ -90,7 +90,7 @@
<td><h5>taskmanager.load-balance.mode</h5></td>
<td style="word-wrap: break-word;">NONE</td>
<td><p>Enum</p></td>
- <td>Mode for the load-balance allocation strategy across all
available <code class="highlighter-rouge">TaskManagers</code>.<ul><li>The <code
class="highlighter-rouge">SLOTS</code> mode tries to spread out the slots
evenly across all available <code
class="highlighter-rouge">TaskManagers</code>.</li><li>The <code
class="highlighter-rouge">NONE</code> mode is the default mode without any
specified strategy.</li></ul><br /><br />Possible
values:<ul><li>"NONE"</li><li>"SLOTS"</li></ul></td>
+ <td>Mode for the load-balance allocation strategy across all
available <code class="highlighter-rouge">TaskManagers</code>.<ul><li>The <code
class="highlighter-rouge">SLOTS</code> mode tries to spread out the slots
evenly across all available <code
class="highlighter-rouge">TaskManagers</code>.</li><li>The <code
class="highlighter-rouge">TASKS</code> mode tries to schedule evenly all tasks
based on its' number across all available <code
class="highlighter-rouge">TaskManagers</code>. Note: Currently, enabling this
parameter can only achieve the balancing effect of the slot level dimension of
<code class="highlighter-rouge">DefaultScheduler</code>.</li><li>The <code
class="highlighter-rouge">NONE</code> mode is the default mode without any
specified strategy.</li></ul><br /><br />Possible
values:<ul><li>"NONE"</li><li>"SLOTS"</li><li>"TASKS"</li></ul></td>
Review Comment:
Good idea~, we do need a more detailed and precise description, as well as
traceable design documents.
How about making the following changes based on your suggestion?
1. Optimize the configuration description section in the code, including
adding FLIP addresses
2. The explanation of how to use and the differences between it and other
configurations, as well as the creation of charts, can be placed in
https://issues.apache.org/jira/browse/FLINK-33392 Completed in the middle
##########
flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java:
##########
@@ -707,6 +707,12 @@ public class TaskManagerOptions {
"The %s mode tries to
spread out the slots evenly across all available %s.",
code(TaskManagerLoadBalanceMode.SLOTS.name()),
code("TaskManagers")),
+ text(
+ "The %s mode tries to
schedule evenly all tasks based on its' number across all available %s. "
Review Comment:
The meaning expressed here is that the scheduler will try its best to ensure
that the scheduling results are concentrated and the number of tasks on each
Taskmanager is similar.
I will add a more detailed description
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]