Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2023-10-07 Thread Xuannan Su
Hi Martijn,

Sorry for the late reply. I don't think it is feasible to always
enable object reuse. If I understand correctly, object reuse is
disabled by default to guarantee correctness because we cannot assume
that the custom operator/function is safe to enable object reuse.

The method proposed in the FLIP is to let the operator inform the
Flink runtime whether it is safe to reuse the emitted records. It
provides a fine-grained way of controlling the object reuse behavior
at the operator level. In the long term, instead of always enabling
object reuse, it is better to remove the object-reuse configuration
and let the runtime determine whether to enable object reuse for each
operator.

I hope that addresses your question. Please let me know if you have
further comments.

Best regards,
Xuannan


On Fri, Sep 29, 2023 at 8:47 AM Martijn Visser  wrote:
>
> Hi Xuannan,
>
> I have one question more from a strategic point of view: given that
> we're working on Flink 2.0, wouldn't we actually want to be in a
> situation where object-reuse is always used and don't make it
> configurable anymore? IIRC, the only reason why it's a configuration
> is for backward compatibility.
>
> Best regards,
>
> Martijn
>
> On Tue, Sep 26, 2023 at 1:32 AM Xuannan Su  wrote:
> >
> > Hi all,
> >
> > We would like to revive the discussion and provide a quick update on
> > the recent work of the FLIP. We have implemented a POC[1], run cases
> > in the flink-benchmarks[2] against the POC, and verified that many of
> > the operators in the benchmark will enable object-reuse without code
> > changes, while the global object-reuse is disabled.
> >
> > Please let me know if you have any further comments on the FLIP. If
> > there are no more comments, we will open the voting in 3 days.
> >
> > Best regards,
> > Xuannan
> >
> > [1] https://github.com/apache/flink/pull/22897
> > [2] https://github.com/apache/flink-benchmarks
> >
> >
> > On Fri, Jul 7, 2023 at 9:18 AM Dong Lin  wrote:
> > >
> > > Hi Jing,
> > >
> > > Thank you for the suggestion. Yes, we can extend it to support null if in
> > > the future we find any use-case for this flexibility.
> > >
> > > Best,
> > > Dong
> > >
> > > On Thu, Jul 6, 2023 at 7:55 PM Jing Ge  wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > one scenario I could imagine is that users could enable global object
> > > > reuse features but force deep copy for some user defined specific 
> > > > functions
> > > > because of any limitations. But that is only my gut feeling. And agree, 
> > > > we
> > > > could keep the solution simple for now as FLIP described and upgrade to 
> > > > 3VL
> > > > once there are such real requirements that are rising.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Thu, Jul 6, 2023 at 12:30 PM Dong Lin  wrote:
> > > >
> > > >> Hi Jing,
> > > >>
> > > >> Thank you for the detailed explanation. Please see my reply inline.
> > > >>
> > > >> On Thu, Jul 6, 2023 at 3:17 AM Jing Ge  wrote:
> > > >>
> > > >>> Hi Xuannan, Hi Dong,
> > > >>>
> > > >>> Thanks for your clarification.
> > > >>>
> > > >>> @Xuannan
> > > >>>
> > > >>> A Jira ticket has been created for the doc update:
> > > >>> https://issues.apache.org/jira/browse/FLINK-32546
> > > >>>
> > > >>> @Dong
> > > >>>
> > > >>> I don't have a concrete example. I just thought about it from a
> > > >>> conceptual or pattern's perspective. Since we have 1. coarse-grained 
> > > >>> global
> > > >>> switch(CGS as abbreviation), i.e. the pipeline.object-reuse and 2.
> > > >>> fine-grained local switch(FGS as abbreviation), i.e. the
> > > >>> objectReuseCompliant variable for specific operators/functions, there 
> > > >>> will
> > > >>> be the following patterns with appropriate combinations:
> > > >>>
> > > >>> pattern 1: coarse-grained switch only. Local object reuse will be
> > > >>> controlled by the coarse-grained switch:
> > > >>> 1.1 cgs == true -> local object reused enabled
> > > >>> 1.2 cgs == true  -> local object reused enabled
> > > >>> 1.3 cgs == false -> local object reused disabled, i.e. deep copy 
> > > >>> enabled
> > > >>> 1.4 cgs == false -> local object reused disabled, i.e. deep copy 
> > > >>> enabled
> > > >>>
> > > >>> afaiu, this is the starting point. I wrote 4 on purpose to make the
> > > >>> regression check easier. We can consider it as the combinations with
> > > >>> cgs(true/false) and fgs(true/false) while fgs is ignored.
> > > >>>
> > > >>> Now we introduce fine-grained switch. There will be two patterns:
> > > >>>
> > > >>> pattern 2: fine-grained switch over coarse-grained switch.
> > > >>> Coarse-grained switch will be ignored when the local fine-grained 
> > > >>> switch
> > > >>> has different value:
> > > >>> 2.1 cgs == true and fgs == true -> local object reused enabled
> > > >>> 2.2 cgs == true and fgs == false -> local object reused disabled, i.e.
> > > >>> deep copy enabled
> > > >>> 2.3 cgs == false and fgs == true -> local object reused enabled
> > > >>> 2.4 cgs == false and fgs == fa

[DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Yangze Guo
Hi, there,

We'd like to start a discussion thread on "FLIP-374: Adding a separate
configuration for specifying Java Options of the SQL Gateway"[1],
where we propose adding a separate configuration option to specify the
Java options for the SQL Gateway. This would allow users to fine-tune
the memory settings, garbage collection behavior, and other relevant
Java parameters specific to the SQL Gateway, ensuring optimal
performance and stability in production environments.

Looking forward to your feedback.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway

Best,
Yangze Guo


[jira] [Created] (FLINK-33203) FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-33203:
--

 Summary: FLIP-374: Adding a separate configuration for specifying 
Java Options of the SQL Gateway
 Key: FLINK-33203
 URL: https://issues.apache.org/jira/browse/FLINK-33203
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Gateway
Reporter: Yangze Guo
Assignee: Yangze Guo
 Fix For: 1.19.0


{color:#00}The SQL Gateway is an essential component of Flink in OLAP 
scenarios, and its performance and stability determine the SLA of Flink as an 
OLAP service. Just like other components in Flink, we propose adding a separate 
configuration option to specify the Java options for the SQL Gateway. This 
would allow users to fine-tune the memory settings, garbage collection 
behavior, and other relevant Java parameters specific to the SQL Gateway, 
ensuring optimal performance and stability in production environments.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-07 Thread Rui Fan
Hi Shammon,

IIUC, you want more flexibility in controlling the two-phase strategy,
right?

> I want this because we would like to add a new slot to TM strategy such
as SLOTS_NUM in the future for OLAP to improve the performance for olap
jobs, which will use TASKS strategy for task to slot. cc Guoyangze

Actually, one option can achieve your requirement, it can control two-phase.
We can add a new enum for this option, and it will use the new strategy for
slot to TM, and use task_balanced strategy for task to slot.

Of course, I think 2 options is more flexible. If the strategy is too many,
2 options are easy for users.

Also, I have a question: What is SLOTS_NUM strategy? Isn't it slot balanced
at tm level?
I want to check whether it's similar to `cluster.evenly-spread-out-slots`.
If they are similar or same, the strategy isn't too many, and one option
may be enough.

Best,
Rui

On Sat, Oct 7, 2023 at 11:29 AM Shammon FY  wrote:

> Thanks Rui, I check the codes and you're right.
>
> As you described above, the entire process is actually two independent
> steps from slot to TM and task to slot. Currenlty we use option
> `cluster.evenly-spread-out-slots` for both of them. Can we provide
> different options for the two steps, such as ANY/SLOTS for RM and ANY/TASKS
> for slot pool?
>
> I want this because we would like to add a new slot to TM strategy such as
> SLOTS_NUM in the future for OLAP to improve the performance for olap jobs,
> which will use TASKS strategy for task to slot. cc Guoyangze
>
> Best,
> Shammon FY
>
> On Fri, Oct 6, 2023 at 6:19 PM xiangyu feng  wrote:
>
>> Thanks Yuepeng and Rui for driving this Discussion.
>>
>> Internally when we try to use Flink 1.17.1 in production, we are also
>> suffering from the unbalanced task distribution problem for jobs with high
>> qps and complex dag. So +1 for the overall proposal.
>>
>> Some questions about the details:
>>
>> 1, About the waiting mechanism: Will the waiting mechanism happen only in
>> the second level 'assigning slots to TM'?  IIUC, the first level
>> 'assigning
>> Tasks to Slots' needs only the asynchronous slot result from slotpool.
>>
>> 2, About the slot LoadingWeight: it is reasonable to use the number of
>> tasks by default in the beginning, but it would be better if this could be
>> easily extended in future to distinguish between CPU-intensive and
>> IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
>> others have CPU bottlenecks.
>>
>> Regards,
>> Xiangyu
>>
>>
>> Yuepeng Pan  于2023年10月5日周四 18:34写道:
>>
>> > Hi, Zhu Zhu,
>> >
>> > Thanks for your feedback!
>> >
>> > > I think we can introduce a new config option
>> > > `taskmanager.load-balance.mode`,
>> > > which accepts "None"/"Slots"/"Tasks".
>> `cluster.evenly-spread-out-slots`
>> > > can be superseded by the "Slots" mode and get deprecated. In the
>> future
>> > > it can support more mode, e.g. "CpuCores", to work better for jobs
>> with
>> > > fine-grained resources. The proposed config option
>> > > `slot.request.max-interval`
>> > > then can be renamed to
>> > `taskmanager.load-balance.request-stablizing-timeout`
>> > > to show its relation with the feature. The proposed
>> > `slot.sharing-strategy`
>> > > is not needed, because the configured "Tasks" mode will do the work.
>> >
>> > The new proposed configuration option sounds good to me.
>> >
>> > I have a small question, If we set our configuration value to 'Tasks,'
>> it
>> > will initiate two processes: balancing the allocation of task
>> quantities at
>> > the slot level and balancing the number of tasks across TaskManagers
>> (TMs).
>> > Alternatively, if we configure it as 'Slots,' the system will employ the
>> > LocalPreferred allocation policy (which is the default) when assigning
>> > tasks to slots, and it will ensure that the number of slots used across
>> TMs
>> > is balanced.
>> > Does  this configuration essentially combine a balanced selection
>> strategy
>> > across two dimensions into fixed configuration items, right?
>> >
>> > I would appreciate it if you could correct me if I've made any errors.
>> >
>> > Best,
>> > Yuepeng.
>> >
>>
>


Re: [DISCUSS] FLIP-368 Reorganize the exceptions thrown in state interfaces

2023-10-07 Thread Zakelly Lan
Hi Jing,

Sorry for the late reply! I agree with you that we do not expect users
to do anything with Flink and we won't "bother" them with those
exceptions. However, users can still catch the `Throwable` and perform
any necessary logging activities, similar to how they use Java
Collection interfaces.


Thanks for your insights!

Best,
Zakelly

On Thu, Sep 21, 2023 at 8:43 PM Jing Ge  wrote:
>
> Fair enough! Thanks Zakelly for the information. Afaic, even users can do
> nothing with Flink, they still can do something in their territory, at
> least doing some logging and metrics stuff, or triggering some other
> services in their ecosystem. After all, the Flink jobs they build are part
> of their service component. It didn't change the fact that we are going to
> use the anti-pattern. Just because we didn't expect users to do
> anything with Flink, does not mean users don't expect to do something with
> the expected exception. Anyway, I am open to hearing different opinions.
>
> Best regards,
> Jing
>
> On Thu, Sep 21, 2023 at 7:02 AM Zakelly Lan  wrote:
>
> > Hi Martijn,
> >
> > Thanks for the reminder!
> >
> > This FLIP proposes a change to the state API that is annotated as
> > @PublicEvolving and targets version 1.19.  I have clarified this in
> > the "Proposed Change" section of the FLIP.
> >
> >
> > Hi Jing,
> >
> > Thanks for sharing your thoughts! Here are my opinions:
> >
> > 1. The exceptions of the state API are usually treated as critical
> > ones. In other words, if anything goes wrong with state accessing, the
> > element processing cannot proceed and the job should fail. Flink users
> > may not know what to do when they encounter these exceptions. I
> > believe this is the main reason why we want to replace them with
> > unchecked exceptions.
> > 2. There have also been some further discussions[1][2] from Stephan
> > and Shixiaogang below the one you pointed out [3], and it seems they
> > come to an agreement to use unchecked exceptions. After reviewing the
> > entire discussion on that PR, I think their arguments are reasonable
> > given the use case.
> >
> > Looking forward to your feedback.
> >
> >
> > Best,
> > Zakelly
> >
> > [1] https://github.com/apache/flink/pull/3380#issuecomment-286807853
> > [2] https://github.com/apache/flink/pull/3380#issuecomment-286932133
> > [3] https://github.com/apache/flink/pull/3380#issuecomment-281631160
> >
> > On Thu, Sep 21, 2023 at 1:27 AM Jing Ge 
> > wrote:
> > >
> > > sorry, typo: It is a known "anti-pattern" instead of "ant-pattern"
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Wed, Sep 20, 2023 at 7:23 PM Jing Ge  wrote:
> > >
> > > > Hi Zakelly,
> > > >
> > > > Thanks for driving this topic. From good software engineering's
> > > > perspective, I have different thoughts:
> > > >
> > > > 1. The idea to get rid of all checked Exceptions and replace them with
> > > > unchecked Exceptions is a known ant-pattern: "Generally speaking, do
> > not
> > > > throw a RuntimeException or create a subclass of RuntimeException
> > simply
> > > > because you don't want to be bothered with specifying the exceptions
> > your
> > > > methods can throw." [1] Checked Exceptions mean expected exceptions
> > that
> > > > can help developers find a way to catch them and decide what to do. It
> > is
> > > > part of the public API signature that can help developers build robust
> > > > systems. We should not mix concepts and build expected exceptions with
> > > > unchecked Java Exception classes.
> > > > 2. The comment Stephan left [2] clearly pointed out that we should
> > avoid
> > > > using generic Java Exceptions, and "find some more 'specific'
> > exceptions
> > > > for the signature, like throws IOException or throws
> > StateAccessException."
> > > > So, the idea is to define/use specific checked Exception classes
> > instead of
> > > > using unchecked Exceptions.
> > > >
> > > > Looking forward to your thoughts.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > [1]
> > > >
> > https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html
> > > > [2] https://github.com/apache/flink/pull/3380#issuecomment-281631160
> > > >
> > > > On Wed, Sep 20, 2023 at 4:52 PM Zakelly Lan 
> > wrote:
> > > >
> > > >> Hi Yanfei,
> > > >>
> > > >> Thanks for your reply!
> > > >>
> > > >> Yes, this FLIP aims to change all state-related exceptions to
> > > >> unchecked exceptions and remove all exceptions from the signature. So
> > > >> I believe we have come to an agreement to keep the interfaces simple.
> > > >>
> > > >>
> > > >> Best regards,
> > > >> Zakelly
> > > >>
> > > >> On Wed, Sep 20, 2023 at 2:26 PM Zakelly Lan 
> > > >> wrote:
> > > >> >
> > > >> > Hi Hangxiang,
> > > >> >
> > > >> > Thank you for your response! Here are my thoughts:
> > > >> >
> > > >> > 1. Regarding the exceptions thrown by internal interfaces, I suggest
> > > >> > keeping them as checked exceptions. Since these exceptions will be
> > > >> > handled by the internal cal

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-07 Thread Rui Fan
Hi Yangze,

> 2. From my understanding, if user enable the
> cluster.evenly-spread-out-slots,
> LeastUtilizationResourceMatchingStrategy will be used to determine the
> slot distribution and the slot allocation in the three TM will be
> (taskmanager.numberOfTaskSlots=3):
> TM1: 3 slot
> TM2: 2 slot
> TM3: 2 slot

When all tms are ready in advance, the three TM will be:
TM1: 3 slot
TM2: 2 slot
TM3: 2 slot

For application mode, the resource manager doesn't apply for
TM in advance, and slots aren't enough before the third TM is ready.
So all slots of the second TM will be used up. The three TM will be:
TM1: 3 slot
TM2: 3 slot
TM3: 1 slot

That's why the FLIP add some notes:

   - All *free* slots are in the last TM, because ResourceManager doesn’t
   have the waiting mechanism, and it just requests 7 slots for this JobMaster.
   - Why is it acceptable?


   -
  - If we just add the waiting mechanism to JobMaster but not in
  ResourceManager, all *free* slots will be in the last TM. All slots
  of other TMs are offered to JM.
  - That is, only one TM may have fewer tasks than the other TMs. The
  difference between the number of tasks of other TMs is at most 1.So When
  *p* >> *slotsPerTM*, the problem can be ignored.
  - We can also suggest users, in cases that p is small, it's better to
  configure *slotsPerTM* to 1, or let *p % slotsPerTM* == 0.

Please correct me if my understanding is wrong, thanks~

Best,
Rui

On Sun, Oct 1, 2023 at 7:38 PM Yangze Guo  wrote:

> Hi, Rui,
>
> 1. With the current mechanism, when physical slots are offered from
> TM, the JobMaster will start deploying tasks and synchronizing their
> states. With the addition of the waiting mechanism, IIUC, the
> JobMaster will deploy and synchronize the states of all tasks only
> after all resources are available. The task deployment and state
> synchronization both occupy the JobMaster's RPC main thread. In
> complex jobs with a lot of tasks, this waiting mechanism may increase
> the pressure on the JobMaster and increase the end-to-end job
> deployment time.
>
> 2. From my understanding, if user enable the
> cluster.evenly-spread-out-slots,
> LeastUtilizationResourceMatchingStrategy will be used to determine the
> slot distribution and the slot allocation in the three TM will be
> (taskmanager.numberOfTaskSlots=3):
> TM1: 3 slot
> TM2: 2 slot
> TM3: 2 slot
>
> Best,
> Yangze Guo
>
> On Sun, Oct 1, 2023 at 6:14 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > Hi Shammon,
> >
> > Thanks for your feedback as well!
> >
> > > IIUC, the overall balance is divided into two parts: slot to TM and
> task
> > to slot.
> > > 1. Slot to TM is guaranteed by SlotManager in ResourceManager
> > > 2. Task to slot is guaranteed by the slot pool in JM
> > >
> > > These two are completely independent, what are the benefits of unifying
> > > these two into one option? Also, do we want to share the same
> > > option between SlotPool in JM and SlotManager in RM? This sounds a bit
> > > strange.
> >
> > Your understanding is totally right, the balance needs 2 parts: slot to
> TM
> > and task to slot.
> >
> > As I understand, the following are benefits of unifying them into one
> > option:
> >
> > - Flink users don't care about these principles inside of flink, they
> don't
> > know these 2 parts.
> > - If flink provides 2 options, flink users need to set 2 options for
> their
> > job.
> > - If one option is missed, the final result may not be good. (Users may
> > have questions when using)
> > - If flink just provides 1 option, enabling one option is enough. (Reduce
> > the probability of misconfiguration)
> >
> > Also, Flink’s options are user-oriented. Each option represents a switch
> or
> > parameter of a feature.
> > A feature may be composed of multiple components inside Flink.
> > It might be better to keep only one switch per feature.
> >
> > Actually, the cluster.evenly-spread-out-slots option is used between
> > SlotPool in JM and SlotManager in RM. 2 components to ensure
> > this feature works well.
> >
> > Please correct me if my understanding is wrong,
> > and looking forward to your feedback, thanks!
> >
> > Best,
> > Rui
> >
> > On Sun, Oct 1, 2023 at 5:52 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Yangze,
> > >
> > > Thanks for your feedback!
> > >
> > > > 1. Is it possible for the SlotPool to get the slot allocation results
> > > > from the SlotManager in advance instead of waiting for the actual
> > > > physical slots to be registered, and perform pre-allocation? The
> > > > benefit of doing this is to make the task deployment process
> smoother,
> > > > especially when there are a large number of tasks in the job.
> > >
> > > Could you elaborate on that? I didn't understand what's the benefit and
> > > smoother.
> > >
> > > > 2. If user enable the cluster.evenly-spread-out-slots, the issue in
> > > > example 2 of section 2.2.3 can be resolved. Do I understand it
> > > > correctly?
> > >
> > > The exam

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-07 Thread Yangze Guo
Thanks for the clarification, Rui.

I believe the root cause of this issue is that in the current
DefaultResourceAllocationStrategy, slot allocation begins before the
decision to PendingTaskManagers requesting is made. That can be fixed
within the strategy without introducing another waiting mechanism. I
think it would be better to address this issue within the scope of
this FLIP. However, I don't have a strong opinion on it, it depends on
your bandwidth.


Best,
Yangze Guo

On Sat, Oct 7, 2023 at 4:16 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> Hi Yangze,
>
> > 2. From my understanding, if user enable the
> > cluster.evenly-spread-out-slots,
> > LeastUtilizationResourceMatchingStrategy will be used to determine the
> > slot distribution and the slot allocation in the three TM will be
> > (taskmanager.numberOfTaskSlots=3):
> > TM1: 3 slot
> > TM2: 2 slot
> > TM3: 2 slot
>
> When all tms are ready in advance, the three TM will be:
> TM1: 3 slot
> TM2: 2 slot
> TM3: 2 slot
>
> For application mode, the resource manager doesn't apply for
> TM in advance, and slots aren't enough before the third TM is ready.
> So all slots of the second TM will be used up. The three TM will be:
> TM1: 3 slot
> TM2: 3 slot
> TM3: 1 slot
>
> That's why the FLIP add some notes:
>
> All free slots are in the last TM, because ResourceManager doesn’t have the 
> waiting mechanism, and it just requests 7 slots for this JobMaster.
> Why is it acceptable?
>
> If we just add the waiting mechanism to JobMaster but not in ResourceManager, 
> all free slots will be in the last TM. All slots of other TMs are offered to 
> JM.
> That is, only one TM may have fewer tasks than the other TMs. The difference 
> between the number of tasks of other TMs is at most 1.So When p >> 
> slotsPerTM, the problem can be ignored.
> We can also suggest users, in cases that p is small, it's better to configure 
> slotsPerTM to 1, or let p % slotsPerTM == 0.
>
> Please correct me if my understanding is wrong, thanks~
>
> Best,
> Rui
>
> On Sun, Oct 1, 2023 at 7:38 PM Yangze Guo  wrote:
>>
>> Hi, Rui,
>>
>> 1. With the current mechanism, when physical slots are offered from
>> TM, the JobMaster will start deploying tasks and synchronizing their
>> states. With the addition of the waiting mechanism, IIUC, the
>> JobMaster will deploy and synchronize the states of all tasks only
>> after all resources are available. The task deployment and state
>> synchronization both occupy the JobMaster's RPC main thread. In
>> complex jobs with a lot of tasks, this waiting mechanism may increase
>> the pressure on the JobMaster and increase the end-to-end job
>> deployment time.
>>
>> 2. From my understanding, if user enable the
>> cluster.evenly-spread-out-slots,
>> LeastUtilizationResourceMatchingStrategy will be used to determine the
>> slot distribution and the slot allocation in the three TM will be
>> (taskmanager.numberOfTaskSlots=3):
>> TM1: 3 slot
>> TM2: 2 slot
>> TM3: 2 slot
>>
>> Best,
>> Yangze Guo
>>
>> On Sun, Oct 1, 2023 at 6:14 PM Rui Fan <1996fan...@gmail.com> wrote:
>> >
>> > Hi Shammon,
>> >
>> > Thanks for your feedback as well!
>> >
>> > > IIUC, the overall balance is divided into two parts: slot to TM and task
>> > to slot.
>> > > 1. Slot to TM is guaranteed by SlotManager in ResourceManager
>> > > 2. Task to slot is guaranteed by the slot pool in JM
>> > >
>> > > These two are completely independent, what are the benefits of unifying
>> > > these two into one option? Also, do we want to share the same
>> > > option between SlotPool in JM and SlotManager in RM? This sounds a bit
>> > > strange.
>> >
>> > Your understanding is totally right, the balance needs 2 parts: slot to TM
>> > and task to slot.
>> >
>> > As I understand, the following are benefits of unifying them into one
>> > option:
>> >
>> > - Flink users don't care about these principles inside of flink, they don't
>> > know these 2 parts.
>> > - If flink provides 2 options, flink users need to set 2 options for their
>> > job.
>> > - If one option is missed, the final result may not be good. (Users may
>> > have questions when using)
>> > - If flink just provides 1 option, enabling one option is enough. (Reduce
>> > the probability of misconfiguration)
>> >
>> > Also, Flink’s options are user-oriented. Each option represents a switch or
>> > parameter of a feature.
>> > A feature may be composed of multiple components inside Flink.
>> > It might be better to keep only one switch per feature.
>> >
>> > Actually, the cluster.evenly-spread-out-slots option is used between
>> > SlotPool in JM and SlotManager in RM. 2 components to ensure
>> > this feature works well.
>> >
>> > Please correct me if my understanding is wrong,
>> > and looking forward to your feedback, thanks!
>> >
>> > Best,
>> > Rui
>> >
>> > On Sun, Oct 1, 2023 at 5:52 PM Rui Fan <1996fan...@gmail.com> wrote:
>> >
>> > > Hi Yangze,
>> > >
>> > > Thanks for your feedback!
>> > >
>> > > > 1. Is it possible for the SlotPool to ge

Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread xiangyu feng
Thanks for initiating this discussion. Within the development towards
Streaming Warehousing, SQL Gateway will become more and more important.
Big +1 to specify Java Options separately for SQL Gateway.

Regards,
Xiangyu

Yangze Guo  于2023年10月7日周六 15:24写道:

> Hi, there,
>
> We'd like to start a discussion thread on "FLIP-374: Adding a separate
> configuration for specifying Java Options of the SQL Gateway"[1],
> where we propose adding a separate configuration option to specify the
> Java options for the SQL Gateway. This would allow users to fine-tune
> the memory settings, garbage collection behavior, and other relevant
> Java parameters specific to the SQL Gateway, ensuring optimal
> performance and stability in production environments.
>
> Looking forward to your feedback.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
>
> Best,
> Yangze Guo
>


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-07 Thread Rui Fan
Hi Yangze,

Thanks for your quick response!

Sorry, I re-read the 2.2.2 part[1] about the Waiting Mechanism, I found
it isn't clear. The root cause of introducing the waiting mechanism is
that the slot requests are sent from JobMaster to SlotPool is
one by one instead of one whole batch. I have rewritten the 2.2.2 part,
please read it again in your free time.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-2.2.2Waitingmechanism

Best,
Rui

On Sat, Oct 7, 2023 at 4:34 PM Yangze Guo  wrote:

> Thanks for the clarification, Rui.
>
> I believe the root cause of this issue is that in the current
> DefaultResourceAllocationStrategy, slot allocation begins before the
> decision to PendingTaskManagers requesting is made. That can be fixed
> within the strategy without introducing another waiting mechanism. I
> think it would be better to address this issue within the scope of
> this FLIP. However, I don't have a strong opinion on it, it depends on
> your bandwidth.
>
>
> Best,
> Yangze Guo
>
> On Sat, Oct 7, 2023 at 4:16 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > Hi Yangze,
> >
> > > 2. From my understanding, if user enable the
> > > cluster.evenly-spread-out-slots,
> > > LeastUtilizationResourceMatchingStrategy will be used to determine the
> > > slot distribution and the slot allocation in the three TM will be
> > > (taskmanager.numberOfTaskSlots=3):
> > > TM1: 3 slot
> > > TM2: 2 slot
> > > TM3: 2 slot
> >
> > When all tms are ready in advance, the three TM will be:
> > TM1: 3 slot
> > TM2: 2 slot
> > TM3: 2 slot
> >
> > For application mode, the resource manager doesn't apply for
> > TM in advance, and slots aren't enough before the third TM is ready.
> > So all slots of the second TM will be used up. The three TM will be:
> > TM1: 3 slot
> > TM2: 3 slot
> > TM3: 1 slot
> >
> > That's why the FLIP add some notes:
> >
> > All free slots are in the last TM, because ResourceManager doesn’t have
> the waiting mechanism, and it just requests 7 slots for this JobMaster.
> > Why is it acceptable?
> >
> > If we just add the waiting mechanism to JobMaster but not in
> ResourceManager, all free slots will be in the last TM. All slots of other
> TMs are offered to JM.
> > That is, only one TM may have fewer tasks than the other TMs. The
> difference between the number of tasks of other TMs is at most 1.So When p
> >> slotsPerTM, the problem can be ignored.
> > We can also suggest users, in cases that p is small, it's better to
> configure slotsPerTM to 1, or let p % slotsPerTM == 0.
> >
> > Please correct me if my understanding is wrong, thanks~
> >
> > Best,
> > Rui
> >
> > On Sun, Oct 1, 2023 at 7:38 PM Yangze Guo  wrote:
> >>
> >> Hi, Rui,
> >>
> >> 1. With the current mechanism, when physical slots are offered from
> >> TM, the JobMaster will start deploying tasks and synchronizing their
> >> states. With the addition of the waiting mechanism, IIUC, the
> >> JobMaster will deploy and synchronize the states of all tasks only
> >> after all resources are available. The task deployment and state
> >> synchronization both occupy the JobMaster's RPC main thread. In
> >> complex jobs with a lot of tasks, this waiting mechanism may increase
> >> the pressure on the JobMaster and increase the end-to-end job
> >> deployment time.
> >>
> >> 2. From my understanding, if user enable the
> >> cluster.evenly-spread-out-slots,
> >> LeastUtilizationResourceMatchingStrategy will be used to determine the
> >> slot distribution and the slot allocation in the three TM will be
> >> (taskmanager.numberOfTaskSlots=3):
> >> TM1: 3 slot
> >> TM2: 2 slot
> >> TM3: 2 slot
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Sun, Oct 1, 2023 at 6:14 PM Rui Fan <1996fan...@gmail.com> wrote:
> >> >
> >> > Hi Shammon,
> >> >
> >> > Thanks for your feedback as well!
> >> >
> >> > > IIUC, the overall balance is divided into two parts: slot to TM and
> task
> >> > to slot.
> >> > > 1. Slot to TM is guaranteed by SlotManager in ResourceManager
> >> > > 2. Task to slot is guaranteed by the slot pool in JM
> >> > >
> >> > > These two are completely independent, what are the benefits of
> unifying
> >> > > these two into one option? Also, do we want to share the same
> >> > > option between SlotPool in JM and SlotManager in RM? This sounds a
> bit
> >> > > strange.
> >> >
> >> > Your understanding is totally right, the balance needs 2 parts: slot
> to TM
> >> > and task to slot.
> >> >
> >> > As I understand, the following are benefits of unifying them into one
> >> > option:
> >> >
> >> > - Flink users don't care about these principles inside of flink, they
> don't
> >> > know these 2 parts.
> >> > - If flink provides 2 options, flink users need to set 2 options for
> their
> >> > job.
> >> > - If one option is missed, the final result may not be good. (Users
> may
> >> > have questions when using)
> >> > - If flink just provides 1 option, enabling one option is en

Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Zakelly Lan
It's quite intuitive to provide such a configuration for sql gateway.
Thanks Yangze for bringing this up and looking forward to it.


Best,
Zakelly




On Sat, Oct 7, 2023 at 4:35 PM xiangyu feng  wrote:
>
> Thanks for initiating this discussion. Within the development towards
> Streaming Warehousing, SQL Gateway will become more and more important.
> Big +1 to specify Java Options separately for SQL Gateway.
>
> Regards,
> Xiangyu
>
> Yangze Guo  于2023年10月7日周六 15:24写道:
>
> > Hi, there,
> >
> > We'd like to start a discussion thread on "FLIP-374: Adding a separate
> > configuration for specifying Java Options of the SQL Gateway"[1],
> > where we propose adding a separate configuration option to specify the
> > Java options for the SQL Gateway. This would allow users to fine-tune
> > the memory settings, garbage collection behavior, and other relevant
> > Java parameters specific to the SQL Gateway, ensuring optimal
> > performance and stability in production environments.
> >
> > Looking forward to your feedback.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> >
> > Best,
> > Yangze Guo
> >


Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Rui Fan
Thanks to Yangze driving this proposal.

`env.java.opts.xxx` is already supported for client, historyserver,
jobmanager and taskmanager. And it's very useful for troubleshooting.
So +1 for `env.java.opts.sql-gateway`.

I have a minor question: doesn't the `env.java.opts.all` support
sql-gateway?
If yes, it's fine. If no, it's better to consider it to be the subtask of
this FLIP.

Best,
Rui


On Sat, Oct 7, 2023 at 4:35 PM xiangyu feng  wrote:

> Thanks for initiating this discussion. Within the development towards
> Streaming Warehousing, SQL Gateway will become more and more important.
> Big +1 to specify Java Options separately for SQL Gateway.
>
> Regards,
> Xiangyu
>
> Yangze Guo  于2023年10月7日周六 15:24写道:
>
> > Hi, there,
> >
> > We'd like to start a discussion thread on "FLIP-374: Adding a separate
> > configuration for specifying Java Options of the SQL Gateway"[1],
> > where we propose adding a separate configuration option to specify the
> > Java options for the SQL Gateway. This would allow users to fine-tune
> > the memory settings, garbage collection behavior, and other relevant
> > Java parameters specific to the SQL Gateway, ensuring optimal
> > performance and stability in production environments.
> >
> > Looking forward to your feedback.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> >
> > Best,
> > Yangze Guo
> >
>


Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Benchao Li
Thanks Yangze for preparing this FLIP, it's good to have this ability
for gateway since we already have it for other JVM processes
(client/JM/TM) as Rui mentioned.

Rui Fan <1996fan...@gmail.com> 于2023年10月7日周六 18:02写道:
>
> Thanks to Yangze driving this proposal.
>
> `env.java.opts.xxx` is already supported for client, historyserver,
> jobmanager and taskmanager. And it's very useful for troubleshooting.
> So +1 for `env.java.opts.sql-gateway`.
>
> I have a minor question: doesn't the `env.java.opts.all` support
> sql-gateway?
> If yes, it's fine. If no, it's better to consider it to be the subtask of
> this FLIP.
>
> Best,
> Rui
>
>
> On Sat, Oct 7, 2023 at 4:35 PM xiangyu feng  wrote:
>
> > Thanks for initiating this discussion. Within the development towards
> > Streaming Warehousing, SQL Gateway will become more and more important.
> > Big +1 to specify Java Options separately for SQL Gateway.
> >
> > Regards,
> > Xiangyu
> >
> > Yangze Guo  于2023年10月7日周六 15:24写道:
> >
> > > Hi, there,
> > >
> > > We'd like to start a discussion thread on "FLIP-374: Adding a separate
> > > configuration for specifying Java Options of the SQL Gateway"[1],
> > > where we propose adding a separate configuration option to specify the
> > > Java options for the SQL Gateway. This would allow users to fine-tune
> > > the memory settings, garbage collection behavior, and other relevant
> > > Java parameters specific to the SQL Gateway, ensuring optimal
> > > performance and stability in production environments.
> > >
> > > Looking forward to your feedback.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> > >
> > > Best,
> > > Yangze Guo
> > >
> >



-- 

Best,
Benchao Li


Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Yangze Guo
Thanks for all your comments.

@Rui
Thanks for the reminder. The "env.java.opts.all" has already taken effect now.


Best,
Yangze Guo

On Sat, Oct 7, 2023 at 6:45 PM Benchao Li  wrote:
>
> Thanks Yangze for preparing this FLIP, it's good to have this ability
> for gateway since we already have it for other JVM processes
> (client/JM/TM) as Rui mentioned.
>
> Rui Fan <1996fan...@gmail.com> 于2023年10月7日周六 18:02写道:
> >
> > Thanks to Yangze driving this proposal.
> >
> > `env.java.opts.xxx` is already supported for client, historyserver,
> > jobmanager and taskmanager. And it's very useful for troubleshooting.
> > So +1 for `env.java.opts.sql-gateway`.
> >
> > I have a minor question: doesn't the `env.java.opts.all` support
> > sql-gateway?
> > If yes, it's fine. If no, it's better to consider it to be the subtask of
> > this FLIP.
> >
> > Best,
> > Rui
> >
> >
> > On Sat, Oct 7, 2023 at 4:35 PM xiangyu feng  wrote:
> >
> > > Thanks for initiating this discussion. Within the development towards
> > > Streaming Warehousing, SQL Gateway will become more and more important.
> > > Big +1 to specify Java Options separately for SQL Gateway.
> > >
> > > Regards,
> > > Xiangyu
> > >
> > > Yangze Guo  于2023年10月7日周六 15:24写道:
> > >
> > > > Hi, there,
> > > >
> > > > We'd like to start a discussion thread on "FLIP-374: Adding a separate
> > > > configuration for specifying Java Options of the SQL Gateway"[1],
> > > > where we propose adding a separate configuration option to specify the
> > > > Java options for the SQL Gateway. This would allow users to fine-tune
> > > > the memory settings, garbage collection behavior, and other relevant
> > > > Java parameters specific to the SQL Gateway, ensuring optimal
> > > > performance and stability in production environments.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > [1]
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > >
>
>
>
> --
>
> Best,
> Benchao Li


RE: [ANNOUNCE] Apache Flink Stateful Functions Release 3.3.0 released

2023-10-07 Thread frans.king
Hello Martijn,

I do not see the images on Dockerhub yet.  Is there an alternative source we 
can use in the meantime?

Thanks,

Frans

-Original Message-
From: Martijn Visser  
Sent: Tuesday, September 26, 2023 5:34 AM
To: dev@flink.apache.org
Subject: Re: [ANNOUNCE] Apache Flink Stateful Functions Release 3.3.0 released

Hi Frans,

Good remark, I still need to provide the images to those who have access to the 
Dockerhub, but I haven't been able to done that yet.
Hopefully I can do that at the end of the week.

Best regards,

Martijn

On Mon, Sep 25, 2023 at 2:04 PM  wrote:
>
> Hi Martijn.
>
> Thanks for this.   Should there also be docker images available?  
> https://hub.docker.com/r/apache/flink-statefun/tags goes up to 3.2.0.
>
> Frans
>
> -Original Message-
> From: Martijn Visser 
> Sent: Tuesday, September 19, 2023 11:37 AM
> To: dev@flink.apache.org; user ; user-zh 
> ; n...@flink.apache.org; annou...@apache.org
> Subject: [ANNOUNCE] Apache Flink Stateful Functions Release 3.3.0 
> released
>
> Stateful Functions is a cross-platform stack for building Stateful Serverless 
> applications, making it radically simpler to develop scalable, consistent, 
> and elastic distributed applications. This new release upgrades the Flink 
> runtime to 1.16.2.
>
> Release highlight:
> - Upgrade underlying Flink dependency to 1.16.2
>
> Release blogpost:
> https://flink.apache.org/2023/09/19/stateful-functions-3.3.0-release-a
> nnouncement/
>
> The release is available for download at:
> https://flink.apache.org/downloads/
>
> Java SDK can be found at:
> https://search.maven.org/artifact/org.apache.flink/statefun-sdk-java/3
> .3.0/jar
>
> Python SDK can be found at:
> https://pypi.org/project/apache-flink-statefun/
>
> GoLang SDK can be found at:
> https://github.com/apache/flink-statefun/tree/statefun-sdk-go/v3.3.0
>
> JavaScript SDK can be found at:
> https://www.npmjs.com/package/apache-flink-statefun
>
> Official Docker image for Flink Stateful Functions can be found at: 
> https://hub.docker.com/r/apache/flink-statefun
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315
> 522&version=12351276
>
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
>
> Regards,
> Martijn Visser
>
>



[jira] [Created] (FLINK-33204) Add description for missing options in the all jobmanager/taskmanager options section in document

2023-10-07 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-33204:
-

 Summary: Add description for missing options in the all 
jobmanager/taskmanager options section in document
 Key: FLINK-33204
 URL: https://issues.apache.org/jira/browse/FLINK-33204
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Configuration
Affects Versions: 1.17.0, 1.18.0
Reporter: Zhanghao Chen
 Fix For: 1.19.0


There are 4 options which are excluded from the all jobmanager/taskmanager 
options section in the configuration document:
 # taskmanager.bind-host
 # taskmanager.rpc.bind-port
 # jobmanager.bind-host
 # jobmanager.rpc.bind-port

We should add them to the document under the all  jobmanager/taskmanager 
options section for completeness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33205) Replace Akka with Pekko in the description of "pekko.ssl.enabled"

2023-10-07 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-33205:
-

 Summary: Replace Akka with Pekko in the description of 
"pekko.ssl.enabled"
 Key: FLINK-33205
 URL: https://issues.apache.org/jira/browse/FLINK-33205
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Configuration
Affects Versions: 1.18.0
Reporter: Zhanghao Chen
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-07 Thread Yangze Guo
Thanks for the updates, Rui.

It does seem challenging to ensure evenness in slot deployment unless
we introduce batch slot requests in SlotPool. However, one possibility
is to add a delay of around 50ms during the SlotPool's resource
requirement declaration to the ResourceManager, similar to the
checkResourceRequirementsWithDelay in the SlotManager. In most cases,
this delay would allow the SlotManager to see all resource
requirements, then it can allocate the slot more evenly. As a side
effect, it could also significantly reduce the number of RPC messages
to the ResourceManager, which could become a single-point bottleneck
in OLAP scenarios. WDYT?

Best,
Yangze Guo

On Sat, Oct 7, 2023 at 5:52 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> Hi Yangze,
>
> Thanks for your quick response!
>
> Sorry, I re-read the 2.2.2 part[1] about the Waiting Mechanism, I found
> it isn't clear. The root cause of introducing the waiting mechanism is
> that the slot requests are sent from JobMaster to SlotPool is
> one by one instead of one whole batch. I have rewritten the 2.2.2 part,
> please read it again in your free time.
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-2.2.2Waitingmechanism
>
> Best,
> Rui
>
> On Sat, Oct 7, 2023 at 4:34 PM Yangze Guo  wrote:
>>
>> Thanks for the clarification, Rui.
>>
>> I believe the root cause of this issue is that in the current
>> DefaultResourceAllocationStrategy, slot allocation begins before the
>> decision to PendingTaskManagers requesting is made. That can be fixed
>> within the strategy without introducing another waiting mechanism. I
>> think it would be better to address this issue within the scope of
>> this FLIP. However, I don't have a strong opinion on it, it depends on
>> your bandwidth.
>>
>>
>> Best,
>> Yangze Guo
>>
>> On Sat, Oct 7, 2023 at 4:16 PM Rui Fan <1996fan...@gmail.com> wrote:
>> >
>> > Hi Yangze,
>> >
>> > > 2. From my understanding, if user enable the
>> > > cluster.evenly-spread-out-slots,
>> > > LeastUtilizationResourceMatchingStrategy will be used to determine the
>> > > slot distribution and the slot allocation in the three TM will be
>> > > (taskmanager.numberOfTaskSlots=3):
>> > > TM1: 3 slot
>> > > TM2: 2 slot
>> > > TM3: 2 slot
>> >
>> > When all tms are ready in advance, the three TM will be:
>> > TM1: 3 slot
>> > TM2: 2 slot
>> > TM3: 2 slot
>> >
>> > For application mode, the resource manager doesn't apply for
>> > TM in advance, and slots aren't enough before the third TM is ready.
>> > So all slots of the second TM will be used up. The three TM will be:
>> > TM1: 3 slot
>> > TM2: 3 slot
>> > TM3: 1 slot
>> >
>> > That's why the FLIP add some notes:
>> >
>> > All free slots are in the last TM, because ResourceManager doesn’t have 
>> > the waiting mechanism, and it just requests 7 slots for this JobMaster.
>> > Why is it acceptable?
>> >
>> > If we just add the waiting mechanism to JobMaster but not in 
>> > ResourceManager, all free slots will be in the last TM. All slots of other 
>> > TMs are offered to JM.
>> > That is, only one TM may have fewer tasks than the other TMs. The 
>> > difference between the number of tasks of other TMs is at most 1.So When p 
>> > >> slotsPerTM, the problem can be ignored.
>> > We can also suggest users, in cases that p is small, it's better to 
>> > configure slotsPerTM to 1, or let p % slotsPerTM == 0.
>> >
>> > Please correct me if my understanding is wrong, thanks~
>> >
>> > Best,
>> > Rui
>> >
>> > On Sun, Oct 1, 2023 at 7:38 PM Yangze Guo  wrote:
>> >>
>> >> Hi, Rui,
>> >>
>> >> 1. With the current mechanism, when physical slots are offered from
>> >> TM, the JobMaster will start deploying tasks and synchronizing their
>> >> states. With the addition of the waiting mechanism, IIUC, the
>> >> JobMaster will deploy and synchronize the states of all tasks only
>> >> after all resources are available. The task deployment and state
>> >> synchronization both occupy the JobMaster's RPC main thread. In
>> >> complex jobs with a lot of tasks, this waiting mechanism may increase
>> >> the pressure on the JobMaster and increase the end-to-end job
>> >> deployment time.
>> >>
>> >> 2. From my understanding, if user enable the
>> >> cluster.evenly-spread-out-slots,
>> >> LeastUtilizationResourceMatchingStrategy will be used to determine the
>> >> slot distribution and the slot allocation in the three TM will be
>> >> (taskmanager.numberOfTaskSlots=3):
>> >> TM1: 3 slot
>> >> TM2: 2 slot
>> >> TM3: 2 slot
>> >>
>> >> Best,
>> >> Yangze Guo
>> >>
>> >> On Sun, Oct 1, 2023 at 6:14 PM Rui Fan <1996fan...@gmail.com> wrote:
>> >> >
>> >> > Hi Shammon,
>> >> >
>> >> > Thanks for your feedback as well!
>> >> >
>> >> > > IIUC, the overall balance is divided into two parts: slot to TM and 
>> >> > > task
>> >> > to slot.
>> >> > > 1. Slot to TM is guaranteed by SlotManager in ResourceManager
>> >> > > 2. Task to slot is guarant