回复: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2023-09-27 Thread Chen Zhanghao
+1 (non-binding), thanks for driving this.

Best,
Zhanghao Chen

发件人: Shammon FY 
发送时间: 2023年9月25日 13:28
收件人: dev 
主题: [VOTE] FLIP-314: Support Customized Job Lineage Listener

Hi devs,

Thanks for all the feedback on FLIP-314: Support Customized Job Lineage
Listener [1] in thread [2].

I would like to start a vote for it. The vote will be opened for at least
72 hours unless there is an objection or insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
[2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc

Best,
Shammon FY


Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2023-09-27 Thread 曹帝胄
+1 (binding)

Best,
Dizhou Cao


[jira] [Created] (FLINK-33166) Support setting root logger level by config

2023-09-27 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-33166:
-

 Summary: Support setting root logger level by config
 Key: FLINK-33166
 URL: https://issues.apache.org/jira/browse/FLINK-33166
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Zhanghao Chen


Users currently cannot change logging level by config and have to modify the 
cumbersome logger configuration file manually. We'd better provide a shortcut 
and support setting root logger level by config.

There're a number configs already to set logger configurations, like 
{{env.log.dir}} for logging dir, {{env.log.max}} for max number of old logging 
file to save. We can name the new config {{{}env.log.level{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: access to the design doc of FLINK-12477

2023-09-27 Thread Yun Tang
Hi Shuyi,

Thank you for your interest in the previous design. Already CC'ed Jing and 
Stefan, who might have such doc access.


Best
Yun Tang

From: Shuyi Chen 
Sent: Thursday, September 28, 2023 13:37
To: dev 
Subject: access to the design doc of FLINK-12477

Hi Devs, can anyone grant access to the design doc of FLINK-12477? Thanks a
lot.

https://docs.google.com/document/d/1eDpsUKv2FqwZiS1Pm6gYO5eFHScBHfULKmH1-ZEWB4g

Shuyi


access to the design doc of FLINK-12477

2023-09-27 Thread Shuyi Chen
Hi Devs, can anyone grant access to the design doc of FLINK-12477? Thanks a
lot.

https://docs.google.com/document/d/1eDpsUKv2FqwZiS1Pm6gYO5eFHScBHfULKmH1-ZEWB4g

Shuyi


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-09-27 Thread Shammon FY
Thanks Yuepeng for initiating this discussion.

+1 in general too, in fact we have implemented a similar mechanism
internally to ensure a balanced allocation of tasks to slots, it works well.

Some comments about the mechanism

1. This mechanism will be only supported in `SlotPool` or both `SlotPool`
and `DeclarativeSlotPool`? Currently the two slot pools are used in
different schedulers. I think this will also bring value to
`DeclarativeSlotPool`, but currently FLIP content seems to be based on
`SlotPool`, right?

2. In fine-grained resource management, we can set different resource
requirements for different nodes, which means that the resources of each
slot are different. What should be done when the slot selected by the
round-robin strategy cannot meet the resource requirements? Will this lead
to the failure of the balance strategy?

3. Is the assignment of tasks to slots balanced based on region or job
level? When multiple TMs fail over, will it cause the balancing strategy to
fail or even worse? What is the current processing strategy?

For Zhuzhu and Rui:

IIUC, the overall balance is divided into two parts: slot to TM and task to
slot.
1. Slot to TM is guaranteed by SlotManager in ResourceManager
2. Task to slot is guaranteed by the slot pool in JM

These two are completely independent, what are the benefits of unifying
these two into one option? Also, do we want to share the same
option between SlotPool in JM and SlotManager in RM? This sounds a bit
strange.

Best,
Shammon FY



On Thu, Sep 28, 2023 at 12:08 PM Rui Fan <1996fan...@gmail.com> wrote:

> Hi Zhu Zhu,
>
> Thanks for your feedback here!
>
> You are right, user needs to set 2 options:
> - cluster.evenly-spread-out-slots=true
> - slot.sharing-strategy=TASK_BALANCED_PREFERRED
>
> Update it to one option is useful at user side, so
> `taskmanager.load-balance.mode` sounds good to me.
> I want to check some points and behaviors about this option:
>
> 1. The default value is None, right?
> 2. When it's set to Tasks, how to assign slots to TM?
> - Option1: It's just check task number
> - Option2: It''s check the slot number first, then check the
> task number when the slot number is the same.
>
> Giving an example to explain what's the difference between them:
>
> - A session cluster has 2 flink jobs, they are jobA and jobB
> - Each TM has 4 slots.
> - The task number of one slot of jobA is 3
> - The task number of one slot of jobB is 1
> - We have 2 TaskManagers:
>   - tm1 runs 3 slots of jobB, so tm1 runs 3 tasks
>   - tm2 runs 1 slot of jobA, and 1 slot of jobB, so tm2 runs 4 tasks.
>
> Now, we need to run a new slot, which tm should offer it?
> - Option1: If we just check the task number, the tm1 is better.
> - Option2: If we check the slot number first, and then check task, the tm2
> is better
>
> The original FLIP selected option2, that's why we didn't add the
> third option. The option2 didn't break the semantics when
> `cluster.evenly-spread-out-slots` is true, and it just improve the
> behavior without the semantics is changed.
>
> In the other hands, if we choose option2, when user set
> `taskmanager.load-balance.mode` is Tasks. It also can achieve
> the goal when it's Slots.
>
> So I think the `Slots` enum isn't needed if we choose option2.
> Of course, If we choose the option1, the enum is needed.
>
> Looking forward to your feedback, thanks~
>
> Best,
> Rui
>
> On Wed, Sep 27, 2023 at 9:11 PM Zhu Zhu  wrote:
>
> > Thanks Yuepeng and Rui for creating this FLIP.
> >
> > +1 in general
> > The idea is straight forward: best-effort gather all the slot requests
> > and offered slots to form an overview before assigning slots, trying to
> > balance the loads of task managers when assigning slots.
> >
> > I have one comment regarding the configuration for ease of use:
> >
> > IIUC, this FLIP uses an existing config 'cluster.evenly-spread-out-slots'
> > as the main switch of the new feature. That is, from user perspective,
> > with this improvement, the 'cluster.evenly-spread-out-slots' feature not
> > only balances the number of slots on task managers, but also balances the
> > number of tasks. This is a behavior change anyway. Besides that, it also
> > requires users to set 'slot.sharing-strategy' to
> 'TASK_BALANCED_PREFERRED'
> > to balance the tasks in each slot.
> >
> > I think we can introduce a new config option
> > `taskmanager.load-balance.mode`,
> > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> > can be superseded by the "Slots" mode and get deprecated. In the future
> > it can support more mode, e.g. "CpuCores", to work better for jobs with
> > fine-grained resources. The proposed config option
> > `slot.request.max-interval`
> > then can be renamed to
> > `taskmanager.load-balance.request-stablizing-timeout`
> > to show its relation with the feature. The proposed
> `slot.sharing-strategy`
> > is not needed, because the configured "Tasks" mode will do the work.
> >
> > WDYT?
> >
> > Thanks,
> > Zhu Zh

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-09-27 Thread Rui Fan
Hi Zhu Zhu,

Thanks for your feedback here!

You are right, user needs to set 2 options:
- cluster.evenly-spread-out-slots=true
- slot.sharing-strategy=TASK_BALANCED_PREFERRED

Update it to one option is useful at user side, so
`taskmanager.load-balance.mode` sounds good to me.
I want to check some points and behaviors about this option:

1. The default value is None, right?
2. When it's set to Tasks, how to assign slots to TM?
- Option1: It's just check task number
- Option2: It''s check the slot number first, then check the
task number when the slot number is the same.

Giving an example to explain what's the difference between them:

- A session cluster has 2 flink jobs, they are jobA and jobB
- Each TM has 4 slots.
- The task number of one slot of jobA is 3
- The task number of one slot of jobB is 1
- We have 2 TaskManagers:
  - tm1 runs 3 slots of jobB, so tm1 runs 3 tasks
  - tm2 runs 1 slot of jobA, and 1 slot of jobB, so tm2 runs 4 tasks.

Now, we need to run a new slot, which tm should offer it?
- Option1: If we just check the task number, the tm1 is better.
- Option2: If we check the slot number first, and then check task, the tm2
is better

The original FLIP selected option2, that's why we didn't add the
third option. The option2 didn't break the semantics when
`cluster.evenly-spread-out-slots` is true, and it just improve the
behavior without the semantics is changed.

In the other hands, if we choose option2, when user set
`taskmanager.load-balance.mode` is Tasks. It also can achieve
the goal when it's Slots.

So I think the `Slots` enum isn't needed if we choose option2.
Of course, If we choose the option1, the enum is needed.

Looking forward to your feedback, thanks~

Best,
Rui

On Wed, Sep 27, 2023 at 9:11 PM Zhu Zhu  wrote:

> Thanks Yuepeng and Rui for creating this FLIP.
>
> +1 in general
> The idea is straight forward: best-effort gather all the slot requests
> and offered slots to form an overview before assigning slots, trying to
> balance the loads of task managers when assigning slots.
>
> I have one comment regarding the configuration for ease of use:
>
> IIUC, this FLIP uses an existing config 'cluster.evenly-spread-out-slots'
> as the main switch of the new feature. That is, from user perspective,
> with this improvement, the 'cluster.evenly-spread-out-slots' feature not
> only balances the number of slots on task managers, but also balances the
> number of tasks. This is a behavior change anyway. Besides that, it also
> requires users to set 'slot.sharing-strategy' to 'TASK_BALANCED_PREFERRED'
> to balance the tasks in each slot.
>
> I think we can introduce a new config option
> `taskmanager.load-balance.mode`,
> which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> can be superseded by the "Slots" mode and get deprecated. In the future
> it can support more mode, e.g. "CpuCores", to work better for jobs with
> fine-grained resources. The proposed config option
> `slot.request.max-interval`
> then can be renamed to
> `taskmanager.load-balance.request-stablizing-timeout`
> to show its relation with the feature. The proposed `slot.sharing-strategy`
> is not needed, because the configured "Tasks" mode will do the work.
>
> WDYT?
>
> Thanks,
> Zhu Zhu
>
> Yuepeng Pan  于2023年9月25日周一 16:26写道:
>
>> Hi all,
>>
>>
>> I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks
>> scheduling.
>>
>>
>> The current strategy of Flink to deploy tasks sometimes leads some
>> TMs(TaskManagers) to have more tasks while others have fewer tasks,
>> resulting in excessive resource utilization at some TMs that contain more
>> tasks and becoming a bottleneck for the entire job processing. Developing
>> strategies to achieve task load balancing for TMs and reducing job
>> bottlenecks becomes very meaningful.
>>
>>
>> The raw design and discussions could be found in the Flink JIRA[2] and
>> Google doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some
>> valuable help and suggestions in advance.
>>
>>
>> Please refer to the FLIP[1] document for more details about the proposed
>> design and implementation. We welcome any feedback and opinions on this
>> proposal.
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
>>
>> [2] https://issues.apache.org/jira/browse/FLINK-31757
>>
>> [3]
>> https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8
>>
>>
>> Best,
>>
>> Yuepeng Pan
>>
>


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-09-27 Thread Yangze Guo
Thanks for driving this FLIP, Yuepeng Pan. +1 for the overall proposal
to support balanced scheduling.

Some questions on the Waiting mechanism and Allocation strategy for slot to TM:

1. Is it possible for the SlotPool to get the slot allocation results
from the SlotManager in advance instead of waiting for the actual
physical slots to be registered, and perform pre-allocation? The
benefit of doing this is to make the task deployment process smoother,
especially when there are a large number of tasks in the job.

2. If user enable the cluster.evenly-spread-out-slots, the issue in
example 2 of section 2.2.3 can be resolved. Do I understand it
correctly?




Best,
Yangze Guo

On Wed, Sep 27, 2023 at 9:12 PM Zhu Zhu  wrote:
>
> Thanks Yuepeng and Rui for creating this FLIP.
>
> +1 in general
> The idea is straight forward: best-effort gather all the slot requests
> and offered slots to form an overview before assigning slots, trying to
> balance the loads of task managers when assigning slots.
>
> I have one comment regarding the configuration for ease of use:
>
> IIUC, this FLIP uses an existing config 'cluster.evenly-spread-out-slots'
> as the main switch of the new feature. That is, from user perspective,
> with this improvement, the 'cluster.evenly-spread-out-slots' feature not
> only balances the number of slots on task managers, but also balances the
> number of tasks. This is a behavior change anyway. Besides that, it also
> requires users to set 'slot.sharing-strategy' to 'TASK_BALANCED_PREFERRED'
> to balance the tasks in each slot.
>
> I think we can introduce a new config option
> `taskmanager.load-balance.mode`,
> which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> can be superseded by the "Slots" mode and get deprecated. In the future
> it can support more mode, e.g. "CpuCores", to work better for jobs with
> fine-grained resources. The proposed config option
> `slot.request.max-interval`
> then can be renamed to `taskmanager.load-balance.request-stablizing-timeout`
> to show its relation with the feature. The proposed `slot.sharing-strategy`
> is not needed, because the configured "Tasks" mode will do the work.
>
> WDYT?
>
> Thanks,
> Zhu Zhu
>
> Yuepeng Pan  于2023年9月25日周一 16:26写道:
>
> > Hi all,
> >
> >
> > I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks
> > scheduling.
> >
> >
> > The current strategy of Flink to deploy tasks sometimes leads some
> > TMs(TaskManagers) to have more tasks while others have fewer tasks,
> > resulting in excessive resource utilization at some TMs that contain more
> > tasks and becoming a bottleneck for the entire job processing. Developing
> > strategies to achieve task load balancing for TMs and reducing job
> > bottlenecks becomes very meaningful.
> >
> >
> > The raw design and discussions could be found in the Flink JIRA[2] and
> > Google doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some
> > valuable help and suggestions in advance.
> >
> >
> > Please refer to the FLIP[1] document for more details about the proposed
> > design and implementation. We welcome any feedback and opinions on this
> > proposal.
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
> >
> > [2] https://issues.apache.org/jira/browse/FLINK-31757
> >
> > [3]
> > https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8
> >
> >
> > Best,
> >
> > Yuepeng Pan
> >


Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2023-09-27 Thread Yun Tang
+1 (binding)

Best,
Yun Tang

From: Yangze Guo 
Sent: Thursday, September 28, 2023 9:20
To: dev@flink.apache.org 
Subject: Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener

+1 (binding)

Best,
Yangze Guo

On Tue, Sep 26, 2023 at 11:05 AM Leonard Xu  wrote:
>
> +1(binding)
>
>
> Best,
> Leonard
>
> > On Sep 26, 2023, at 9:59 AM, Feng Jin  wrote:
> >
> > +1(no-binding)
> >
> >
> > thanks for driving this proposal
> >
> >
> > Best,
> > Feng
> >
> > On Mon, Sep 25, 2023 at 11:19 PM Jing Ge  wrote:
> >
> >> +1(binding) Thanks!
> >>
> >> Best Regards,
> >> Jing
> >>
> >> On Sun, Sep 24, 2023 at 10:30 PM Shammon FY  wrote:
> >>
> >>> Hi devs,
> >>>
> >>> Thanks for all the feedback on FLIP-314: Support Customized Job Lineage
> >>> Listener [1] in thread [2].
> >>>
> >>> I would like to start a vote for it. The vote will be opened for at least
> >>> 72 hours unless there is an objection or insufficient votes.
> >>>
> >>> [1]
> >>>
> >>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> >>> [2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc
> >>>
> >>> Best,
> >>> Shammon FY
> >>>
> >>
>


Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2023-09-27 Thread Yangze Guo
+1 (binding)

Best,
Yangze Guo

On Tue, Sep 26, 2023 at 11:05 AM Leonard Xu  wrote:
>
> +1(binding)
>
>
> Best,
> Leonard
>
> > On Sep 26, 2023, at 9:59 AM, Feng Jin  wrote:
> >
> > +1(no-binding)
> >
> >
> > thanks for driving this proposal
> >
> >
> > Best,
> > Feng
> >
> > On Mon, Sep 25, 2023 at 11:19 PM Jing Ge  wrote:
> >
> >> +1(binding) Thanks!
> >>
> >> Best Regards,
> >> Jing
> >>
> >> On Sun, Sep 24, 2023 at 10:30 PM Shammon FY  wrote:
> >>
> >>> Hi devs,
> >>>
> >>> Thanks for all the feedback on FLIP-314: Support Customized Job Lineage
> >>> Listener [1] in thread [2].
> >>>
> >>> I would like to start a vote for it. The vote will be opened for at least
> >>> 72 hours unless there is an objection or insufficient votes.
> >>>
> >>> [1]
> >>>
> >>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> >>> [2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc
> >>>
> >>> Best,
> >>> Shammon FY
> >>>
> >>
>


Re: [DISCUSS] Implementing SQL remote functions

2023-09-27 Thread Krzysztof Zarzycki
Hello Alan,
At my company we implemented an open source Flink HTTP connector, that you
might find interesting. It can be represented as a source table as well
and be used in lookups. Here is the link
https://github.com/getindata/flink-http-connector

Best
Krzysztof


On Thu, Sep 21, 2023 at 7:34 AM Feng Jin  wrote:

> Hi Alan
>
> I believe that supporting asynchronous UDF is a valuable
> feature. Currently, there is a similar FLIP[1] available:
> Can this meet your needs?
>
> [1].
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction
>
>
> Best,
> Feng
>
> On Thu, Sep 21, 2023 at 1:12 PM Alan Sheinberg
>  wrote:
>
> > Hi Ron,
> >
> > Thanks for your response.  I've answered some of your questions below.
> >
> > I think one solution is to support Mini-Batch Lookup Join by the
> framework
> > > layer, do a RPC call by a batch input row, which can improve
> throughput.
> >
> >
> > Would the idea be to collect a batch and then do a single RPC (or at
> least
> > handle a number of rpcs in a single AsyncLookupFunction call)?  That is
> an
> > interesting idea and could simplify things.  For our use cases,
> > technically, I can write a AsyncLookupFunction and utilize
> > AsyncWaitOperator using unbatched RPCs and do a Lookup Join without any
> > issue. My hesitation is that I'm afraid that callers will find it
> > unintuitive to join with a table where the underlying RPC is not being
> > modeled in that manner.  For example, it could be a text classifier
> > IS_POSITIVE_SENTIMENT(...) where there's no backing table, just
> > computation.
> >
> > how does the remote function help to solve your problem?
> >
> >
> > The problem is pretty open-ended.  There are jobs where you want to join
> > data with some external data source and inject it into your pipeline, but
> > others might also be offloading some computation to an external system.
> > The external system might be owned by a different party, have different
> > permissions, have different hardware to do a computation (e.g. train a
> > model), or just block for a while.  The most intuitive invocation for
> this
> > is just a scalar function in SQL.  You just want it to be able to run at
> a
> > high throughput.
> >
> > Regarding implementing the Remote Function, can you go into more detail
> > > about your idea, how we should support it, and how users should use it,
> > > from API design to semantic explanation?and
> >
> >
> > The unimplemented option I gave the most thought to is 3).  You can
> imagine
> > an AsyncScalarFunction definition and example class like:
> >
> > public class AsyncScalarFunction extends UserDefinedFunction {
> >   @Override public final FunctionKind getKind() {
> > return FunctionKind.ASYNC_SCALAR;
> >   }
> >   @Override public TypeInference getTypeInference(DataTypeFactory
> > typeFactory) {
> >return TypeInferenceExtractor.forAsyncScalarFunction(typeFactory,
> > getClass());
> >   }
> > }
> >
> > class MyScalarFunction extends AsyncScalarFunction {
> >   // Eval method with a future to use as a callback, with arbitrary
> > additional arguments
> >   public void eval(CompletableFuture result, String input) {
> > // Example which uses an async http client
> > AsyncHttpClient httpClient = new AsyncHttpClient();
> > // Do the request and then invoke the callback depending on the
> > outcome.
> > Future responseFuture =
> httpClient.doPOST(getRequestBody(
> > input));
> > responseFuture.handle((response, throwable) -> {
> > if (throwable != null) {
> >   result.completeExceptionally(throwable);
> > } else {
> >   result.complete(response.getBody());
> > }
> >});
> >  }
> >  ...
> > }
> >
> > Then you can register it in your Flink program as with other UDFs and
> call
> > it:
> > tEnv.createTemporarySystemFunction("MY_FUNCTION",
> MyScalarFunction.class);
> > TableResult result = tEnv.executeSql("SELECT MY_FUNCTION(input) FROM
> > (SELECT i.input from Inputs i ORDER BY i.timestamp)");
> >
> > I know there are questions about SQL semantics to consider.  For example,
> > does invocation of MY_FUNCTION preserve the order of the subquery above.
> > To be SQL compliant, I believe it must, so any async request we send out
> > must be output in order, regardless of when they complete.  There are
> > probably other considerations as well.   This for example is implemented
> as
> > an option already in AsyncWaitOperator.
> >
> > I could do a similar dive into option 2) if that would also be helpful,
> > though maybe this is a good starting point for conversation.
> >
> > Hope that addressed your questions,
> > Alan
> >
> > On Mon, Sep 18, 2023 at 6:51 PM liu ron  wrote:
> >
> > > Hi, Alan
> > >
> > > Thanks for driving this proposal. It sounds interesting.
> > > Regarding implementing the Remote Function, can you go into more detail
> > > about your idea, how we should support it, and how users should use it,
> > > from API design to sem

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-27 Thread Bonnie Arogyam Varghese
Hi Martjin,
Yes, the suggestion for the configuration name made by Jark sounds good.

Also, thanks to everyone who participated in this discussion.

On Tue, Sep 19, 2023 at 2:40 PM Martijn Visser 
wrote:

> Hey Jark,
>
> Sounds fine to me.
> @Bonnie does that also work for you?
>
> Best regards,
>
> Martijn
>
> On Fri, Sep 15, 2023 at 1:28 PM Jark Wu  wrote:
> >
> > Hi Martijn,
> >
> > Thanks for the investigation. I found the blog[1] shows a use case
> > that they use "OPTIMIZER_IGNORE_HINTS" to check if embedded
> > hints can be removed in legacy code. I think this is a useful tool to
> > improve queries without complex hints strewn throughout the code.
> > Therefore, I'm fine to support this now.
> >
> > Maybe we can follow Oracle to name the config
> > "table.optimizer.ignore-query-hints=false"?
> >
> > Best,
> > Jark
> >
> > [1]: https://dbaora.com/optimizer_ignore_hints-oracle-database-18c/
> >
> > On Fri, 15 Sept 2023 at 17:57, Martijn Visser 
> > wrote:
> >
> > > Hi Jark,
> > >
> > > Oracle has/had a generic "OPTIMIZER_IGNORE_HINTS" [1]. It looks like
> MSSQL
> > > has configuration options to disable specific hints [2] instead of a
> > > generic solution.
> > >
> > > [1]
> > >
> > >
> https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/OPTIMIZER_IGNORE_HINTS.html#GUID-D62CA6D8-D0D8-4A20-93EA-EEB4B3144347
> > > [2]
> > >
> > >
> https://www.mssqltips.com/sqlservertip/4175/disabling-sql-server-optimizer-rules-with-queryruleoff/
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Fri, Sep 15, 2023 at 10:53 AM Jark Wu  wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > I can understand that.
> > > > Is there any database/system that supports disabling/enabling query
> hints
> > > >  with a configuration? This might help us to understand the use case,
> > > > and follow the approach.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 15 Sept 2023 at 15:34, Martijn Visser <
> martijnvis...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I think Jark has a valid point with:
> > > > >
> > > > > > Does this mean that in the future we might add an option to
> disable
> > > > each
> > > > > feature?
> > > > >
> > > > > I don't think that's a reasonable outcome indeed, but we are
> currently
> > > > in a
> > > > > situation where we don't have clear guidelines on when to add a
> > > > > configuration option, and when not to add one. From a platform
> > > > perspective,
> > > > > there might not be an imminent or obvious security implication,
> but you
> > > > > want to minimize a potential attack surface.
> > > > >
> > > > > > We should try to remove the unnecessary configuration from the
> list
> > > in
> > > > > Flink 2.0.
> > > > >
> > > > > I agree with that too.
> > > > >
> > > > > With these things in mind, my proposal would be the following:
> > > > >
> > > > > * Add a configuration option for this situation, given that we
> don't
> > > have
> > > > > clear guidelines on when to add/not add a new config option.
> > > > > * Since we want to overhaul the configuration layer anyway in Flink
> > > 2.0,
> > > > we
> > > > > clean-up the configuration list by not having an option for each
> item,
> > > > but
> > > > > either a generic option that allows you to disable one or more
> features
> > > > (by
> > > > > providing a list as the configuration option), or we already bundle
> > > > > multiple configuration options into a specific category, e.g. so
> that
> > > you
> > > > > can have a default Flink without any hardening, a read-only Flink,
> a
> > > > > fully-hardened Flink etc)
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > >
> > > > > On Mon, Sep 11, 2023 at 4:51 PM Jim Hughes
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jing and Jark!
> > > > > >
> > > > > > I can definitely appreciate the desire to have fewer
> configurations.
> > > > > >
> > > > > > Do you have a suggested alternative for platform providers to
> limit
> > > or
> > > > > > restrict the hints that Bonnie is talking about?
> > > > > >
> > > > > > As one possibility, maybe one configuration could be set to
> control
> > > all
> > > > > > hints.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Jim
> > > > > >
> > > > > > On Sat, Sep 9, 2023 at 6:16 AM Jark Wu  wrote:
> > > > > >
> > > > > > > I agree with Jing,
> > > > > > >
> > > > > > > My biggest concern is this makes the boundary of adding an
> option
> > > > very
> > > > > > > unclear.
> > > > > > > It's not a strong reason to add a config just because of it
> doesn't
> > > > > > affect
> > > > > > > existing
> > > > > > > users. Does this mean that in the future we might add an
> option to
> > > > > > disable
> > > > > > > each feature?
> > > > > > >
> > > > > > > Flink already has a very long list of configurations [1][2] and
> > > this
> > > > is
> > > > > > > very scary
> > > > > > > and not easy to use. We should try to remove the unnecessary
> > > > > > configuration
> 

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-27 Thread Jing Ge
Hi Zhanghao,

Thanks for the update!

Best regards,
Jing

On Mon, Sep 25, 2023 at 9:54 PM Chen Zhanghao 
wrote:

> Hi Jing,
>
> I've updated Compatibility, Deprecation, and Migration Plan section to
> list all the potential compatibility issues for users who want to upgrade
> an existing job to use this feature:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> .
>
> Best,
> Zhanghao Chen
> 
> 发件人: Jing Ge 
> 发送时间: 2023年9月25日 23:02
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Hi Zhanghao,
>
> Thanks for driving the FLIP. This is a nice feature users are looking for.
> From users' perspective, would you like to add explicit description about
> any potential(or none) compatibility issues if users want to use this new
> feature and start existing jobs with savepoints or checkpoints?
>
> Best regards,
> Jing
>
> On Sun, Sep 24, 2023 at 9:05 PM Chen Zhanghao 
> wrote:
>
> > Hi Lincoln,
> >
> > Thanks for the comments.
> >
> > - For concerns #1, I agree that we do not always produce optimal plan for
> > both cases. However, it is difficult to do so and non-trivial complexity
> is
> > expected. On the other hand, although our proposal generates a
> sub-optimal
> > plan when the bottleneck is on the aggregate operator, it still provides
> > more flexibility for performance tuning. Therefore, I think we can
> > implement it in the straightforward way first, WDYT?
> >
> > - For concerns #2, I'd like to clarify a bit: exception will only be
> > thrown i.f.f. the source may produce delete/update messages AND no
> primary
> > key specified AND the source parallelism is different from the default
> > parallelism. So for CDC without pk, we're still good if source
> parallelism
> > is not specified.
> >
> > - For concerns #3, at the current point, I think making the name more
> > specific is better as no other future use cases can be previsioned now.
> > Since this is an internal API, we are free to refactor it later if
> needed.
> >
> > - For concerns about adaptive scheduler, the problems you mentioned do
> > exist, but I don't think it relevant here. The planner may encode some
> > hints to help the scheduler, but afterall, those efforts should be
> > initiated on the scheduler side.
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Lincoln Lee 
> > 发送时间: 2023年9月22日 23:19
> > 收件人: dev@flink.apache.org 
> > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> > Sources
> >
> > Hi Zhanghao,
> >
> > Thanks for the FLIP and discussion!  Hope this reply isn't too late : )
> > Firstly I'm fully agreed with the motivation of this FLIP and the value
> for
> > the users, but there are a few things we should consider(please correct
> me
> > if I'm misunderstanding):
> >
> > *1.  *It seems that the current solution only takes care of part of the
> > requirement, the need to set source's parallelism may be different in
> > different jobs,  for example, consider the following two job
> topologies(one
> > {} simply represents a vertex):
> > a. {source -> calc -> sink}
> >
> > b. {source -> calc} -> {aggregate} -> {sink}
> >
> > For job a, if there is a bottleneck in calc operator, but source
> > parallelism cannot be scaled up (e.g., limited by kafka's partition
> > number), so the calc operator cannot be scaled up to achieve higher
> > throughput because the operators in source vertex are chained together,
> > then current solution is reasonable (break the chain, add a shuffle).
> >
> > But for job b, if the bottleneck is the aggregate operator (not calc),
> it's
> > more likely be better to scale up the aggregate operator/vertex and
> without
> > breaking the {source -> calc} chain, as this will incur additional
> shuffle
> > cost.
> > So if we decide to add this new feature, I would recommend that both
> cases
> > be taken care of.
> >
> >
> > 2. the assumption that a cdc source must have pk(primary key) may not be
> > reasonable, for example, mysql cdc supports the case without pk(
> >
> >
> https://ververica.github.io/flink-cdc-connectors/master/content/connectors/mysql-cdc.html#tables-without-primary-keys
> > ),
> > so we can not just raise an error here.
> >
> >
> > 3. for the new SourceTransformationWrapper I have some concerns about the
> > future evolution, if we need to add support for other operators, do we
> > continue to add new xxWrappers?
> >
> > I've also revisited the previous discussion on FLIP-146[1], there were no
> > clear conclusions or good ideas about similar support issues for the
> source
> > before, and I also noticed that the new capability to change per-vertex
> > parallelism via rest api in 1.18 (part of FLIP-291[2][3], but there is
> > actually an issue about sql job's parallelism change which may require a
> > hash shuffle to ensure the order of update stream, this needs to be
> > followed up in FLIP-291, a jira will be creat

Re: [ANNOUNCE] Release 1.18.0, release candidate #0

2023-09-27 Thread Jing Ge
Hi Folks,

@Ryan FYI: CI passed and the PR has been merged. Thanks!

If there are no more other concerns, I will start publishing 1.18-rc1.

Best regards,
Jing

On Mon, Sep 25, 2023 at 1:40 PM Jing Ge  wrote:

> Hi Ryan,
>
> Thanks for reaching out. It is fine to include it but we need to wait
> until the CI passes. I am not sure how long it will take, since there seems
> to be some infra issues.
>
> Best regards,
> Jing
>
> On Mon, Sep 25, 2023 at 11:34 AM Ryan Skraba 
> wrote:
>
>> Hello!  There's a security fix that probably should be applied to 1.18
>> in the next RC1 : https://github.com/apache/flink/pull/23461 (bump to
>> snappy-java).  Do you think this would be possible to include?
>>
>> All my best, Ryan
>>
>> [1]: https://issues.apache.org/jira/browse/FLINK-33149 "Bump
>> snappy-java to 1.1.10.4"
>>
>>
>>
>> On Mon, Sep 25, 2023 at 3:54 PM Jing Ge 
>> wrote:
>> >
>> > Thanks Zakelly for the update! Appreciate it!
>> >
>> > @Piotr Nowojski  If you do not have any other
>> > concerns, I will move forward to create 1.18 rc1 and start voting. WDYT?
>> >
>> > Best regards,
>> > Jing
>> >
>> > On Mon, Sep 25, 2023 at 2:20 AM Zakelly Lan 
>> wrote:
>> >
>> > > Hi Jing and everyone,
>> > >
>> > > I have conducted three rounds of benchmarking with Java11, comparing
>> > > release 1.18 (commit: deb07e99560[1]) with commit 6d62f9918ea[2]. The
>> > > results are attached[3]. Most of the tests show no obvious regression.
>> > > However, I did observe significant change in several tests. Upon
>> > > reviewing the historical results from the previous pipeline, I also
>> > > discovered a substantial variance in those tests, as shown in the
>> > > timeline pictures included in the sheet[3]. I believe this variance
>> > > has existed for a long time and requires further investigation, and
>> > > fully measuring the variance requires more rounds (15 or more). I
>> > > think for now it is not a blocker for release 1.18. WDYT?
>> > >
>> > >
>> > > Best,
>> > > Zakelly
>> > >
>> > > [1]
>> > >
>> https://github.com/apache/flink/commit/deb07e99560b45033a629afc3f90666ad0a32feb
>> > > [2]
>> > >
>> https://github.com/apache/flink/commit/6d62f9918ea2cbb8a10c705a25a4ff6deab60711
>> > > [3]
>> > >
>> https://docs.google.com/spreadsheets/d/1V0-duzNTgu7H6R7kioF-TAPhlqWl7Co6Q9ikTBuaULo/edit?usp=sharing
>> > >
>> > > On Sun, Sep 24, 2023 at 11:29 AM ConradJam 
>> wrote:
>> > > >
>> > > > +1 for testing with Java 17
>> > > >
>> > > > Jing Ge  于2023年9月24日周日 09:40写道:
>> > > >
>> > > > > +1 for testing with Java 17 too. Thanks Zakelly for your effort!
>> > > > >
>> > > > > Best regards,
>> > > > > Jing
>> > > > >
>> > > > > On Fri, Sep 22, 2023 at 1:01 PM Zakelly Lan <
>> zakelly@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Hi Jing,
>> > > > > >
>> > > > > > I agree we could wait for the result with Java 11. And it
>> should be
>> > > > > > available next Monday.
>> > > > > > Additionally, I could also build a pipeline with Java 17 later
>> since
>> > > > > > it is supported in 1.18[1].
>> > > > > >
>> > > > > >
>> > > > > > Best regards,
>> > > > > > Zakelly
>> > > > > >
>> > > > > > [1]
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/flink/commit/9c1318ca7fa5b2e7b11827068ad1288483aaa464#diff-8310c97396d60e96766a936ca8680f1e2971ef486cfc2bc55ec9ca5a5333c47fR53
>> > > > > >
>> > > > > > On Fri, Sep 22, 2023 at 5:57 PM Jing Ge
>> 
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > Hi Zakelly,
>> > > > > > >
>> > > > > > > Thanks for your effort and the update! Since Java 8 has been
>> > > > > > deprecated[1],
>> > > > > > > let's wait for the result with Java 11. It should be available
>> > > after
>> > > > > the
>> > > > > > > weekend and there should be no big surprise. WDYT?
>> > > > > > >
>> > > > > > > Best regards,
>> > > > > > > Jing
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.15/#jdk-upgrade
>> > > > > > >
>> > > > > > > On Fri, Sep 22, 2023 at 11:26 AM Zakelly Lan <
>> > > zakelly@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi everyone,
>> > > > > > > >
>> > > > > > > > I want to provide an update on the benchmark results that I
>> have
>> > > been
>> > > > > > > > working on. After spending some time preparing the
>> environment
>> > > and
>> > > > > > > > adjusting the benchmark script, I finally got a comparison
>> > > between
>> > > > > > > > release 1.18 (commit: 2aeb99804ba[1]) and the commit before
>> the
>> > > old
>> > > > > > > > codespeed server went down (commit: 6d62f9918ea[2]) on
>> openjdk8.
>> > > The
>> > > > > > > > report is attached[3]. Note that the test has only run once
>> on
>> > > jdk8,
>> > > > > > > > so the impact of single-test fluctuations is not ruled out.
>> > > > > > > > Additionally, I have noticed some significant fluctuations
>> in
>> > > > > specific
>> > > > > > > > tests when reviewing previous benchmark scores, which I
>> have also

[RESULT][VOTE] FLIP-327: Support switching from batch to stream mode to improve throughput when processing backlog data

2023-09-27 Thread Dong Lin
Hi all,

Thank everyone for your review and the votes!

I am happy to announce that FLIP-327: Support switching from batch to
stream mode to improve throughput when processing backlog data [1] has been
accepted.

There are 4 binding votes and 3 non-binding votes [2]:

- Jing Ge (binding)
- Rui Fan (binding)
- Xintong Song (binding)
- Dong Lin (binding)
- Yuepeng Pan (non-binding)
- Venkatakrishnan Sowrirajan (non-binding)
- Ahmed Hamdy (non-binding)

There is no disapproving vote.

Cheers,
Dong

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-327
%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data
[2] https://lists.apache.org/thread/7cj6pzx7w0ynqyogxgk668wjr322mcqw


Re: [VOTE] FLIP-327: Support switching from batch to stream mode to improve throughput when processing backlog data

2023-09-27 Thread Dong Lin
Thank you all for the votes!

I will close the voting thread and summarize the result in a separate email.

On Tue, Sep 26, 2023 at 1:11 AM Ahmed Hamdy  wrote:

> +1(non binding)
> Best regards
> Ahmed Hamdy
>
> On Mon, 25 Sep 2023, 19:57 Venkatakrishnan Sowrirajan, 
> wrote:
>
> > +1 (non-binding)
> >
> > On Sun, Sep 24, 2023, 6:49 PM Xintong Song 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Sat, Sep 23, 2023 at 10:16 PM Yuepeng Pan 
> > > wrote:
> > >
> > > > +1(non-binding), thank you for driving this proposal.
> > > >
> > > > Best,
> > > > Yuepeng Pan.
> > > > At 2023-09-22 14:07:45, "Dong Lin"  wrote:
> > > > >Hi all,
> > > > >
> > > > >We would like to start the vote for FLIP-327: Support switching from
> > > batch
> > > > >to stream mode to improve throughput when processing backlog data
> [1].
> > > > This
> > > > >FLIP was discussed in this thread [2].
> > > > >
> > > > >The vote will be open until at least Sep 27th (at least 72
> > > > >hours), following the consensus voting process.
> > > > >
> > > > >Cheers,
> > > > >Xuannan and Dong
> > > > >
> > > > >[1]
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-327*3A*Support*switching*from*batch*to*stream*mode*to*improve*throughput*when*processing*backlog*data__;JSsrKysrKysrKysrKysr!!IKRxdwAv5BmarQ!ZP19r7-3IBaBI-kV0olZvaz5a2TFN3uxge2TJM7WQvovjfRbOl71NaC3SEh_UEJmH7Lssqu0bx4FKResPPc7$
> > > > >[2]
> > >
> >
> https://urldefense.com/v3/__https://lists.apache.org/thread/29nvjt9sgnzvs90browb8r6ng31dcs3n__;!!IKRxdwAv5BmarQ!ZP19r7-3IBaBI-kV0olZvaz5a2TFN3uxge2TJM7WQvovjfRbOl71NaC3SEh_UEJmH7Lssqu0bx4FKZEAz9yp$
> > > >
> > >
> >
>


[jira] [Created] (FLINK-33165) Flink UI stack trace popup continually displayed when a job is deleted

2023-09-27 Thread david radley (Jira)
david radley created FLINK-33165:


 Summary: Flink UI stack trace popup continually displayed when a 
job is deleted
 Key: FLINK-33165
 URL: https://issues.apache.org/jira/browse/FLINK-33165
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
 Environment: MacOS M1 . 
Reporter: david radley
 Fix For: 1.7.3


If you run a job and view that running job in the Flink dashboard, then delete 
the job, the UI displays a pop-up with an information icon and a message and 
stack trace . This is repeatedly displayed every second or so. If we must 
display a popup is should be once without the stack (which implies there is an 
error - the stack trace is not useful to the user). We should look into whether 
we need a popup in this case. It would be better to not display a popup, as 
this should be just a state change.    

An example popup content  is:

*Server Response Message:*

org.apache.flink.runtime.rest.NotFoundException: Job 
4e332318d164d86e76a239eabd49bf93 not found

at 
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99)

at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(Unknown 
Source)

at 
java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(Unknown
 Source)

at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)

at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)

at 
org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109)

at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source)

at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
 Source)

at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)

at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)

at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:260)

at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source)

at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
 Source)

at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)

at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)

at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1298)

at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)

at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)

at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)

at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source)

at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
 Source)

at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)

at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)

at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:45)

at akka.dispatch.OnComplete.internal(Future.scala:299)

at akka.dispatch.OnComplete.internal(Future.scala:297)

at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)

at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)

at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)

at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)

at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)

at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)

at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)

at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)

at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:622)

at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:25)

at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)

at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:536)

at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)

at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)

at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)

at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)

at 
akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(Bat

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-09-27 Thread Zhu Zhu
Thanks Yuepeng and Rui for creating this FLIP.

+1 in general
The idea is straight forward: best-effort gather all the slot requests
and offered slots to form an overview before assigning slots, trying to
balance the loads of task managers when assigning slots.

I have one comment regarding the configuration for ease of use:

IIUC, this FLIP uses an existing config 'cluster.evenly-spread-out-slots'
as the main switch of the new feature. That is, from user perspective,
with this improvement, the 'cluster.evenly-spread-out-slots' feature not
only balances the number of slots on task managers, but also balances the
number of tasks. This is a behavior change anyway. Besides that, it also
requires users to set 'slot.sharing-strategy' to 'TASK_BALANCED_PREFERRED'
to balance the tasks in each slot.

I think we can introduce a new config option
`taskmanager.load-balance.mode`,
which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
can be superseded by the "Slots" mode and get deprecated. In the future
it can support more mode, e.g. "CpuCores", to work better for jobs with
fine-grained resources. The proposed config option
`slot.request.max-interval`
then can be renamed to `taskmanager.load-balance.request-stablizing-timeout`
to show its relation with the feature. The proposed `slot.sharing-strategy`
is not needed, because the configured "Tasks" mode will do the work.

WDYT?

Thanks,
Zhu Zhu

Yuepeng Pan  于2023年9月25日周一 16:26写道:

> Hi all,
>
>
> I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks
> scheduling.
>
>
> The current strategy of Flink to deploy tasks sometimes leads some
> TMs(TaskManagers) to have more tasks while others have fewer tasks,
> resulting in excessive resource utilization at some TMs that contain more
> tasks and becoming a bottleneck for the entire job processing. Developing
> strategies to achieve task load balancing for TMs and reducing job
> bottlenecks becomes very meaningful.
>
>
> The raw design and discussions could be found in the Flink JIRA[2] and
> Google doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some
> valuable help and suggestions in advance.
>
>
> Please refer to the FLIP[1] document for more details about the proposed
> design and implementation. We welcome any feedback and opinions on this
> proposal.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
>
> [2] https://issues.apache.org/jira/browse/FLINK-31757
>
> [3]
> https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8
>
>
> Best,
>
> Yuepeng Pan
>


[jira] [Created] (FLINK-33164) HBase connector support ignore null value for partial update

2023-09-27 Thread tanjialiang (Jira)
tanjialiang created FLINK-33164:
---

 Summary: HBase connector support ignore null value for partial 
update
 Key: FLINK-33164
 URL: https://issues.apache.org/jira/browse/FLINK-33164
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / HBase
Affects Versions: hbase-3.0.0
Reporter: tanjialiang


Sometimes, user want to write data and ignore null value to achieve partial 
update. So i suggest adding an options: sink.ignore-null-value.

 
{code:java}
CREATE TABLE hTable (
 rowkey STRING,
 cf1 ROW,
 PRIMARY KEY (rowkey) NOT ENFORCED
) WITH (
 'connector' = 'hbase-2.2',
 'table-name' = 'default:test',
 'zookeeper.quorum' = 'localhost:2181',
 'sink.ignore-null-value' = 'true' -- default is false, true is enabled
);

INSERT INTO hTable VALUES('1', ROW('10', 'hello, world'));
INSERT INTO hTable VALUES('1', ROW('30', CAST(NULL AS STRING))); -- null value 
to cf1.q2

-- when sink.ignore-null-value is false
// after first insert
{rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}} 
// after second insert, cf1.q2 update to null
{rowkey: "1", "cf1": {q1: "30", q2: "null"}} 


-- when sink.ignore-null-value is true
// after first insert 
{rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}}
// after second insert, cf1.q2 is still the old value 
{rowkey: "1", "cf1": {q1: "30", q2: "hello, world"}} {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)