回复: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2023-09-27 Thread Chen Zhanghao
+1 (non-binding), thanks for driving this.

Best,
Zhanghao Chen

发件人: Shammon FY 
发送时间: 2023年9月25日 13:28
收件人: dev 
主题: [VOTE] FLIP-314: Support Customized Job Lineage Listener

Hi devs,

Thanks for all the feedback on FLIP-314: Support Customized Job Lineage
Listener [1] in thread [2].

I would like to start a vote for it. The vote will be opened for at least
72 hours unless there is an objection or insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
[2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc

Best,
Shammon FY


回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-25 Thread Chen Zhanghao
Hi Jing,

I've updated Compatibility, Deprecation, and Migration Plan section to list all 
the potential compatibility issues for users who want to upgrade an existing 
job to use this feature: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150.

Best,
Zhanghao Chen

发件人: Jing Ge 
发送时间: 2023年9月25日 23:02
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Hi Zhanghao,

Thanks for driving the FLIP. This is a nice feature users are looking for.
From users' perspective, would you like to add explicit description about
any potential(or none) compatibility issues if users want to use this new
feature and start existing jobs with savepoints or checkpoints?

Best regards,
Jing

On Sun, Sep 24, 2023 at 9:05 PM Chen Zhanghao 
wrote:

> Hi Lincoln,
>
> Thanks for the comments.
>
> - For concerns #1, I agree that we do not always produce optimal plan for
> both cases. However, it is difficult to do so and non-trivial complexity is
> expected. On the other hand, although our proposal generates a sub-optimal
> plan when the bottleneck is on the aggregate operator, it still provides
> more flexibility for performance tuning. Therefore, I think we can
> implement it in the straightforward way first, WDYT?
>
> - For concerns #2, I'd like to clarify a bit: exception will only be
> thrown i.f.f. the source may produce delete/update messages AND no primary
> key specified AND the source parallelism is different from the default
> parallelism. So for CDC without pk, we're still good if source parallelism
> is not specified.
>
> - For concerns #3, at the current point, I think making the name more
> specific is better as no other future use cases can be previsioned now.
> Since this is an internal API, we are free to refactor it later if needed.
>
> - For concerns about adaptive scheduler, the problems you mentioned do
> exist, but I don't think it relevant here. The planner may encode some
> hints to help the scheduler, but afterall, those efforts should be
> initiated on the scheduler side.
>
> Best,
> Zhanghao Chen
> 
> 发件人: Lincoln Lee 
> 发送时间: 2023年9月22日 23:19
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Hi Zhanghao,
>
> Thanks for the FLIP and discussion!  Hope this reply isn't too late : )
> Firstly I'm fully agreed with the motivation of this FLIP and the value for
> the users, but there are a few things we should consider(please correct me
> if I'm misunderstanding):
>
> *1.  *It seems that the current solution only takes care of part of the
> requirement, the need to set source's parallelism may be different in
> different jobs,  for example, consider the following two job topologies(one
> {} simply represents a vertex):
> a. {source -> calc -> sink}
>
> b. {source -> calc} -> {aggregate} -> {sink}
>
> For job a, if there is a bottleneck in calc operator, but source
> parallelism cannot be scaled up (e.g., limited by kafka's partition
> number), so the calc operator cannot be scaled up to achieve higher
> throughput because the operators in source vertex are chained together,
> then current solution is reasonable (break the chain, add a shuffle).
>
> But for job b, if the bottleneck is the aggregate operator (not calc), it's
> more likely be better to scale up the aggregate operator/vertex and without
> breaking the {source -> calc} chain, as this will incur additional shuffle
> cost.
> So if we decide to add this new feature, I would recommend that both cases
> be taken care of.
>
>
> 2. the assumption that a cdc source must have pk(primary key) may not be
> reasonable, for example, mysql cdc supports the case without pk(
>
> https://ververica.github.io/flink-cdc-connectors/master/content/connectors/mysql-cdc.html#tables-without-primary-keys
> ),
> so we can not just raise an error here.
>
>
> 3. for the new SourceTransformationWrapper I have some concerns about the
> future evolution, if we need to add support for other operators, do we
> continue to add new xxWrappers?
>
> I've also revisited the previous discussion on FLIP-146[1], there were no
> clear conclusions or good ideas about similar support issues for the source
> before, and I also noticed that the new capability to change per-vertex
> parallelism via rest api in 1.18 (part of FLIP-291[2][3], but there is
> actually an issue about sql job's parallelism change which may require a
> hash shuffle to ensure the order of update stream, this needs to be
> followed up in FLIP-291, a jira will be created later).  So perhaps

回复: [VOTE] FLIP-362: Support minimum resource limitation

2023-09-25 Thread Chen Zhanghao
Thanks for driving this. +1 (non-binding)

Best,
Zhanghao Chen

发件人: xiangyu feng 
发送时间: 2023年9月25日 17:38
收件人: dev@flink.apache.org 
主题: [VOTE] FLIP-362: Support minimum resource limitation

Hi all,

I would like to start the vote for FLIP-362:  Support minimum resource
limitation[1].
This FLIP was discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

Regards,
Xiangyu

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-362%3A+Support+minimum+resource+limitation
[2] https://lists.apache.org/thread/m2v9n4yynm97v8swhqj2o5k0sqlb5ym4


回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-24 Thread Chen Zhanghao
Hi Lincoln,

Thanks for the comments.

- For concerns #1, I agree that we do not always produce optimal plan for both 
cases. However, it is difficult to do so and non-trivial complexity is 
expected. On the other hand, although our proposal generates a sub-optimal plan 
when the bottleneck is on the aggregate operator, it still provides more 
flexibility for performance tuning. Therefore, I think we can implement it in 
the straightforward way first, WDYT?

- For concerns #2, I'd like to clarify a bit: exception will only be thrown 
i.f.f. the source may produce delete/update messages AND no primary key 
specified AND the source parallelism is different from the default parallelism. 
So for CDC without pk, we're still good if source parallelism is not specified.

- For concerns #3, at the current point, I think making the name more specific 
is better as no other future use cases can be previsioned now. Since this is an 
internal API, we are free to refactor it later if needed.

- For concerns about adaptive scheduler, the problems you mentioned do exist, 
but I don't think it relevant here. The planner may encode some hints to help 
the scheduler, but afterall, those efforts should be initiated on the scheduler 
side.

Best,
Zhanghao Chen

发件人: Lincoln Lee 
发送时间: 2023年9月22日 23:19
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Hi Zhanghao,

Thanks for the FLIP and discussion!  Hope this reply isn't too late : )
Firstly I'm fully agreed with the motivation of this FLIP and the value for
the users, but there are a few things we should consider(please correct me
if I'm misunderstanding):

*1.  *It seems that the current solution only takes care of part of the
requirement, the need to set source's parallelism may be different in
different jobs,  for example, consider the following two job topologies(one
{} simply represents a vertex):
a. {source -> calc -> sink}

b. {source -> calc} -> {aggregate} -> {sink}

For job a, if there is a bottleneck in calc operator, but source
parallelism cannot be scaled up (e.g., limited by kafka's partition
number), so the calc operator cannot be scaled up to achieve higher
throughput because the operators in source vertex are chained together,
then current solution is reasonable (break the chain, add a shuffle).

But for job b, if the bottleneck is the aggregate operator (not calc), it's
more likely be better to scale up the aggregate operator/vertex and without
breaking the {source -> calc} chain, as this will incur additional shuffle
cost.
So if we decide to add this new feature, I would recommend that both cases
be taken care of.


2. the assumption that a cdc source must have pk(primary key) may not be
reasonable, for example, mysql cdc supports the case without pk(
https://ververica.github.io/flink-cdc-connectors/master/content/connectors/mysql-cdc.html#tables-without-primary-keys),
so we can not just raise an error here.


3. for the new SourceTransformationWrapper I have some concerns about the
future evolution, if we need to add support for other operators, do we
continue to add new xxWrappers?

I've also revisited the previous discussion on FLIP-146[1], there were no
clear conclusions or good ideas about similar support issues for the source
before, and I also noticed that the new capability to change per-vertex
parallelism via rest api in 1.18 (part of FLIP-291[2][3], but there is
actually an issue about sql job's parallelism change which may require a
hash shuffle to ensure the order of update stream, this needs to be
followed up in FLIP-291, a jira will be created later).  So perhaps, we
need to think about it more (the next version is not yet launched, so we
still have time)

[1] https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
[3] https://issues.apache.org/jira/browse/FLINK-31471


Best,
Lincoln Lee


Chen Zhanghao  于2023年9月22日周五 16:00写道:

> Thanks to everyone who participated in the discussion here. If no further
> questions/concerns are raised, we'll start voting next Monday afternoon
> (GMT+8).
>
> Best,
> Zhanghao Chen
> 
> 发件人: Jane Chan 
> 发送时间: 2023年9月22日 15:35
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Hi Zhanghao,
>
> Thanks for the update; +1 for the proposal!
>
> Best,
> Jane
>
> On Fri, Sep 22, 2023 at 2:13 PM Chen Zhanghao 
> wrote:
>
> > Hi Jane,
> >
> > Thanks for the suggestions and totally agree with them. I've updated the
> > FLIP with the following two changes:
> >
> > 1. Rename WrapperTransformation to SourceTransformationWrapper that w

[RESULT][VOTE] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-22 Thread Chen Zhanghao
Hi everyone,

The proposal, FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI, has been unanimously accepted with 5 votes (4 binding):

+1 votes:

 - Yangze Guo (binding)
 - Rui Fan (binding)
 - Jing Ge (binding)
 - Weihua Hu (binding)
 - Matt Wang

Thanks again to everyone who participated in the discussion and voting.

Best,
Zhanghao Chen


回复: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-22 Thread Chen Zhanghao
Thank you all! Closing the vote. The result will be sent in a separate email.

Best,
Zhanghao Chen

发件人: Matt Wang 
发送时间: 2023年9月20日 20:54
收件人: dev@flink.apache.org 
主题: Re: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

+1 (non-binding)


Thanks for driving this FLIP


--

Best,
Matt Wang


 Replied Message 
| From | Weihua Hu |
| Date | 09/19/2023 19:17 |
| To |  |
| Subject | Re: [VOTE] FLIP-363: Unify the Representation of TaskManager 
Location in REST API and Web UI |
+1(binding)

Best,
Weihua


On Tue, Sep 19, 2023 at 6:16 PM Jing Ge  wrote:

+1(binding) Thanks!

Best regards,
Jing

On Tue, Sep 19, 2023 at 9:01 AM Chen Zhanghao 
wrote:

Hi Devs,

Thanks for all the feedbacks on FLIP-363: Unify the Representation of
TaskManager Location in REST API and Web UI [1][2]. Given that the
consensus on the naming issue has been reached (using "endpoint" instead
of
"location"),  I'd like to restart the vote for it. The vote will be open
for at least 72 hours (until Sep 22th 12:00 PM GMT) unless there is an
objection or insufficient votes.

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
[2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd

Best,
Zhanghao Chen
____
发件人: Chen Zhanghao 
发送时间: 2023年9月18日 19:19
收件人: dev@flink.apache.org ; Jing Ge <
j...@ververica.com>
主题: 回复: [VOTE] FLIP-363: Unify the Representation of TaskManager Location
in REST API and Web UI

Thanks for pointing that out. Let's give it a bit more time for reaching
consensus on the naming issue and postpone the voting for now. Sorry for
the inconvenience here. Will send another email once the voting restarts.

Best,
Zhanghao Chen

发件人: Rui Fan <1996fan...@gmail.com>
发送时间: 2023年9月18日 11:55
收件人: dev@flink.apache.org ; Jing Ge <
j...@ververica.com>
主题: Re: [VOTE] FLIP-363: Unify the Representation of TaskManager Location
in REST API and Web UI

A gentle reminder about the location naming.
The naming of location is a little unclear, but
I can't think of any other better naming.

So I +1(binding) first.

Ping @Jing Ge  to help double check the name again.

Sorry for mentioning naming in the VOTE thread,
I didn't know this VOTE would be so early.

Best,
Rui

On Mon, Sep 18, 2023 at 11:44 AM Yangze Guo  wrote:

+1 (binding)

Best,
Yangze Guo

On Mon, Sep 18, 2023 at 11:37 AM Chen Zhanghao
 wrote:

Hi All,

Thanks for all the feedback on FLIP-363: Unify the Representation of
TaskManager Location in REST API and Web UI [1][2]

I'd like to start a vote for FLIP-363. The vote will be open for at
least 72 hours unless there is an objection or insufficient votes.

[1]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
[2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd

Best,
Zhanghao Chen





回复: [Discuss] FLIP-366: Support standard YAML for FLINK configuration

2023-09-22 Thread Chen Zhanghao
Hi Junrui,

Thanks for driving this, +1 for it

Best,
Zhanghao Chen

发件人: Junrui Lee 
发送时间: 2023年9月20日 11:06
收件人: dev@flink.apache.org 
主题: [Discuss] FLIP-366: Support standard YAML for FLINK configuration

Hi devs,

I would like to start a discussion about FLIP-366:
Support standard YAML for FLINK configuration[1]

The current flink-conf.yaml parser in FLINK is not a standard YAML parser,
which has some shortcomings.
Firstly, it does not support nested structure configuration items and only
supports key-value pairs, resulting in poor readability. Secondly, if the
value is a collection type, such as a List or Map, users are required to
write the value in a FLINK-specific pattern, which is inconvenient to use.
Additionally, the parser of FLINK has some differences in syntax compared
to the standard YAML parser, such as the syntax for parsing comments and
null values. These inconsistencies can cause confusion for users, as seen
in FLINK-15358 and FLINK-32740.

By supporting standard YAML, these issues can be resolved, and users can
create a Flink configuration file using third-party tools and leverage
some advanced YAML features. Therefore, we propose to support standard
YAML for FLINK configuration.

You can find more details in the FLIP-366[1]. Looking forward to your
feedback.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-366%3A+Support+standard+YAML+for+FLINK+configuration

Best,
Junrui


回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-22 Thread Chen Zhanghao
Thanks to everyone who participated in the discussion here. If no further 
questions/concerns are raised, we'll start voting next Monday afternoon (GMT+8).

Best,
Zhanghao Chen

发件人: Jane Chan 
发送时间: 2023年9月22日 15:35
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Hi Zhanghao,

Thanks for the update; +1 for the proposal!

Best,
Jane

On Fri, Sep 22, 2023 at 2:13 PM Chen Zhanghao 
wrote:

> Hi Jane,
>
> Thanks for the suggestions and totally agree with them. I've updated the
> FLIP with the following two changes:
>
> 1. ​Rename WrapperTransformation to SourceTransformationWrapper that wraps
> a SourceTransformation only. Note that we do not plan to support the legacy
> LegacySourceTransformation.
> 2. Choosing the partitioner after the source will be based on the
> changelog mode of the source + the existence of the primary key in source
> schema. If the source will produce update/delete message but a primary key
> does not exist, an exception will be thrown.
>
> Best,
> Zhanghao Chen
> 
> 发件人: Jane Chan 
> 发送时间: 2023年9月20日 15:13
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Hi Zhanghao,
>
> Thanks for the update. The FLIP now looks good to me in general, and I have
> two minor comments.
>
> 1. Compared with other subclasses like `CacheTransformation` or
> `PartitionTransformation`, the name  `WrapperTransformation` seems too
> general. What about `SourceTransformationWrapper`, which is more specific
> and descriptive, WDYT?
>
> 2.
>
> > When the source generates update and delete data (determined by checking
> > the existence of a primary key in the source schema), the source will use
> > hash partitioner to send data.
>
>
> It might not be sufficient to determine whether the source is a CDC source
> solely based on checking the existence of the primary key. It's better to
> check the changelog mode of the source. On the other hand, adding the hash
> partitioner requires the CDC source table to declare the primary key in the
> DDL. Therefore, it is preferable to explain this restriction in the FLIP
> and doc and throw a meaningful exception when users want to configure a
> different parallelism for a CDC source but forget to declare the primary
> key constraint.
>
> Best,
> Jane
>
> On Wed, Sep 20, 2023 at 9:20 AM Benchao Li  wrote:
>
> > Thank you for the update, the FLIP now looks good to me.
> >
> > Chen Zhanghao  于2023年9月19日周二 22:50写道:
> > >
> > > Thanks to everyone for the valuable inputs, we learnt a lot during the
> > discussion. We've updated the FLIP in three main aspects based on the
> > discussion here:
> > >
> > > - Add a new subsection on keeping downstream operators' parallelism
> > unchanged by wrapping the source transformation in a phantom
> transformation.
> > > - Add a new subsection on how to deal with changelog messages, simply
> > put, build a hash partitioner based on the primary key when a source
> > generates update/delete data.
> > > - Update the non-goals section to remove the possibly misleading
> > statement that setting parallelism for individual operators lacks public
> > interest and state that we leave it for future work due to its extra
> > complexity.
> > >
> > > Looking forward to your suggestions.
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Feng Jin 
> > > 发送时间: 2023年9月17日 0:56
> > > 收件人: dev@flink.apache.org 
> > > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> > Sources
> > >
> > > Hi, Zhanghao
> > >
> > > Thank you for proposing this FLIP, it is a very meaningful feature.
> > >
> > > I agree that currently we may only consider the parallelism setting of
> > the
> > > source itself. If we consider the parallelism setting of other
> operators,
> > > it may make the entire design more complex.
> > >
> > > Regarding the situation where the parallelism of the source is
> different
> > > from that of downstream tasks, I did not find a more detailed
> description
> > > in FLIP.
> > >
> > > By default, if the parallelism between two operators is different, the
> > > rebalance partitioner will be used.
> > > But in the SQL scenario, I believe that we should keep the behavior of
> > > parallelism setting consistent with that of the sink.
> >

回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-21 Thread Chen Zhanghao
Hi Jane,

Thanks for the suggestions and totally agree with them. I've updated the FLIP 
with the following two changes:

1. ​Rename WrapperTransformation to SourceTransformationWrapper that wraps a 
SourceTransformation only. Note that we do not plan to support the legacy 
LegacySourceTransformation.
2. Choosing the partitioner after the source will be based on the changelog 
mode of the source + the existence of the primary key in source schema. If the 
source will produce update/delete message but a primary key does not exist, an 
exception will be thrown.

Best,
Zhanghao Chen

发件人: Jane Chan 
发送时间: 2023年9月20日 15:13
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Hi Zhanghao,

Thanks for the update. The FLIP now looks good to me in general, and I have
two minor comments.

1. Compared with other subclasses like `CacheTransformation` or
`PartitionTransformation`, the name  `WrapperTransformation` seems too
general. What about `SourceTransformationWrapper`, which is more specific
and descriptive, WDYT?

2.

> When the source generates update and delete data (determined by checking
> the existence of a primary key in the source schema), the source will use
> hash partitioner to send data.


It might not be sufficient to determine whether the source is a CDC source
solely based on checking the existence of the primary key. It's better to
check the changelog mode of the source. On the other hand, adding the hash
partitioner requires the CDC source table to declare the primary key in the
DDL. Therefore, it is preferable to explain this restriction in the FLIP
and doc and throw a meaningful exception when users want to configure a
different parallelism for a CDC source but forget to declare the primary
key constraint.

Best,
Jane

On Wed, Sep 20, 2023 at 9:20 AM Benchao Li  wrote:

> Thank you for the update, the FLIP now looks good to me.
>
> Chen Zhanghao  于2023年9月19日周二 22:50写道:
> >
> > Thanks to everyone for the valuable inputs, we learnt a lot during the
> discussion. We've updated the FLIP in three main aspects based on the
> discussion here:
> >
> > - Add a new subsection on keeping downstream operators' parallelism
> unchanged by wrapping the source transformation in a phantom transformation.
> > - Add a new subsection on how to deal with changelog messages, simply
> put, build a hash partitioner based on the primary key when a source
> generates update/delete data.
> > - Update the non-goals section to remove the possibly misleading
> statement that setting parallelism for individual operators lacks public
> interest and state that we leave it for future work due to its extra
> complexity.
> >
> > Looking forward to your suggestions.
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Feng Jin 
> > 发送时间: 2023年9月17日 0:56
> > 收件人: dev@flink.apache.org 
> > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
> >
> > Hi, Zhanghao
> >
> > Thank you for proposing this FLIP, it is a very meaningful feature.
> >
> > I agree that currently we may only consider the parallelism setting of
> the
> > source itself. If we consider the parallelism setting of other operators,
> > it may make the entire design more complex.
> >
> > Regarding the situation where the parallelism of the source is different
> > from that of downstream tasks, I did not find a more detailed description
> > in FLIP.
> >
> > By default, if the parallelism between two operators is different, the
> > rebalance partitioner will be used.
> > But in the SQL scenario, I believe that we should keep the behavior of
> > parallelism setting consistent with that of the sink.
> >
> > 1. When the source only generates insert-only data, if there is a
> mismatch
> > in parallelism between the source and downstream operators, rebalance is
> > used by default.
> >
> > 2. When the source generates update and delete data, we should require
> the
> > source to configure a primary key and then build a hash partitioner based
> > on that primary key.
> >
> > WDYT ?
> >
> >
> > Best,
> > Feng
> >
> >
> > On Sat, Sep 16, 2023 at 5:58 PM Jane Chan  wrote:
> >
> > > Hi Zhanghao,
> > >
> > > Thanks for the explanation.
> > >
> > > For Q1, I think the key lies in determining the boundary where the
> chain
> > > should be broken. However, this boundary is ultimately determined by
> the
> > > specific requirements of each user query.
> > >
> > > The most straightforward app

回复: [Discuss] FLIP-362: Support minimum resource limitation

2023-09-19 Thread Chen Zhanghao
Thanks for driving this, Xiangyu. We use Session clusters for quick SQL 
debugging internally, and found cold-start job submission slow due to lack of 
the exact minimum resource reservation feature proposed here. This should 
improve the experience a lot for running short lived-jobs in session clusters.

Best,
Zhanghao Chen

发件人: Yangze Guo 
发送时间: 2023年9月19日 13:10
收件人: xiangyu feng 
抄送: dev@flink.apache.org 
主题: Re: [Discuss] FLIP-362: Support minimum resource limitation

Thanks for driving this @Xiangyu. This is a feature that many users
have requested for a long time. +1 for the overall proposal.

Best,
Yangze Guo

On Tue, Sep 19, 2023 at 11:48 AM xiangyu feng  wrote:
>
> Hi Devs,
>
> I'm opening this thread to discuss FLIP-362: Support minimum resource 
> limitation. The design doc can be found at:
> FLIP-362: Support minimum resource limitation
>
> Currently, the Flink cluster only requests Task Managers (TMs) when there is 
> a resource requirement, and idle TMs are released after a certain period of 
> time. However, in certain scenarios, such as running short lived-jobs in 
> session cluster and scheduling batch jobs stage by stage, we need to improve 
> the efficiency of job execution by maintaining a certain number of available 
> workers in the cluster all the time.
>
> After discussed with Yangze, we introduced this new feature. The new added 
> public options and proposed changes are described in this FLIP.
>
> Looking forward to your feedback, thanks.
>
> Best regards,
> Xiangyu
>


回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-19 Thread Chen Zhanghao
Thanks to everyone for the valuable inputs, we learnt a lot during the 
discussion. We've updated the FLIP in three main aspects based on the 
discussion here:

- ​Add a new subsection on keeping downstream operators' parallelism unchanged 
by wrapping the source transformation in a phantom transformation.
- Add a new subsection on how to deal with changelog messages, simply put, 
build a hash partitioner based on the primary key when a source generates 
update/delete data.
- Update the non-goals section to remove the possibly misleading statement that 
setting parallelism for individual operators lacks public interest and state 
that we leave it for future work due to its extra complexity.

Looking forward to your suggestions.

Best,
Zhanghao Chen

发件人: Feng Jin 
发送时间: 2023年9月17日 0:56
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Hi, Zhanghao

Thank you for proposing this FLIP, it is a very meaningful feature.

I agree that currently we may only consider the parallelism setting of the
source itself. If we consider the parallelism setting of other operators,
it may make the entire design more complex.

Regarding the situation where the parallelism of the source is different
from that of downstream tasks, I did not find a more detailed description
in FLIP.

By default, if the parallelism between two operators is different, the
rebalance partitioner will be used.
But in the SQL scenario, I believe that we should keep the behavior of
parallelism setting consistent with that of the sink.

1. When the source only generates insert-only data, if there is a mismatch
in parallelism between the source and downstream operators, rebalance is
used by default.

2. When the source generates update and delete data, we should require the
source to configure a primary key and then build a hash partitioner based
on that primary key.

WDYT ?


Best,
Feng


On Sat, Sep 16, 2023 at 5:58 PM Jane Chan  wrote:

> Hi Zhanghao,
>
> Thanks for the explanation.
>
> For Q1, I think the key lies in determining the boundary where the chain
> should be broken. However, this boundary is ultimately determined by the
> specific requirements of each user query.
>
> The most straightforward approach is breaking the chain after the source
> operator, even though it involves a tradeoff. This is because there may be
> instances of `StreamExecWatermarkAssigner`, `StreamExecMiniBatchAssigner`,
> or `StreamExecChangelogNormalize` occurring before the `StreamExecCalc`
> node, and it would be complex and challenging to enumerate all possible
> match patterns.
>
> A more complex workaround would be to provide an entry point for users to
> configure the specific operator that should serve as the breakpoint.
> Meanwhile, this would further increase the complexity of this FLIP.
>
> However, if the parallelism of each operator can be configured (in the
> future), then this problem would not exist (it might be beyond the scope of
> discussion for this FLIP).
>
> I personally lean towards keeping the FLIP concise and focused by choosing
> the most straightforward approach. I would also like to hear other's
> opinions.
>
> Best,
> Jane
>
> On Sat, Sep 16, 2023 at 10:21 AM Yun Tang  wrote:
>
> > Hi Zhanghao,
> >
> > Certainly, I think we shall leave this FLIP focus on setting the source
> > parallelism via DDL's properties. I just want to clarify that setting
> > parallelism for individual operators is also profitable from my
> experience,
> > which is slighted in your FLIP.
> >
> > @ConradJam BTW, compared with SQL hint, I think using `scan.parallelism`
> > is better to align with current `sink.parallelism`. And once we introduce
> > such option, we can also use SQL hint of dynamic table options[1] to
> > configure the source parallelism.
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options
> >
> >
> > Best
> > Yun Tang
> > 
> > From: ConradJam 
> > Sent: Friday, September 15, 2023 22:52
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for
> Table/SQL
> > Sources
> >
> > + 1 Thanks for the FLIP and the discussion. I would like to ask whether
> to
> > use SQL Hint syntax to set this parallelism?
> >
> > Martijn Visser  于2023年9月15日周五 20:52写道:
> >
> > > Hi everyone,
> > >
> > > Thanks for the FLIP and the discussion. I find it exciting. Thanks for
> > > pushing for this.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On 

回复: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-19 Thread Chen Zhanghao
Hi Devs,

Thanks for all the feedbacks on FLIP-363: Unify the Representation of 
TaskManager Location in REST API and Web UI [1][2]. Given that the consensus on 
the naming issue has been reached (using "endpoint" instead of "location"),  
I'd like to restart the vote for it. The vote will be open for at least 72 
hours (until Sep 22th 12:00 PM GMT) unless there is an objection or 
insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
[2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd

Best,
Zhanghao Chen
____
发件人: Chen Zhanghao 
发送时间: 2023年9月18日 19:19
收件人: dev@flink.apache.org ; Jing Ge 
主题: 回复: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks for pointing that out. Let's give it a bit more time for reaching 
consensus on the naming issue and postpone the voting for now. Sorry for the 
inconvenience here. Will send another email once the voting restarts.

Best,
Zhanghao Chen

发件人: Rui Fan <1996fan...@gmail.com>
发送时间: 2023年9月18日 11:55
收件人: dev@flink.apache.org ; Jing Ge 
主题: Re: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

A gentle reminder about the location naming.
The naming of location is a little unclear, but
I can't think of any other better naming.

So I +1(binding) first.

Ping @Jing Ge  to help double check the name again.

Sorry for mentioning naming in the VOTE thread,
I didn't know this VOTE would be so early.

Best,
Rui

On Mon, Sep 18, 2023 at 11:44 AM Yangze Guo  wrote:

> +1 (binding)
>
> Best,
> Yangze Guo
>
> On Mon, Sep 18, 2023 at 11:37 AM Chen Zhanghao
>  wrote:
> >
> > Hi All,
> >
> > Thanks for all the feedback on FLIP-363: Unify the Representation of
> TaskManager Location in REST API and Web UI [1][2]
> >
> > I'd like to start a vote for FLIP-363. The vote will be open for at
> least 72 hours unless there is an objection or insufficient votes.
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > [2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd
> >
> > Best,
> > Zhanghao Chen
>


回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-18 Thread Chen Zhanghao
Thanks to everyone who discussed here, I really appreciate it. I've updated the 
FLIP to change all occurrences of "location" to "endpoint" instead. Given that 
we've reached consensus on the topic, I'll restart the voting.

Best,
Zhanghao Chen

发件人: Rui Fan 
发送时间: 2023年9月19日 12:16
收件人: Chen Zhanghao ; dev 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

+1, thanks everyone who discussed here.

Best,
Rui


On Tue, 19 Sep 2023 at 11:41, Chen Zhanghao 
mailto:zhanghao.c...@outlook.com>> wrote:
Hi Jing,

Thanks for the clarification, I now see the point. +1 for using endpoint now. 
@fan...@apache.org<mailto:fan...@apache.org> WDYT?

Best,
Zhanghao Chen

发件人: Yangze Guo mailto:karma...@gmail.com>>
发送时间: 2023年9月19日 11:18

收件人: dev@flink.apache.org<mailto:dev@flink.apache.org> 
mailto:dev@flink.apache.org>>
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks for the clarification, Jing. I agree that using another term
like endpoint can help us to distinguish it from the existing concept
of "location". +1 for using the term endpoint and introducing
TaskManagerLocation.getEndpoint().

Best,
Yangze Guo

On Mon, Sep 18, 2023 at 11:52 PM Jing Ge  wrote:
>
> Hi Zhanghao,
>
> That is exactly the reason why location should not be used, because there
> is a clear definition of location in Flink, e.g. TaskManagerLocation which
> contains more information than hostname+port. If you think endpoint is too
> generic, how about locationEndpoint? But if we build that format logic into
> Location classes, it will look like
> TaskManagerLocation.getLocationEndpoint() with redundant "location".
> TaskManagerLocation.getEndpoint() is better.
> TaskManagerLocation.getLocation(),
> TaskManagerLocation.getLocationAsString(), or similar names in that
> direction are even worse.
>
> Best regards,
> Jing
>
> On Wed, Sep 13, 2023 at 2:52 PM Chen Zhanghao 
> mailto:zhanghao.c...@outlook.com>>
> wrote:
>
> > Hi Jing,
> >
> > Thanks for the suggestion. Endpoint is indeed a more professional word in
> > the networking world but I think location is more suited here for two
> > reasons:
> >
> >   1.  The term here is for uniquely identifying the TaskManager where the
> > task is deployed while providing the host machine info as well to help
> > identify taskmanager- and host-aggregative problems. So strictly speaking,
> > it is not used in a pure networking context.
> >   2.  The term "location" is already used widely in the codebase, e.g.
> > TaskManagerLocation and JobExceptions-related classes.
> >
> > WDYT?
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Jing Ge 
> > 发送时间: 2023年9月13日 4:52
> > 收件人: dev@flink.apache.org<mailto:dev@flink.apache.org> 
> > mailto:dev@flink.apache.org>>
> > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > Location in REST API and Web UI
> >
> > Hi Zhanghao,
> >
> > Thanks for bringing this to our attention. It is a good proposal to improve
> > data consistency.
> >
> > Speaking of naming conventions of choosing location over host, how about
> > "endpoint" with the following thoughts:
> >
> > 1. endpoint is a more professional word than location in the network
> > context.
> > 2. I know commonly endpoints mean the URLs of services. Using Hostname:port
> > as the endpoint follows exactly the same rule, because TaskManager is the
> > top level service that aligns with the top level endpoint.
> >
> > WDYT?
> >
> > Best regards,
> > Jing
> >
> >
> > On Mon, Sep 11, 2023 at 6:01 AM Weihua Hu 
> > mailto:huweihua@gmail.com>> wrote:
> >
> > > Hi, Zhanghao
> > >
> > > Since the meaning of "host" is not aligned, it seems good for me to
> > remove
> > > it in the future release.
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Sep 11, 2023 at 11:48 AM Chen Zhanghao <
> > zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>>
> > > wrote:
> > >
> > > > Hi Fan Rui,
> > > >
> > > > Thanks for clarifying the definition of "public interfaces", that
> > helps a
> > > > lot!
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > _

回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-18 Thread Chen Zhanghao
Hi Jing,

Thanks for the clarification, I now see the point. +1 for using endpoint now. 
@fan...@apache.org<mailto:fan...@apache.org> WDYT?

Best,
Zhanghao Chen

发件人: Yangze Guo 
发送时间: 2023年9月19日 11:18
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks for the clarification, Jing. I agree that using another term
like endpoint can help us to distinguish it from the existing concept
of "location". +1 for using the term endpoint and introducing
TaskManagerLocation.getEndpoint().

Best,
Yangze Guo

On Mon, Sep 18, 2023 at 11:52 PM Jing Ge  wrote:
>
> Hi Zhanghao,
>
> That is exactly the reason why location should not be used, because there
> is a clear definition of location in Flink, e.g. TaskManagerLocation which
> contains more information than hostname+port. If you think endpoint is too
> generic, how about locationEndpoint? But if we build that format logic into
> Location classes, it will look like
> TaskManagerLocation.getLocationEndpoint() with redundant "location".
> TaskManagerLocation.getEndpoint() is better.
> TaskManagerLocation.getLocation(),
> TaskManagerLocation.getLocationAsString(), or similar names in that
> direction are even worse.
>
> Best regards,
> Jing
>
> On Wed, Sep 13, 2023 at 2:52 PM Chen Zhanghao 
> wrote:
>
> > Hi Jing,
> >
> > Thanks for the suggestion. Endpoint is indeed a more professional word in
> > the networking world but I think location is more suited here for two
> > reasons:
> >
> >   1.  The term here is for uniquely identifying the TaskManager where the
> > task is deployed while providing the host machine info as well to help
> > identify taskmanager- and host-aggregative problems. So strictly speaking,
> > it is not used in a pure networking context.
> >   2.  The term "location" is already used widely in the codebase, e.g.
> > TaskManagerLocation and JobExceptions-related classes.
> >
> > WDYT?
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Jing Ge 
> > 发送时间: 2023年9月13日 4:52
> > 收件人: dev@flink.apache.org 
> > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > Location in REST API and Web UI
> >
> > Hi Zhanghao,
> >
> > Thanks for bringing this to our attention. It is a good proposal to improve
> > data consistency.
> >
> > Speaking of naming conventions of choosing location over host, how about
> > "endpoint" with the following thoughts:
> >
> > 1. endpoint is a more professional word than location in the network
> > context.
> > 2. I know commonly endpoints mean the URLs of services. Using Hostname:port
> > as the endpoint follows exactly the same rule, because TaskManager is the
> > top level service that aligns with the top level endpoint.
> >
> > WDYT?
> >
> > Best regards,
> > Jing
> >
> >
> > On Mon, Sep 11, 2023 at 6:01 AM Weihua Hu  wrote:
> >
> > > Hi, Zhanghao
> > >
> > > Since the meaning of "host" is not aligned, it seems good for me to
> > remove
> > > it in the future release.
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Sep 11, 2023 at 11:48 AM Chen Zhanghao <
> > zhanghao.c...@outlook.com>
> > > wrote:
> > >
> > > > Hi Fan Rui,
> > > >
> > > > Thanks for clarifying the definition of "public interfaces", that
> > helps a
> > > > lot!
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > 
> > > > 发件人: Rui Fan <1996fan...@gmail.com>
> > > > 发送时间: 2023年9月11日 11:18
> > > > 收件人: dev@flink.apache.org 
> > > > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > > > Location in REST API and Web UI
> > > >
> > > > Thanks Zhanghao driving this FLIP, adding the port in Web UI
> > > > seems good to me.
> > > >
> > > > Hi Shammon and Zhanghao,
> > > >
> > > > I would like to clarify the difference between Public Interfaces
> > > > in FLIP and @Public in code.
> > > >
> > > > As I understand, the `Public Interfaces in FLIP` means these
> > > > changes will be used in user side, such as: @Public class,
> > > > Configuration settings, User-facing scripts/command-line tools,
> > > > and rest api, e

回复: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-18 Thread Chen Zhanghao
Thanks for pointing that out. Let's give it a bit more time for reaching 
consensus on the naming issue and postpone the voting for now. Sorry for the 
inconvenience here. Will send another email once the voting restarts.

Best,
Zhanghao Chen

发件人: Rui Fan <1996fan...@gmail.com>
发送时间: 2023年9月18日 11:55
收件人: dev@flink.apache.org ; Jing Ge 
主题: Re: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

A gentle reminder about the location naming.
The naming of location is a little unclear, but
I can't think of any other better naming.

So I +1(binding) first.

Ping @Jing Ge  to help double check the name again.

Sorry for mentioning naming in the VOTE thread,
I didn't know this VOTE would be so early.

Best,
Rui

On Mon, Sep 18, 2023 at 11:44 AM Yangze Guo  wrote:

> +1 (binding)
>
> Best,
> Yangze Guo
>
> On Mon, Sep 18, 2023 at 11:37 AM Chen Zhanghao
>  wrote:
> >
> > Hi All,
> >
> > Thanks for all the feedback on FLIP-363: Unify the Representation of
> TaskManager Location in REST API and Web UI [1][2]
> >
> > I'd like to start a vote for FLIP-363. The vote will be open for at
> least 72 hours unless there is an objection or insufficient votes.
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > [2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd
> >
> > Best,
> > Zhanghao Chen
>


[VOTE] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-17 Thread Chen Zhanghao
Hi All,

Thanks for all the feedback on FLIP-363: Unify the Representation of 
TaskManager Location in REST API and Web UI [1][2]

I'd like to start a vote for FLIP-363. The vote will be open for at least 72 
hours unless there is an objection or insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
[2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd

Best,
Zhanghao Chen


回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-17 Thread Chen Zhanghao
Hi all,

I've updated the FLIP to incorporate Yangze's advice:

1. Add a new string formatter method to TaskManagerLocation and 
ArchivedTaskManagerLocation that prints in the form of "${hostname}:${port}" to 
align the string formatter used by REST API.
2. Highlight that the old host field will be kept for at least 2 minor versions 
before removal.

Best,
Zhanghao Chen

发件人: Yangze Guo 
发送时间: 2023年9月15日 17:26
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks for driving this, Zhanghao. +1 for the overall proposal.

Some cents from my side:

1. Since most of the rest api get the location from
TaskManagerLocation, we can align the string formatter in this class.
E.g. add something like toHumanRealableString to this class.

2. From my understanding of FLIP-321, if we want to deprecate the host
field, we should annotate the related field / getter / setter with
@Deprecated in this version, and keep them at least 2 minor releases.

Best,
Yangze Guo

On Wed, Sep 13, 2023 at 8:52 PM Chen Zhanghao  wrote:
>
> Hi Jing,
>
> Thanks for the suggestion. Endpoint is indeed a more professional word in the 
> networking world but I think location is more suited here for two reasons:
>
>   1.  The term here is for uniquely identifying the TaskManager where the 
> task is deployed while providing the host machine info as well to help 
> identify taskmanager- and host-aggregative problems. So strictly speaking, it 
> is not used in a pure networking context.
>   2.  The term "location" is already used widely in the codebase, e.g. 
> TaskManagerLocation and JobExceptions-related classes.
>
> WDYT?
>
> Best,
> Zhanghao Chen
> 
> 发件人: Jing Ge 
> 发送时间: 2023年9月13日 4:52
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location 
> in REST API and Web UI
>
> Hi Zhanghao,
>
> Thanks for bringing this to our attention. It is a good proposal to improve
> data consistency.
>
> Speaking of naming conventions of choosing location over host, how about
> "endpoint" with the following thoughts:
>
> 1. endpoint is a more professional word than location in the network
> context.
> 2. I know commonly endpoints mean the URLs of services. Using Hostname:port
> as the endpoint follows exactly the same rule, because TaskManager is the
> top level service that aligns with the top level endpoint.
>
> WDYT?
>
> Best regards,
> Jing
>
>
> On Mon, Sep 11, 2023 at 6:01 AM Weihua Hu  wrote:
>
> > Hi, Zhanghao
> >
> > Since the meaning of "host" is not aligned, it seems good for me to remove
> > it in the future release.
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, Sep 11, 2023 at 11:48 AM Chen Zhanghao 
> > wrote:
> >
> > > Hi Fan Rui,
> > >
> > > Thanks for clarifying the definition of "public interfaces", that helps a
> > > lot!
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Rui Fan <1996fan...@gmail.com>
> > > 发送时间: 2023年9月11日 11:18
> > > 收件人: dev@flink.apache.org 
> > > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > > Location in REST API and Web UI
> > >
> > > Thanks Zhanghao driving this FLIP, adding the port in Web UI
> > > seems good to me.
> > >
> > > Hi Shammon and Zhanghao,
> > >
> > > I would like to clarify the difference between Public Interfaces
> > > in FLIP and @Public in code.
> > >
> > > As I understand, the `Public Interfaces in FLIP` means these
> > > changes will be used in user side, such as: @Public class,
> > > Configuration settings, User-facing scripts/command-line tools,
> > > and rest api, etc.
> > >
> > > You can refer to  "What are the "public interfaces" of the project?"
> > > part in Flink Improvement Proposals doc[1].
> > >
> > > @Public class means the user will use this class directly, and
> > > these rest classes won't be depended on directly. So I think
> > > these classes related to rest don't need to be marked @Public.
> > >
> > > Please correct me if anything is wrong, thanks~
> > >
> > > [1]
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Sep 11, 

回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-15 Thread Chen Zhanghao
Hi Jane,

Thanks for the valuable suggestions.

For Q1, it's indeed an issue. Some possible ideas include introducing a fake 
transformation after the source that takes the global default parallelism, or 
simply make exec nodes to take the global default parallelism, but both ways 
prevent potential chaining opportunity and I'm not sure if that's good to go. 
We'll need to give deeper thoughts in it and polish our proposal. We're also 
more than glad to hear your inputs on it.

For Q2, scan.parallelism will take high precedence, as the more specific config 
should take higher precedence.

Best,
Zhanghao Chen

发件人: Jane Chan 
发送时间: 2023年9月15日 11:56
收件人: dev@flink.apache.org 
抄送: dewe...@outlook.com 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Hi, Zhanghao, Dewei,

Thanks for initiating this discussion. This feature is valuable in
providing more flexibility for performance tuning for SQL pipelines.

Here are my two cents,

1. In the FLIP, you mentioned concerns about the parallelism of the calc
node and concluded to "leave the behavior unchanged for now."  This means
that the calc node will use the parallelism of the source operator,
regardless of whether the source parallelism is configured or not. If I
understand correctly, currently, except for the sink exec node (which has
the ability to configure its own parallelism), the rest of the exec nodes
accept its input parallelism. From the design, I didn't see the details
about coping with input and default parallelism for the rest of the exec
nodes. Can you elaborate more about the details?

2. Does the configuration `table.exec.resource.default-parallelism` take
precedence over `scan.parallelism`?

Best,
Jane

On Fri, Sep 15, 2023 at 10:43 AM Yun Tang  wrote:

> Thanks for creating this FLIP,
>
> Many users have demands to configure the source parallelism just as
> configuring the sink parallelism via DDL. Look forward for this feature.
>
> BTW, I think setting parallelism for each operator should also be
> valuable. And this shall work with compiled plan [1] instead of SQL's DDL.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
>
> Best
> Yun Tang
> 
> From: Benchao Li 
> Sent: Thursday, September 14, 2023 19:53
> To: dev@flink.apache.org 
> Cc: dewe...@outlook.com 
> Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Thanks Zhanghao, Dewei for preparing the FLIP,
>
> I think this is a long awaited feature, and I appreciate your effort,
> especially the "Other concerns" part you listed.
>
> Regarding the parallelism of transformations following the source
> transformation, it's indeed a problem that we initially want to solve
> when we introduced this feature internally. I'd like to hear more
> opinions on this. Personally I'm ok to leave it out of this FLIP for
> the time being.
>
> Chen Zhanghao  于2023年9月14日周四 14:46写道:
> >
> > Hi Devs,
> >
> > Dewei (cced) and I would like to start a discussion on FLIP-367: Support
> Setting Parallelism for Table/SQL Sources [1].
> >
> > Currently, Flink Table/SQL jobs do not expose fine-grained control of
> operator parallelism to users. FLIP-146 [2] brings us support for setting
> parallelism for sinks, but except for that, one can only set a default
> global parallelism and all other operators share the same parallelism.
> However, in many cases, setting parallelism for sources individually is
> preferable:
> >
> > - Many connectors have an upper bound parallelism to efficiently ingest
> data. For example, the parallelism of a Kafka source is bound by the number
> of partitions, any extra tasks would be idle.
> > - Other operators may involve intensive computation and need a larger
> parallelism.
> >
> > We propose to improve the current situation by extending the current
> table source API to support setting parallelism for Table/SQL sources via
> connector options.
> >
> > Looking forward to your feedback.
> >
> > [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache
> Flink - Apache Software Foundation<
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> >
> > [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache
> Flink - Apache Software Foundation<
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> >
> >
> > Best,
> > Zhanghao Chen
>
>
>
> --
>
> Best,
> Benchao Li
>


回复: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-15 Thread Chen Zhanghao
Hi Yun,

Thanks for the input. Agree with that setting parallelism for each operator is 
also valuable, and compiled plan should work. However, this would significantly 
increase the complexity. I'd rather make the FLIP focused and leave it for 
future work. WDYT?

Best,
Zhanghao Chen

发件人: Yun Tang 
发送时间: 2023年9月15日 10:32
收件人: dev@flink.apache.org 
抄送: dewe...@outlook.com 
主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Thanks for creating this FLIP,

Many users have demands to configure the source parallelism just as configuring 
the sink parallelism via DDL. Look forward for this feature.

BTW, I think setting parallelism for each operator should also be valuable. And 
this shall work with compiled plan [1] instead of SQL's DDL.


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration

Best
Yun Tang

From: Benchao Li 
Sent: Thursday, September 14, 2023 19:53
To: dev@flink.apache.org 
Cc: dewe...@outlook.com 
Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL 
Sources

Thanks Zhanghao, Dewei for preparing the FLIP,

I think this is a long awaited feature, and I appreciate your effort,
especially the "Other concerns" part you listed.

Regarding the parallelism of transformations following the source
transformation, it's indeed a problem that we initially want to solve
when we introduced this feature internally. I'd like to hear more
opinions on this. Personally I'm ok to leave it out of this FLIP for
the time being.

Chen Zhanghao  于2023年9月14日周四 14:46写道:
>
> Hi Devs,
>
> Dewei (cced) and I would like to start a discussion on FLIP-367: Support 
> Setting Parallelism for Table/SQL Sources [1].
>
> Currently, Flink Table/SQL jobs do not expose fine-grained control of 
> operator parallelism to users. FLIP-146 [2] brings us support for setting 
> parallelism for sinks, but except for that, one can only set a default global 
> parallelism and all other operators share the same parallelism. However, in 
> many cases, setting parallelism for sources individually is preferable:
>
> - Many connectors have an upper bound parallelism to efficiently ingest data. 
> For example, the parallelism of a Kafka source is bound by the number of 
> partitions, any extra tasks would be idle.
> - Other operators may involve intensive computation and need a larger 
> parallelism.
>
> We propose to improve the current situation by extending the current table 
> source API to support setting parallelism for Table/SQL sources via connector 
> options.
>
> Looking forward to your feedback.
>
> [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache 
> Flink - Apache Software 
> Foundation<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150>
> [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache Flink 
> - Apache Software 
> Foundation<https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces>
>
> Best,
> Zhanghao Chen



--

Best,
Benchao Li


回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-15 Thread Chen Zhanghao
Hi Yangze,

Thanks for the input, totally agree with you, will take the advice.

Best,
Zhanghao Chen

发件人: Yangze Guo 
发送时间: 2023年9月15日 17:26
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks for driving this, Zhanghao. +1 for the overall proposal.

Some cents from my side:

1. Since most of the rest api get the location from
TaskManagerLocation, we can align the string formatter in this class.
E.g. add something like toHumanRealableString to this class.

2. From my understanding of FLIP-321, if we want to deprecate the host
field, we should annotate the related field / getter / setter with
@Deprecated in this version, and keep them at least 2 minor releases.

Best,
Yangze Guo

On Wed, Sep 13, 2023 at 8:52 PM Chen Zhanghao  wrote:
>
> Hi Jing,
>
> Thanks for the suggestion. Endpoint is indeed a more professional word in the 
> networking world but I think location is more suited here for two reasons:
>
>   1.  The term here is for uniquely identifying the TaskManager where the 
> task is deployed while providing the host machine info as well to help 
> identify taskmanager- and host-aggregative problems. So strictly speaking, it 
> is not used in a pure networking context.
>   2.  The term "location" is already used widely in the codebase, e.g. 
> TaskManagerLocation and JobExceptions-related classes.
>
> WDYT?
>
> Best,
> Zhanghao Chen
> 
> 发件人: Jing Ge 
> 发送时间: 2023年9月13日 4:52
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location 
> in REST API and Web UI
>
> Hi Zhanghao,
>
> Thanks for bringing this to our attention. It is a good proposal to improve
> data consistency.
>
> Speaking of naming conventions of choosing location over host, how about
> "endpoint" with the following thoughts:
>
> 1. endpoint is a more professional word than location in the network
> context.
> 2. I know commonly endpoints mean the URLs of services. Using Hostname:port
> as the endpoint follows exactly the same rule, because TaskManager is the
> top level service that aligns with the top level endpoint.
>
> WDYT?
>
> Best regards,
> Jing
>
>
> On Mon, Sep 11, 2023 at 6:01 AM Weihua Hu  wrote:
>
> > Hi, Zhanghao
> >
> > Since the meaning of "host" is not aligned, it seems good for me to remove
> > it in the future release.
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, Sep 11, 2023 at 11:48 AM Chen Zhanghao 
> > wrote:
> >
> > > Hi Fan Rui,
> > >
> > > Thanks for clarifying the definition of "public interfaces", that helps a
> > > lot!
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Rui Fan <1996fan...@gmail.com>
> > > 发送时间: 2023年9月11日 11:18
> > > 收件人: dev@flink.apache.org 
> > > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > > Location in REST API and Web UI
> > >
> > > Thanks Zhanghao driving this FLIP, adding the port in Web UI
> > > seems good to me.
> > >
> > > Hi Shammon and Zhanghao,
> > >
> > > I would like to clarify the difference between Public Interfaces
> > > in FLIP and @Public in code.
> > >
> > > As I understand, the `Public Interfaces in FLIP` means these
> > > changes will be used in user side, such as: @Public class,
> > > Configuration settings, User-facing scripts/command-line tools,
> > > and rest api, etc.
> > >
> > > You can refer to  "What are the "public interfaces" of the project?"
> > > part in Flink Improvement Proposals doc[1].
> > >
> > > @Public class means the user will use this class directly, and
> > > these rest classes won't be depended on directly. So I think
> > > these classes related to rest don't need to be marked @Public.
> > >
> > > Please correct me if anything is wrong, thanks~
> > >
> > > [1]
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Sep 11, 2023 at 11:09 AM Weihua Hu 
> > wrote:
> > >
> > > > Hi, Zhanghao
> > > >
> > > > Thanks for bringing this proposal.
> > > >
> > > > I have a concern:
> > > >
> > > > I prefer to keep the "host" fie

[DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-13 Thread Chen Zhanghao
Hi Devs,

Dewei (cced) and I would like to start a discussion on FLIP-367: Support 
Setting Parallelism for Table/SQL Sources [1].

Currently, Flink Table/SQL jobs do not expose fine-grained control of operator 
parallelism to users. FLIP-146 [2] brings us support for setting parallelism 
for sinks, but except for that, one can only set a default global parallelism 
and all other operators share the same parallelism. However, in many cases, 
setting parallelism for sources individually is preferable:

- ​Many connectors have an upper bound parallelism to efficiently ingest data. 
For example, the parallelism of a Kafka source is bound by the number of 
partitions, any extra tasks would be idle.
- ​Other operators may involve intensive computation and need a larger 
parallelism.

We propose to improve the current situation by extending the current table 
source API to support setting parallelism for Table/SQL sources via connector 
options.

Looking forward to your feedback.

[1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache Flink 
- Apache Software 
Foundation
[2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache Flink - 
Apache Software 
Foundation

Best,
Zhanghao Chen


回复: [VOTE] FLIP-361: Improve GC Metrics

2023-09-13 Thread Chen Zhanghao
+1 (unbinding). Looking forward to it

Best,
Zhanghao Chen

发件人: Gyula Fóra 
发送时间: 2023年9月13日 21:16
收件人: dev 
主题: [VOTE] FLIP-361: Improve GC Metrics

Hi All!

Thanks for all the feedback on FLIP-361: Improve GC Metrics [1][2]

I'd like to start a vote for it. The vote will be open for at least 72
hours unless there is an objection or insufficient votes.

Cheers,
Gyula

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
[2] https://lists.apache.org/thread/qqqv54vyr4gbp63wm2d12q78m8h95xb2


回复: [VOTE] FLIP-334: Decoupling autoscaler and kubernetes and support the Standalone Autoscaler

2023-09-13 Thread Chen Zhanghao
Thanks for driving this. +1 (non-binding)

Best,
Zhanghao Chen

发件人: Rui Fan <1996fan...@gmail.com>
发送时间: 2023年9月13日 10:52
收件人: dev 
主题: [VOTE] FLIP-334: Decoupling autoscaler and kubernetes and support the 
Standalone Autoscaler

Hi all,

Thanks for all the feedback about the FLIP-334:
Decoupling autoscaler and kubernetes and
support the Standalone Autoscaler[1].
This FLIP was discussed in [2].

I'd like to start a vote for it. The vote will be open for at least 72
hours (until Sep 16th 11:00 UTC+8) unless there is an objection or
insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-334+%3A+Decoupling+autoscaler+and+kubernetes+and+support+the+Standalone+Autoscaler
[2] https://lists.apache.org/thread/kmm03gls1vw4x6vk1ypr9ny9q9522495

Best,
Rui


回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-13 Thread Chen Zhanghao
Hi Jing,

Thanks for the suggestion. Endpoint is indeed a more professional word in the 
networking world but I think location is more suited here for two reasons:

  1.  The term here is for uniquely identifying the TaskManager where the task 
is deployed while providing the host machine info as well to help identify 
taskmanager- and host-aggregative problems. So strictly speaking, it is not 
used in a pure networking context.
  2.  The term "location" is already used widely in the codebase, e.g. 
TaskManagerLocation and JobExceptions-related classes.

WDYT?

Best,
Zhanghao Chen

发件人: Jing Ge 
发送时间: 2023年9月13日 4:52
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Hi Zhanghao,

Thanks for bringing this to our attention. It is a good proposal to improve
data consistency.

Speaking of naming conventions of choosing location over host, how about
"endpoint" with the following thoughts:

1. endpoint is a more professional word than location in the network
context.
2. I know commonly endpoints mean the URLs of services. Using Hostname:port
as the endpoint follows exactly the same rule, because TaskManager is the
top level service that aligns with the top level endpoint.

WDYT?

Best regards,
Jing


On Mon, Sep 11, 2023 at 6:01 AM Weihua Hu  wrote:

> Hi, Zhanghao
>
> Since the meaning of "host" is not aligned, it seems good for me to remove
> it in the future release.
>
> Best,
> Weihua
>
>
> On Mon, Sep 11, 2023 at 11:48 AM Chen Zhanghao 
> wrote:
>
> > Hi Fan Rui,
> >
> > Thanks for clarifying the definition of "public interfaces", that helps a
> > lot!
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Rui Fan <1996fan...@gmail.com>
> > 发送时间: 2023年9月11日 11:18
> > 收件人: dev@flink.apache.org 
> > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > Location in REST API and Web UI
> >
> > Thanks Zhanghao driving this FLIP, adding the port in Web UI
> > seems good to me.
> >
> > Hi Shammon and Zhanghao,
> >
> > I would like to clarify the difference between Public Interfaces
> > in FLIP and @Public in code.
> >
> > As I understand, the `Public Interfaces in FLIP` means these
> > changes will be used in user side, such as: @Public class,
> > Configuration settings, User-facing scripts/command-line tools,
> > and rest api, etc.
> >
> > You can refer to  "What are the "public interfaces" of the project?"
> > part in Flink Improvement Proposals doc[1].
> >
> > @Public class means the user will use this class directly, and
> > these rest classes won't be depended on directly. So I think
> > these classes related to rest don't need to be marked @Public.
> >
> > Please correct me if anything is wrong, thanks~
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> > Best,
> > Rui
> >
> > On Mon, Sep 11, 2023 at 11:09 AM Weihua Hu 
> wrote:
> >
> > > Hi, Zhanghao
> > >
> > > Thanks for bringing this proposal.
> > >
> > > I have a concern:
> > >
> > > I prefer to keep the "host" field and add a "location" field in future
> > > versions.
> > > Consider a scenario where a machine (host) with multiple TaskManagers
> has
> > > poor processing performance due to some problems.
> > > By using a host field aggregation, I can identify the problems with
> this
> > > machine and take it offline.
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Sep 11, 2023 at 10:34 AM Chen Zhanghao <
> > zhanghao.c...@outlook.com>
> > > wrote:
> > >
> > > > Hi Shammon,
> > > >
> > > > I think all REST API response messages (e.g.
> > > > SubtaskExecutionAttemptDetailsInfo) should be considered as part of
> the
> > > > public APIs and therefore be marked as @Public. It is true though
> none
> > of
> > > > them are marked as @public yet. Maybe we should do that. ccing
> > > > @chesnay<mailto:ches...@apache.org> for confirmation.
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > 
> > > > 发件人: Shammon FY 
> > > > 发送时间: 2023年9月11日 10:22
> > > > 收件人: dev@flink.apache.org 
> > > > 主题: Re: [DISCUSS] FLIP-363: Unify the R

回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-10 Thread Chen Zhanghao
Hi Fan Rui,

Thanks for clarifying the definition of "public interfaces", that helps a lot!

Best,
Zhanghao Chen

发件人: Rui Fan <1996fan...@gmail.com>
发送时间: 2023年9月11日 11:18
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks Zhanghao driving this FLIP, adding the port in Web UI
seems good to me.

Hi Shammon and Zhanghao,

I would like to clarify the difference between Public Interfaces
in FLIP and @Public in code.

As I understand, the `Public Interfaces in FLIP` means these
changes will be used in user side, such as: @Public class,
Configuration settings, User-facing scripts/command-line tools,
and rest api, etc.

You can refer to  "What are the "public interfaces" of the project?"
part in Flink Improvement Proposals doc[1].

@Public class means the user will use this class directly, and
these rest classes won't be depended on directly. So I think
these classes related to rest don't need to be marked @Public.

Please correct me if anything is wrong, thanks~

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best,
Rui

On Mon, Sep 11, 2023 at 11:09 AM Weihua Hu  wrote:

> Hi, Zhanghao
>
> Thanks for bringing this proposal.
>
> I have a concern:
>
> I prefer to keep the "host" field and add a "location" field in future
> versions.
> Consider a scenario where a machine (host) with multiple TaskManagers has
> poor processing performance due to some problems.
> By using a host field aggregation, I can identify the problems with this
> machine and take it offline.
>
> Best,
> Weihua
>
>
> On Mon, Sep 11, 2023 at 10:34 AM Chen Zhanghao 
> wrote:
>
> > Hi Shammon,
> >
> > I think all REST API response messages (e.g.
> > SubtaskExecutionAttemptDetailsInfo) should be considered as part of the
> > public APIs and therefore be marked as @Public. It is true though none of
> > them are marked as @public yet. Maybe we should do that. ccing
> > @chesnay<mailto:ches...@apache.org> for confirmation.
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Shammon FY 
> > 发送时间: 2023年9月11日 10:22
> > 收件人: dev@flink.apache.org 
> > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > Location in REST API and Web UI
> >
> > Thanks Zhanghao for initialing this discussion, I have just one comment:
> >
> > I checked the classes `SubtasksAllAccumulatorsHandler`,
> > `SubtasksTimesHandler`, `SubtaskCurrentAttemptDetailsHandler`,
> > `JobVertexTaskManagersHandler` and `JobExceptionsHandler` you mentioned
> in
> > `Public Interfaces` and they are not annotated as `Public`. So do you
> want
> > to annotate them as `Plublic`? If not, I think you may need to move them
> > from `Public Interfaces` to `Proposed Changes`.
> >
> > Best,
> > Shammon FY
> >
> > On Sat, Sep 9, 2023 at 12:11 PM Chen Zhanghao  >
> > wrote:
> >
> > > Hi Devs,
> > >
> > > I would like to start a discussion on FLIP-363: Unify the
> Representation
> > > of TaskManager Location in REST API and Web UI [1].
> > >
> > > The TaskManager location of subtasks is important for identifying
> > > TM-related problems. There are a number of places in REST API and Web
> UI
> > > where TaskManager location is returned/displayed.
> > >
> > > Problems:
> > >
> > >   *   Only hostname is provided to represent TaskManager location in
> some
> > > places (e.g. SubtaskCurrentAttemptDetailsHandler). However, in a
> > > containerized era, it is common to have multiple TMs on the same host,
> > and
> > > port info is crucial to distinguish different TMs.
> > >   *   Inconsistent naming of the field to represent TaskManager
> location:
> > > "host" is used in most places but "location" is also used in
> > > JobExceptions-related places.
> > >   *   Inconsistent semantics of the "host" field: The semantics of the
> > > host field are inconsistent, sometimes it denotes hostname only while
> in
> > > other times it denotes hostname + port (which is also inconsistent with
> > the
> > > name of "host").
> > >
> > > We propose to improve the current situation by:
> > >
> > >   *   Use a field named "location" that represents TaskManager location
> > in
> > > the form of "${hostname}:${port}" in a consistent manner across REST
> APIs
> > > and the front-end.
> > >   *   Rename the column name from "Host" to "Location" on the Web UI to
> > > reflect the change that both hostname and port are displayed.
> > >   *   Keep the old "host" fields untouched for compatibility. They can
> be
> > > removed in the next major version.
> > >
> > > Looking forward to your feedback.
> > >
> > > [1] FLIP-363: Unify the Representation of TaskManager Location in REST
> > API
> > > and Web UI - Apache Flink - Apache Software Foundation<
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > > >
> > >
> > > Best,
> > > Zhanghao Chen
> > >
> >
>


回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-10 Thread Chen Zhanghao
Hi Weihua,

Thanks for the suggestion on keeping the host field. However, I think the 
location field can serve the needs for finding host-aggregative tasks as well 
and I'm not sure if a separate host field is still needed. The fields in REST 
API are mainly used under two scenarios, UI & code processing using the REST 
API:

  1.  For direct use on UI. One can find host-aggregative tasks by sorting on 
the location field as well, tasks on the same tasks will be placed close to 
each other after sorting.
  2.  For code processing using the REST API, it is easy to extract the host 
info by simple parsing of the location field as well.

Looking forward to hearing other's thoughts on this well.

Finally, even if the consensus is that the host field is still needed, since 
the semantics of the host field is inconsistent at this point, we'll still 
leave the field as it is for now and change it to contain host info only in the 
next major version only.

Best,
Zhanghao Chen

发件人: Weihua Hu 
发送时间: 2023年9月11日 11:08
收件人: dev@flink.apache.org 
抄送: ches...@apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Hi, Zhanghao

Thanks for bringing this proposal.

I have a concern:

I prefer to keep the "host" field and add a "location" field in future
versions.
Consider a scenario where a machine (host) with multiple TaskManagers has
poor processing performance due to some problems.
By using a host field aggregation, I can identify the problems with this
machine and take it offline.

Best,
Weihua


On Mon, Sep 11, 2023 at 10:34 AM Chen Zhanghao 
wrote:

> Hi Shammon,
>
> I think all REST API response messages (e.g.
> SubtaskExecutionAttemptDetailsInfo) should be considered as part of the
> public APIs and therefore be marked as @Public. It is true though none of
> them are marked as @public yet. Maybe we should do that. ccing
> @chesnay<mailto:ches...@apache.org> for confirmation.
>
> Best,
> Zhanghao Chen
> 
> 发件人: Shammon FY 
> 发送时间: 2023年9月11日 10:22
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> Location in REST API and Web UI
>
> Thanks Zhanghao for initialing this discussion, I have just one comment:
>
> I checked the classes `SubtasksAllAccumulatorsHandler`,
> `SubtasksTimesHandler`, `SubtaskCurrentAttemptDetailsHandler`,
> `JobVertexTaskManagersHandler` and `JobExceptionsHandler` you mentioned in
> `Public Interfaces` and they are not annotated as `Public`. So do you want
> to annotate them as `Plublic`? If not, I think you may need to move them
> from `Public Interfaces` to `Proposed Changes`.
>
> Best,
> Shammon FY
>
> On Sat, Sep 9, 2023 at 12:11 PM Chen Zhanghao 
> wrote:
>
> > Hi Devs,
> >
> > I would like to start a discussion on FLIP-363: Unify the Representation
> > of TaskManager Location in REST API and Web UI [1].
> >
> > The TaskManager location of subtasks is important for identifying
> > TM-related problems. There are a number of places in REST API and Web UI
> > where TaskManager location is returned/displayed.
> >
> > Problems:
> >
> >   *   Only hostname is provided to represent TaskManager location in some
> > places (e.g. SubtaskCurrentAttemptDetailsHandler). However, in a
> > containerized era, it is common to have multiple TMs on the same host,
> and
> > port info is crucial to distinguish different TMs.
> >   *   Inconsistent naming of the field to represent TaskManager location:
> > "host" is used in most places but "location" is also used in
> > JobExceptions-related places.
> >   *   Inconsistent semantics of the "host" field: The semantics of the
> > host field are inconsistent, sometimes it denotes hostname only while in
> > other times it denotes hostname + port (which is also inconsistent with
> the
> > name of "host").
> >
> > We propose to improve the current situation by:
> >
> >   *   Use a field named "location" that represents TaskManager location
> in
> > the form of "${hostname}:${port}" in a consistent manner across REST APIs
> > and the front-end.
> >   *   Rename the column name from "Host" to "Location" on the Web UI to
> > reflect the change that both hostname and port are displayed.
> >   *   Keep the old "host" fields untouched for compatibility. They can be
> > removed in the next major version.
> >
> > Looking forward to your feedback.
> >
> > [1] FLIP-363: Unify the Representation of TaskManager Location in REST
> API
> > and Web UI - Apache Flink - Apache Software Foundation<
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > >
> >
> > Best,
> > Zhanghao Chen
> >
>


回复: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-10 Thread Chen Zhanghao
Hi Shammon,

I think all REST API response messages (e.g. 
SubtaskExecutionAttemptDetailsInfo) should be considered as part of the public 
APIs and therefore be marked as @Public. It is true though none of them are 
marked as @public yet. Maybe we should do that. ccing 
@chesnay<mailto:ches...@apache.org> for confirmation.

Best,
Zhanghao Chen

发件人: Shammon FY 
发送时间: 2023年9月11日 10:22
收件人: dev@flink.apache.org 
主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in 
REST API and Web UI

Thanks Zhanghao for initialing this discussion, I have just one comment:

I checked the classes `SubtasksAllAccumulatorsHandler`,
`SubtasksTimesHandler`, `SubtaskCurrentAttemptDetailsHandler`,
`JobVertexTaskManagersHandler` and `JobExceptionsHandler` you mentioned in
`Public Interfaces` and they are not annotated as `Public`. So do you want
to annotate them as `Plublic`? If not, I think you may need to move them
from `Public Interfaces` to `Proposed Changes`.

Best,
Shammon FY

On Sat, Sep 9, 2023 at 12:11 PM Chen Zhanghao 
wrote:

> Hi Devs,
>
> I would like to start a discussion on FLIP-363: Unify the Representation
> of TaskManager Location in REST API and Web UI [1].
>
> The TaskManager location of subtasks is important for identifying
> TM-related problems. There are a number of places in REST API and Web UI
> where TaskManager location is returned/displayed.
>
> Problems:
>
>   *   Only hostname is provided to represent TaskManager location in some
> places (e.g. SubtaskCurrentAttemptDetailsHandler). However, in a
> containerized era, it is common to have multiple TMs on the same host, and
> port info is crucial to distinguish different TMs.
>   *   Inconsistent naming of the field to represent TaskManager location:
> "host" is used in most places but "location" is also used in
> JobExceptions-related places.
>   *   Inconsistent semantics of the "host" field: The semantics of the
> host field are inconsistent, sometimes it denotes hostname only while in
> other times it denotes hostname + port (which is also inconsistent with the
> name of "host").
>
> We propose to improve the current situation by:
>
>   *   Use a field named "location" that represents TaskManager location in
> the form of "${hostname}:${port}" in a consistent manner across REST APIs
> and the front-end.
>   *   Rename the column name from "Host" to "Location" on the Web UI to
> reflect the change that both hostname and port are displayed.
>   *   Keep the old "host" fields untouched for compatibility. They can be
> removed in the next major version.
>
> Looking forward to your feedback.
>
> [1] FLIP-363: Unify the Representation of TaskManager Location in REST API
> and Web UI - Apache Flink - Apache Software Foundation<
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> >
>
> Best,
> Zhanghao Chen
>


[DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-08 Thread Chen Zhanghao
Hi Devs,

I would like to start a discussion on FLIP-363: Unify the Representation of 
TaskManager Location in REST API and Web UI [1].

The TaskManager location of subtasks is important for identifying TM-related 
problems. There are a number of places in REST API and Web UI where TaskManager 
location is returned/displayed.

Problems:

  *   Only hostname is provided to represent TaskManager location in some 
places (e.g. SubtaskCurrentAttemptDetailsHandler). However, in a containerized 
era, it is common to have multiple TMs on the same host, and port info is 
crucial to distinguish different TMs.
  *   Inconsistent naming of the field to represent TaskManager location: 
"host" is used in most places but "location" is also used in 
JobExceptions-related places.
  *   Inconsistent semantics of the "host" field: The semantics of the host 
field are inconsistent, sometimes it denotes hostname only while in other times 
it denotes hostname + port (which is also inconsistent with the name of "host").

We propose to improve the current situation by:

  *   Use a field named "location" that represents TaskManager location in the 
form of "${hostname}:${port}" in a consistent manner across REST APIs and the 
front-end.
  *   Rename the column name from "Host" to "Location" on the Web UI to reflect 
the change that both hostname and port are displayed.
  *   Keep the old "host" fields untouched for compatibility. They can be 
removed in the next major version.

Looking forward to your feedback.

[1] FLIP-363: Unify the Representation of TaskManager Location in REST API and 
Web UI - Apache Flink - Apache Software 
Foundation

Best,
Zhanghao Chen


回复: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Chen Zhanghao
+1 for the proposal. A side question: how will we handle a major Flink version 
given that Flink 2.0 is around the corner.

Best,
Zhanghao Chen

发件人: Gyula Fóra 
发送时间: 2023年9月5日 20:12
收件人: dev 
抄送: Maximilian Michels ; Thomas Weise ; 
Márton Balassi ; morh...@apache.org 
主题: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

Hi All!

@Maximilian Michels  has raised the question of Flink
version support in the operator before the last release. I would like to
open this discussion publicly so we can finalize this before the next
release.

Background:
Currently the Flink Operator supports all Flink versions since Flink 1.13.
While this is great for the users, it introduces a lot of backward
compatibility related code in the operator logic and also adds considerable
time to the CI. We should strike a reasonable balance here that allows us
to move forward and eliminate some of this tech debt.

In the current model it is also impossible to support all features for all
Flink versions which leads to some confusion over time.

Proposal:
Since it's a key feature of the kubernetes operator to support several
versions at the same time, I propose to support the last 4 stable Flink
minor versions. Currently this would mean to support Flink 1.14-1.17 (and
drop 1.13 support). When Flink 1.18 is released we would drop 1.14 support
and so on. Given the Flink release cadence this means about 2 year support
for each Flink version.

What do you think?

Cheers,
Gyula


[REQUEST] Edit Permissions for FLIP

2023-09-05 Thread Chen Zhanghao
Hi folks,

I'm writing to request the edit permission for FLIP. My Confluence Wiki ID is: 
zhanghao.chen. I've recently reported two JIRA issues and was reminded of the 
need to create a FLIP for each of them as they would change the public API:

  1.  [FLINK-25371] Include data port as part of the host info for subtask 
detail panel on Web UI - ASF JIRA 
(apache.org). During code 
review with Fan Rui and Weihua, we think it better to align the inconsistent 
usage of the host field in various REST APIs (some only contains host name, 
some contains host + port), and would therefore need to add a new field that 
consistently holds both hostname and port info while keeping the old host field 
untouched.

  2.  [FLINK-32872] Add option to control the default partitioner when the 
parallelism of upstream and downstream operator does not match - ASF JIRA 
(apache.org), which intends 
to add a new configuration.

Thanks for your attention, much appreciated in advance.

Best,
Zhanghao Chen


回复: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-03 Thread Chen Zhanghao
Congratulations, Weihua!

Best,
Zhanghao Chen

发件人: Xintong Song 
发送时间: 2023年8月4日 11:18
收件人: dev 
抄送: Weihua Hu 
主题: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

Hi everyone,

On behalf of the PMC, I'm very happy to announce Weihua Hu as a new Flink
Committer!

Weihua has been consistently contributing to the project since May 2022. He
mainly works in Flink's distributed coordination areas. He is the main
contributor of FLIP-298 and many other improvements in large-scale job
scheduling and improvements. He is also quite active in mailing lists,
participating discussions and answering user questions.

Please join me in congratulating Weihua!

Best,

Xintong (on behalf of the Apache Flink PMC)


回复: Scaling Flink Jobs without Restarting Job

2023-07-23 Thread Chen Zhanghao
Hi Talat,

In reactive mode, rescaling is performed by a whole-graph failover, which is 
already less costly compared to a full job restart where all containers need to 
be requested again. For simple stateless jobs, this usually won't take long (a 
few seconds), you can measure how long it takes for all tasks returning to 
RUNNING status during a rescaling. Fluctuating traffic might be caused by 
re-consuming some data when recovering from a previous checkpoint. In this 
case, reducing the checkpoint interval will help.

As regards to partial failover for rescaling, it might be challenging. 
Rescaling stateless job will still involve redistribution of Kafka partitions 
(for Kafka sources for example) and requires some coordination works.

Best,
Zhanghao Chen

发件人: Talat Uyarer via dev 
发送时间: 2023年7月23日 15:28
收件人: dev 
主题: Scaling Flink Jobs without Restarting Job

HI,

We are using Flink with Adaptive Scheduler(Reactive Mode) on Kubernetes
with Standalone deployment Application mode for our streaming
infrastructure. Our autoscaler is scaling up or down our jobs. However,
each scale action causes a job restart.

Our customers complain about fluctuating traffic that we are sending. Is
there any way to reschedule tasks and calculate graphs without restarting
the whole job ? Or Reduce restart time ?

Job is set max parallelism 2x of maxWorker and we use GCS for checkpointing
storage. I know rescaling stateful jobs requires keygroups to be
redistributed. But we have stateless jobs also Such as reading from Kafka
and extracting data and writing a sink. If you can provide some entry
points we can start implementation support for those jobs.

Thanks


回复: [ANNOUNCE] New Apache Flink Committer - Yong Fang

2023-07-23 Thread Chen Zhanghao
Congrats, Shammon!

Best,
Zhanghao Chen

发件人: Weihua Hu 
发送时间: 2023年7月24日 11:11
收件人: dev@flink.apache.org 
抄送: Shammon FY 
主题: Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang

Congratulations!

Best,
Weihua


On Mon, Jul 24, 2023 at 11:04 AM Paul Lam  wrote:

> Congrats, Shammon!
>
> Best,
> Paul Lam
>
> > 2023年7月24日 10:56,Jingsong Li  写道:
> >
> > Shammon
>
>