Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-11 Thread Sergey Nuyanzin
+1 (binding)

that's a nice improvement
thanks for driving that

On Wed, Oct 11, 2023 at 8:42 AM Lincoln Lee  wrote:

> +1 (binding)
>
> Best,
> Lincoln Lee
>
>
> Leonard Xu  于2023年10月10日周二 09:57写道:
>
> > +1(binding)
> >
> > Best,
> > Leonard
> >
> > > On Oct 9, 2023, at 9:45 PM, Jing Ge 
> wrote:
> > >
> > > +1(binding)
> > >
> > > Best Regards,
> > > Jing
> > >
> > > On Mon, Oct 9, 2023 at 10:40 AM Ahmed Hamdy 
> > wrote:
> > >
> > >> +1 (non-binding)
> > >> Best Regards
> > >> Ahmed Hamdy
> > >>
> > >>
> > >> On Mon, 9 Oct 2023 at 09:38, xiangyu feng 
> wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> Feng Jin  于2023年10月9日周一 16:00写道:
> > >>>
> >  +1 (non-binding)
> > 
> >  Best,
> >  Feng
> > 
> >  On Mon, Oct 9, 2023 at 3:12 PM Yangze Guo 
> wrote:
> > 
> > > +1 (binding)
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Oct 9, 2023 at 2:46 PM Yun Tang  wrote:
> > >>
> > >> +1 (binding)
> > >>
> > >> Best
> > >> Yun Tang
> > >> 
> > >> From: Weihua Hu 
> > >> Sent: Monday, October 9, 2023 12:03
> > >> To: dev@flink.apache.org 
> > >> Subject: Re: [VOTE] FLIP-367: Support Setting Parallelism for
> > >>> Table/SQL
> > > Sources
> > >>
> > >> +1 (binding)
> > >>
> > >> Best,
> > >> Weihua
> > >>
> > >>
> > >> On Mon, Oct 9, 2023 at 11:47 AM Shammon FY 
> > >>> wrote:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>>
> > >>> On Mon, Oct 9, 2023 at 11:12 AM Benchao Li  > >>>
> > > wrote:
> > >>>
> >  +1 (binding)
> > 
> >  Zhanghao Chen  于2023年10月9日周一
> > >> 10:20写道:
> > >
> > > Hi All,
> > >
> > > Thanks for all the feedback on FLIP-367: Support Setting
> > > Parallelism
> > >>> for
> >  Table/SQL Sources [1][2].
> > >
> > > I'd like to start a vote for FLIP-367. The vote will be open
> >  until
> > > Oct
> >  12th 12:00 PM GMT) unless there is an objection or insufficient
> > > votes.
> > >
> > > [1]
> > 
> > >>>
> > >
> > 
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > > [2]
> > > https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
> > >
> > > Best,
> > > Zhanghao Chen
> > 
> > 
> > 
> >  --
> > 
> >  Best,
> >  Benchao Li
> > 
> > >>>
> > >
> > 
> > >>>
> > >>
> >
> >
>


-- 
Best regards,
Sergey


dev@flink.apache.org

2023-10-11 Thread Yuepeng Pan
+1(non-binding)
Thanks for your driving the voting thread.

Best Regards.
Yuepeng Pan

On 2023/10/06 16:33:40 Joao Boto wrote:
> Hi all, Thank you to everyone for the feedback on FLIP-239[1]. Based on the
> discussion thread [2] and some offline discussions, we have come to a
> consensus on the design and are ready to take a vote to contribute this to
> Flink. I'd like to start a vote for it. The vote will be open for at least
> 72 hours(excluding weekends, unless there is an objection or an
> insufficient number of votes. [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
> [2]https://lists.apache.org/thread/yx833h5h3fjlyor0bfm32chy3sjw8hwt  Best,
> Joao Boto
> 


Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-11 Thread Martijn Visser
+1 (binding)

On Wed, Oct 11, 2023 at 9:15 AM Sergey Nuyanzin  wrote:
>
> +1 (binding)
>
> that's a nice improvement
> thanks for driving that
>
> On Wed, Oct 11, 2023 at 8:42 AM Lincoln Lee  wrote:
>
> > +1 (binding)
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Leonard Xu  于2023年10月10日周二 09:57写道:
> >
> > > +1(binding)
> > >
> > > Best,
> > > Leonard
> > >
> > > > On Oct 9, 2023, at 9:45 PM, Jing Ge 
> > wrote:
> > > >
> > > > +1(binding)
> > > >
> > > > Best Regards,
> > > > Jing
> > > >
> > > > On Mon, Oct 9, 2023 at 10:40 AM Ahmed Hamdy 
> > > wrote:
> > > >
> > > >> +1 (non-binding)
> > > >> Best Regards
> > > >> Ahmed Hamdy
> > > >>
> > > >>
> > > >> On Mon, 9 Oct 2023 at 09:38, xiangyu feng 
> > wrote:
> > > >>
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> Feng Jin  于2023年10月9日周一 16:00写道:
> > > >>>
> > >  +1 (non-binding)
> > > 
> > >  Best,
> > >  Feng
> > > 
> > >  On Mon, Oct 9, 2023 at 3:12 PM Yangze Guo 
> > wrote:
> > > 
> > > > +1 (binding)
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Oct 9, 2023 at 2:46 PM Yun Tang  wrote:
> > > >>
> > > >> +1 (binding)
> > > >>
> > > >> Best
> > > >> Yun Tang
> > > >> 
> > > >> From: Weihua Hu 
> > > >> Sent: Monday, October 9, 2023 12:03
> > > >> To: dev@flink.apache.org 
> > > >> Subject: Re: [VOTE] FLIP-367: Support Setting Parallelism for
> > > >>> Table/SQL
> > > > Sources
> > > >>
> > > >> +1 (binding)
> > > >>
> > > >> Best,
> > > >> Weihua
> > > >>
> > > >>
> > > >> On Mon, Oct 9, 2023 at 11:47 AM Shammon FY 
> > > >>> wrote:
> > > >>
> > > >>> +1 (binding)
> > > >>>
> > > >>>
> > > >>> On Mon, Oct 9, 2023 at 11:12 AM Benchao Li  > > >>>
> > > > wrote:
> > > >>>
> > >  +1 (binding)
> > > 
> > >  Zhanghao Chen  于2023年10月9日周一
> > > >> 10:20写道:
> > > >
> > > > Hi All,
> > > >
> > > > Thanks for all the feedback on FLIP-367: Support Setting
> > > > Parallelism
> > > >>> for
> > >  Table/SQL Sources [1][2].
> > > >
> > > > I'd like to start a vote for FLIP-367. The vote will be open
> > >  until
> > > > Oct
> > >  12th 12:00 PM GMT) unless there is an objection or insufficient
> > > > votes.
> > > >
> > > > [1]
> > > 
> > > >>>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > > > [2]
> > > > https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > 
> > > 
> > > 
> > >  --
> > > 
> > >  Best,
> > >  Benchao Li
> > > 
> > > >>>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > >
> >
>
>
> --
> Best regards,
> Sergey


[jira] [Created] (FLINK-33237) Optimize the data type of JobStatus#state from String to Enum

2023-10-11 Thread Rui Fan (Jira)
Rui Fan created FLINK-33237:
---

 Summary: Optimize the data type of JobStatus#state from String to 
Enum
 Key: FLINK-33237
 URL: https://issues.apache.org/jira/browse/FLINK-33237
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Rui Fan
Assignee: Rui Fan


As discuss at comment[1], the type of 
{{org.apache.flink.kubernetes.operator.api.status.JobStatus#state}} be changed 
from {{String}} to {{org.apache.flink.api.common.JobStatus.}}

 

[1] 
https://github.com/apache/flink-kubernetes-operator/pull/677#discussion_r1354340358

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: FW: RE: [DISCUSS] FLIP-368 Reorganize the exceptions thrown in state interfaces

2023-10-11 Thread David Radley
Hi Zakelly,
Thanks for making this clear for me.  We should document the impact on the user 
in the release notes, which will be a minimal rewrite and recompile of any java 
using the old APIs.
I think it is a good point you make about if there are future implementations 
that are
worth retrying (such as network access) – then there could be retries. I agree 
we should not be trying to create code now for an implementation consideration 
that is not there yet,

+1 from me ,
 Kind regards, David.

From: Zakelly Lan 
Date: Wednesday, 11 October 2023 at 04:25
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-368 Reorganize the exceptions 
thrown in state interfaces
Hi David,

Thanks for your response.

The exceptions thrown by state interfaces are NOT retriable. For
example, there may be some elements sent to the wrong subtask due to a
non-deterministic hashCode() algorithm and the key group is not
matching. Or the rocksdb may fail to read a file if it has been
deleted by the user. If there are future implementations that are
worth retrying (such as network access), it would be better to let the
implementation itself handle the retries and provide a configuration
for this, rather than requiring users to catch these exceptions.

Regarding the release and documentation, I have mentioned that this
change is targeted for version 1.19 with proper documentation. You may
have noticed that state interfaces are annotated with @PublicEvolving,
which means these interfaces may change across versions. The changes
are suitable for a minor release (1.18.0 currently to 1.19.0 in the
future) as defined by the API compatibility guarantees of Flink[1].



Best,
Zakelly


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#api-compatibility-guarantees

On Tue, Oct 10, 2023 at 6:19 PM David Radley  wrote:
>
> Hi,
> I notice 
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/ValueState.html
>   is an external API. I am concerned that this change will break existing 
> applications using the old interface, they are likely to have catches / 
> throws around the existing checked Exceptions.
>
> If we go with RunTimeException, I would suggest that this sort of breaking 
> change should be done on a Flink version change, where it is appropriate to 
> make breaking changes to the API with associated documentation.
>
> If we want this change on a minor release,  we could create a new class 
> ValueState2– that is used internally with the cleaned up Exceptions, but 
> still expose the old class and Exceptions for existing external applications. 
> I guess new applications could use the new ValueState2 .
>
> What do you think?
> Kind regards, David.
>
>
> From: David Radley 
> Date: Tuesday, 10 October 2023 at 09:49
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-368 Reorganize the exceptions thrown 
> in state interfaces
> Hi ,
> The argument seems to be that the errors cannot be acted on so should be 
> runtime exceptions. I want to confirm that none of these errors could / 
> should be retriable. If there is a possibility that the state is available at 
> some time later then I assume a checked retriable Exception would be 
> appropriate for those cases; and be part of the contract with the caller. Can 
> we be sure that there is no possibility that the state will become available; 
> if so then I agree that a runtime Exception is appropriate. What do you think?
>
>
>
> Kind regards, David.
>
>
> From: Zakelly Lan 
> Date: Monday, 9 October 2023 at 18:12
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-368 Reorganize the exceptions thrown 
> in state interfaces
> Hi everyone,
>
> It seems we're gradually reaching a consensus. So I would like to
> start a vote after 72 hours if there are no further discussions.
>
> Please let me know if you have any concerns, thanks!
>
>
> Best,
> Zakelly
>
>
> On Sat, Oct 7, 2023 at 4:07 PM Zakelly Lan  wrote:
> >
> > Hi Jing,
> >
> > Sorry for the late reply! I agree with you that we do not expect users
> > to do anything with Flink and we won't "bother" them with those
> > exceptions. However, users can still catch the `Throwable` and perform
> > any necessary logging activities, similar to how they use Java
> > Collection interfaces.
> >
> >
> > Thanks for your insights!
> >
> > Best,
> > Zakelly
> >
> > On Thu, Sep 21, 2023 at 8:43 PM Jing Ge  wrote:
> > >
> > > Fair enough! Thanks Zakelly for the information. Afaic, even users can do
> > > nothing with Flink, they still can do something in their territory, at
> > > least doing some logging and metrics stuff, or triggering some other
> > > services in their ecosystem. After all, the Flink jobs they build are part
> > > of their service component. It didn't change the fact that we are going to
> > > use the anti-pattern. Just because we didn't expect users to do
> > > anything with Flink, does not mean use

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-11 Thread xiangyu feng
Hi Yuepeng,

Thx for ur reply.

> Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
interface currently only includes a description of the number of tasks. So,
IIUC, If there is a need to further expand
> descriptions of other resource loads, we just extend it based on the
current interface and its implementations, right?

I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
currently it only supports comparing the load for one factor. If we want to
add more loading factors, LoadingWeight might need to add a 'LoadType'
field for distinction, WeightLoadable might need to return
Set.

I'm not sure I understand this correctly, WDYT?

Regards,
Xiangyu

Yuepeng Pan  于2023年10月11日周三 13:53写道:

> Hi, xiangyu,
> Thanks for your attention as well.
>
> >1, About the waiting mechanism: Will the waiting mechanism happen only in
> >the second level 'assigning slots to TM'? IIUC, the first level 'assigning
> >Tasks to Slots' needs only the asynchronous slot result from slotpool.
>
> As described in the latest FLIP, the introduction of the waiting mechanism
> at the second level is to ensure that, in all deployment modes such as
> application, session, etc., we do not fall into a local greedy state when
> selecting the optimal slot position. This requires obtaining a global
> resource view to get the best result.
> IIUC, The allocation process from Task to Slot is the generation of a
> mapping relationship between two abstract descriptions, and at this point,
> there is no coupling of information between tasks/slots and Task Managers
> (TMs).
>
>
> >2, About the slot LoadingWeight: it is reasonable to use the number of
> >tasks by default in the beginning, but it would be better if this could be
> >easily extended in future to distinguish between CPU-intensive and
> >IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> >others have CPU bottlenecks.
>
> Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
> interface currently only includes a description of the number of tasks. So,
> IIUC, If there is a need to further expand descriptions of other resource
> loads, we just extend it based on the current interface and its
> implementations, right?
> Please correct me if I have misunderstood. Thanks a lot~
>
> Best,
> Yuepeng.
>
> On 2023/10/06 10:19:21 xiangyu feng wrote:
> > Thanks Yuepeng and Rui for driving this Discussion.
> >
> > Internally when we try to use Flink 1.17.1 in production, we are also
> > suffering from the unbalanced task distribution problem for jobs with
> high
> > qps and complex dag. So +1 for the overall proposal.
> >
> > Some questions about the details:
> >
> > 1, About the waiting mechanism: Will the waiting mechanism happen only in
> > the second level 'assigning slots to TM'?  IIUC, the first level
> 'assigning
> > Tasks to Slots' needs only the asynchronous slot result from slotpool.
> >
> > 2, About the slot LoadingWeight: it is reasonable to use the number of
> > tasks by default in the beginning, but it would be better if this could
> be
> > easily extended in future to distinguish between CPU-intensive and
> > IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> > others have CPU bottlenecks.
> >
> > Regards,
> > Xiangyu
> >
> >
> > Yuepeng Pan  于2023年10月5日周四 18:34写道:
> >
> > > Hi, Zhu Zhu,
> > >
> > > Thanks for your feedback!
> > >
> > > > I think we can introduce a new config option
> > > > `taskmanager.load-balance.mode`,
> > > > which accepts "None"/"Slots"/"Tasks".
> `cluster.evenly-spread-out-slots`
> > > > can be superseded by the "Slots" mode and get deprecated. In the
> future
> > > > it can support more mode, e.g. "CpuCores", to work better for jobs
> with
> > > > fine-grained resources. The proposed config option
> > > > `slot.request.max-interval`
> > > > then can be renamed to
> > > `taskmanager.load-balance.request-stablizing-timeout`
> > > > to show its relation with the feature. The proposed
> > > `slot.sharing-strategy`
> > > > is not needed, because the configured "Tasks" mode will do the work.
> > >
> > > The new proposed configuration option sounds good to me.
> > >
> > > I have a small question, If we set our configuration value to 'Tasks,'
> it
> > > will initiate two processes: balancing the allocation of task
> quantities at
> > > the slot level and balancing the number of tasks across TaskManagers
> (TMs).
> > > Alternatively, if we configure it as 'Slots,' the system will employ
> the
> > > LocalPreferred allocation policy (which is the default) when assigning
> > > tasks to slots, and it will ensure that the number of slots used
> across TMs
> > > is balanced.
> > > Does  this configuration essentially combine a balanced selection
> strategy
> > > across two dimensions into fixed configuration items, right?
> > >
> > > I would appreciate it if you could correct me if I've made any errors.
> > >
> > > Best,
> > > Yuepeng.
> > >
> >
>


Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-11 Thread Samrat Deb
+1 (non binding)

On Wed, 11 Oct 2023 at 1:29 PM, Martijn Visser 
wrote:

> +1 (binding)
>
> On Wed, Oct 11, 2023 at 9:15 AM Sergey Nuyanzin 
> wrote:
> >
> > +1 (binding)
> >
> > that's a nice improvement
> > thanks for driving that
> >
> > On Wed, Oct 11, 2023 at 8:42 AM Lincoln Lee 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Leonard Xu  于2023年10月10日周二 09:57写道:
> > >
> > > > +1(binding)
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > On Oct 9, 2023, at 9:45 PM, Jing Ge 
> > > wrote:
> > > > >
> > > > > +1(binding)
> > > > >
> > > > > Best Regards,
> > > > > Jing
> > > > >
> > > > > On Mon, Oct 9, 2023 at 10:40 AM Ahmed Hamdy 
> > > > wrote:
> > > > >
> > > > >> +1 (non-binding)
> > > > >> Best Regards
> > > > >> Ahmed Hamdy
> > > > >>
> > > > >>
> > > > >> On Mon, 9 Oct 2023 at 09:38, xiangyu feng 
> > > wrote:
> > > > >>
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> Feng Jin  于2023年10月9日周一 16:00写道:
> > > > >>>
> > > >  +1 (non-binding)
> > > > 
> > > >  Best,
> > > >  Feng
> > > > 
> > > >  On Mon, Oct 9, 2023 at 3:12 PM Yangze Guo 
> > > wrote:
> > > > 
> > > > > +1 (binding)
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Mon, Oct 9, 2023 at 2:46 PM Yun Tang 
> wrote:
> > > > >>
> > > > >> +1 (binding)
> > > > >>
> > > > >> Best
> > > > >> Yun Tang
> > > > >> 
> > > > >> From: Weihua Hu 
> > > > >> Sent: Monday, October 9, 2023 12:03
> > > > >> To: dev@flink.apache.org 
> > > > >> Subject: Re: [VOTE] FLIP-367: Support Setting Parallelism for
> > > > >>> Table/SQL
> > > > > Sources
> > > > >>
> > > > >> +1 (binding)
> > > > >>
> > > > >> Best,
> > > > >> Weihua
> > > > >>
> > > > >>
> > > > >> On Mon, Oct 9, 2023 at 11:47 AM Shammon FY  >
> > > > >>> wrote:
> > > > >>
> > > > >>> +1 (binding)
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Oct 9, 2023 at 11:12 AM Benchao Li <
> libenc...@apache.org
> > > > >>>
> > > > > wrote:
> > > > >>>
> > > >  +1 (binding)
> > > > 
> > > >  Zhanghao Chen  于2023年10月9日周一
> > > > >> 10:20写道:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > Thanks for all the feedback on FLIP-367: Support Setting
> > > > > Parallelism
> > > > >>> for
> > > >  Table/SQL Sources [1][2].
> > > > >
> > > > > I'd like to start a vote for FLIP-367. The vote will be
> open
> > > >  until
> > > > > Oct
> > > >  12th 12:00 PM GMT) unless there is an objection or
> insufficient
> > > > > votes.
> > > > >
> > > > > [1]
> > > > 
> > > > >>>
> > > > >
> > > > 
> > > > >>>
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > > > > [2]
> > > > >
> https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > 
> > > > 
> > > > 
> > > >  --
> > > > 
> > > >  Best,
> > > >  Benchao Li
> > > > 
> > > > >>>
> > > > >
> > > > 
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
>


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-11 Thread Yuepeng Pan
Hi, xiangyu.
Thanks for your quick reply.

>interface currently only includes a description of the number of tasks. So,
>IIUC, If there is a need to further expand
>current interface and its implementations, right?

Yes, that's indeed the case.

>I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
>currently it only supports comparing the load for one factor. If we want to
>add more loading factors, LoadingWeight might need to add a 'LoadType'
>field for distinction, WeightLoadable might need to return
>Set.

Thank you for the clarification, I think I roughly understand your description:
In fact, regarding the specific implementation and extension of this 
LoadingWeight, we can extend it based on this interface and its implementation 
as mentioned above.
If making frequent changes to the interface and its implementation is really 
tiresome, we can also consider introducing a built-in collapsible Map or other 
type of attribute, like the SlotSharingGroup class in the 
org.apache.flink.api.common.operators package, to describe the specific 
collection of load values and types. This way, these loads are collapsed within 
the LoadingWeight's implementation and can be expanded when needed for use. Of 
course, we can also consider an implementation like the one you mentioned, 
introducing a method in WeightLoadable that returns a collection as the return 
type, so the load values are expanded at the calling site and then used. As I 
understand it, both approaches can achieve the goal.

Of course, I also look forward to hearing others' suggestions. If there are any 
mistakes in my statement, please correct me. 
Looking forward to your reply.

Best regards.
Yuepeng Pan

On 2023/10/11 08:44:51 xiangyu feng wrote:
> Hi Yuepeng,
> 
> Thx for ur reply.
> 
> > Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
> interface currently only includes a description of the number of tasks. So,
> IIUC, If there is a need to further expand
> > descriptions of other resource loads, we just extend it based on the
> current interface and its implementations, right?
> 
> I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
> currently it only supports comparing the load for one factor. If we want to
> add more loading factors, LoadingWeight might need to add a 'LoadType'
> field for distinction, WeightLoadable might need to return
> Set.
> 
> I'm not sure I understand this correctly, WDYT?
> 
> Regards,
> Xiangyu
> 
> Yuepeng Pan  于2023年10月11日周三 13:53写道:
> 
> > Hi, xiangyu,
> > Thanks for your attention as well.
> >
> > >1, About the waiting mechanism: Will the waiting mechanism happen only in
> > >the second level 'assigning slots to TM'? IIUC, the first level 'assigning
> > >Tasks to Slots' needs only the asynchronous slot result from slotpool.
> >
> > As described in the latest FLIP, the introduction of the waiting mechanism
> > at the second level is to ensure that, in all deployment modes such as
> > application, session, etc., we do not fall into a local greedy state when
> > selecting the optimal slot position. This requires obtaining a global
> > resource view to get the best result.
> > IIUC, The allocation process from Task to Slot is the generation of a
> > mapping relationship between two abstract descriptions, and at this point,
> > there is no coupling of information between tasks/slots and Task Managers
> > (TMs).
> >
> >
> > >2, About the slot LoadingWeight: it is reasonable to use the number of
> > >tasks by default in the beginning, but it would be better if this could be
> > >easily extended in future to distinguish between CPU-intensive and
> > >IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> > >others have CPU bottlenecks.
> >
> > Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
> > interface currently only includes a description of the number of tasks. So,
> > IIUC, If there is a need to further expand descriptions of other resource
> > loads, we just extend it based on the current interface and its
> > implementations, right?
> > Please correct me if I have misunderstood. Thanks a lot~
> >
> > Best,
> > Yuepeng.
> >
> > On 2023/10/06 10:19:21 xiangyu feng wrote:
> > > Thanks Yuepeng and Rui for driving this Discussion.
> > >
> > > Internally when we try to use Flink 1.17.1 in production, we are also
> > > suffering from the unbalanced task distribution problem for jobs with
> > high
> > > qps and complex dag. So +1 for the overall proposal.
> > >
> > > Some questions about the details:
> > >
> > > 1, About the waiting mechanism: Will the waiting mechanism happen only in
> > > the second level 'assigning slots to TM'?  IIUC, the first level
> > 'assigning
> > > Tasks to Slots' needs only the asynchronous slot result from slotpool.
> > >
> > > 2, About the slot LoadingWeight: it is reasonable to use the number of
> > > tasks by default in the beginning, but it would be better if this could
> > be

[jira] [Created] (FLINK-33238) Upgrade org.apache.avro:avro to 1.11.3 to mitigate CVE-2023-39410

2023-10-11 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-33238:
--

 Summary: Upgrade org.apache.avro:avro to 1.11.3 to mitigate 
CVE-2023-39410
 Key: FLINK-33238
 URL: https://issues.apache.org/jira/browse/FLINK-33238
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Kafka, Formats (JSON, Avro, Parquet, ORC, 
SequenceFile)
Reporter: Martijn Visser
Assignee: Martijn Visser


We should update AVRO to 1.11.3 to avoid false-positives on CVE-2023-39410




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-11 Thread xiangyu feng
Hi Yuepeng,

Thanks for your feedback. I agree with u, both approaches can achieve the
goal.
As long as we can easily extend the balancing strategy to consider more
than one factors without changing the interface, the solution is OK for me.

Regards,
Xiangyu

Yuepeng Pan  于2023年10月11日周三 17:38写道:

> Hi, xiangyu.
> Thanks for your quick reply.
>
> >interface currently only includes a description of the number of tasks.
> So,
> >IIUC, If there is a need to further expand
> >current interface and its implementations, right?
>
> Yes, that's indeed the case.
>
> >I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
> >currently it only supports comparing the load for one factor. If we want
> to
> >add more loading factors, LoadingWeight might need to add a 'LoadType'
> >field for distinction, WeightLoadable might need to return
> >Set.
>
> Thank you for the clarification, I think I roughly understand your
> description:
> In fact, regarding the specific implementation and extension of this
> LoadingWeight, we can extend it based on this interface and its
> implementation as mentioned above.
> If making frequent changes to the interface and its implementation is
> really tiresome, we can also consider introducing a built-in collapsible
> Map or other type of attribute, like the SlotSharingGroup class in the
> org.apache.flink.api.common.operators package, to describe the specific
> collection of load values and types. This way, these loads are collapsed
> within the LoadingWeight's implementation and can be expanded when needed
> for use. Of course, we can also consider an implementation like the one you
> mentioned, introducing a method in WeightLoadable that returns a collection
> as the return type, so the load values are expanded at the calling site and
> then used. As I understand it, both approaches can achieve the goal.
>
> Of course, I also look forward to hearing others' suggestions. If there
> are any mistakes in my statement, please correct me.
> Looking forward to your reply.
>
> Best regards.
> Yuepeng Pan
>
> On 2023/10/11 08:44:51 xiangyu feng wrote:
> > Hi Yuepeng,
> >
> > Thx for ur reply.
> >
> > > Nice feedback. In fact, as mentioned in the Google Doc, the
> LoadingWeight
> > interface currently only includes a description of the number of tasks.
> So,
> > IIUC, If there is a need to further expand
> > > descriptions of other resource loads, we just extend it based on the
> > current interface and its implementations, right?
> >
> > I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
> > currently it only supports comparing the load for one factor. If we want
> to
> > add more loading factors, LoadingWeight might need to add a 'LoadType'
> > field for distinction, WeightLoadable might need to return
> > Set.
> >
> > I'm not sure I understand this correctly, WDYT?
> >
> > Regards,
> > Xiangyu
> >
> > Yuepeng Pan  于2023年10月11日周三 13:53写道:
> >
> > > Hi, xiangyu,
> > > Thanks for your attention as well.
> > >
> > > >1, About the waiting mechanism: Will the waiting mechanism happen
> only in
> > > >the second level 'assigning slots to TM'? IIUC, the first level
> 'assigning
> > > >Tasks to Slots' needs only the asynchronous slot result from slotpool.
> > >
> > > As described in the latest FLIP, the introduction of the waiting
> mechanism
> > > at the second level is to ensure that, in all deployment modes such as
> > > application, session, etc., we do not fall into a local greedy state
> when
> > > selecting the optimal slot position. This requires obtaining a global
> > > resource view to get the best result.
> > > IIUC, The allocation process from Task to Slot is the generation of a
> > > mapping relationship between two abstract descriptions, and at this
> point,
> > > there is no coupling of information between tasks/slots and Task
> Managers
> > > (TMs).
> > >
> > >
> > > >2, About the slot LoadingWeight: it is reasonable to use the number of
> > > >tasks by default in the beginning, but it would be better if this
> could be
> > > >easily extended in future to distinguish between CPU-intensive and
> > > >IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> > > >others have CPU bottlenecks.
> > >
> > > Nice feedback. In fact, as mentioned in the Google Doc, the
> LoadingWeight
> > > interface currently only includes a description of the number of
> tasks. So,
> > > IIUC, If there is a need to further expand descriptions of other
> resource
> > > loads, we just extend it based on the current interface and its
> > > implementations, right?
> > > Please correct me if I have misunderstood. Thanks a lot~
> > >
> > > Best,
> > > Yuepeng.
> > >
> > > On 2023/10/06 10:19:21 xiangyu feng wrote:
> > > > Thanks Yuepeng and Rui for driving this Discussion.
> > > >
> > > > Internally when we try to use Flink 1.17.1 in production, we are also
> > > > suffering from the unbalanced task distribution problem for jobs with
> > > high
> > > > qps and c

[jira] [Created] (FLINK-33239) After enabling exactly-once in the Flink Kafka sink, the Kafka broker's memory keeps increasing, eventually causing the Kafka broker to crash.

2023-10-11 Thread Darcy Lin (Jira)
Darcy Lin created FLINK-33239:
-

 Summary: After enabling exactly-once in the Flink Kafka sink, the 
Kafka broker's memory keeps increasing, eventually causing the Kafka broker to 
crash.
 Key: FLINK-33239
 URL: https://issues.apache.org/jira/browse/FLINK-33239
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.17.1
 Environment: flink 1.17.1

kafka server 2.8.2
Reporter: Darcy Lin
 Attachments: image-2023-10-11-18-47-32-712.png

We are using Flink version 1.17.1 and Kafka server version 2.8.2. After 
enabling exactly-once, in order to allow downstream consumers to read data from 
Kafka as soon as possible, we set the checkpoint interval to 5 seconds. 
Approximately three days after writing to the Kafka cluster, the Kafka JVM's 
memory is exhausted. We printed the memory consumption and found that the main 
consumption is on the {{kafka.log.ProducerStateEntry}} object.

Currently, in the exactly-once Kafka sink, a new producer is created every time 
a checkpoint is executed. The {{kafka.log.ProducerStateEntry}} object seems to 
store the producer's state, so it keeps increasing. We'd like to ask: Is this 
normal? If it's normal, do we need to allocate a large amount of memory for our 
Kafka cluster? If it's not normal, how should we solve this problem?

!image-2023-10-11-18-47-32-712.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-11 Thread Zakelly Lan
Hi Jane,

The fine-grained TTL management is extremely useful for performance
tuning, so +1 for the idea. I have a minor suggestion: would it be
possible to provide a simple hint that allows the omission of the key?
For example, something like "SELECT /+ STATE_TTL('1h')/", which would
specify the TTL for all states in the 'SELECT' clause.

And I also share the same concern as Feng. I am wondering if we could
show the state ttl for operators in Flink UI.


Best,
Zakelly

On Wed, Oct 11, 2023 at 1:27 PM Feng Jin  wrote:
>
> Hi Jane,
>
> Thank you for providing this explanation.
>
> Another small question, since there is no exception thrown when the STATE
> hint is set incorrectly,
> should we somehow show that the TTL setting has taken effect?
> For instance, exhibiting the TTL option within the operator's description?
>
> Best,
> Feng
>
> On Tue, Oct 10, 2023 at 7:51 PM Xuyang  wrote:
>
> > Hi, Jane.
> >
> >
> > I think this syntax will be easier for users to set operator ttl. So big
> > +1. I left some minor comments here.
> >
> >
> > I notice that using STATE_TTL hints wrongly will not throw any exceptions.
> > But it seems that in the current join hint scenario,
> > if user uses an unknown table name as the chosen side, a validation
> > exception will be thrown.
> > Maybe we should distinguish which exceptions need to be thrown explicitly.
> >
> >
> >
> >
> > --
> >
> > Best!
> > Xuyang
> >
> >
> >
> >
> >
> > At 2023-10-10 18:23:55, "Jane Chan"  wrote:
> > >Hi Feng,
> > >
> > >Thank you for your valuable comments. The reason for not including the
> > >scenarios above is as follows:
> > >
> > >For <1>, the automatically inferred stateful operators are not easily
> > >expressible in SQL. This issue was discussed in FLIP-292, and besides
> > >ChangelogNormalize, SinkUpsertMateralizer also faces the same problem.
> > >
> > >For <2> and <3>, the challenge lies in internal implementation. During the
> > >default_rewrite phase, the row_number expression in LogicalProject is
> > >transformed into LogicalWindow by Calcite's
> > >CoreRules#PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW. However, CalcRelSplitter
> > >does not pass the hints as an input argument when creating LogicalWindow,
> > >resulting in the loss of the hint at this step. To support this, we may
> > >need to rewrite some optimization rules in Calcite, which could be a
> > >follow-up work if required.
> > >
> > >Best,
> > >Jane
> > >
> > >On Tue, Oct 10, 2023 at 1:40 AM Feng Jin  wrote:
> > >
> > >> Hi Jane,
> > >>
> > >> Thank you for proposing this FLIP.
> > >>
> > >> I believe that this FLIP will greatly enhance the flexibility of setting
> > >> state, and by setting different operators' TTL, it can also increase job
> > >> stability, especially in regular join scenarios.
> > >> The parameter design is very concise, big +1 for this, and it is also
> > >> relatively easy to use for users.
> > >>
> > >>
> > >> I have a small question: in the FLIP, it only mentions join and group.
> > >> Should we also consider other scenarios?
> > >>
> > >> 1. the auto generated deduplicate operator[1].
> > >> 2. the deduplicate query[2].
> > >> 3. the topN query[3].
> > >>
> > >> [1]
> > >>
> > >>
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-cdc-events-duplicate
> > >> [2]
> > >>
> > >>
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/deduplication/
> > >> [3]
> > >>
> > >>
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/topn/
> > >>
> > >>
> > >> Best,
> > >> Feng
> > >>
> > >> On Sun, Oct 8, 2023 at 5:53 PM Jane Chan  wrote:
> > >>
> > >> > Hi devs,
> > >> >
> > >> > I'd like to initiate a discussion on FLIP-373: Support Configuring
> > >> > Different State TTLs using SQL Hint [1]. This proposal is on top of
> > the
> > >> > FLIP-292 [2] to address typical scenarios with unambiguous semantics
> > and
> > >> > hint propagation.
> > >> >
> > >> > I'm looking forward to your opinions!
> > >> >
> > >> >
> > >> > [1]
> > >> >
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
> > >> > [2]
> > >> >
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
> > >> >
> > >> > Best,
> > >> > Jane
> > >> >
> > >>
> >


[jira] [Created] (FLINK-33240) Generate docs for deprecated options as well

2023-10-11 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-33240:
-

 Summary: Generate docs for deprecated options as well
 Key: FLINK-33240
 URL: https://issues.apache.org/jira/browse/FLINK-33240
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Zhanghao Chen
 Fix For: 1.19.0


Currently, Flink will skip doc generation for deprecated options (See 
{{{}ConfigOptionsDocGenerator#{}}}{{{}shouldBeDocumented{}}}). As a result, the 
deprecated options can no longer be found in the new version of Flink document. 
This might confuse users upgrading from an older version of Flink and they have 
to either carefully read the release notes or check the source code for 
upgrading guidance on deprecated options. I suggest generating doc for 
deprecated options as well, and we should scan through the code to make sure 
that proper upgrading guidance is provided for the deprecated options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33241) Align config option generation documentation for Flink's config documentation

2023-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33241:
-

 Summary: Align config option generation documentation for Flink's 
config documentation
 Key: FLINK-33241
 URL: https://issues.apache.org/jira/browse/FLINK-33241
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.19.0
Reporter: Matthias Pohl


The configuration parameter docs generation is documented in two places in 
different ways:
[docs/README.md:62|https://github.com/apache/flink/blob/5c1e9f3b1449cb77276d578b344d9a69c7cf9a3c/docs/README.md#L62]
 and 
[flink-docs/README.md:44|https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-docs/README.md#L44].
 Only the latter one works.

We should remove the corresponding command from {{docs/README.md}} and refer to 
{{flink-docs/README.md}} for the documentation. That way, we only have to 
maintain a single file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS][FLINK-33240] Document deprecated options as well

2023-10-11 Thread Zhanghao Chen
Hi Flink users and developers,

Currently, Flink won't generate doc for the deprecated options. This might 
confuse users when upgrading from an older version of Flink: they have to 
either carefully read the release notes or check the source code for upgrade 
guidance on deprecated options.

I propose to document deprecated options as well, with a "(deprecated)" tag 
placed at the beginning of the option description to highlight the deprecation 
status [1].

Looking forward to your feedbacks on it.

[1] https://issues.apache.org/jira/browse/FLINK-33240

Best,
Zhanghao Chen


Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-11 Thread ConradJam
+1 TTL shows the state ttl for operators in Flink web ui can be know what
operator state

Zakelly Lan  于2023年10月11日周三 19:14写道:

> Hi Jane,
>
> The fine-grained TTL management is extremely useful for performance
> tuning, so +1 for the idea. I have a minor suggestion: would it be
> possible to provide a simple hint that allows the omission of the key?
> For example, something like "SELECT /+ STATE_TTL('1h')/", which would
> specify the TTL for all states in the 'SELECT' clause.
>
> And I also share the same concern as Feng. I am wondering if we could
> show the state ttl for operators in Flink UI.
>
>
> Best,
> Zakelly
>
> On Wed, Oct 11, 2023 at 1:27 PM Feng Jin  wrote:
> >
> > Hi Jane,
> >
> > Thank you for providing this explanation.
> >
> > Another small question, since there is no exception thrown when the STATE
> > hint is set incorrectly,
> > should we somehow show that the TTL setting has taken effect?
> > For instance, exhibiting the TTL option within the operator's
> description?
> >
> > Best,
> > Feng
> >
> > On Tue, Oct 10, 2023 at 7:51 PM Xuyang  wrote:
> >
> > > Hi, Jane.
> > >
> > >
> > > I think this syntax will be easier for users to set operator ttl. So
> big
> > > +1. I left some minor comments here.
> > >
> > >
> > > I notice that using STATE_TTL hints wrongly will not throw any
> exceptions.
> > > But it seems that in the current join hint scenario,
> > > if user uses an unknown table name as the chosen side, a validation
> > > exception will be thrown.
> > > Maybe we should distinguish which exceptions need to be thrown
> explicitly.
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Best!
> > > Xuyang
> > >
> > >
> > >
> > >
> > >
> > > At 2023-10-10 18:23:55, "Jane Chan"  wrote:
> > > >Hi Feng,
> > > >
> > > >Thank you for your valuable comments. The reason for not including the
> > > >scenarios above is as follows:
> > > >
> > > >For <1>, the automatically inferred stateful operators are not easily
> > > >expressible in SQL. This issue was discussed in FLIP-292, and besides
> > > >ChangelogNormalize, SinkUpsertMateralizer also faces the same problem.
> > > >
> > > >For <2> and <3>, the challenge lies in internal implementation.
> During the
> > > >default_rewrite phase, the row_number expression in LogicalProject is
> > > >transformed into LogicalWindow by Calcite's
> > > >CoreRules#PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW. However,
> CalcRelSplitter
> > > >does not pass the hints as an input argument when creating
> LogicalWindow,
> > > >resulting in the loss of the hint at this step. To support this, we
> may
> > > >need to rewrite some optimization rules in Calcite, which could be a
> > > >follow-up work if required.
> > > >
> > > >Best,
> > > >Jane
> > > >
> > > >On Tue, Oct 10, 2023 at 1:40 AM Feng Jin 
> wrote:
> > > >
> > > >> Hi Jane,
> > > >>
> > > >> Thank you for proposing this FLIP.
> > > >>
> > > >> I believe that this FLIP will greatly enhance the flexibility of
> setting
> > > >> state, and by setting different operators' TTL, it can also
> increase job
> > > >> stability, especially in regular join scenarios.
> > > >> The parameter design is very concise, big +1 for this, and it is
> also
> > > >> relatively easy to use for users.
> > > >>
> > > >>
> > > >> I have a small question: in the FLIP, it only mentions join and
> group.
> > > >> Should we also consider other scenarios?
> > > >>
> > > >> 1. the auto generated deduplicate operator[1].
> > > >> 2. the deduplicate query[2].
> > > >> 3. the topN query[3].
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-cdc-events-duplicate
> > > >> [2]
> > > >>
> > > >>
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/deduplication/
> > > >> [3]
> > > >>
> > > >>
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/topn/
> > > >>
> > > >>
> > > >> Best,
> > > >> Feng
> > > >>
> > > >> On Sun, Oct 8, 2023 at 5:53 PM Jane Chan 
> wrote:
> > > >>
> > > >> > Hi devs,
> > > >> >
> > > >> > I'd like to initiate a discussion on FLIP-373: Support Configuring
> > > >> > Different State TTLs using SQL Hint [1]. This proposal is on top
> of
> > > the
> > > >> > FLIP-292 [2] to address typical scenarios with unambiguous
> semantics
> > > and
> > > >> > hint propagation.
> > > >> >
> > > >> > I'm looking forward to your opinions!
> > > >> >
> > > >> >
> > > >> > [1]
> > > >> >
> > > >> >
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
> > > >> > [2]
> > > >> >
> > > >> >
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
> > > >> >
> > > >> > Best,
> > > >> > Jane
> > > >> >
> > > >>
> > >
>


-- 
Best

ConradJam


[VOTE] FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink

2023-10-11 Thread Péter Váry
Hi all,

Thank you to everyone for the feedback on FLIP-371[1].
Based on the discussion thread [2], I think we are ready to take a vote to
contribute this to Flink.
I'd like to start a vote for it.
The vote will be open for at least 72 hours (excluding weekends, unless
there is an objection or an insufficient number of votes).

Thanks,
Peter


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
[2] https://lists.apache.org/thread/v3mrspdlrqrzvbwm0lcgr0j4v03dx97c


Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-11 Thread Yun Tang
I think showing the TTL for operators is a nice-to-have feature, not a must one 
in this FLIP. We can still get the information from the operator descriptions.

And I think we can continue the TTL showing work based on FLINK-33230 [1].

Last but not least, I prefer to throw exceptions if the TTL configuration is 
mistakenly used as it will affect the data correctness.

[1] https://issues.apache.org/jira/browse/FLINK-33230

Best
Yun Tang

From: ConradJam 
Sent: Wednesday, October 11, 2023 20:30
To: dev@flink.apache.org 
Subject: Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State TTLs 
using SQL Hint

+1 TTL shows the state ttl for operators in Flink web ui can be know what
operator state

Zakelly Lan  于2023年10月11日周三 19:14写道:

> Hi Jane,
>
> The fine-grained TTL management is extremely useful for performance
> tuning, so +1 for the idea. I have a minor suggestion: would it be
> possible to provide a simple hint that allows the omission of the key?
> For example, something like "SELECT /+ STATE_TTL('1h')/", which would
> specify the TTL for all states in the 'SELECT' clause.
>
> And I also share the same concern as Feng. I am wondering if we could
> show the state ttl for operators in Flink UI.
>
>
> Best,
> Zakelly
>
> On Wed, Oct 11, 2023 at 1:27 PM Feng Jin  wrote:
> >
> > Hi Jane,
> >
> > Thank you for providing this explanation.
> >
> > Another small question, since there is no exception thrown when the STATE
> > hint is set incorrectly,
> > should we somehow show that the TTL setting has taken effect?
> > For instance, exhibiting the TTL option within the operator's
> description?
> >
> > Best,
> > Feng
> >
> > On Tue, Oct 10, 2023 at 7:51 PM Xuyang  wrote:
> >
> > > Hi, Jane.
> > >
> > >
> > > I think this syntax will be easier for users to set operator ttl. So
> big
> > > +1. I left some minor comments here.
> > >
> > >
> > > I notice that using STATE_TTL hints wrongly will not throw any
> exceptions.
> > > But it seems that in the current join hint scenario,
> > > if user uses an unknown table name as the chosen side, a validation
> > > exception will be thrown.
> > > Maybe we should distinguish which exceptions need to be thrown
> explicitly.
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Best!
> > > Xuyang
> > >
> > >
> > >
> > >
> > >
> > > At 2023-10-10 18:23:55, "Jane Chan"  wrote:
> > > >Hi Feng,
> > > >
> > > >Thank you for your valuable comments. The reason for not including the
> > > >scenarios above is as follows:
> > > >
> > > >For <1>, the automatically inferred stateful operators are not easily
> > > >expressible in SQL. This issue was discussed in FLIP-292, and besides
> > > >ChangelogNormalize, SinkUpsertMateralizer also faces the same problem.
> > > >
> > > >For <2> and <3>, the challenge lies in internal implementation.
> During the
> > > >default_rewrite phase, the row_number expression in LogicalProject is
> > > >transformed into LogicalWindow by Calcite's
> > > >CoreRules#PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW. However,
> CalcRelSplitter
> > > >does not pass the hints as an input argument when creating
> LogicalWindow,
> > > >resulting in the loss of the hint at this step. To support this, we
> may
> > > >need to rewrite some optimization rules in Calcite, which could be a
> > > >follow-up work if required.
> > > >
> > > >Best,
> > > >Jane
> > > >
> > > >On Tue, Oct 10, 2023 at 1:40 AM Feng Jin 
> wrote:
> > > >
> > > >> Hi Jane,
> > > >>
> > > >> Thank you for proposing this FLIP.
> > > >>
> > > >> I believe that this FLIP will greatly enhance the flexibility of
> setting
> > > >> state, and by setting different operators' TTL, it can also
> increase job
> > > >> stability, especially in regular join scenarios.
> > > >> The parameter design is very concise, big +1 for this, and it is
> also
> > > >> relatively easy to use for users.
> > > >>
> > > >>
> > > >> I have a small question: in the FLIP, it only mentions join and
> group.
> > > >> Should we also consider other scenarios?
> > > >>
> > > >> 1. the auto generated deduplicate operator[1].
> > > >> 2. the deduplicate query[2].
> > > >> 3. the topN query[3].
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-cdc-events-duplicate
> > > >> [2]
> > > >>
> > > >>
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/deduplication/
> > > >> [3]
> > > >>
> > > >>
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/topn/
> > > >>
> > > >>
> > > >> Best,
> > > >> Feng
> > > >>
> > > >> On Sun, Oct 8, 2023 at 5:53 PM Jane Chan 
> wrote:
> > > >>
> > > >> > Hi devs,
> > > >> >
> > > >> > I'd like to initiate a discussion on FLIP-373: Support Configuring
> > > >> > Different State TTLs using SQL Hint [1]. This proposal is on top
> of
> > > the
> > > >> > FLIP-292 [2] to address typical scenarios with unambiguous
> semantics
> > > 

[jira] [Created] (FLINK-33242) flink-misc doesn't succeed due to the YARN tests

2023-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33242:
-

 Summary: flink-misc doesn't succeed due to the YARN tests
 Key: FLINK-33242
 URL: https://issues.apache.org/jira/browse/FLINK-33242
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl


https://github.com/XComp/flink/actions/runs/6473584177/job/17581942919

{code}
2023-10-10T23:16:09.3548634Z Oct 10 23:16:09 23:16:09.354 [ERROR] Tests run: 1, 
Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.664 s <<< FAILURE! - in 
org.apache.flink.yarn.YarnPrioritySchedulingITCase
2023-10-10T23:16:09.3564980Z Oct 10 23:16:09 23:16:09.354 [ERROR] 
org.apache.flink.yarn.YarnPrioritySchedulingITCase.yarnApplication_submissionWithPriority_shouldRespectPriority
  Time elapsed: 1.226 s  <<< ERROR!
2023-10-10T23:16:09.3565608Z Oct 10 23:16:09 java.lang.RuntimeException: Runner 
failed with exception.
2023-10-10T23:16:09.3566290Z Oct 10 23:16:09at 
org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:949)
2023-10-10T23:16:09.3566954Z Oct 10 23:16:09at 
org.apache.flink.yarn.YarnPrioritySchedulingITCase.lambda$yarnApplication_submissionWithPriority_shouldRespectPriority$0(YarnPrioritySchedulingITCase.java:45)
2023-10-10T23:16:09.3567646Z Oct 10 23:16:09at 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
2023-10-10T23:16:09.3568447Z Oct 10 23:16:09at 
org.apache.flink.yarn.YarnPrioritySchedulingITCase.yarnApplication_submissionWithPriority_shouldRespectPriority(YarnPrioritySchedulingITCase.java:41)
2023-10-10T23:16:09.3569187Z Oct 10 23:16:09at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-10-10T23:16:09.3569805Z Oct 10 23:16:09at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2023-10-10T23:16:09.3570485Z Oct 10 23:16:09at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-10-10T23:16:09.3571052Z Oct 10 23:16:09at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
2023-10-10T23:16:09.3571527Z Oct 10 23:16:09at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
2023-10-10T23:16:09.3572075Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
2023-10-10T23:16:09.3572716Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
2023-10-10T23:16:09.3573350Z Oct 10 23:16:09at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
2023-10-10T23:16:09.3573954Z Oct 10 23:16:09at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
2023-10-10T23:16:09.3574665Z Oct 10 23:16:09at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
2023-10-10T23:16:09.3575378Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
2023-10-10T23:16:09.3576139Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
2023-10-10T23:16:09.3576852Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
2023-10-10T23:16:09.3577539Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
2023-10-10T23:16:09.3578225Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
2023-10-10T23:16:09.3578898Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
2023-10-10T23:16:09.3579568Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
2023-10-10T23:16:09.3580243Z Oct 10 23:16:09at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
2023-10-10T23:16:09.3580917Z Oct 10 23:16:09at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217)
2023-10-10T23:16:09.3581584Z Oct 10 23:16:09at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2023-10-10T23:16:09.3582276Z Oct 10 23:16:09at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213)
2023-10-10T23:16:09.3582952Z Oct 10 23:16:09at 
org.junit.jupiter.engine.desc

[jira] [Created] (FLINK-33243) .scalafmt.conf cannot be found in Test packaging/licensing job

2023-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33243:
-

 Summary: .scalafmt.conf cannot be found in Test 
packaging/licensing job
 Key: FLINK-33243
 URL: https://issues.apache.org/jira/browse/FLINK-33243
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl


https://github.com/XComp/flink/actions/runs/6473584177/job/17581941684#step:8:4327



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33244) Not Able To Pass the Configuration On Flink Session

2023-10-11 Thread Amarjeet Singh (Jira)
Amarjeet Singh created FLINK-33244:
--

 Summary: Not Able To Pass the Configuration On Flink Session
 Key: FLINK-33244
 URL: https://issues.apache.org/jira/browse/FLINK-33244
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib, Kubernetes Operator
Affects Versions: kubernetes-operator-1.5.0
Reporter: Amarjeet Singh
 Fix For: 1.17.1


Hi 
I have tried configuring the flink run -D like 


-Dmetrics.reporter=promgateway\
-Dmetrics.reporter.promgateway.jobName: flink_test_outside

these configuration .

And Same is for FLink Kubernetive Operator



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Release 1.18.0, release candidate #1

2023-10-11 Thread Etienne Chauchot

Hi all,

FYI, build issues for connectors related to architecture tests adding or 
removing violations is under fixing and tracked here (1)


Of course, it is indeed not a blocker as connectors development is 
decoupled from Flink development.


[1] https://issues.apache.org/jira/browse/FLINK-32563

Best

Etienne

Le 09/10/2023 à 09:07, Jing Ge a écrit :

Hi Sergey and devs,

Thanks for bringing this to our attention. I am open to discuss that. I
have the following thoughts:

1. Like I already mentioned in many other threads, build issues in
downstream repos should not block upstream release. I understand the
concern that developers want to have stable connectors. But it violates the
intention of connector externalization.
2. It is expensive to download Flink jar(roughly 500M) from S3 for each PR
and nightly build of each connector. Does it make sense to leverage [1].
Many Flink docs have been using it.
3. I will check internally at Ververica to see if we could make the file
publicly accessible to temporarily solve this issue.

Looking forward to your feedback.

Best regards,
Jing

[1]https://nightlies.apache.org/flink/

On Fri, Oct 6, 2023 at 2:03 PM Konstantin Knauf  wrote:


Hi everyone,

I've just opened a PR for the release announcement [1] and I am looking
forward to reviews and feedback.

Cheers,

Konstantin

[1]https://github.com/apache/flink-web/pull/680

Am Fr., 6. Okt. 2023 um 11:03 Uhr schrieb Sergey Nuyanzin <
snuyan...@gmail.com>:


sorry for not mentioning it in previous mail

based on the reason above I'm
-1 (non-binding)

also there is one more issue [1]
which blocks all the externalised connectors testing against the most
recent commits in
to corresponding branches
[1]https://issues.apache.org/jira/browse/FLINK-33175


On Thu, Oct 5, 2023 at 11:19 PM Sergey Nuyanzin
wrote:


Thanks for creating RC1

* Downloaded artifacts
* Built from sources
* Verified checksums and gpg signatures
* Verified versions in pom files
* Checked NOTICE, LICENSE files

The strange thing I faced is
CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished
fails on AZP [1]

which looks like it is related to [2], [3] fixed  in 1.18.0 (not 100%
sure).


[1]https://issues.apache.org/jira/browse/FLINK-33186
[2]https://issues.apache.org/jira/browse/FLINK-32996
[3]https://issues.apache.org/jira/browse/FLINK-32907

On Tue, Oct 3, 2023 at 2:53 PM Ferenc Csaky 
Thanks everyone for the efforts!

Checked the following:

- Downloaded artifacts
- Built Flink from source
- Verified checksums/signatures
- Verified NOTICE, LICENSE files
- Deployed dummy SELECT job via SQL gateway on standalone cluster,

things

seemed fine according to the log files

+1 (non-binding)

Best,
Ferenc


--- Original Message ---
On Friday, September 29th, 2023 at 22:12, Gabor Somogyi <
gabor.g.somo...@gmail.com> wrote:




Thanks for the efforts!

+1 (non-binding)

* Verified versions in the poms
* Built from source
* Verified checksums and signatures
* Started basic workloads with kubernetes operator
* Verified NOTICE and LICENSE files

G

On Fri, Sep 29, 2023, 18:16 Matthias Pohl

matthias.p...@aiven.io.invalid

wrote:


Thanks for creating RC1. I did the following checks:

* Downloaded artifacts
* Built Flink from sources
* Verified SHA512 checksums GPG signatures
* Compared checkout with provided sources
* Verified pom file versions
* Went over NOTICE file/pom files changes without finding anything
suspicious
* Deployed standalone session cluster and ran WordCount example in

batch

and streaming: Nothing suspicious in log files found

+1 (binding)

On Fri, Sep 29, 2023 at 10:34 AM Etienne Chauchot

echauc...@apache.org

wrote:


Hi all,

Thanks to the team for this RC.

I did a quick check of this RC against user pipelines (1) coded

with

DataSet (even if deprecated and soon removed), DataStream and

SQL

APIs

based on the small scope of this test, LGTM

+1 (non-binding)

[1]https://github.com/echauchot/tpcds-benchmark-flink

Best
Etienne

Le 28/09/2023 à 19:35, Jing Ge a écrit :


Hi everyone,

The RC1 for Apache Flink 1.18.0 has been created. The related

voting

process will be triggered once the announcement is ready. The

RC1

has

all
the artifacts that we would typically have for a release,

except

for

the
release note and the website pull request for the release

announcement.

The following contents are available for your review:

- Confirmation of no benchmarks regression at the thread[1].
- The preview source release and binary convenience releases

[2],

which

are signed with the key with fingerprint 96AE0E32CBE6E0753CE6

[3].

- all artifacts that would normally be deployed to the Maven
Central Repository [4].
- source code tag "release-1.18.0-rc1" [5]

Your help testing the release will be greatly appreciated! And

we'll

create the rc1 release and the voting thread as soon as all

the

efforts

are
finished.

[1]

https://lists.apache.org/thread/yxyphglwwvq57wcqlfrnk3qo9t3sr2ro

[2]

https://dist.apache

[jira] [Created] (FLINK-33245) finegrained_resource_management module failed

2023-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33245:
-

 Summary: finegrained_resource_management module failed
 Key: FLINK-33245
 URL: https://issues.apache.org/jira/browse/FLINK-33245
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl



||Run||Runtime||Outcome||
|[run|https://github.com/XComp/flink/actions/runs/6473584177/job/17581943111]|3h
 58m 17s|* timed out with thread dump
|
|[run|https://github.com/XComp/flink/actions/runs/6472816505/job/17575963787]|3h
 36m 56s|* {{NoSuchMethodError}} in {{HAJobRunOnHadoopS3FileSystemITCase}}
|
|[run|https://github.com/XComp/flink/actions/runs/6472726326/job/17575765131]|58m
 8s|* fatal error in {{ZooKeeperLeaderElectionConnectionHandlingTest}}
|
|[run|https://github.com/XComp/flink/actions/runs/6471693368/job/17575340696]|3h
 58m 20s|* timed out with thread dump
|
|[run|https://github.com/XComp/flink/actions/runs/6471147857/job/17571310183]|3h
 48m 54s|* {{NoSuchMethodError}} in {{HAJobRunOnHadoopS3FileSystemITCase}}
|
|[run|https://github.com/XComp/flink/actions/runs/6470473080/job/17569261725]|3h
 58m 20s|* timed out with thread dump
|
|[run|https://github.com/XComp/flink/actions/runs/6468655160/job/17563927249]|1h
 52m 35s|* 100ms timeout occurred in 
{{MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot}}
|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSSION] test connectors against Flink master in PRs

2023-10-11 Thread Etienne Chauchot

Hi,

Cross-posting in that thread too:

FYI, build issues for connectors related to architecture tests adding or 
removing violations is under fixing and tracked here (1)


[1] https://issues.apache.org/jira/browse/FLINK-32563

Best

Etienne


Le 03/07/2023 à 10:05, Etienne Chauchot a écrit :


Hi all,

I wanted to post here the result of a discussion I had in private with 
Chesnay related to this subject. The question was regarding archunit 
with connectors:


"How to deal with different archunit violations between 2 versions of 
Flink ?  If a violation is normal and should be added to the violation 
store but the related rule has changed in a recent Flink version, how 
to have different set of violations between 2 flink versions for one 
single violation store?"


We concluded by saying that even if a connector should support (and 
therefore be tested against) the last 2 versions of Flink, the 
archunit tests should run only on the main supported Flink version 
(usually the most recent one).


As a consequence, I'll configure that in Cassandra connector and 
update the connectors migration wiki doc to serve as an example for 
such cases.


Best

Etienne


Le 29/06/2023 à 15:57, Etienne Chauchot a écrit :


Hi Martijn,

Thanks for your feedback. I makes total sense to me.

I'll enable it for Cassandra.

Best

Etienne

Le 29/06/2023 à 10:54, Martijn Visser a écrit :


Hi Etienne,

I think it all depends on the actual maintainers of the connector to
make a decision on that: if their unreleased version of the connector
should be compatible with a new Flink version, then they should test
against it. For example, that's already done at Elasticsearch [1] and
JDBC [2].

Choosing which versions to support is a decision by the maintainers in
the community, and it always requires an action by a maintainer to
update the CI config to set the correct versions whenever a new Flink
version is released.

Best regards,

Martijn

[1]https://github.com/apache/flink-connector-elasticsearch/blob/main/.github/workflows/push_pr.yml
[2]https://github.com/apache/flink-connector-jdbc/blob/main/.github/workflows/push_pr.yml

On Wed, Jun 28, 2023 at 6:09 PM Etienne Chauchot  wrote:

Hi all,

Connectors are external to flink. As such, they need to be tested
against stable (released) versions of Flink.

But I was wondering if it would make sense to test connectors in PRs
also against latest flink master snapshot to allow to discover failures
before merging the PRs, ** while the author is still available **,
rather than discovering them in nightly tests (that test against
snapshot) after the merge. That would allow the author to anticipate
potential failures and provide more future proof code (even if master is
subject to change before the connector release).

Of course, if a breaking change was introduced in master, such tests
will fail. But they should be considered as a preview of how the code
will behave against the current snapshot of the next flink version.

WDYT ?


Best

Etienne

[jira] [Created] (FLINK-33246) Add RescalingIT case that uses checkpoints and resource requests

2023-10-11 Thread Stefan Richter (Jira)
Stefan Richter created FLINK-33246:
--

 Summary: Add RescalingIT case that uses checkpoints and resource 
requests
 Key: FLINK-33246
 URL: https://issues.apache.org/jira/browse/FLINK-33246
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Reporter: Stefan Richter
Assignee: Stefan Richter


RescalingITCase currently uses savepoints and cancel/restart for rescaling. We 
should add a test that also tests rescaling from checkpoints under changing 
resource requirements, i.e. without cancelation of the job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-11 Thread Jane Chan
Thank you all for the valuable feedback.

There is a difference when users incorrectly use hints, and there are two
possible scenarios:
<1> STATE_TTL hint can be applied to the current query block but with an
invalid key or value. In this case, the validation exception will be
thrown. The HintOptionChecker will throw exceptions if the options are
empty or the value is an invalid duration. And JoinHintResolver will throw
exceptions if the hint key(table name/alias, etc.) does not exist. So does
for the group aggregate.

<2> STATE_TTL hint cannot be applied to the current query block, e.g.,
SELECT /*+ STATE_TTL('MyTable' = '2h') */ * a, b, c FROM MyTable. In this
case, the hint is ignored. This is a by-design behavior according to
FLIP-229 [1].

I've made the modifications to the FLIP regarding exception handling. I
would appreciate it if you could review it again.

@Xuyang

>  I notice that using STATE_TTL hints wrongly will not throw any
> exceptions. But it seems that in the current join hint scenario, if user
> uses an unknown table name as the chosen side, a validation exception will
> be thrown. Maybe we should distinguish which exceptions need to be thrown
> explicitly.


You're right; the STATE_TTL hint semantic check should throw exceptions
like join hints.

@Feng @Yun

> since there is no exception thrown when the STATE hint is set incorrectly,
> should we somehow show that the TTL setting has taken effect?

For instance, exhibiting the TTL option within the operator's description?


We can throw explicit exceptions for scenario #1. For scenario #2, I prefer
to align the behavior for current query block hints for now (and we may
open a separate discussion in the future). On the other hand, from the
implementation aspect, it is not easy to do so. Taking the example of
deduplication mentioned earlier, the hint is lost before it propagates to
the FlinkLogicalRank node, making it challenging to capture the exception.

@Zakelly

> would it be possible to provide a simple hint that allows the omission of
> the key? For example, something like "SELECT /+ STATE_TTL('1h')/", which
> would specify the TTL for all states in the 'SELECT' clause.


I'm afraid the hint key cannot be omitted. E.g., the join operator is a
TwoInputStreamOperator; if we want to specify different state TTLs for the
left and right input, we need to inform the planner of the ordinal info
(LEFT v.s. RIGHT). On the other hand, the SELECT clause may contain several
query blocks, while the hint scope is designed to apply to the current
query block [2] to prevent the hint on the outer query block from
propagating to the inner query block. For the use case where users need to
specify the TTL for all states in the 'SELECT' clause, it is preferable to
modify the compiled plan instead of using hints.

@ConradJam
I share the same opinion with @Yun that this is a nice-to-have feature. Big
+1 for the follow-up on FLINK-33230. Users can now use the COMPILE PLAN
statement to serialize the query to a JSON string or file and then check
the state metadata to ensure the hint is applied.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job

[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#query-hints


Best,
Jane


On Wed, Oct 11, 2023 at 8:49 PM Yun Tang  wrote:

> I think showing the TTL for operators is a nice-to-have feature, not a
> must one in this FLIP. We can still get the information from the operator
> descriptions.
>
> And I think we can continue the TTL showing work based on FLINK-33230 [1].
>
> Last but not least, I prefer to throw exceptions if the TTL configuration
> is mistakenly used as it will affect the data correctness.
>
> [1] https://issues.apache.org/jira/browse/FLINK-33230
>
> Best
> Yun Tang
> 
> From: ConradJam 
> Sent: Wednesday, October 11, 2023 20:30
> To: dev@flink.apache.org 
> Subject: Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State
> TTLs using SQL Hint
>
> +1 TTL shows the state ttl for operators in Flink web ui can be know what
> operator state
>
> Zakelly Lan  于2023年10月11日周三 19:14写道:
>
> > Hi Jane,
> >
> > The fine-grained TTL management is extremely useful for performance
> > tuning, so +1 for the idea. I have a minor suggestion: would it be
> > possible to provide a simple hint that allows the omission of the key?
> > For example, something like "SELECT /+ STATE_TTL('1h')/", which would
> > specify the TTL for all states in the 'SELECT' clause.
> >
> > And I also share the same concern as Feng. I am wondering if we could
> > show the state ttl for operators in Flink UI.
> >
> >
> > Best,
> > Zakelly
> >
> > On Wed, Oct 11, 2023 at 1:27 PM Feng Jin  wrote:
> > >
> > >

Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-11 Thread Jane Chan
+1 (non-binding)

Best,
Jane

On Wed, Oct 11, 2023 at 5:13 PM Samrat Deb  wrote:

> +1 (non binding)
>
> On Wed, 11 Oct 2023 at 1:29 PM, Martijn Visser 
> wrote:
>
> > +1 (binding)
> >
> > On Wed, Oct 11, 2023 at 9:15 AM Sergey Nuyanzin 
> > wrote:
> > >
> > > +1 (binding)
> > >
> > > that's a nice improvement
> > > thanks for driving that
> > >
> > > On Wed, Oct 11, 2023 at 8:42 AM Lincoln Lee 
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Leonard Xu  于2023年10月10日周二 09:57写道:
> > > >
> > > > > +1(binding)
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > > > On Oct 9, 2023, at 9:45 PM, Jing Ge 
> > > > wrote:
> > > > > >
> > > > > > +1(binding)
> > > > > >
> > > > > > Best Regards,
> > > > > > Jing
> > > > > >
> > > > > > On Mon, Oct 9, 2023 at 10:40 AM Ahmed Hamdy <
> hamdy10...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1 (non-binding)
> > > > > >> Best Regards
> > > > > >> Ahmed Hamdy
> > > > > >>
> > > > > >>
> > > > > >> On Mon, 9 Oct 2023 at 09:38, xiangyu feng  >
> > > > wrote:
> > > > > >>
> > > > > >>> +1 (non-binding)
> > > > > >>>
> > > > > >>> Feng Jin  于2023年10月9日周一 16:00写道:
> > > > > >>>
> > > > >  +1 (non-binding)
> > > > > 
> > > > >  Best,
> > > > >  Feng
> > > > > 
> > > > >  On Mon, Oct 9, 2023 at 3:12 PM Yangze Guo  >
> > > > wrote:
> > > > > 
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Mon, Oct 9, 2023 at 2:46 PM Yun Tang 
> > wrote:
> > > > > >>
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> Best
> > > > > >> Yun Tang
> > > > > >> 
> > > > > >> From: Weihua Hu 
> > > > > >> Sent: Monday, October 9, 2023 12:03
> > > > > >> To: dev@flink.apache.org 
> > > > > >> Subject: Re: [VOTE] FLIP-367: Support Setting Parallelism
> for
> > > > > >>> Table/SQL
> > > > > > Sources
> > > > > >>
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> Best,
> > > > > >> Weihua
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Oct 9, 2023 at 11:47 AM Shammon FY <
> zjur...@gmail.com
> > >
> > > > > >>> wrote:
> > > > > >>
> > > > > >>> +1 (binding)
> > > > > >>>
> > > > > >>>
> > > > > >>> On Mon, Oct 9, 2023 at 11:12 AM Benchao Li <
> > libenc...@apache.org
> > > > > >>>
> > > > > > wrote:
> > > > > >>>
> > > > >  +1 (binding)
> > > > > 
> > > > >  Zhanghao Chen  于2023年10月9日周一
> > > > > >> 10:20写道:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > Thanks for all the feedback on FLIP-367: Support Setting
> > > > > > Parallelism
> > > > > >>> for
> > > > >  Table/SQL Sources [1][2].
> > > > > >
> > > > > > I'd like to start a vote for FLIP-367. The vote will be
> > open
> > > > >  until
> > > > > > Oct
> > > > >  12th 12:00 PM GMT) unless there is an objection or
> > insufficient
> > > > > > votes.
> > > > > >
> > > > > > [1]
> > > > > 
> > > > > >>>
> > > > > >
> > > > > 
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > > > > > [2]
> > > > > >
> > https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
> > > > > >
> > > > > > Best,
> > > > > > Zhanghao Chen
> > > > > 
> > > > > 
> > > > > 
> > > > >  --
> > > > > 
> > > > >  Best,
> > > > >  Benchao Li
> > > > > 
> > > > > >>>
> > > > > >
> > > > > 
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> >
>


[jira] [Created] (FLINK-33247) IllegalArgumentException in NoticeFileChecker

2023-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33247:
-

 Summary: IllegalArgumentException in NoticeFileChecker
 Key: FLINK-33247
 URL: https://issues.apache.org/jira/browse/FLINK-33247
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl


{code}
2023-10-11T15:55:54.7616189Z 15:55:54.760 [INFO] --- 
exec-maven-plugin:3.1.0:java (default-cli) @ flink-ci-tools ---
2023-10-11T15:55:55.2153082Z 15:55:55,212 WARN  
org.apache.flink.tools.ci.licensecheck.LicenseChecker[] - THIS UTILITY 
IS ONLY CHECKING FOR COMMON LICENSING MISTAKES. A MANUAL CHECK OF THE NOTICE 
FILES, DEPLOYED ARTIFACTS, ETC. IS STILL NEEDED!
2023-10-11T15:55:55.2217611Z 15:55:55,221 DEBUG 
org.apache.flink.tools.ci.licensecheck.NoticeFileChecker [] - Loaded 3 
items from resource modules-defining-excess-dependencies.modulelist
2023-10-11T15:55:58.1540870Z 15:55:58,153 INFO  
org.apache.flink.tools.ci.licensecheck.NoticeFileChecker [] - Extracted 128 
modules that were deployed and 174 modules which bundle dependencies with a 
total of 174 dependencies
2023-10-11T15:55:58.1700608Z 15:55:58.162 [WARNING] 
2023-10-11T15:55:58.1701401Z java.lang.IllegalArgumentException
2023-10-11T15:55:58.1702263Z at sun.nio.fs.UnixPath.getName 
(UnixPath.java:334)
2023-10-11T15:55:58.1702685Z at sun.nio.fs.UnixPath.getName 
(UnixPath.java:43)
2023-10-11T15:55:58.1703278Z at 
org.apache.flink.tools.ci.licensecheck.NoticeFileChecker.lambda$findNoticeFiles$10
 (NoticeFileChecker.java:362)
2023-10-11T15:55:58.1703907Z at 
java.util.stream.ReferencePipeline$2$1.accept (ReferencePipeline.java:176)
2023-10-11T15:55:58.1704412Z at 
java.util.stream.ReferencePipeline$3$1.accept (ReferencePipeline.java:195)
2023-10-11T15:55:58.1704877Z at java.util.Iterator.forEachRemaining 
(Iterator.java:133)
2023-10-11T15:55:58.1705356Z at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining 
(Spliterators.java:1801)
2023-10-11T15:55:58.1705902Z at java.util.stream.AbstractPipeline.copyInto 
(AbstractPipeline.java:484)
2023-10-11T15:55:58.1706457Z at 
java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:474)
2023-10-11T15:55:58.1706975Z at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential (ReduceOps.java:913)
2023-10-11T15:55:58.1707485Z at java.util.stream.AbstractPipeline.evaluate 
(AbstractPipeline.java:234)
2023-10-11T15:55:58.1708005Z at java.util.stream.ReferencePipeline.collect 
(ReferencePipeline.java:578)
2023-10-11T15:55:58.1708667Z at 
org.apache.flink.tools.ci.licensecheck.NoticeFileChecker.findNoticeFiles 
(NoticeFileChecker.java:366)
2023-10-11T15:55:58.1709374Z at 
org.apache.flink.tools.ci.licensecheck.NoticeFileChecker.run 
(NoticeFileChecker.java:91)
2023-10-11T15:55:58.1710011Z at 
org.apache.flink.tools.ci.licensecheck.LicenseChecker.main 
(LicenseChecker.java:42)
2023-10-11T15:55:58.1710548Z at org.codehaus.mojo.exec.ExecJavaMojo$1.run 
(ExecJavaMojo.java:279)
2023-10-11T15:55:58.1710929Z at java.lang.Thread.run (Thread.java:829)
{code}
https://github.com/XComp/flink/actions/runs/6483465871/job/17607225139#step:8:41046



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink

2023-10-11 Thread Gyula Fóra
Thanks , Peter.

+1

Gyula

On Wed, 11 Oct 2023 at 14:47, Péter Váry 
wrote:

> Hi all,
>
> Thank you to everyone for the feedback on FLIP-371[1].
> Based on the discussion thread [2], I think we are ready to take a vote to
> contribute this to Flink.
> I'd like to start a vote for it.
> The vote will be open for at least 72 hours (excluding weekends, unless
> there is an objection or an insufficient number of votes).
>
> Thanks,
> Peter
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
> [2] https://lists.apache.org/thread/v3mrspdlrqrzvbwm0lcgr0j4v03dx97c
>


[jira] [Created] (FLINK-33248) Calling CURRENT_WATERMARK without parameters gives IndexOutOfBoundsException

2023-10-11 Thread Bonnie Varghese (Jira)
Bonnie Varghese created FLINK-33248:
---

 Summary: Calling CURRENT_WATERMARK without parameters gives 
IndexOutOfBoundsException
 Key: FLINK-33248
 URL: https://issues.apache.org/jira/browse/FLINK-33248
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Bonnie Varghese


Create a table
{code:java}
Flink SQL> CREATE TABLE T (ts TIMESTAMP(3)) WITH ('connector'='values', 
'bounded'='true');
[INFO] Execute statement succeed. {code}
Select CURRENT_WATERMARK without parameters
{code:java}
Flink SQL> SELECT ts, CURRENT_WATERMARK() FROM T;
[ERROR] Could not execute SQL statement. Reason:
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 {code}
 

 

{*}Expected Behavior{*}:
It should return a SqlValidatorException: No match found for function signature 
CURRENT_WATERMARK()

This is inline with other functions which expects a parameter
Example:
{code:java}
Flink SQL> SELECT ARRAY_JOIN();
[ERROR] Could not execute SQL statement. Reason:
org.apache.calcite.sql.validate.SqlValidatorException: No match found for 
function signature ARRAY_JOIN {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-11 Thread Yuepeng Pan
Hi, Shammon.
Thanks for your feedback.

>1. This mechanism will be only supported in `SlotPool` or both `SlotPool` and 
>`DeclarativeSlotPool`? 

As described on the FLIP page, the current design plans to introduce the 
waiting mechanism only in the `SlotPool`, because the existing 
`WaitingForResources` can already achieve this effect.

>Currently the two slot pools are used in different schedulers. 

Yes, that's indeed the case.

>I think this will also bring value to `DeclarativeSlotPool`, but currently 
>FLIP content seems to be based on `SlotPool`, right?

Yes. your expectations are indeed reasonable. In theory, the 
`DeclarativeSlotPool` could also benefit from a waiting mechanism, as 
discussed. The purpose of introducing the waiting mechanism is to enable the 
`SlotPool` to have a global view to calculate the globally optimal solution. 
I've rechecked the relevant logic in the `AdaptiveScheduler`, and as I 
understand, the existing mechanisms already fulfill the current feature 
requirements. You could find more conclusions on this in FLIP `3.2.5`. Of 
course, I'd be appreciated with your confirmation. If there's any 
misunderstanding on my part, please correct me.

>2. ... What should be done when the slot selected by the round-robin strategy 
>cannot meet the resource requirements?

Is this referring to the phase of task-to-slot allocation? I'm not quite sure, 
would you mind explaining it? Thanks~.

>3. Is the assignment of tasks to slots balanced based on region or job level? 

Currently, there is no specific handling based on regions, and there is no 
job-level balancing. The target effect of the current feature is to achieve 
load balancing based on the number of tasks at the Task Manager (TM) level.
Looking forward to any suggestions regarding the item you mentioned.

>When multiple TMs fail over, will it cause the balancing strategy to fail or 
>even worse? 

IIUC, when multiple Task Managers undergo failover, the results after 
successful recovery will still be maintained in a relatively balanced state.

>What is the current processing strategy?

The Slot-to-TM strategy does not change after a Task Manager undergoes failover.

Best, Regards.
Yuepeng Pan

On 2023/09/28 05:10:13 Shammon FY wrote:
> Thanks Yuepeng for initiating this discussion.
> 
> +1 in general too, in fact we have implemented a similar mechanism
> internally to ensure a balanced allocation of tasks to slots, it works well.
> 
> Some comments about the mechanism
> 
> 1. This mechanism will be only supported in `SlotPool` or both `SlotPool`
> and `DeclarativeSlotPool`? Currently the two slot pools are used in
> different schedulers. I think this will also bring value to
> `DeclarativeSlotPool`, but currently FLIP content seems to be based on
> `SlotPool`, right?
> 
> 2. In fine-grained resource management, we can set different resource
> requirements for different nodes, which means that the resources of each
> slot are different. What should be done when the slot selected by the
> round-robin strategy cannot meet the resource requirements? Will this lead
> to the failure of the balance strategy?
> 
> 3. Is the assignment of tasks to slots balanced based on region or job
> level? When multiple TMs fail over, will it cause the balancing strategy to
> fail or even worse? What is the current processing strategy?
> 
> For Zhuzhu and Rui:
> 
> IIUC, the overall balance is divided into two parts: slot to TM and task to
> slot.
> 1. Slot to TM is guaranteed by SlotManager in ResourceManager
> 2. Task to slot is guaranteed by the slot pool in JM
> 
> These two are completely independent, what are the benefits of unifying
> these two into one option? Also, do we want to share the same
> option between SlotPool in JM and SlotManager in RM? This sounds a bit
> strange.
> 
> Best,
> Shammon FY
> 
> 
> 
> On Thu, Sep 28, 2023 at 12:08 PM Rui Fan <1996fan...@gmail.com> wrote:
> 
> > Hi Zhu Zhu,
> >
> > Thanks for your feedback here!
> >
> > You are right, user needs to set 2 options:
> > - cluster.evenly-spread-out-slots=true
> > - slot.sharing-strategy=TASK_BALANCED_PREFERRED
> >
> > Update it to one option is useful at user side, so
> > `taskmanager.load-balance.mode` sounds good to me.
> > I want to check some points and behaviors about this option:
> >
> > 1. The default value is None, right?
> > 2. When it's set to Tasks, how to assign slots to TM?
> > - Option1: It's just check task number
> > - Option2: It''s check the slot number first, then check the
> > task number when the slot number is the same.
> >
> > Giving an example to explain what's the difference between them:
> >
> > - A session cluster has 2 flink jobs, they are jobA and jobB
> > - Each TM has 4 slots.
> > - The task number of one slot of jobA is 3
> > - The task number of one slot of jobB is 1
> > - We have 2 TaskManagers:
> >   - tm1 runs 3 slots of jobB, so tm1 runs 3 tasks
> >   - tm2 runs 1 slot of jobA, and 1 slot of jobB, so tm2 runs 4 tasks.
> >
> > Now, we 

[jira] [Created] (FLINK-33249) comment should be parsed by StringLiteral() instead of SqlCharStringLiteral to avoid parsing failure

2023-10-11 Thread xiaogang zhou (Jira)
xiaogang zhou created FLINK-33249:
-

 Summary: comment should be parsed by StringLiteral() instead of 
SqlCharStringLiteral to avoid parsing failure
 Key: FLINK-33249
 URL: https://issues.apache.org/jira/browse/FLINK-33249
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.17.1
Reporter: xiaogang zhou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


dev@flink.apache.org

2023-10-11 Thread Jing Ge
+1(binding) Thanks!

Best regards,
Jing

On Wed, Oct 11, 2023 at 9:34 AM Yuepeng Pan  wrote:

> +1(non-binding)
> Thanks for your driving the voting thread.
>
> Best Regards.
> Yuepeng Pan
>
> On 2023/10/06 16:33:40 Joao Boto wrote:
> > Hi all, Thank you to everyone for the feedback on FLIP-239[1]. Based on
> the
> > discussion thread [2] and some offline discussions, we have come to a
> > consensus on the design and are ready to take a vote to contribute this
> to
> > Flink. I'd like to start a vote for it. The vote will be open for at
> least
> > 72 hours(excluding weekends, unless there is an objection or an
> > insufficient number of votes. [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
> > [2]https://lists.apache.org/thread/yx833h5h3fjlyor0bfm32chy3sjw8hwt
> Best,
> > Joao Boto
> >
>


dev@flink.apache.org

2023-10-11 Thread Samrat Deb
+1 (non binding )

Bests,
Samrat

On Thu, Oct 12, 2023 at 11:14 AM Jing Ge  wrote:

> +1(binding) Thanks!
>
> Best regards,
> Jing
>
> On Wed, Oct 11, 2023 at 9:34 AM Yuepeng Pan  wrote:
>
> > +1(non-binding)
> > Thanks for your driving the voting thread.
> >
> > Best Regards.
> > Yuepeng Pan
> >
> > On 2023/10/06 16:33:40 Joao Boto wrote:
> > > Hi all, Thank you to everyone for the feedback on FLIP-239[1]. Based on
> > the
> > > discussion thread [2] and some offline discussions, we have come to a
> > > consensus on the design and are ready to take a vote to contribute this
> > to
> > > Flink. I'd like to start a vote for it. The vote will be open for at
> > least
> > > 72 hours(excluding weekends, unless there is an objection or an
> > > insufficient number of votes. [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
> > > [2]https://lists.apache.org/thread/yx833h5h3fjlyor0bfm32chy3sjw8hwt
> > Best,
> > > Joao Boto
> > >
> >
>