from:"Biao Geng"

Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Biao Geng

Thanks for driving this.
+1 (non-binding)

Best,
Biao Geng


weijie guo  于2024年6月13日周四 09:48写道：

> Thanks for driving this!
>
> +1(binding)
>
> Best regards,
>
> Weijie
>
>
> Xintong Song  于2024年6月13日周四 09:04写道：
>
> > +1(binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi  >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás  >
> > > wrote:
> > >
> > > > Sounds reasonable,
> > > > +1
> > > >
> > > > Cheers,
> > > > Matyas
> > > >
> > > >
> > > > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for driving this Ferenc,
> > > > > +1 (non-binding)
> > > > >
> > > > > Regards,
> > > > > Mate
> > > > >
> > > > > Ferenc Csaky  ezt írta (időpont: 2024.
> > > jún.
> > > > > 12., Sze, 17:23):
> > > > >
> > > > > > Hello devs,
> > > > > >
> > > > > > I would like to start a vote about FLIP-464 [1]. The FLIP is
> about
> > to
> > > > > > merge back the
> > > > > > "flink run-application" functionality to "flink run", so the
> latter
> > > > will
> > > > > > be capable to deploy jobs in
> > > > > > all deployment modes. More details in the FLIP. Discussion thread
> > > [2].
> > > > > >
> > > > > > The vote will be open for at least 72 hours (until 2024 March 23
> > > 14:03
> > > > > > UTC) unless there
> > > > > > are any objections or insufficient votes.
> > > > > >
> > > > > > Thanks,Ferenc
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > > > > [2]
> > https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo

2024-06-04 Thread Biao Geng

Congratulations, Weijie!
Best,
Biao Geng


Yun Tang  于2024年6月5日周三 10:42写道：

> Congratulations, Weijie!
>
> Best
> Yun Tang
> 
> From: Hangxiang Yu 
> Sent: Wednesday, June 5, 2024 10:00
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo
>
> Congratulations, Weijie!
>
> On Tue, Jun 4, 2024 at 11:40 PM Zhanghao Chen 
> wrote:
>
> > Congrats, Weijie!
> >
> > Best,
> > Zhanghao Chen
> > 
> > From: Hang Ruan 
> > Sent: Tuesday, June 4, 2024 16:37
> > To: dev@flink.apache.org 
> > Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo
> >
> > Congratulations Weijie!
> >
> > Best,
> > Hang
> >
> > Yanfei Lei  于2024年6月4日周二 16:24写道：
> >
> > > Congratulations!
> > >
> > > Best,
> > > Yanfei
> > >
> > > Leonard Xu  于2024年6月4日周二 16:20写道：
> > > >
> > > > Congratulations！
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > 2024年6月4日 下午4:02，Yangze Guo  写道：
> > > > >
> > > > > Congratulations!
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Tue, Jun 4, 2024 at 4:00 PM Weihua Hu 
> > > wrote:
> > > > >>
> > > > >> Congratulations, Weijie!
> > > > >>
> > > > >> Best,
> > > > >> Weihua
> > > > >>
> > > > >>
> > > > >> On Tue, Jun 4, 2024 at 3:03 PM Yuxin Tan 
> > > wrote:
> > > > >>
> > > > >>> Congratulations, Weijie!
> > > > >>>
> > > > >>> Best,
> > > > >>> Yuxin
> > > > >>>
> > > > >>>
> > > > >>> Yuepeng Pan  于2024年6月4日周二 14:57写道：
> > > > >>>
> > > > >>>> Congratulations !
> > > > >>>>
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Yuepeng Pan
> > > > >>>>
> > > > >>>> At 2024-06-04 14:45:45, "Xintong Song" 
> > > wrote:
> > > > >>>>> Hi everyone,
> > > > >>>>>
> > > > >>>>> On behalf of the PMC, I'm very happy to announce that Weijie
> Guo
> > > has
> > > > >>>> joined
> > > > >>>>> the Flink PMC!
> > > > >>>>>
> > > > >>>>> Weijie has been an active member of the Apache Flink community
> > for
> > > many
> > > > >>>>> years. He has made significant contributions in many
> components,
> > > > >>> including
> > > > >>>>> runtime, shuffle, sdk, connectors, etc. He has driven /
> > > participated in
> > > > >>>>> many FLIPs, authored and reviewed hundreds of PRs, been
> > > consistently
> > > > >>>> active
> > > > >>>>> on mailing lists, and also helped with release management of
> 1.20
> > > and
> > > > >>>>> several other bugfix releases.
> > > > >>>>>
> > > > >>>>> Congratulations and welcome Weijie!
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>>
> > > > >>>>> Xintong (on behalf of the Flink PMC)
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>
>
> --
> Best,
> Hangxiang.
>

Re: Best Practices? Fault Isolation for Processing Large Number of Same-Shaped Input Kafka Topics in a Big Flink Job

2024-05-12 Thread Biao Geng

Hi Kevin,

I think the question is valuable and It looks like this question can be
posted at the user email list to receive more feedback.

As for the question, I just want to share some observations:
1. When there are hundreds of data pipelines, it is nearly impossible to
make all of them work properly for a long time. So we may need to consider
the failover strategy more carefully. The Exponential Delay Restart
Strategy may be a good choice or you can check the doc for more details:
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/
. Another point is if in your sql job, all of these topic are processed
parallely and there is no operator would try to connect them, make sure
that the config 'jobmanager.execution.failover-strategy' is set to 'region'
so that the failure in one pipeline would not lead other pipelines to be
restarted.
2. Sometimes logic isolation may not be enough, and I would say that for
some mission-critical pipelines, it is better to keep them in a dedicated
flink job and sometimes a backup job is also required.
3. When we use 'hundreds of kafka topics', there is also one point worth
mentioning: make sure the topics are distributed or replicated correctly.
When upstream kafka brokers fail due to some common reasons, the downstream
flink job can do nothing.

Hope these points help!

Best,
Biao Geng

Kevin Lam  于2024年5月9日周四 03:52写道：

> Hi everyone,
>
> I'm currently prototyping on a project where we need to process a large
> number of Kafka input topics (say, a couple of hundred), all of which share
> the same DataType/Schema.
>
> Our objective is to run the same Flink SQL on all of the input topics, but
> I am concerned about doing this in a single large Flink SQL application for
> fault isolation purposes. We'd like to limit the "blast radius" in cases of
> data issues or "poison pills" in any particular Kafka topic — meaning, if
> one topic runs into a problem, it shouldn’t compromise or halt the
> processing of the others.
>
> At the same time, we are concerned about the operational toil associated
> with managing hundreds of Flink jobs that are really one logical
> application.
>
> Has anyone here tackled a similar challenge? If so:
>
>1. How did you design your solution to handle a vast number of topics
>without creating a heavy management burden?
>2. What strategies or patterns have you found effective in isolating
>issues within a specific topic so that they do not affect the
> processing of
>others?
>3. Are there specific configurations or tools within the Flink ecosystem
>that you'd recommend to efficiently manage this scenario?
>
> Any examples, suggestions, or references to relevant documentation would be
> helpful. Thank you in advance for your time and help!
>

[jira] [Created] (FLINK-35273) PyFlink's LocalZonedTimestampType should respect timezone set by set_local_timezone

2024-04-30 Thread Biao Geng (Jira)

Biao Geng created FLINK-35273:
-

 Summary: PyFlink's LocalZonedTimestampType should respect timezone 
set by set_local_timezone
 Key: FLINK-35273
 URL: https://issues.apache.org/jira/browse/FLINK-35273
 Project: Flink
  Issue Type: Bug
Reporter: Biao Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-26 Thread Biao Geng

+1 (non-binding)

Best,
Biao Geng


gongzhongqiang  于2024年4月26日周五 00:41写道：

> +1( (non-binding)
>
>
> Best,
> Zhongqiang Gong
>
> Mate Czagany  于2024年4月24日周三 16:06写道：
>
> > Hi everyone,
> >
> > I'd like to start a vote on the FLIP-446: Kubernetes Operator State
> > Snapshot CRD [1]. The discussion thread is here [2].
> >
> > The vote will be open for at least 72 hours unless there is an objection
> or
> > insufficient votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-446%3A+Kubernetes+Operator+State+Snapshot+CRD
> > [2] https://lists.apache.org/thread/q5dzjwj0qk34rbg2sczyypfhokxoc3q7
> >
> > Regards,
> > Mate
> >
>

Re: [ANNOUNCE] New Apache Flink PMC Member - Jing Ge

2024-04-14 Thread Biao Geng

Congrats, Jing!

Best,
Biao Geng

Zakelly Lan  于2024年4月15日周一 10:17写道：

> Congratulations, Jing!
>
>
> Best,
> Zakelly
>
> On Sat, Apr 13, 2024 at 12:47 AM Ferenc Csaky 
> wrote:
>
> > Congratulations, Jing!
> >
> > Best,
> > Ferenc
> >
> >
> >
> > On Friday, April 12th, 2024 at 13:54, Ron liu 
> wrote:
> >
> > >
> > >
> > > Congratulations, Jing!
> > >
> > > Best,
> > > Ron
> > >
> > > Junrui Lee jrlee@gmail.com 于2024年4月12日周五 18:54写道：
> > >
> > > > Congratulations, Jing!
> > > >
> > > > Best,
> > > > Junrui
> > > >
> > > > Aleksandr Pilipenko z3d...@gmail.com 于2024年4月12日周五 18:28写道：
> > > >
> > > > > Congratulations, Jing!
> > > > >
> > > > > Best Regards,
> > > > > Aleksandr
> >
>

Re: [VOTE] FLIP-355: Add parent dir of files to classpath using yarn.provided.lib.dirs

2023-09-14 Thread Biao Geng

+1 (non-binding)

Best,
Biao Geng

Yang Wang  于2023年9月14日周四 12:07写道：

> +1 (binding)
>
> Best,
> Yang
>
> Becket Qin  于2023年9月14日周四 11:01写道：
>
> > +1 (binding)
> >
> > Thanks for the FLIP, Archit.
> >
> > Cheers,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Thu, Sep 14, 2023 at 10:31 AM Dong Lin  wrote:
> >
> > > Thanks Archit for the FLIP.
> > >
> > > +1 (binding)
> > >
> > > Regards,
> > > Dong
> > >
> > > On Thu, Sep 14, 2023 at 1:47 AM Archit Goyal
> >  > > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for reviewing the FLIP-355 Add parent dir of files to
> classpath
> > > > using yarn.provided.lib.dirs :
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+355%3A+Add+parent+dir+of+files+to+classpath+using+yarn.provided.lib.dirs
> > > >
> > > > Following is the discussion thread :
> > > > https://lists.apache.org/thread/gv0ro4jsq4o206wg5gz9z5cww15qkvb9
> > > >
> > > > I'd like to start a vote for it. The vote will be open for at least
> 72
> > > > hours (until September 15, 12:00AM GMT) unless there is an objection
> or
> > > an
> > > > insufficient number of votes.
> > > >
> > > > Thanks,
> > > > Archit Goyal
> > > >
> > >
> >
>

Re: [Discuss] FLIP-355: Add parent dir of files to classpath using yarn.provided.lib.dirs

2023-09-12 Thread Biao Geng

+1 for the FLIP.
Another side thought is that in my experience, when users want to make
Flink TM use a specific hadoop/hive configuration, an easier way is to ship
the corresponding conf dir and set the env
variable via containerized.taskmanager.env.HADOOP_CONF_DIR.

Best,
Biao Geng

Archit Goyal  于2023年9月13日周三 08:00写道：

> Hi All,
>
> If there are no further concerns about this FLIP, I will start a vote
> thread tomorrow.
>
> Thanks,
> Archit Goyal
>
>
> From: Venkatakrishnan Sowrirajan 
> Date: Thursday, August 24, 2023 at 10:21 PM
> To: dev@flink.apache.org 
> Subject: Re: [Discuss] FLIP-355: Add parent dir of files to classpath
> using yarn.provided.lib.dirs
> Thanks Yang Wang.
>
> In that case, whenever you get a chance could you please review the PR?
>
>
> On Wed, Aug 23, 2023, 8:15 PM Yang Wang  wrote:
>
> > +1 for this FLIP
> >
> > Maybe a FLIP is an overkill for this enhancement.
> >
> >
> > Best,
> > Yang
> >
> > Venkatakrishnan Sowrirajan  于2023年8月23日周三 01:44写道：
> >
> > > Thanks for the FLIP, Archit.
> > >
> > > This is definitely quite a useful addition to the
> > *yarn.provided.lib.dirs*
> > > . +1.
> > >
> > > IMO, except for the fact that *yarn.provided.lib.dirs* (platform
> specific
> > > jars can be cached) takes only directories whereas *yarn.ship-files*
> > (user
> > > files) takes both files and dirs, the overall logic in terms of
> > > constructing the classpath in both the cases should be roughly the
> same.
> > >
> > > Referencing the PR (
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Fapache%2Fflink%2Fpull%2F23164__%3B!!IKRxdwAv5BmarQ!cgTpodngoQAdd-qu3CvbQeAwENiu1nf0eahTH-v1NhUsSn4Y7M7sVGQYSnBjB2XgaOlyzGe7XEiU3-cAOoy84Kw%24=05%7C01%7Cargoyal%40linkedin.com%7C1e626c31ecaf408575b008dba52b1c6a%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C638285376817262437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=%2F8QiBnfVJlLZ9atF1WamCsMbAaK0TzEgcCvmd85uSnk%3D=0
> <
> https://urldefense.com/v3/__https://github.com/apache/flink/pull/23164__;!!IKRxdwAv5BmarQ!cgTpodngoQAdd-qu3CvbQeAwENiu1nf0eahTH-v1NhUsSn4Y7M7sVGQYSnBjB2XgaOlyzGe7XEiU3-cAOoy84Kw$
> >
> > ) with the
> > > initial implementation you created as well here.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Tue, Aug 22, 2023 at 10:09 AM Archit Goyal
> >  > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Gentle ping if I can get a review on the FLIP.
> > > >
> > > > Thanks,
> > > > Archit Goyal
> > > >
> > > > From: Archit Goyal 
> > > > Date: Thursday, August 17, 2023 at 5:51 PM
> > > > To: dev@flink.apache.org 
> > > > Subject: [Discuss] FLIP-355: Add parent dir of files to classpath
> using
> > > > yarn.provided.lib.dirs
> > > > Hi All,
> > > >
> > > > I am opening this thread to discuss the proposal to add parent
> > > directories
> > > > of files to classpath when using yarn.provided.lib.dirs. This is
> > > documented
> > > > in FLIP-355 <
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FFLINK%2FFLIP*355*3A*Add*parent*dir*of*files*to*classpath*using*yarn.provided.lib.dirs__%3BKyUrKysrKysrKys!!IKRxdwAv5BmarQ!fFlyBKWuWwYcWfOcpLflTTi36tyHPiENIUry9J0ygaZY0VURnIs0glu5yafV0w0tfSsnOb9ZxDD9Cjw2TApTekVU%24=05%7C01%7Cargoyal%40linkedin.com%7C1e626c31ecaf408575b008dba52b1c6a%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C638285376817262437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=FfyKaJDglL1caECVWX0nX8SwhHMnejfFMLjtpo5pizo%3D=0
> <
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP*355*3A*Add*parent*dir*of*files*to*classpath*using*yarn.provided.lib.dirs__;KyUrKysrKysrKys!!IKRxdwAv5BmarQ!fFlyBKWuWwYcWfOcpLflTTi36tyHPiENIUry9J0ygaZY0VURnIs0glu5yafV0w0tfSsnOb9ZxDD9Cjw2TApTekVU$
> >
> > > > >.
> > > >
> > > > This FLIP mentions about enhancing YARN's classpath configuration to
> > > > include parent directories of files in yarn.provided.lib.dirs.
> > > >
> > > > Please feel free to reply to this email thread and share your
> opinions.
> > > >
> > > > Thanks,
> > > > Archit Goyal
> > > >
> > >
> >
>

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-06 Thread Biao Geng

+1 for the proposal.

Best,
Biao Geng

Gyula Fóra  于2023年9月6日周三 16:10写道：

> @Zhanghao Chen:
>
> I am not completely sure at this point what this will mean for 2.0 simply
> because I am also not sure what that will mean for the operator as well :)
> I think this will depend on the compatibility guarantees we can provide
> across Flink major versions in general. We have to look into that and
> tackle the question there independently.
>
> Gyula
>
> On Tue, Sep 5, 2023 at 6:12 PM Maximilian Michels  wrote:
>
> > +1 Sounds good! Four releases give a decent amount of time to migrate
> > to the next Flink version.
> >
> > On Tue, Sep 5, 2023 at 5:33 PM Őrhidi Mátyás 
> > wrote:
> > >
> > > +1
> > >
> > > On Tue, Sep 5, 2023 at 8:03 AM Thomas Weise  wrote:
> > >
> > > > +1, thanks for the proposal
> > > >
> > > > On Tue, Sep 5, 2023 at 8:13 AM Gyula Fóra 
> > wrote:
> > > >
> > > > > Hi All!
> > > > >
> > > > > @Maximilian Michels  has raised the question of
> > Flink
> > > > > version support in the operator before the last release. I would
> > like to
> > > > > open this discussion publicly so we can finalize this before the
> next
> > > > > release.
> > > > >
> > > > > Background:
> > > > > Currently the Flink Operator supports all Flink versions since
> Flink
> > > > 1.13.
> > > > > While this is great for the users, it introduces a lot of backward
> > > > > compatibility related code in the operator logic and also adds
> > > > considerable
> > > > > time to the CI. We should strike a reasonable balance here that
> > allows us
> > > > > to move forward and eliminate some of this tech debt.
> > > > >
> > > > > In the current model it is also impossible to support all features
> > for
> > > > all
> > > > > Flink versions which leads to some confusion over time.
> > > > >
> > > > > Proposal:
> > > > > Since it's a key feature of the kubernetes operator to support
> > several
> > > > > versions at the same time, I propose to support the last 4 stable
> > Flink
> > > > > minor versions. Currently this would mean to support Flink
> 1.14-1.17
> > (and
> > > > > drop 1.13 support). When Flink 1.18 is released we would drop 1.14
> > > > support
> > > > > and so on. Given the Flink release cadence this means about 2 year
> > > > support
> > > > > for each Flink version.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > >
> >
>

Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-07 Thread Biao Geng

Congrats, Hangxiang!
Best,
Biao Geng


发送自 Outlook for iOS<https://aka.ms/o0ukef>

发件人: Qingsheng Ren 
发送时间: Monday, August 7, 2023 4:23:11 PM
收件人: dev@flink.apache.org 
主题: Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

Congratulations and welcome aboard, Hangxiang!

Best,
Qingsheng

On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl 
wrote:

> Congratulations, Hangxiang! :)
>
> On Mon, Aug 7, 2023 at 10:01 AM Junrui Lee  wrote:
>
> > Congratulations, Hangxiang!
> >
> > Best,
> > Junrui
> >
> > Yun Tang  于2023年8月7日周一 15:19写道：
> >
> > > Congratulations, Hangxiang!
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Danny Cranmer 
> > > Sent: Monday, August 7, 2023 15:11
> > > To: dev 
> > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
> > >
> > > Congrats Hangxiang! Welcome to the team.
> > >
> > > Danny.
> > >
> > > On Mon, 7 Aug 2023, 08:04 Rui Fan, <1996fan...@gmail.com> wrote:
> > >
> > > > Congratulations Hangxiang!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Mon, Aug 7, 2023 at 2:58 PM Yuan Mei 
> > wrote:
> > > >
> > > > > On behalf of the PMC, I'm happy to announce Hangxiang Yu as a new
> > Flink
> > > > > Committer.
> > > > >
> > > > > Hangxiang has been active in the Flink community for more than 1.5
> > > years
> > > > > and has played an important role in developing and maintaining
> State
> > > and
> > > > > Checkpoint related features/components, including Generic
> Incremental
> > > > > Checkpoints (take great efforts to make the feature prod-ready).
> > > > Hangxiang
> > > > > is also the main driver of the FLIP-263: Resolving schema
> > > compatibility.
> > > > >
> > > > > Hangxiang is passionate about the Flink community. Besides the
> > > technical
> > > > > contribution above, he is also actively promoting Flink: talks
> about
> > > > > Generic
> > > > > Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang
> also
> > > > spent
> > > > > a good amount of time supporting users, participating in
> Jira/mailing
> > > > list
> > > > > discussions, and reviewing code.
> > > > >
> > > > > Please join me in congratulating Hangxiang for becoming a Flink
> > > > Committer!
> > > > >
> > > > > Thanks,
> > > > > Yuan Mei (on behalf of the Flink PMC)
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-07 Thread Biao Geng

Congrats, Yanfei!
Best,
Biao Geng

发送自 Outlook for iOS<https://aka.ms/o0ukef>

发件人: Qingsheng Ren 
发送时间: Monday, August 7, 2023 4:23:52 PM
收件人: dev@flink.apache.org 
主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

Congratulations and welcome, Yanfei!

Best,
Qingsheng

On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl 
wrote:

> Congratulations, Yanfei! :)
>
> On Mon, Aug 7, 2023 at 10:00 AM Junrui Lee  wrote:
>
> > Congratulations Yanfei!
> >
> > Best,
> > Junrui
> >
> > Yun Tang  于2023年8月7日周一 15:19写道：
> >
> > > Congratulations, Yanfei!
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Danny Cranmer 
> > > Sent: Monday, August 7, 2023 15:10
> > > To: dev 
> > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > >
> > > Congrats Yanfei! Welcome to the team.
> > >
> > > Danny
> > >
> > > On Mon, 7 Aug 2023, 08:03 Rui Fan, <1996fan...@gmail.com> wrote:
> > >
> > > > Congratulations Yanfei!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Mon, Aug 7, 2023 at 2:56 PM Yuan Mei 
> > wrote:
> > > >
> > > > > On behalf of the PMC, I'm happy to announce Yanfei Lei as a new
> Flink
> > > > > Committer.
> > > > >
> > > > > Yanfei has been active in the Flink community for almost two years
> > and
> > > > has
> > > > > played an important role in developing and maintaining State and
> > > > Checkpoint
> > > > > related features/components, including RocksDB Rescaling
> Performance
> > > > > Improvement and Generic Incremental Checkpoints.
> > > > >
> > > > > Yanfei also helps improve community infrastructure in many ways,
> > > > including
> > > > > migrating the Flink Daily performance benchmark to the Apache Flink
> > > slack
> > > > > channel. She is the maintainer of the benchmark and has improved
> its
> > > > > detection stability significantly. She is also one of the major
> > > > maintainers
> > > > > of the FrocksDB Repo and released FRocksDB 6.20.3 (part of Flink
> 1.17
> > > > > release). Yanfei is a very active community member, supporting
> users
> > > and
> > > > > participating
> > > > > in tons of discussions on the mailing lists.
> > > > >
> > > > > Please join me in congratulating Yanfei for becoming a Flink
> > Committer!
> > > > >
> > > > > Thanks,
> > > > > Yuan Mei (on behalf of the Flink PMC)
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-05-30 Thread Biao Geng

Thanks Paul for the proposal!I believe it would be very useful for flink
users.
After reading the FLIP, I have some questions:
1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in
Application mode? More specifically, if we use SQL client/gateway to
execute some interactive SQLs like a SELECT query, can we ask flink to use
Application mode to execute those queries after this FLIP?
2. Deployment: I believe in YARN mode, the implementation is trivial as we
can ship files via YARN's tool easily but for K8s, things can be more
complicated as Shengkai said. I have implemented a simple POC
<https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133>
based on SQL client before(i.e. consider the SQL client which supports
executing a SQL file as the SQL driver in this FLIP). One problem I have
met is how do we ship SQL files ( or Job Graph) to the k8s side. Without
such support, users have to modify the initContainer or rebuild a new K8s
image every time to fetch the SQL file. Like the flink k8s operator, one
workaround is to utilize the flink config(transforming the SQL file to a
escaped string like Weihua mentioned) which will be converted to a
ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB
<https://kubernetes.io/docs/concepts/configuration/configmap/>). Not sure
if we have better solutions.
3. Serialization of SessionState: in SessionState, there are some
unserializable fields
like org.apache.flink.table.resource.ResourceManager#userClassLoader. It
may be worthwhile to add more details about the serialization part.

Best,
Biao Geng

Paul Lam  于2023年5月31日周三 11:49写道：

> Hi Weihua,
>
> Thanks a lot for your input! Please see my comments inline.
>
> > - Is SQLRunner the better name? We use this to run a SQL Job. (Not
> strong,
> > the SQLDriver is fine for me)
>
> I’ve thought about SQL Runner but picked SQL Driver for the following
> reasons FYI:
>
> 1. I have a PythonDriver doing the same job for PyFlink [1]
> 2. Flink program's main class is sort of like Driver in JDBC which
> translates SQLs into
> databases specific languages.
>
> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
>
> > - Could we run SQL jobs using SQL in strings? Otherwise, we need to
> prepare
> > a SQL file in an image for Kubernetes application mode, which may be a
> bit
> > cumbersome.
>
> Do you mean a pass the SQL string a configuration or a program argument?
>
> I thought it might be convenient for testing propose, but not recommended
> for production,
> cause Flink SQLs could be complicated and involves lots of characters that
> need to escape.
>
> WDYT?
>
> > - I noticed that we don't specify the SQLDriver jar in the
> "run-application"
> > command. Does that mean we need to perform automatic detection in Flink?
>
> Yes! It’s like running a PyFlink job with the following command:
>
> ```
> ./bin/flink run \
>   --pyModule table.word_count \
>   --pyFiles examples/python/table
> ```
>
> The CLI determines if it’s a SQL job, if yes apply the SQL Driver
> automatically.
>
>
> [1]
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>
> Best,
> Paul Lam
>
> > 2023年5月30日 21:56，Weihua Hu  写道：
> >
> > Thanks Paul for the proposal.
> >
> > +1 for this. It is valuable in improving ease of use.
> >
> > I have a few questions.
> > - Is SQLRunner the better name? We use this to run a SQL Job. (Not
> strong,
> > the SQLDriver is fine for me)
> > - Could we run SQL jobs using SQL in strings? Otherwise, we need to
> prepare
> > a SQL file in an image for Kubernetes application mode, which may be a
> bit
> > cumbersome.
> > - I noticed that we don't specify the SQLDriver jar in the
> "run-application"
> > command. Does that mean we need to perform automatic detection in Flink?
> >
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, May 29, 2023 at 7:24 PM Paul Lam  wrote:
> >
> >> Hi team,
> >>
> >> I’d like to start a discussion about FLIP-316 [1], which introduces a
> SQL
> >> driver as the
> >> default main class for Flink SQL jobs.
> >>
> >> Currently, Flink SQL could be executed out of the box either via SQL
> >> Client/Gateway
> >> or embedded in a Flink Java/Python program.
> >>
> >> However, each one has its drawback:
> >>
> >> - SQL Client/Gateway doesn’t support the application deployment mode [2]
> >> - Flink Java/Python program requires extra work to write a non-SQL
> program
> >>
> >> Theref

Re: 您好，退订

2023-05-05 Thread Biao Geng

你好，可以通过发送包含任意内容的邮件到  dev-unsubscr...@flink.apache.org 来取消dev订阅。
Best,
Biao Geng

Fang jun  于2023年5月5日周五 14:07写道：

>
>
> Sent from my iPhone
>

Re: 退订

2023-05-05 Thread Biao Geng

你好，可以通过发送包含任意内容的邮件到  dev-unsubscr...@flink.apache.org 来取消dev订阅。
Best,
Biao Geng

houyui <2008-ho...@163.com> 于2023年5月5日周五 13:33写道：

> 您好：
>退订！

[jira] [Created] (FLINK-31964) Improve the document of Autoscaler as 1.17.0 is released

2023-04-27 Thread Biao Geng (Jira)

Biao Geng created FLINK-31964:
-

 Summary: Improve the document of Autoscaler as 1.17.0 is released
 Key: FLINK-31964
 URL: https://issues.apache.org/jira/browse/FLINK-31964
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng
 Attachments: image-2023-04-28-10-21-09-935.png

Since 1.17.0 is released and the official image is 
[available|https://hub.docker.com/layers/library/flink/1.17.0-scala_2.12-java8/images/sha256-a8bbef97ec3f7ce4fa6541d48dfe16261ee7f93f93b164c0e84644605f9ea0a3?context=explore],
 we can update the image link in the Autoscaler section.
 !image-2023-04-28-10-21-09-935.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-288: Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-24 Thread Biao Geng

+1 (non-binding)
Best,
Biao Geng

Martijn Visser  于2023年4月24日周一 20:20写道：

> +1 (binding)
>
> On Mon, Apr 24, 2023 at 4:10 AM Feng Jin  wrote:
>
> > +1 (non-binding)
> >
> >
> > Best,
> > Feng
> >
> > On Mon, Apr 24, 2023 at 9:55 AM Hang Ruan 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Hang
> > >
> > > Paul Lam  于2023年4月23日周日 11:58写道：
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Paul Lam
> > > >
> > > > > 2023年4月23日 10:57，Shammon FY  写道：
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Best,
> > > > > Shammon FY
> > > > >
> > > > > On Sun, Apr 23, 2023 at 10:35 AM Qingsheng Ren  > > > <mailto:renqs...@gmail.com>> wrote:
> > > > >
> > > > >> Thanks for pushing this FLIP forward, Hongshun!
> > > > >>
> > > > >> +1 (binding)
> > > > >>
> > > > >> Best,
> > > > >> Qingsheng
> > > > >>
> > > > >> On Fri, Apr 21, 2023 at 2:52 PM Hongshun Wang <
> > > loserwang1...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Dear Flink Developers,
> > > > >>>
> > > > >>>
> > > > >>> Thank you for providing feedback on FLIP-288: Enable Dynamic
> > > Partition
> > > > >>> Discovery by Default in Kafka Source[1] on the discussion
> > thread[2].
> > > > >>>
> > > > >>> The goal of the FLIP is to enable partition discovery by default
> > and
> > > > set
> > > > >>> EARLIEST offset strategy for later discovered partitions.
> > > > >>>
> > > > >>>
> > > > >>> I am initiating a vote for this FLIP. The vote will be open for
> at
> > > > least
> > > > >> 72
> > > > >>> hours, unless there is an objection or insufficient votes.
> > > > >>>
> > > > >>>
> > > > >>> [1]: [
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source](https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)>
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> >
> > > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > >
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > >
> > > > >> <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > >
> > > > >>> <
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > >
> > > > >>>
> > > > >>> [2]: [
> > > > >>>
> > > > >>>
> > > > >> https://lists.apache.org/thread/581f2xq5d1tlwc8gcr27gwkp3zp0wrg6]
> <
> > > > https://lists.apache.org/thread/581f2xq5d1tlwc8gcr27gwkp3zp0wrg6]>(
> > > > https://lists.apache.org/thread/581f2xq5d1tlwc8gcr27gwkp3zp0wrg6 <
> > > > https://lists.apache.org/thread/581f2xq5d1tlwc8gcr27gwkp3zp0wrg6>)
> > > > >>>
> > > > >>>
> > > > >>> Best regards,
> > > > >>> Hongshun
> > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan

2023-02-20 Thread Biao Geng

Congrats, Rui!
Best,
Biao Geng

weijie guo  于2023年2月21日周二 11:21写道：

> Congrats, Rui!
>
> Best regards,
>
> Weijie
>
>
> Leonard Xu  于2023年2月21日周二 11:03写道：
>
> > Congratulations, Rui!
> >
> > Best,
> > Leonard
> >
> > > On Feb 21, 2023, at 9:50 AM, Matt Wang  wrote:
> > >
> > > Congrats Rui
> > >
> > >
> > > --
> > >
> > > Best,
> > > Matt Wang
> > >
> > >
> > >  Replied Message 
> > > | From | yuxia |
> > > | Date | 02/21/2023 09:22 |
> > > | To | dev |
> > > | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan |
> > > Congrats Rui
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Samrat Deb" 
> > > 收件人: "dev" 
> > > 发送时间: 星期二, 2023年 2 月 21日 上午 1:09:25
> > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan
> > >
> > > Congrats Rui
> > >
> > > On Mon, 20 Feb 2023 at 10:28 PM, Anton Kalashnikov  >
> > > wrote:
> > >
> > > Congrats Rui!
> > >
> > > --
> > > Best regards,
> > > Anton Kalashnikov
> > >
> > > On 20.02.23 17:53, Matthias Pohl wrote:
> > > Congratulations, Rui :)
> > >
> > > On Mon, Feb 20, 2023 at 5:10 PM Jing Ge 
> > > wrote:
> > >
> > > Congrats Rui!
> > >
> > > On Mon, Feb 20, 2023 at 3:19 PM Piotr Nowojski 
> > > wrote:
> > >
> > > Hi, everyone
> > >
> > > On behalf of the PMC, I'm very happy to announce Rui Fan as a new Flink
> > > Committer.
> > >
> > > Rui Fan has been active on a small scale since August 2019, and ramped
> > > up
> > > his contributions in the 2nd half of 2021. He was mostly involved in
> > > quite
> > > demanding performance related work around the network stack and
> > > checkpointing, like re-using TCP connections [1], and many crucial
> > > improvements to the unaligned checkpoints. Among others: FLIP-227:
> > > Support
> > > overdraft buffer [2], Merge small ChannelState file for Unaligned
> > > Checkpoint [3], Timeout aligned to unaligned checkpoint barrier in the
> > > output buffers [4].
> > >
> > > Please join me in congratulating Rui Fan for becoming a Flink
> > > Committer!
> > >
> > > Best,
> > > Piotr Nowojski (on behalf of the Flink PMC)
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-22643
> > > [2]
> > >
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-227%3A+Support+overdraft+buffer
> > > [3] https://issues.apache.org/jira/browse/FLINK-26803
> > > [4] https://issues.apache.org/jira/browse/FLINK-27251
> > >
> > >
> >
> >
>

Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee

2023-01-09 Thread Biao Geng

Congrats, Lincoln!
Best,
Biao Geng

获取 Outlook for iOS<https://aka.ms/o0ukef>

发件人: Wencong Liu 
发送时间: Tuesday, January 10, 2023 2:39:47 PM
收件人: dev@flink.apache.org 
主题: Re:Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee

Congratulations, Lincoln!

Best regards,
Wencong














在 2023-01-10 13:25:09，"Yanfei Lei"  写道：
>Congratulations, well deserved!
>
>Best,
>Yanfei
>
>Yuan Mei  于2023年1月10日周二 13:16写道：
>
>> Congratulations, Lincoln!
>>
>> Best,
>> Yuan
>>
>> On Tue, Jan 10, 2023 at 12:23 PM Lijie Wang 
>> wrote:
>>
>> > Congratulations, Lincoln!
>> >
>> > Best,
>> > Lijie
>> >
>> > Jingsong Li  于2023年1月10日周二 12:07写道：
>> >
>> > > Congratulations, Lincoln!
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Tue, Jan 10, 2023 at 11:56 AM Leonard Xu  wrote:
>> > > >
>> > > > Congratulations, Lincoln!
>> > > >
>> > > > Impressive work in streaming semantics, well deserved!
>> > > >
>> > > >
>> > > > Best,
>> > > > Leonard
>> > > >
>> > > >
>> > > > > On Jan 10, 2023, at 11:52 AM, Jark Wu  wrote:
>> > > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > On behalf of the PMC, I'm very happy to announce Lincoln Lee as a
>> new
>> > > Flink
>> > > > > committer.
>> > > > >
>> > > > > Lincoln Lee has been a long-term Flink contributor since 2017. He
>> > > mainly
>> > > > > works on Flink
>> > > > > SQL parts and drives several important FLIPs, e.g., FLIP-232 (Retry
>> > > Async
>> > > > > I/O), FLIP-234 (
>> > > > > Retryable Lookup Join), FLIP-260 (TableFunction Finish). Besides,
>> He
>> > > also
>> > > > > contributed
>> > > > > much to Streaming Semantics, including the non-determinism problem
>> > and
>> > > the
>> > > > > message
>> > > > > ordering problem.
>> > > > >
>> > > > > Please join me in congratulating Lincoln for becoming a Flink
>> > > committer!
>> > > > >
>> > > > > Cheers,
>> > > > > Jark Wu
>> > > >
>> > >
>> >
>>

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.3.0 released

2022-12-16 Thread Biao Geng

Congratulations!
Thanks a lot for the awesome work!

Best regards,
Biao Geng


Őrhidi Mátyás  于2022年12月15日周四 01:30写道：

> The Apache Flink community is very happy to announce the release of Apache
> Flink Kubernetes Operator 1.3.0.
>
> Release highlights:
>
>- Upgrade to Fabric8 6.x.x and JOSDK 4.x.x
>- Restart unhealthy Flink clusters
>- Contribute the Flink Kubernetes Operator to OperatorHub
>- Publish flink-kubernetes-operator-api module separately
>
> Please check out the release blog post for an overview of the release:
> https://flink.apache.org/news/2022/12/14/release-kubernetes-operator-1.3.0.html
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Maven artifacts for Flink Kubernetes Operator can be found at:
> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
>
> Official Docker image for Flink Kubernetes Operator applications can be
> found at: https://hub.docker.com/r/apache/flink-kubernetes-operator
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352322
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
>
> Matyas Orhidi
>

Re: [DISCUSS] FLIP-271: Autoscaling

2022-11-05 Thread Biao Geng

Hi Max,

Thanks a lot for the FLIP. It is an extremely attractive feature!

Just some follow up questions/thoughts after reading the FLIP:
In the doc, the discussion of  the strategy of “scaling out” is thorough and 
convincing to me but it seems that “scaling down” is less discussed. I have 2 
cents for this aspect:

  1.  For source parallelisms, if the user configure a much larger value than 
normal, there should be very little pending records though it is possible to 
get optimized. But IIUC, in current algorithm, we will not take actions for 
this case as the backlog growth rate is almost zero. Is the understanding right?
  2.  Compared with “scaling out”, “scaling in” is usually more dangerous as it 
is more likely to lead to negative influence to the downstream jobs. The 
min/max load bounds should be useful. I am wondering if it is possible to have 
different strategy for “scaling in” to make it more conservative. Or more 
eagerly, allow custom autoscaling strategy(e.g. time-based strategy).

Another side thought is that to recover a job from checkpoint/savepoint, the 
new parallelism cannot be larger than max parallelism defined in the 
checkpoint(see 
this<https://github.com/apache/flink/blob/17a782c202c93343b8884cb52f4562f9c4ba593f/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/Checkpoints.java#L128>).
 Not sure if this limit should be mentioned in the FLIP.

Again, thanks for the great work and looking forward to using flink k8s 
operator with it!

Best,
Biao Geng

From: Maximilian Michels 
Date: Saturday, November 5, 2022 at 2:37 AM
To: dev 
Cc: Gyula Fóra , Thomas Weise , Marton 
Balassi , Őrhidi Mátyás 
Subject: [DISCUSS] FLIP-271: Autoscaling
Hi,

I would like to kick off the discussion on implementing autoscaling for
Flink as part of the Flink Kubernetes operator. I've outlined an approach
here which I find promising:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling

I've been discussing this approach with some of the operator contributors:
Gyula, Marton, Matyas, and Thomas (all in CC). We started prototyping an
implementation based on the current FLIP design. If that goes well, we
would like to contribute this to Flink based on the results of the
discussion here.

I'm curious to hear your thoughts.

-Max

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.2.0, release candidate #2

2022-10-06 Thread Biao Geng

+1(non-binding)
Thanks a lot for the great work.

Successfully verified the following:
- Checksums and gpg signatures of the tar files.
- No binaries in source release
- Build from source, build image from source without errors
- Helm Repo works, Helm install works
- Run HA/python example in application mode
- Check licenses in source code

Best,
Biao Geng


Maximilian Michels  于2022年10月6日周四 20:16写道：

> Turns out the issue with the Helm installation was that I was using
> cert-manager 1.9.1 instead of the recommended version 1.8.2. The operator
> now deploys cleanly in my local environment.
>
> On Thu, Oct 6, 2022 at 12:34 PM Maximilian Michels  wrote:
>
> > +1 (binding) because the source release looks good.
> >
> > I've verified the following:
> >
> > 1. Downloaded, compiled, and verified the signature of the source release
> > staged at
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.2.0-rc2/
> > 2. Verified licenses (Not a blocker: the LICENSE file does not contain a
> > link to the bundled licenses directory, this should be fixed in future
> > releases)
> > 3. Verified the Helm Chart
> >
> > The Helm chart installation resulted in the following pod events:
> >
> > Events:
> >   Type Reason   Age   From   Message
> >    --        ---
> >   Normal   Scheduled5m42s default-scheduler
> >  Successfully assigned default/flink-kubernetes-operator-54fcd9df98-645rf
> > to docker-desktop
> >   Warning  FailedMount  3m39s kubeletUnable
> to
> > attach or mount volumes: unmounted volumes=[keystore], unattached
> > volumes=[kube-api-access-pdnzw keystore flink-operator-config-volume]:
> > timed out waiting for the condition
> >   Warning  FailedMount  92s (x10 over 5m42s)  kubelet
> >  MountVolume.SetUp failed for volume "keystore" : secret
> > "webhook-server-cert" not found
> >   Warning  FailedMount  84s   kubeletUnable
> to
> > attach or mount volumes: unmounted volumes=[keystore], unattached
> > volumes=[flink-operator-config-volume kube-api-access-pdnzw keystore]:
> > timed out waiting for the condition
> >
> > Do we need to list any additional steps in the docs?
> >
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/
> >
> > -Max
> >
> > On Wed, Oct 5, 2022 at 2:16 PM Őrhidi Mátyás 
> > wrote:
> >
> >> +1 (non-binding)
> >>
> >> - Verified source distributions (except the licenses and maven
> artifacts)
> >> - Verified Helm chart and Docker image
> >> - Verified basic examples
> >>
> >> Everything seems okay to me.
> >>
> >> Cheers,
> >> Matyas
> >>
> >> On Tue, Oct 4, 2022 at 10:27 PM Gyula Fóra 
> wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > - Verified Helm repo works as expected, points to correct image tag,
> >> build,
> >> > version
> >> > - Verified examples + checked operator logs everything looks as
> expected
> >> > - Verified hashes, signatures and source release contains no binaries
> >> > - Ran built-in tests, built jars + docker image from source
> successfully
> >> >
> >> > Cheers,
> >> > Gyula
> >> >
> >> > On Sat, Oct 1, 2022 at 2:27 AM Jim Busche  wrote:
> >> >
> >> > > +1 (not-binding)
> >> > >
> >> > > Thank you Gyula,
> >> > >
> >> > >
> >> > > Helm install from flink-kubernetes-operator-1.2.0-helm.tgz looks
> good,
> >> > > logs look normal
> >> > >
> >> > > podman Dockerfile build from source looks good.
> >> > >
> >> > > twistlock security scans of the proposed image look good:
> >> > > ghcr.io/apache/flink-kubernetes-operator:95128bf
> >> > >
> >> > > UI and basic sample look good.
> >> > >
> >> > > Checksums looked good.
> >> > >
> >> > > Tested on OpenShift 4.10.25.  Will try additional versions (4.8 and
> >> 4.11)
> >> > > if I get an opportunity, but I don't expect issues.
> >> > >
> >> > >
> >> > >
> >> > > Thank you,
> >> > >
> >> > > James Busche
> >> > >
> >> >
> >>
> >
>

[jira] [Created] (FLINK-29529) Update flink version in flink-python-example of flink k8s operator

2022-10-06 Thread Biao Geng (Jira)

Biao Geng created FLINK-29529:
-

 Summary: Update flink version in flink-python-example of flink k8s 
operator
 Key: FLINK-29529
 URL: https://issues.apache.org/jira/browse/FLINK-29529
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


Currently, we hardcoded the version of both flink image and pyflink pip package 
as 1.15.0 in the example's Dockerfile. It is not the best practice as the flink 
has new 1.15.x releases.
We had better do following improvements:
{{FROM flink:1.15.0 -> FROM flink:1.15}}
{{RUN pip3 install apache-flink==1.15.0 -> RUN pip install 
"apache-flink>=1.15.0,<1.16.0"}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-29362) Allow loading dynamic config for kerberos authentication in CliFrontend

2022-09-20 Thread Biao Geng (Jira)

Biao Geng created FLINK-29362:
-

 Summary: Allow loading dynamic config for kerberos authentication 
in CliFrontend
 Key: FLINK-29362
 URL: https://issues.apache.org/jira/browse/FLINK-29362
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client
Reporter: Biao Geng


In the 
[code|https://github.com/apache/flink/blob/97f5a45cd035fbae37a7468c6f771451ddb4a0a4/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1167],
 Flink's client will try to {{SecurityUtils.install(new 
SecurityConfiguration(cli.configuration));}} with configs(e.g. 
{{security.kerberos.login.principal}} and {{security.kerberos.login.keytab}}) 
from only flink-conf.yaml.
If users specify the above 2 config via -D option, it will not work as 
{{cli.parseAndRun(args)}} will be executed after installing security configs 
from flink-conf.yaml.
However, if a user specify principal A in client's flink-conf.yaml and use -D 
option to specify principal B, the launched YARN container will use principal B 
though the job is submitted in client end with principal A.

Such behavior can be misleading as Flink provides 2 ways to set a config but 
does not keep consistency between client and cluster. It also influence users 
who want use flink with kerberos as they must modify flink-conf.yaml if they 
want to use another kerberos user.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28769) Flink History Server show wrong name of batch jobs

2022-08-01 Thread Biao Geng (Jira)

Biao Geng created FLINK-28769:
-

 Summary: Flink History Server show wrong name of batch jobs
 Key: FLINK-28769
 URL: https://issues.apache.org/jira/browse/FLINK-28769
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Reporter: Biao Geng
 Attachments: image-2022-08-02-00-41-51-815.png

When running {{examples/batch/WordCount.jar}} using flink1.15 and 1.16 together 
with history server started, the history server shows default name(e.g. Flink 
Java Job at Tue Aug 02.. ) of the batch job instead of the name( "WordCount 
Example" ) specified in the java code.
But for {{examples/streaming/WordCount.jar}}, the job name in history server is 
correct.

It looks like that 
{{org.apache.flink.api.java.ExecutionEnvironment#executeAsync(java.lang.String)}}
 does not set job name as what 
{{org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#execute(java.lang.String)}}
 does(e.g. streamGraph.setJobName(jobName); ).


!image-2022-08-02-00-41-51-815.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28503) Fix invalid link in Python FAQ Document

2022-07-12 Thread Biao Geng (Jira)

Biao Geng created FLINK-28503:
-

 Summary: Fix invalid link in Python FAQ Document
 Key: FLINK-28503
 URL: https://issues.apache.org/jira/browse/FLINK-28503
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Project Website
Reporter: Biao Geng
 Attachments: image-2022-07-12-14-51-01-434.png

[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/faq/#preparing-python-virtual-environment]

The script for setting pyflink virtual environment is invalid now. The 
candidate is 
[https://nightlies.apache.org/flink/flink-docs-release-1.12/downloads/setup-pyflink-virtual-env.sh]
 or we can add this short script in the doc website directly.

!image-2022-07-12-14-51-01-434.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28495) Fix typos or mistakes of Flink CEP Document in the official website

2022-07-11 Thread Biao Geng (Jira)

Biao Geng created FLINK-28495:
-

 Summary: Fix typos or mistakes of Flink CEP Document in the 
official website
 Key: FLINK-28495
 URL: https://issues.apache.org/jira/browse/FLINK-28495
 Project: Flink
  Issue Type: Improvement
Reporter: Biao Geng


"Will generate the following matches for an input sequence: C D A1 A2 A3 D A4 
B. with combinations enabled: {C A1 B}, {C A1 A2 B}, {C A1 A3 B}, {C A1 A4 B}, 
{C A1 A2 A3 B}, {C A1 A2 A4 B}, {C A1 A3 A4 B}, {C A1 A2 A3 A4 B}" -> "Will 
generate the following matches for an input sequence: C D A1 A2 A3 D A4 B. with 
combinations enabled: {C A1 B}, {C A1 A2 B}, {C A1 A3 B}, {C A1 A4 B}, {C A1 A2 
A3 B}, {C A1 A2 A4 B}, {C A1 A3 A4 B}, {C A1 A2 A3 A4 B}, {C A2 B}, {C A2 A3 
B}, {C A2 A4 B}, {C A2 A3 A4 B}, {C A3 B}, {C A3 A4 B}, {C A4 B}"
"For SKIP_TO_FIRST/LAST there are two options how to handle cases when there 
are no elements mapped to the specified variable." -> "For SKIP_TO_FIRST/LAST 
there are two options how to handle cases when there are no events mapped to 
the *PatternName*."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28025) Document the change of serviceAccount in upgrading doc of k8s opeator

2022-06-13 Thread Biao Geng (Jira)

Biao Geng created FLINK-28025:
-

 Summary: Document the change of serviceAccount in upgrading doc of 
k8s opeator
 Key: FLINK-28025
 URL: https://issues.apache.org/jira/browse/FLINK-28025
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


Since 1.0.0, we require users to specify serviceAccount and add corresponding 
validation.

According to the experience of [~czchen] , we had better document such change 
in the upgrading doc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27894) Build flink-connector-hive failed using Maven@3.8.5

2022-06-03 Thread Biao Geng (Jira)

Biao Geng created FLINK-27894:
-

 Summary: Build flink-connector-hive failed using Maven@3.8.5
 Key: FLINK-27894
 URL: https://issues.apache.org/jira/browse/FLINK-27894
 Project: Flink
  Issue Type: Improvement
Reporter: Biao Geng


When I tried to build flink project locally with Java8 and Maven3.8.5, I met 
such error:


{code:java}
[ERROR] Failed to execute goal on project flink-connector-hive_2.12: Could not 
resolve dependencies for project 
org.apache.flink:flink-connector-hive_2.12:jar:1.16-SNAPSHOT: Failed to collect 
dependencies at org.apache.hive:hive-exec:jar:2.3.9 -> 
org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to read 
artifact descriptor for 
org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Could not transfer 
artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to 
maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: 
[repository.jboss.org 
(http://repository.jboss.org/nexus/content/groups/public/, default, disabled), 
conjars (http://conjars.org/repo, default, releases+snapshots), 
apache.snapshots (http://repository.apache.org/snapshots, default, snapshots)] 
-> [Help 1]

{code}

After some investigation, the reason may be that Maven 3.8.1 disables support 
for repositories using "http" protocol. Due to [NIFI-8398], one possible 
solution is adding 
{code:xml}



conjars
https://conjars.org/repo


{code}
 in the pom.xml of flink-connector-hive module.




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.0.0, release candidate #4

2022-06-01 Thread Biao Geng

Hi there,
+1 (non-binding)


Successfully verified the following:
- Checksums and gpg signatures of the tar files.
- No binaries in source release
- Build from source, build image from source
- Helm Repo works, Helm install works, docker image matches release commit
tag
- Submit example applications without errors
- Check licenses in the docs dir in source code
- Submit ~50 streaming jobs using the same k8s operator deployment and the
whole cluster works fine

Best,
Biao Geng

Márton Balassi  于2022年6月1日周三 22:12写道：

> Hi team,
>
> +1 (binding)
>
> Verified the following:
>
> - NOTICE file looks good :-)
> - Signatures, Hashes
> - No binaries in source release
> - Helm Repo works, Helm install works, docker image matches release commit
> tag
> - Build from source, build container from source
> - Submit example application, session and session job deployments without
> errors
>
> Best,
> Marton
>
> On Wed, Jun 1, 2022 at 1:31 PM Gyula Fóra  wrote:
>
> > Hi Yang!
> >
> > +1 (binding)
> >
> > Thank you for fixing the NOTICE file issue
> >
> > Successfully verified the following:
> > - Signatures, Hashes
> > - No binaries in source release
> > - Helm Repo works, Helm install works, docker image matches release
> commit
> > tag
> > - Build from source, build container from source
> > - Submit example application, session and session job deployments without
> > errors
> > - Reviewed website PR
> >
> > Cheers,
> > Gyula
> >
> > Thank you!
> > Gyula
> >
> > On Wed, Jun 1, 2022 at 12:16 PM Yang Wang  wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #4 for the version
> 1.0.0
> > of
> > > Apache Flink Kubernetes Operator,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Kubernetes Operator canonical source distribution (including the
> > > Dockerfile), to be deployed to the release repository at
> dist.apache.org
> > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > repository
> > > at dist.apache.org
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > d) Docker image to be pushed to dockerhub
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a,b) can be found in the corresponding dev
> repository
> > > at dist.apache.org [1]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [2]
> > > * The docker image for d) is staged on github [7]
> > >
> > > All artifacts are signed with the key
> > > 2FF2977BBBFFDF283C6FE7C6A301006F3591EE2C [3]
> > >
> > > Other links for your review:
> > > * JIRA release notes [4]
> > > * source code tag "release-1.0.0-rc4" [5]
> > > * PR to update the website Downloads page to include Kubernetes
> Operator
> > > links [6]
> > >
> > > **Vote Duration**
> > >
> > > Since there's no functional changes from release candidate #3, the
> voting
> > > time will run for 48 hours.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > **Note on Verification**
> > >
> > > You can follow the basic verification guide here[8].
> > > Note that you don't need to verify everything yourself, but please make
> > > note of what you have tested together with your +- vote.
> > >
> > > Thanks,
> > > Yang
> > >
> > > [1]
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.0.0-rc4/
> > > [2]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1506/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351500
> > > [5]
> > >
> >
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.0.0-rc4
> > > [6] https://github.com/apache/flink-web/pull/542
> > > [7] ghcr.io/apache/flink-kubernetes-operator:fa2cd14
> > > [8]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
> > >
> >
>

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.0.0, release candidate #1

2022-05-24 Thread Biao Geng

Hi Yang,
Thanks for the work!
I successfully verified these items:
1. Verify that the checksums and GPG files are intact
2. Verify that the source distributions do not contain any binaries
3. Build the source distribution to ensure all source files
4. Validate the Maven artifacts do not contain any external dependency with jar
tf
5. Verify Helm chart can be installed without overriding any parameters
using helm repo add operator-1.0.0-rc1
https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.0.0-rc1/
&& helm install flink-kubernetes-operator
operator-1.0.0-rc1/flink-kubernetes-operator
6. Verify the job using  examples/basic-checkpoint-ha.yaml and check that
if we manually updated the configmap `flink-operator-config`, the following
submitted job will take the new default configuration
7. Verified operator logs does not contain unexpected things

I have not found new issues besides what Gyula has mentioned. I will follow
our jira list to validate other new features or improvements.

Best,
Biao Geng


Yang Wang  于2022年5月24日周二 16:08写道：

> Thanks Gyula for sharing the feedback.
>
> I have created two tickets[1][2] to fix the problems and will integrate
> them in the next release candidate.
>
> I am not closing this vote and waiting for other verification comments.
>
> [1]. https://issues.apache.org/jira/browse/FLINK-27746
> [2]. https://issues.apache.org/jira/browse/FLINK-27747
>
> Best,
> Yang
>
> Gyula Fóra  于2022年5月24日周二 01:03写道：
>
> > Hi Yang!
> >
> > Thank you for preparing the RC.
> >
> > I have successfully verified the following:
> > - Signatures, Hashes
> > - No binaries in source release
> > - Helm Repo works, Helm install works, docker image matches release
> commit
> > tag
> > - Build from source
> > - Submit example job without errors
> >
> > Some problems that I have found:
> >  - In the Helm chart release the Chart.yaml file doesn't have an apache
> > license header (the same file in the source release has it)
> >  - I could not build the Docker image from the source release, getting
> the
> > following error:
> >
> >
> > > [build 11/14] COPY .git ./.git:
> >
> > --
> >
> > failed to compute cache key: "/.git" not found: not found
> >
> >
> > I will continue with further functional / manual verification.
> >
> >
> > Cheers,
> >
> > Gyula
> >
> > On Mon, May 23, 2022 at 5:58 AM Yang Wang  wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> 1.0.0
> > of
> > > Apache Flink Kubernetes Operator,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Kubernetes Operator canonical source distribution (including the
> > > Dockerfile), to be deployed to the release repository at
> dist.apache.org
> > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > repository
> > > at dist.apache.org
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > d) Docker image to be pushed to dockerhub
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a,b) can be found in the corresponding dev
> repository
> > > at dist.apache.org [1]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [2]
> > > * The docker image for d) is staged on github [7]
> > >
> > > All artifacts are signed with the key
> > > 2FF2977BBBFFDF283C6FE7C6A301006F3591EE2C [3]
> > >
> > > Other links for your review:
> > > * JIRA release notes [4]
> > > * source code tag "release-1.0.0-rc1" [5]
> > > * PR to update the website Downloads page to include Kubernetes
> Operator
> > > links [6]
> > >
> > > **Vote Duration**
> > >
> > > The voting time will run for at least 72 hours.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > **Note on Verification**
> > >
> > > You can follow the basic verification guide here[8].
> > > Note that you don't need to verify everything yourself, but please make
> > > note of what you have tested toge

Re: Job Logs - Yarn Application Mode

2022-05-19 Thread Biao Geng

Hi there,
@Zain, Weihua's suggestion should be able to fulfill the request to check
JM logs. If you do want to use YARN cli for running Flink applications, it
is possible to check JM's log with the YARN command like:
*yarn logs -applicationId application_xxx_yyy -am -1 -logFiles
jobmanager.log*
For TM log, command would be like:
* yarn logs -applicationId  -containerId   -logFiles
taskmanager.log*
Note, it is not super easy to find the container id of TM. Some workaround
would be to check JM's log first and get the container id for TM from that.
You can also learn more about the details of above commands from *yarn logs
-help*

@Shengkai, yes, you are right the actual JM address is managed by YARN. To
access the JM launched by YARN, users need to access YARN web ui to find
the YARN application by applicationId and then click 'application master
url' of that application to be redirected to Flink web ui.

Best,
Biao Geng

Shengkai Fang  于2022年5月20日周五 10:59写道：

> Hi.
>
> I am not familiar with the YARN application mode. Because the job manager
> is started when submit the jobs. So how can users know the address of the
> JM? Do we need to look up the Yarn UI to search the submitted job with the
> JobID?
>
> Best,
> Shengkai
>
> Weihua Hu  于2022年5月20日周五 10:23写道：
>
>> Hi,
>> You can get the logs from Flink Web UI if job is running.
>> Best,
>> Weihua
>>
>> 2022年5月19日 下午10:56，Zain Haider Nemati  写道：
>>
>> Hey All,
>> How can I check logs for my job when it is running in application mode
>> via yarn
>>
>>
>>

[jira] [Created] (FLINK-27615) Document the minimum supported version of k8s for flink k8s operator

2022-05-14 Thread Biao Geng (Jira)

Biao Geng created FLINK-27615:
-

 Summary: Document the minimum supported version of k8s for flink 
k8s operator
 Key: FLINK-27615
 URL: https://issues.apache.org/jira/browse/FLINK-27615
 Project: Flink
  Issue Type: Improvement
Reporter: Biao Geng


In our webhook, to support {{{}watchNamespaces{}}}, we rely on the 
{{kubernetes.io/metadata.name}} to filter the validation requests. However, 
this label will be automatically added to a namespace only since k8s 1.21.1 due 
to k8s' [release 
notes|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1211]
 .  If users run the flink k8s operator on older k8s versions, they have add 
such label by themselevs to support the feature of namespaceSelector in our 
webhook.

As a result, if we want to support the feature defaultly, we may need to 
emphasize that users should use k8s >= v1.21.1  to run the flink k8s operator.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27329) Add default value of replica of JM pod and remove declaring it in example yamls

2022-04-20 Thread Biao Geng (Jira)

Biao Geng created FLINK-27329:
-

 Summary: Add default value of replica of JM pod and remove 
declaring it in example yamls
 Key: FLINK-27329
 URL: https://issues.apache.org/jira/browse/FLINK-27329
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


Currently, we do not explicitly set the default value of `replica` in 
`JobManagerSpec`. As a result, Java sets the default value to be zero. 
Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
to be 1. 
After a deeper look when debugging the exception thrown in FLINK-27310, we find 
it would be better to set the default value to 1 for `replica` fields and 
remove the declaration in examples due to following reasons:
1. A normal Session or Application cluster should have at least one JM. The 
current default value, zero, does not follow the common case.
2. One JM can work for k8s HA mode as well and if users really want to launch a 
standby JM for faster recorvery, they can declare the `replica` field in the 
yaml file. In examples, we just use the new default valu(i.e. 1) should be fine.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27309) Allow to load default flink configs in the k8s operator dynamically

2022-04-19 Thread Biao Geng (Jira)

Biao Geng created FLINK-27309:
-

 Summary: Allow to load default flink configs in the k8s operator 
dynamically
 Key: FLINK-27309
 URL: https://issues.apache.org/jira/browse/FLINK-27309
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


Current default configs used by the k8s operator will be saved in the 
/opt/flink/conf dir in the k8s operator pod and will be loaded only once when 
the operator is created.
Since the flink k8s operator could be a long running service and users may want 
to modify the default configs(e.g the metric reporter sampling interval) for 
newly created deployments, it may better to load the default configs 
dynamically(i.e. parsing the latest /opt/flink/conf/flink-conf.yaml) in the 
{{ReconcilerFactory}} and {{ObserverFactory}}, instead of redeploying the 
operator.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27109) The naming pattern of ClusterRole in Flink K8s operator should consider namespace

2022-04-07 Thread Biao Geng (Jira)

Biao Geng created FLINK-27109:
-

 Summary: The naming pattern of ClusterRole in Flink K8s operator 
should consider namespace
 Key: FLINK-27109
 URL: https://issues.apache.org/jira/browse/FLINK-27109
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


As the 
[doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
 of k8s said, ClusterRole is one kind of non-namespaced resource. 
In our helm chart, we now define the ClusterRole with name 'flink-operator' and 
the namespace field in metadata will be omitted. As a result, if a user wants 
to install multiple flink-kubernetes-operator in different namespace, the 
ClusterRole 'flink-operator' will be created multiple times. 
Errors like
{quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
already exists. Unable to continue with install: ClusterRole "flink-operator" 
in namespace "" exists and cannot be imported into the current release: invalid 
ownership metadata; annotation validation error: key 
"meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current value 
is "default"
{quote}
will be thrown.

One solution could be adding the namespace as a postfix in the name of 
ClusterRole.
Another possible solution is to add if else check to avoid creating existed 
resource.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27009) Support SQL job submission in flink kubernetes opeartor

2022-04-02 Thread Biao Geng (Jira)

Biao Geng created FLINK-27009:
-

 Summary: Support SQL job submission in flink kubernetes opeartor
 Key: FLINK-27009
 URL: https://issues.apache.org/jira/browse/FLINK-27009
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Biao Geng


Currently, the flink kubernetes opeartor is for jar job using application or 
session cluster. For SQL job, there is no out of box solution in the operator.  
One simple and short-term solution is to wrap the SQL script into a jar job 
using table API with limitation.
The long-term solution may work with 
[FLINK-26541|https://issues.apache.org/jira/browse/FLINK-26541] to achieve the 
full support.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [VOTE] Apache Flink Kubernetes Operator Release 0.1.0, release candidate #3

2022-03-31 Thread Biao Geng

+1 (non-binding)

1. Verify that the checksums and GPG files
2. Verify that the source distributions do not contain any binaries
3. Build the source distribution to ensure all source files
4. Verify that POM files and the README.md file do not have anything
unexpected.
6. Verify that chart and appVersion matches the target release
7. Verify Helm chart can be installed without overriding any parameters
using helm install flink-kubernetes-operator
flink-kubernetes-operator-0.1.0-rc3/flink-kubernetes-operator
8. Verify docker image building
9. Verify the job using  examples/basic-checkpoint-ha.yaml and check
updated job config(e.g. parallelism) can take effect when using last-state
mode
10. Verified operator logs does not contain unexpected things

For some of the included examples:

Submit yaml
Submit job / spec upgrade
Validate operator logs
Validate deployment status
For stateful examples

Verify stateful upgrade uses the correct checkpoint/savepoint
Trigger manual savepoint
Gyula Fóra  于2022年3月31日周四 18:18写道：

> +1 (binding)
>
> Verified the following:
>
>- Verified source distribution do not contain binaries
>- Verified LICENSES and NOTICES
>- Built from source, verified maven and helm chart versions
>- Verified helm chart points to correct docker image and deploys it by
>default
>- Verified maven artifacts correctly deployed to staging repository and
>contents
>- Verified staging repo as helm repo, helm installation and
>basic/checkpointing examples with upgrades and manual savepoints
>- Verified operator logs and cr status does not contain unexpected
>things
>
> Cheers,
> Gyula
>
> On Thu, Mar 31, 2022 at 10:54 AM Yang Wang  wrote:
>
> > +1 (non-binding)
> >
> > Verified via the following steps:
> >
> > * Verify checksums and GPG signatures
> > * Verify that the source distributions do not contain any binaries
> > * Build source distribution successfully
> > * Verify all the POM version is 0.1.0
> >
> > * License check, the jars bundled in docker image and maven artifacts
> have
> > correct NOTICE and licenses
> >
> > # Functionality verification
> > * Install flink-kubernetes-operator via helm
> > - helm repo add flink-kubernetes-operator-0.1.0-rc3
> >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-0.1.0-rc3
> > - helm install flink-kubernetes-operator
> > flink-kubernetes-operator-0.1.0-rc3/flink-kubernetes-operator
> >
> > * Apply a new FlinkDeployment CR with HA and ingress enabled, Flink webUI
> > normal
> > - kubectl apply -f
> >
> >
> https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-0.1/e2e-tests/data/cr.yaml
> >
> > * Upgrade FlinkDeployment, new job parallelism takes effect and recover
> > from latest checkpoint
> > - kubectl patch flinkdep flink-example-statemachine --type merge
> > --patch '{"spec":{"job": {"parallelism": 1 } } }'
> >
> > * Verify manual savepoint trigger
> > - kubectl patch flinkdep flink-example-statemachine --type merge
> > --patch '{"spec":{"job": {"savepointTriggerNonce": 1 } } }'
> >
> > * Suspend a FlinkDeployment
> > - kubectl patch flinkdep flink-example-statemachine --type merge
> > --patch '{"spec":{"job": {"state": "suspended" } } }'
> >
> >
> > Best,
> > Yang
> >
> > Márton Balassi  于2022年3月31日周四 01:01写道：
> >
> > > +1 (binding)
> > >
> > > Verified the following:
> > >
> > >- shasums
> > >- gpg signatures
> > >- source does not contain any binaries
> > >- built from source
> > >- deployed via helm after adding the distribution webserver endpoint
> > as
> > >a helm registry
> > >- all relevant files have license headers
> > >
> > >
> > > On Wed, Mar 30, 2022 at 4:39 PM Gyula Fóra  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #3 for the version
> > 0.1.0
> > > of
> > > > Apache Flink Kubernetes Operator,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > **Release Overview**
> > > >
> > > > As an overview, the release consists of the following:
> > > > a) Kubernetes Operator canonical source distribution (including the
> > > > Dockerfile), to be deployed to the release repository at
> > dist.apache.org
> > > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > > repository
> > > > at dist.apache.org
> > > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > > d) Docker image to be pushed to dockerhub
> > > >
> > > > **Staging Areas to Review**
> > > >
> > > > The staging areas containing the above mentioned artifacts are as
> > > follows,
> > > > for your review:
> > > > * All artifacts for a,b) can be found in the corresponding dev
> > repository
> > > > at dist.apache.org [1]
> > > > * All artifacts for c) can be found at the Apache Nexus Repository
> [2]
> > > > * The docker image is staged on github [7]
> > > >
> > > > All

Re: [VOTE] Apache Flink Kubernetes Operator Release 0.1.0, release candidate #2

2022-03-30 Thread Biao Geng

Thanks Marton,
I vote for consistency and `flink-kubernetes-operator` as well.
`flink-operator` have been used widely in docker hub and other k8s operator
implementations, which may confuse users.

Best,
Biao

Gyula Fóra  于2022年3月30日周三 16:26写道：

> Thanks Marton!
> I think we should aim for consistency and easy discoverability. Since we
> use the name `flink-kubernetes-operator` for the project and everywhere on
> the website to refer to it, I suggest adopting it as the name also.
>
> This should be a very simple change that we can easily do before the next
> rc if there are no objections.
>
> Gyula
>
> On Wed, Mar 30, 2022 at 10:10 AM Márton Balassi 
> wrote:
>
> > Hi team,
> >
> > I would like to ask for your input in naming the operator docker image
> and
> > helm chart. [1]
> >
> > For the sake of brevity when we started the Kubernetes Operator work we
> > named the docker image and the helm chart simply flink-operator, while
> the
> > git repository is named flink-kubernetes-operator. [2]
> > Now closing in on our preview release it makes sense to reconsider this,
> it
> > might be preferred to name all artifacts flink-kubernetes-operator for
> the
> > sake of consistency.
> > Currently docker images of our builds are available in the GitHub
> Registry
> > tagged with the short git commit hash and the last build of select
> branches
> > is tagged with the branch name:
> >
> > ghcr.io/apache/flink-operator:439bd41ghcr.io/apache/flink-operator:main
> >
> > During the release process we plan to move the docker image to dockerhub
> > following the process established for Flink.
> > Currently the helm operator for the release candidate can be installed as
> > follows:
> >
> > helm repo add operator-rc2
> >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-0.1.0-rc2
> > helm install flink-operator operator-rc2/flink-operator
> >
> > So the helm chart itself is called flink-operator, but to follow the name
> > of the project it is packaged into
> flink-kubernetes-operator-.tgz.
> > Do you prefer flink-operator for brevity or flink-kubernetes-operator for
> > consistency? If we vote for changing the current setup we could get the
> > changes in for the next RC.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-26924
> > [2] https://github.com/apache/flink-kubernetes-operator
> > [3]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-PublishtheDockerfilesforthenewrelease
> >
> > On Wed, Mar 30, 2022 at 9:17 AM Gyula Fóra  wrote:
> >
> > > Hi Devs,
> > >
> > > I am cancelling the current vote as we have promoted
> > > https://issues.apache.org/jira/browse/FLINK-26916 as a blocker issue.
> > >
> > > Yang has found a fix that we can reasonably do in a short time frame,
> so
> > we
> > > will work on that today before creating the next RC (hopefully later
> > today)
> > >
> > > Thank you!
> > > Gyula
> > >
> > > On Tue, Mar 29, 2022 at 7:51 PM Gyula Fóra 
> wrote:
> > >
> > > > Hi Devs,
> > > >
> > > > I just want to give you heads up regarding a significant issue Matyas
> > > > found around the last-state upgrade mode:
> > > > https://issues.apache.org/jira/browse/FLINK-26916
> > > >
> > > > Long story short, the last-state mode does not actually upgrade the
> > job,
> > > > only the job/taskmanagers. The fix is non-trivial to say the least
> and
> > > > requires changes to the Flink Kubernetes HA implementations to be
> > > feasible.
> > > > Please see the jira ticket for details.
> > > >
> > > > I discussed this offline with Thomas Weise and we agreed to clearly
> > > > highlight this limitation with warnings in the documentation and the
> > > > release blogpost but not block the release on it as the fix will most
> > > > likely require Flink 1.15 if we can include the required changes
> there.
> > > >
> > > > Please continue testing the current RC :)
> > > >
> > > > Thank you all!
> > > > Gyula
> > > >
> > > >
> > > > On Tue, Mar 29, 2022 at 5:17 PM Chenya Zhang 
> > wrote:
> > > >
> > > >> +1 non-binding, thanks for the great efforts and look forward to the
> > > >> release!
> > > >>
> > > >> Chenya
> > > >>
> > > >> On Tue, Mar 29, 2022 at 2:10 AM Gyula Fóra 
> wrote:
> > > >>
> > > >> > Hi everyone,
> > > >> >
> > > >> > Please review and vote on the release candidate #2 for the version
> > > >> 0.1.0 of
> > > >> > Apache Flink Kubernetes Operator,
> > > >> > as follows:
> > > >> > [ ] +1, Approve the release
> > > >> > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > >> >
> > > >> > **Release Overview**
> > > >> >
> > > >> > As an overview, the release consists of the following:
> > > >> > a) Kubernetes Operator canonical source distribution (including
> the
> > > >> > Dockerfile), to be deployed to the release repository at
> > > >> dist.apache.org
> > > >> > b) Kubernetes Operator Helm Chart to be deployed to the release
> > > >> repository
> > > >> > at dist.apache.org
> > > >> > c)

[jira] [Created] (FLINK-26832) Output more status info for JobObserver

2022-03-23 Thread Biao Geng (Jira)

Biao Geng created FLINK-26832:
-

 Summary: Output more status info for JobObserver
 Key: FLINK-26832
 URL: https://issues.apache.org/jira/browse/FLINK-26832
 Project: Flink
  Issue Type: Sub-task
Reporter: Biao Geng


For {{JobObserver#observeFlinkJobStatus()}}, we currently only 
{{logger.info("Job status successfully updated");}}.
This is could be more informative if we output actual job status here to help 
users check the status of the Job due to flink operator's log, not only 
depending on the flink web ui.

The proposed change looks like:
{{logger.info("Job status successfully updated from {} to {}", currentState, 
targetState);}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26825) Document State Machine Graph for JM Deployment

2022-03-23 Thread Biao Geng (Jira)

Biao Geng created FLINK-26825:
-

 Summary: Document State Machine Graph for JM Deployment
 Key: FLINK-26825
 URL: https://issues.apache.org/jira/browse/FLINK-26825
 Project: Flink
  Issue Type: Sub-task
Reporter: Biao Geng


 We may need a state machine graph for JM deployment or the whole cluster like 
[lyft’s|https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md]
 in our document to help our developers refine our code later and our users 
learn about how the oeprator reconciles.

The state machine is highly probably to be changed when we introduce more 
features(e.g. rollback strategy) and we should update the doc when such change 
happens.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [VOTE]FLIP-215: Introduce FlinkSessionJob CRD in the kubernetes operator

2022-03-18 Thread Biao Geng

+1(non-binding)
Thanks for the work!

Best,
Biao Geng

Yang Wang  于2022年3月18日周五 19:01写道：

> +1(binding)
>
> Thanks for your contribution.
>
> Best,
> Yang
>
> Gyula Fóra  于2022年3月18日周五 18:44写道：
>
> > I think this is a simple and valuable addition that will also be a
> building
> > block for other important future features.
> >
> > +1
> >
> > Gyula
> >
> > On Fri, Mar 18, 2022 at 10:30 AM Aitozi  wrote:
> >
> > > Hi community:
> > > I'd like to start a vote on FLIP-215: Introduce FlinkSessionJob CRD
> > in
> > > the kubernetes operator [1] which has been discussed in the thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > not enough votes.
> > >
> > > [1]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > [2]: https://lists.apache.org/thread/fpp5m9jkr0wnjryd07xtpj13t80z99yt
> > >
> > > Best,
> > > Aitozi.
> > >
> >
>

[jira] [Created] (FLINK-26671) Update 'Developer Guide' in README

2022-03-16 Thread Biao Geng (Jira)

Biao Geng created FLINK-26671:
-

 Summary: Update 'Developer Guide' in README
 Key: FLINK-26671
 URL: https://issues.apache.org/jira/browse/FLINK-26671
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Biao Geng
 Attachments: image-2022-03-16-14-05-56-936.png

The shell command in README is now based on the root dir of the repo. In 
Developer Guide,   we should update the {{helm install ...}}  to {{helm install 
flink-operator helm/flink-operator --set 
image.repository=/flink-java-operator --set image.tag=latest}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26655) Merge isJobManagerPortReady and isJobManagerServing to check if JM pod can work correctly

2022-03-15 Thread Biao Geng (Jira)

Biao Geng created FLINK-26655:
-

 Summary: Merge isJobManagerPortReady and isJobManagerServing to 
check if JM pod can work correctly
 Key: FLINK-26655
 URL: https://issues.apache.org/jira/browse/FLINK-26655
 Project: Flink
  Issue Type: Sub-task
Reporter: Biao Geng


We now consider that JM pod can have 2 possible states after launching:
 # JM is launched but port is not ready.
 # JM is launched, port is ready but rest service is not ready.

It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.
With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [DISCUSS] Preview release for Flink Kubernetes Operator

2022-03-13 Thread Biao Geng

Hi there,

It is exciting to see the discussion of the release timeline! I agree that
the end of March is a proper date.
To make others easier get involved in this discussion, I think we may need
to provide a more straightforward feature list for the preview release. The
"Initial Feature Set" in FLIP-212
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-212:+Introduce+Flink+Kubernetes+Operator>
is
almost complete. But some new features like webhook based validate and
flink operator metric are not added and they are only tracked in the long
JIRA list. If we can update the FLIP, it may be more convenient and can
also help us write release notes later. I also created a draft
<https://github.com/bgeng777/flink-kubernetes-operator/blob/features/doc/features.md>
for myself to track completed or in-plan features. Hope it can help.

Best,
Biao Geng

Gyula Fóra  于2022年3月14日周一 04:11写道：

> @Konstantin: Yes I completely agree that for this release the API (CRD)
> should be marked experimental!
> I have opened a ticket to track this:
> https://issues.apache.org/jira/browse/FLINK-26620
>
> @Yang Wang  : I think we still have plenty of time
> to work on features like the session job before the release, would be nice
> to provide a complete story to the users.
>
> Gyula
>
> On Sun, Mar 13, 2022 at 5:17 PM Konstantin Knauf 
> wrote:
>
> > Hi everyone,
> >
> > can we mark all the APIs as experimental/alpha so that it is clear that
> > these can be broken in future releases for now? I think this would be
> very
> > important given the early stage of the project. We want to be able to
> > address shortcomings without worrying too much about backwards
> > compatibility at this stage, I believe.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Sun, Mar 13, 2022 at 7:48 AM Yang Wang  wrote:
> >
> > > Thanks Gyula for starting this discussion.
> > >
> > > Given that the core functionality is closing to stable, I am in favor
> of
> > > having the MVP release at the end of March.
> > > The first release will help us to collect more feedbacks from the
> users.
> > > Also it is a good chance to let the users know that the community is
> > trying
> > > to maintain an official Kubernetes operator :)
> > > I hope that the companies could build their own production streaming
> > > platform on top of the flink-kubernetes-operator in the future.
> > >
> > > FYI: @Wenjun Min is still working hard on supporting the Session Job in
> > > Flink Kubernetes operator, It will be great if we could include it in
> the
> > > first release.
> > > And I believe we have enough time.
> > >
> > > Moreover, I agree with you that we need to invest more time in the
> > > documentation, e2e tests, helm install optimization, logging,
> > > etc. before the release.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > >
> > > Gyula Fóra  于2022年3月12日周六 01:10写道：
> > >
> > > > Hi Team!
> > > >
> > > > I would like to discuss the timeline for the initial
> preview/milestone
> > > > release of the flink-kubernetes-operator
> > > > <https://github.com/apache/flink-kubernetes-operator> project.
> > > >
> > > > The last few weeks we have been working very hard with the community
> to
> > > > stabilize the initial feature set and I think we have made great
> > > progress.
> > > > While we are still far from a production ready-state, a preview
> release
> > > > will give us the opportunity to reach more people and gather much
> > needed
> > > > input to take this project to the next level.
> > > >
> > > > There are still a couple missing features that we need to iron out
> and
> > we
> > > > need to make sure we have proper documentation but after that I think
> > it
> > > > would be a good time for the preview release.
> > > >
> > > > I propose to aim for the first release candidate around the 25-27th
> of
> > > > March after which we should dedicate a few days for some extensive
> > > testing
> > > > and bugfixing.
> > > >
> > > > What do you think?
> > > >
> > > > Gyula
> > > >
> > >
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>

[jira] [Created] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)

Biao Geng created FLINK-26405:
-

 Summary: Add validation check of num of JM replica
 Key: FLINK-26405
 URL: https://issues.apache.org/jira/browse/FLINK-26405
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


When HA is enabled, the replicas of JM can be 1 or 2, 

When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [DISCUSS] Flink Kubernetes Operator controller flow

2022-02-28 Thread Biao Geng

Hi Gyula,
Thanks for the discussion. It also makes senses to me on the separation of
3 components and Yang's proposal.
Just 1 follow-up thought after checking your PR: in the reconcile()
method of controller, IIUC, the real flow could be
`validate->observe->reconcile->validate->observe->reconcile...". The
validation phase seems to be required only when the creation of the job
cluster and the upgrade of config. For phases like waiting the JM from
deploying to ready, it is not mandatory and thus the flow can look like
`validate->observe->reconcile->optional validate due to current
state->observe->reconcile...`

Őrhidi Mátyás  于2022年2月28日周一 21:26写道：

> It is worth looking at the controller code in the spotify operator too:
>
> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
>
> It is looping in the 'observer phase' until it reaches a stable state, then
> it performs the necessary changes.
>
> Based on this I also suggest keeping the logic in separate
> modules(Validate->Observe->Reconcile). The three components might not
> even be enough as we add more and more complexity to the code.
>
> Cheers,
> Matyas
>
>
> On Mon, Feb 28, 2022 at 2:03 PM Aitozi  wrote:
>
> > Hi, Gyula
> >   Thanks for driving this discussion. I second Yang Wang's idea that
> > it's better to make the `validator`, `observer` and `reconciler`
> > self-contained. I also prefer to define the `Observer` as an interface
> and
> > we could define the statuses that `Observer` will expose. It acts like
> the
> > observer protocol between the `Observer` and `Reconciler`.
> >
> > Best,
> > Aitozi.
> >
> > Yang Wang  于2022年2月28日周一 16:28写道：
> >
> > > Thanks for posting the discussion here.
> > >
> > >
> > > Having the components `validator` `observer` `reconciler` makes lots of
> > > sense. And the "Validate -> Observe -> Reconcile"
> > > flow seems natural to me.
> > >
> > > Regarding the implementation in the PR, instead of directly using the
> > > observer in the reconciler, I lean to let the observer
> > > exports the results to the status(e.g. jobmanager deployment status,
> rest
> > > port readiness, flink jobs status, etc.) and
> > > the reconciler reads it from the status. Then each component is more
> > > self-contained and the boundary will be clearer.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Gyula Fóra  于2022年2月28日周一 16:01写道：
> > >
> > > > Hi All!
> > > >
> > > > I would like to start a discussion thread regarding the structure of
> > > > the Kubernetes
> > > > Operator
> > > > <
> > > >
> > >
> >
> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> > > > >
> > > > controller
> > > > flow. Based on some recent PR discussions we have no clear consensus
> on
> > > the
> > > > structure and the expectations which can potentially lead to back and
> > > forth
> > > > changes and unnecessary complexity.
> > > >
> > > > *Background*
> > > > In the initial prototype we had a very basic flow:
> > > >  1. Observe flink job status
> > > >  2. (if observation successful) reconcile changes
> > > >  3. Reschedule reconcile with success/error
> > > >
> > > > This basic prototype flow could not cover all requirements and did
> not
> > > > allow for things like waiting until Jobmanager deployment is ready
> etc.
> > > >
> > > > To solve these shortcomings, some changes were introduced recently
> here
> > > > . While
> > > this
> > > > change introduced many improvements and safeguards it also completely
> > > > changed the original controller flow. Now the reconciler is
> responsible
> > > for
> > > > ensuring that it can actually reconcile by checking the deployment
> and
> > > > ports. The job status observation logic has also been moved into the
> > > actual
> > > > reconcile logic.
> > > >
> > > >
> > > > *Discussion Question*What controller flow would we like to have? Do
> we
> > > want
> > > > to separate the observer from the reconciler or keep them together?
> > > >
> > > > In my personal view, we should try to adopt a very simple flow to
> make
> > > the
> > > > operator clean and modular. If possible I would like to restore the
> > > > original flow with some modifications:
> > > >
> > > >  1. Validate deployment object
> > > >  2. Observe deployment and flink job status -> Return comprehensive
> > > status
> > > > info
> > > >  3. Reconcile deployment based on observed status and resource
> changes
> > > >  (Both 2/3 should be able to reschedule immediately if necessary)
> > > >
> > > > I think the Observer component should be able to describe the current
> > > > status of the deployment objects and the flink job to the extent that
> > the
> > > > reconciler can work with that information alone. If we do it this
> way,
> > we
> > > > can also use the status information that the

[jira] [Created] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-17 Thread Biao Geng (Jira)

Biao Geng created FLINK-26216:
-

 Summary: Make 'replicas' work in JobManager Spec
 Key: FLINK-26216
 URL: https://issues.apache.org/jira/browse/FLINK-26216
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Biao Geng


In our flink kubernetes operator's cr, we allow users to set the replica of 
JobManager. 
But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set this 
value from the yaml file and as a result, the {{replicas}} will not work and 
the default value(i.e. 1) will be applied. 
Though we believe one JM with KubernetesHaService should be enough for most HA 
cases, the {{replicas}} field of JM also makes sense since more than one JM can 
reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: Azure Pipelines are dealing with an incident, causing pipeline runs to fail

2022-02-10 Thread Biao Geng

Just notice that there are some recent pipelines that have finished
successfully[1]. It seems that this issue is mitigated.

[1]
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31194=results

Martijn Visser  于2022年2月10日周四 15:48写道：

> It looks like the issues haven't been resolved yet unfortunately.
>
> On Wed, 9 Feb 2022 at 13:15, Robert Metzger  wrote:
>
> > I filed a support request with Microsoft:
> >
> >
> https://developercommunity.visualstudio.com/t/Number-of-Microsoft-hosted-agents-droppe/1658827?from=email=21=newest
> >
> > On Wed, Feb 9, 2022 at 1:04 PM Martijn Visser 
> > wrote:
> >
> > > Unfortunately it looks like there are still failures. Will keep you
> > posted
> > >
> > > On Wed, 9 Feb 2022 at 11:51, Martijn Visser 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > The issue should now be resolved.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Wed, 9 Feb 2022 at 10:55, Martijn Visser 
> > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Please keep in mind that Azure Pipelines currently is dealing with
> an
> > > >> incident [1] which causes all CI pipeline runs on Azure to fail.
> When
> > > the
> > > >> incident has been resolved, it will be required to retrigger your
> > > pipeline
> > > >> to see if the pipeline then passes.
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Martijn Visser
> > > >> https://twitter.com/MartijnVisser82
> > > >>
> > > >> [1] https://status.dev.azure.com/_event/287959626
> > > >>
> > > >
> > >
> >
>

[jira] [Created] (FLINK-26047) Support usrlib in HDFS for YARN application mode

2022-02-09 Thread Biao Geng (Jira)

Biao Geng created FLINK-26047:
-

 Summary: Support usrlib in HDFS for YARN application mode
 Key: FLINK-26047
 URL: https://issues.apache.org/jira/browse/FLINK-26047
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Biao Geng


In YARN Application mode, we currently support using user jar and lib jar from 
HDFS. For example, we can run commands like:
{quote}./bin/flink run-application -t yarn-application \
-Dyarn.provided.lib.dirs="hdfs://myhdfs/my-remote-flink-dist-dir" \
hdfs://myhdfs/jars/my-application.jar{quote}
For {{usrlib}}, we currently only support local directory. I propose to add 
HDFS support for {{usrlib}} to work with CLASSPATH_INCLUDE_USER_JAR better. It 
can also benefit cases like using notebook to submit flink job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26030) Set FLINK_LIB_DIR to lib under working dir in YARN containers

2022-02-08 Thread Biao Geng (Jira)

Biao Geng created FLINK-26030:
-

 Summary: Set FLINK_LIB_DIR to lib under working dir in YARN 
containers
 Key: FLINK-26030
 URL: https://issues.apache.org/jira/browse/FLINK-26030
 Project: Flink
  Issue Type: Improvement
Reporter: Biao Geng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [VOTE] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-06 Thread Biao Geng

+1 (non-binding)

Best,
Biao Geng

Peter Huang  于2022年2月7日周一 14:31写道：

> +1 (non-binding)
>
>
> Best Regards
> Peter Huang
>
> On Sun, Feb 6, 2022 at 7:35 PM Yang Wang  wrote:
>
> > +1 (binding)
> >
> > Best,
> > Yang
> >
> > Xintong Song  于2022年2月7日周一 10:25写道：
> >
> > > +1 (binding)
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Feb 7, 2022 at 12:52 AM Márton Balassi <
> balassi.mar...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Sat, Feb 5, 2022 at 5:35 PM Israel Ekpo 
> > wrote:
> > > >
> > > > > I am very excited to see this.
> > > > >
> > > > > Thanks for driving the effort
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > >
> > > > > On Sat, Feb 5, 2022 at 10:53 AM Shqiprim Bunjaku <
> > > > > shqiprimbunj...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Feb 5, 2022 at 4:39 PM Chenya Zhang <
> > > > chenyazhangche...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Thanks folks for leading this effort and making it happen so
> > fast!
> > > > > > >
> > > > > > > Best,
> > > > > > > Chenya
> > > > > > >
> > > > > > > On Sat, Feb 5, 2022 at 12:02 AM Gyula Fóra 
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Thomas!
> > > > > > > >
> > > > > > > > +1 (binding) from my side
> > > > > > > >
> > > > > > > > Happy to see this effort getting some traction!
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Gyula
> > > > > > > >
> > > > > > > > On Sat, Feb 5, 2022 at 3:00 AM Thomas Weise 
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'd like to start a vote on FLIP-212: Introduce Flink
> > > Kubernetes
> > > > > > > > > Operator [1] which has been discussed in [2].
> > > > > > > > >
> > > > > > > > > The vote will be open for at least 72 hours unless there is
> > an
> > > > > > > > > objection or not enough votes.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > > > [2]
> > > > > https://lists.apache.org/thread/1z78t6rf70h45v7fbd2m93rm2y1bvh0z
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Thomas
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > > Israel Ekpo
> > > > > Lead Instructor, IzzyAcademy.com
> > > > > https://www.youtube.com/c/izzyacademy
> > > > > https://izzyacademy.com/
> > > > >
> > > >
> > >
> >
>

Re: Looking for Maintainers for Flink on YARN

2022-01-26 Thread Biao Geng

Hi Konstantin,
I am willing to be one maintainer of Flink on YARN. I have some relevant
experience in maintaining Flink on YARN clusters in Alibaba and I hope I
can  make contributions to some of the cases in your list.

Best,
Biao Geng

Konstantin Knauf 于2022年1月26日 周三17:17写道：

> Hi everyone,
>
> We are seeing an increasing number of test instabilities related to YARN
> [1]. Does someone in this group have the time to pick these up? The Flink
> Confluence contains a guide on how to triage test instability tickets.
>
> Thanks,
>
> Konstantin
>
> [1]
>
> https://issues.apache.org/jira/browse/FLINK-25514?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20%3D%20%22Deployment%20%2F%20YARN%22%20AND%20labels%20%3D%20test-stability
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/Triage+Test+Instability+Tickets
>
> On Mon, Sep 13, 2021 at 2:22 PM 柳尘  wrote:
>
> > Thanks to Konstantin for raising this question, and to Marton and Gabor
> > To strengthen!
> >
> >  If i can help
> > In order to better participate in the work, please let me know.
> >
> > the best,
> > cheng xingyuan
> >
> >
> > > 2021年7月29日 下午4:15，Konstantin Knauf  写道：
> > >
> > > Dear community,
> > >
> > > We are looking for community members, who would like to maintain
> Flink's
> > > YARN support going forward. So far, this has been handled by teams at
> > > Ververica & Alibaba. The focus of these teams has shifted over the past
> > > months so that we only have little time left for this topic. Still, we
> > > think, it is important to maintain high quality support for Flink on
> > YARN.
> > >
> > > What does "Maintaining Flink on YARN" mean? There are no known bigger
> > > efforts outstanding. We are mainly talking about addressing
> > > "test-stability" issues, bugs, version upgrades, community
> contributions
> > &
> > > smaller feature requests. The prioritization of these would be up to
> the
> > > future maintainers, except "test-stability" issues which are important
> to
> > > address for overall productivity.
> > >
> > > If a group of community members forms itself, we are happy to give an
> > > introduction to relevant pieces of the code base, principles,
> > assumptions,
> > > ... and hand over open threads.
> > >
> > > If you would like to take on this responsibility or can join this
> effort
> > in
> > > a supporting role, please reach out!
> > >
> > > Cheers,
> > >
> > > Konstantin
> > > for the Deployment & Coordination Team at Ververica
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> >
> >
>
> --
>
> Konstantin Knauf | Head of Product
>
> +49 160 91394525
>
>
> Follow us @VervericaData Ververica <https://www.ververica.com/>
>
>
> --
>
> Join Flink Forward
> <https://www.google.com/maps/search/Forward++-+The?entry=gmail=g><
> https://flink-forward.org/> - The
> <https://www.google.com/maps/search/Forward++-+The?entry=gmail=g>
> Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
> Jinwei (Kevin) Zhang
>

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-01-26 Thread Biao Geng

Hi Thomas,
Thanks a lot for the great efforts in this well-organized FLIP! After
reading the FLIP carefully, I think Yang has given some great feedback and
I just want to share some of my concerns:
# 1 Flink Native vs Standalone integration
I believe it is reasonable to support both modes in the long run but in the
FLIP and previous thread[1], it seems that we have not made a decision on
which one to implement initially. The FLIP mentioned "Maybe start with
support for Flink Native" for reusing codes in [2]. Is it the selected one
finally?
# 2 K8S StatefulSet v.s. K8S Deployment
In the CR Example, I notice that the kind we use is FlinkDeployment. I
would like to check if we have made the decision to use K8S Deployment
workload resource. As the name implies, StatefulSet is for stateful apps
while Deployment is usually for stateless apps. I think it is worthwhile to
consider the choice more carefully due to some user case in gcp
operator[3], which may influence our other design choices(like the Flink
application deletion strategy).

Again, thanks for the work and I believe this FLIP is pretty useful for
many customers and I hope I can make some contributions to this FLIP impl!

Best regard,
Biao Geng

[1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4
[2] https://github.com/wangyang0918/flink-native-k8s-operator
[3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/354

Yang Wang  于2022年1月26日周三 15:25写道：

> Thanks Thomas for creating FLIP-212 to introduce the Flink Kubernetes
> Operator.
>
> The proposal looks already very good to me and has integrated all the input
> in the previous discussion(e.g. native K8s VS standalone, Go VS java).
>
> I read the FLIP carefully and have some questions that need to be
> clarified.
>
> # How do we run a Flink job from a CR?
> 1. Start a session cluster and then followed by submitting the Flink job
> via rest API
> 2. Start a Flink application cluster which bundles one or more Flink jobs
> It is not clear enough to me which way we will choose. It seems that the
> existing google/lyft K8s operator is using #1. But I lean to #2 in the new
> introduced K8s operator.
> If #2 is the case, how could we get the job status when it finished or
> failed? Maybe FLINK-24113[1] and FLINK-25715[2] could help. Or we may need
> to enable the Flink history server[3].
>
>
> # ApplicationDeployer Interface or "flink run-application" /
> "kubernetes-session.sh"
> How do we start the Flink application or session cluster?
> It will be great if we have the public and stable interfaces for deployment
> in Flink. But currently we only have an internal interface
> *ApplicationDeployer* to deploy the application cluster and
> no interfaces for deploying session cluster.
> Of cause, we could also use the CLI command for submission. However, it
> will have poor performance when launching multiple applications.
>
>
> # Pod Template
> Is the pod template in CR same with what Flink has already supported[4]?
> Then I am afraid not the arbitrary field(e.g. cpu/memory resources) could
> take effect.
>
>
> [1]. https://issues.apache.org/jira/browse/FLINK-24113
> [2]. https://issues.apache.org/jira/browse/FLINK-25715
> [3].
>
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/advanced/historyserver/
> [4].
>
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/native_kubernetes/#pod-template
>
>
>
> Best,
> Yang
>
>
> Thomas Weise  于2022年1月25日周二 13:08写道：
>
> > Hi,
> >
> > As promised in [1] we would like to start the discussion on the
> > addition of a Kubernetes operator to the Flink project as FLIP-212:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> >
> > Please note that the FLIP is currently focussed on the overall
> > direction; the intention is to fill in more details once we converge
> > on the high level plan.
> >
> > Thanks and looking forward to a lively discussion!
> >
> > Thomas
> >
> > [1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4
> >
>

Re: [DISCUSS] Future of Per-Job Mode

2022-01-13 Thread Biao Geng

Hi Konstantin,

Thanks a lot for starting this discussion! I hope my thoughts and
experiences why users use Per-Job Mode, especially in YARN can help:
#1. Per-job mode makes managing dependencies easier: I have met some
customers who used Per-Job Mode to submit jobs with a lot of local
user-defined jars by using '-C' option directly. They do not need to upload
these jars to some remote file system(e.g. HDFS) first, which makes their
life easier.
#2. In YARN mode, currently, there are some limitations of Application Mode:
in this jira(https://issues.apache.org/jira/browse/FLINK-24897) that I am
working on, we find that YARN Application Mode do not support `usrlib` very
well, which makes it hard to use FlinkUserCodeClassLoader to load classes
in user-defined jars.

I believe above 2 points, especially #2, can be reassured as we enhance the
YARN Application Mode later but I think it is worthwhile to consider
dependency management more carefully before we make decisions.

Best,
Biao Geng


Konstantin Knauf  于2022年1月13日周四 16:32写道：

> Hi everyone,
>
> I would like to discuss and understand if the benefits of having Per-Job
> Mode in Apache Flink outweigh its drawbacks.
>
>
> *# Background: Flink's Deployment Modes*
> Flink currently has three deployment modes. They differ in the following
> dimensions:
> * main() method executed on Jobmanager or Client
> * dependencies shipped by client or bundled with all nodes
> * number of jobs per cluster & relationship between job and cluster
> lifecycle* (supported resource providers)
>
> ## Application Mode
> * main() method executed on Jobmanager
> * dependencies already need to be available on all nodes
> * dedicated cluster for all jobs executed from the same main()-method
> (Note: applications with more than one job, currently still significant
> limitations like missing high-availability). Technically, a session cluster
> dedicated to all jobs submitted from the same main() method.
> * supported by standalone, native kubernetes, YARN
>
> ## Session Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * cluster is shared by multiple jobs submitted from different clients,
> independent lifecycle
> * supported by standalone, Native Kubernetes, YARN
>
> ## Per-Job Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * dedicated cluster for a single job
> * supported by YARN only
>
>
> *# Reasons to Keep** There are use cases where you might need the
> combination of a single job per cluster, but main() method execution in the
> client. This combination is only supported by per-job mode.
> * It currently exists. Existing users will need to migrate to either
> session or application mode.
>
>
> *# Reasons to Drop** With Per-Job Mode and Application Mode we have two
> modes that for most users probably do the same thing. Specifically, for
> those users that don't care where the main() method is executed and want to
> submit a single job per cluster. Having two ways to do the same thing is
> confusing.
> * Per-Job Mode is only supported by YARN anyway. If we keep it, we should
> work towards support in Kubernetes and Standalone, too, to reduce special
> casing.
> * Dropping per-job mode would reduce complexity in the code and allow us
> to dedicate more resources to the other two deployment modes.
> * I believe with session mode and application mode we have to easily
> distinguishable and understandable deployment modes that cover Flink's use
> cases:
>* session mode: olap-style, interactive jobs/queries, short lived batch
> jobs, very small jobs, traditional cluster-centric deployment mode (fits
> the "Hadoop world")
>* application mode: long-running streaming jobs, large scale &
> heterogenous jobs (resource isolation!), application-centric deployment
> mode (fits the "Kubernetes world")
>
>
> *# Call to Action*
> * Do you use per-job mode? If so, why & would you be able to migrate to
> one of the other methods?
> * Am I missing any pros/cons?
> * Are you in favor of dropping per-job mode midterm?
>
> Cheers and thank you,
>
> Konstantin
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>

[jira] [Created] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)

Biao Geng created FLINK-24897:
-

 Summary: Enable application mode on YARN to use usrlib
 Key: FLINK-24897
 URL: https://issues.apache.org/jira/browse/FLINK-24897
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Biao Geng


Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by all jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  [#FLINK-21289]. In k8s mode, there is a 
solution that users can use `usrlib` directory to store there user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-24682) Unify the -C option behavior in both yarn application and per-job mode

2021-10-28 Thread Biao Geng (Jira)

Biao Geng created FLINK-24682:
-

 Summary: Unify the -C option behavior in both yarn application and 
per-job mode
 Key: FLINK-24682
 URL: https://issues.apache.org/jira/browse/FLINK-24682
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.12.3
 Environment: flink 1.12.3

yarn 2.8.5
Reporter: Biao Geng


Recently, when switching the job submission mode from per-job mode to 
application mode on yarn, we found the behavior of '-C' ('–-classpath') is 
somehow misleading:
In per-job mode, the `main()` method of the program is executed in the local 
machine and '-C' option works well when we use it to specify some local user 
jars like -C file://xx.jar.
But in application mode, this option works differently: as the `main()` method 
will be executed on the job manager in the cluster, it is unclear where the url 
like `file://xx.jar` points. It seems that `file://xx.jar` is located 
on the job manager machine in the cluster due to the code. If that is true, it 
may mislead users as in per-job mode, it refers to the the jars in the client 
machine. 
In summary, if we can unify the -C option behavior in both yarn application and 
per-job mode, it would help users to switch to application mode more smoothly 
and more importantly, it makes it much easier to specify some local jars, that 
should be loaded by UserClassLoader, on the client machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

56 matches

Mail list logo