Re: [DISCUSS] Connector releases for Flink 1.19

2024-05-30 Thread Sergey Nuyanzin
Hi Jing,
>Thanks for the hint wrt JDBC connector. Where could users know that it
> already supports 1.19?

There is no released version supporting/tested against 1.19
However the support was added within [1] and currently
there is an active RC in voting stage containing this fix [2]

[1]
https://github.com/apache/flink-connector-jdbc/commit/7025642d88ff661e486745b23569595e1813a1d0
[2] https://lists.apache.org/thread/b7xbjo4crt1527ldksw4nkwo8vs56csy


Re: [VOTE] Release flink-connector-aws v4.3.0, release candidate #2

2024-05-30 Thread gongzhongqiang
+1 (non-binding)

- Validated the checksum hash and signature.
- No binaries exist in the source archive.
- Built the source with JDK 8 succeed.
- Verified the flink-web PR.
- Ensured the JAR is built by JDK 8.

Best,
Zhongqiang Gong

Danny Cranmer  于2024年4月19日周五 18:08写道:

> Hi everyone,
>
> Please review and vote on release candidate #2 for flink-connector-aws
> v4.3.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> This version supports Flink 1.18 and 1.19.
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint 125FD8DB [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v4.3.0-rc2 [5],
> * website pull request listing the new release [6].
> * CI build of the tag [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353793
> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-aws-4.3.0-rc2
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1721/
> [5] https://github.com/apache/flink-connector-aws/releases/tag/v4.3.0-rc2
> [6] https://github.com/apache/flink-web/pull/733
> [7] https://github.com/apache/flink-connector-aws/actions/runs/8751694197
>


[jira] [Created] (FLINK-35495) The native metrics for column family are not reported

2024-05-30 Thread Yanfei Lei (Jira)
Yanfei Lei created FLINK-35495:
--

 Summary: The native metrics for column family are not reported
 Key: FLINK-35495
 URL: https://issues.apache.org/jira/browse/FLINK-35495
 Project: Flink
  Issue Type: Sub-task
Reporter: Yanfei Lei






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Flink CDC 3.1.1 Release

2024-05-30 Thread Jiabao Sun
Thanks Xiqian,

+1 for the release.

Best,
Jiabao

Hang Ruan  于2024年5月31日周五 10:49写道:

> Hi, Xiqian.
>
> +1 for releasing 3.1.1. Thanks for the discussion.
>
> Best,
> Hang
>
> gongzhongqiang  于2024年5月30日周四 09:07写道:
>
> > +1
> > Thanks Xiqian.
> >
> > Best,
> > Zhongqiang Gong
> >
> > Xiqian YU  于2024年5月28日周二 19:44写道:
> >
> > > Hi devs,
> > >
> > > I would like to make a proposal about creating a new Flink CDC 3.1
> patch
> > > release (3.1.1). It’s been a week since the last CDC version 3.1.0 got
> > > released [1], and since then, 7 tickets have been closed, 4 of them are
> > of
> > > high priority.
> > >
> > > Currently, there are 5 items open at the moment: 1 of them is a
> blocker,
> > > which stops users from restoring with existed checkpoints after
> upgrading
> > > [2]. There’s a PR ready and will be merged soon. Other 4 of them have
> > > approved PRs, and will be merged soon [3][4][5][6]. I propose that a
> > patch
> > > version could be released after all pending tickets closed.
> > >
> > > Please reply if there are any unresolved blocking issues you’d like to
> > > include in this release.
> > >
> > > Regards,
> > > Xiqian
> > >
> > > [1]
> > >
> >
> https://flink.apache.org/2024/05/17/apache-flink-cdc-3.1.0-release-announcement/
> > > [2] https://issues.apache.org/jira/browse/FLINK-35464
> > > [3] https://issues.apache.org/jira/browse/FLINK-35149
> > > [4] https://issues.apache.org/jira/browse/FLINK-35323
> > > [5] https://issues.apache.org/jira/browse/FLINK-35430
> > > [6] https://issues.apache.org/jira/browse/FLINK-35447
> > >
> > >
> >
>


Re: [DISCUSS] Flink CDC 3.1.1 Release

2024-05-30 Thread Hang Ruan
Hi, Xiqian.

+1 for releasing 3.1.1. Thanks for the discussion.

Best,
Hang

gongzhongqiang  于2024年5月30日周四 09:07写道:

> +1
> Thanks Xiqian.
>
> Best,
> Zhongqiang Gong
>
> Xiqian YU  于2024年5月28日周二 19:44写道:
>
> > Hi devs,
> >
> > I would like to make a proposal about creating a new Flink CDC 3.1 patch
> > release (3.1.1). It’s been a week since the last CDC version 3.1.0 got
> > released [1], and since then, 7 tickets have been closed, 4 of them are
> of
> > high priority.
> >
> > Currently, there are 5 items open at the moment: 1 of them is a blocker,
> > which stops users from restoring with existed checkpoints after upgrading
> > [2]. There’s a PR ready and will be merged soon. Other 4 of them have
> > approved PRs, and will be merged soon [3][4][5][6]. I propose that a
> patch
> > version could be released after all pending tickets closed.
> >
> > Please reply if there are any unresolved blocking issues you’d like to
> > include in this release.
> >
> > Regards,
> > Xiqian
> >
> > [1]
> >
> https://flink.apache.org/2024/05/17/apache-flink-cdc-3.1.0-release-announcement/
> > [2] https://issues.apache.org/jira/browse/FLINK-35464
> > [3] https://issues.apache.org/jira/browse/FLINK-35149
> > [4] https://issues.apache.org/jira/browse/FLINK-35323
> > [5] https://issues.apache.org/jira/browse/FLINK-35430
> > [6] https://issues.apache.org/jira/browse/FLINK-35447
> >
> >
>


Re: [DISCUSS] Merge "flink run" and "flink run-application" in Flink 2.0

2024-05-30 Thread Hang Ruan
Hi, Ferenc.

+1 for this proposal. This FLIP will help to make the CLI clearer for users.

I think we should better add an example in the FLIP about how to use the
application mode with the new CLI.
Besides that, we need to add some new tests for this change instead of only
using the existed tests.

Best,
Hang

Mate Czagany  于2024年5月29日周三 19:57写道:

> Hi Ferenc,
>
> Thanks for the FLIP, +1 from me for the proposal. I think these changes
> would be a great solution to all the confusion that comes from these two
> action parameters.
>
> Best regards,
> Mate
>
> Ferenc Csaky  ezt írta (időpont: 2024. máj.
> 28., K, 16:13):
>
> > Thank you Xintong for your input.
> >
> > I prepared a FLIP for this change [1], looking forward for any
> > other opinions.
> >
> > Thanks,
> > Ferenc
> >
> > [1]
> >
> https://docs.google.com/document/d/1EX74rFp9bMKdfoGkz1ASOM6Ibw32rRxIadX72zs2zoY/edit?usp=sharing
> >
> >
> >
> > On Friday, 17 May 2024 at 07:04, Xintong Song 
> > wrote:
> >
> > >
> > >
> > > AFAIK, the main purpose of having `run-application` was to make sure
> > > the user is aware that application mode is used, which executes the
> main
> > > method of the user program in JM rather than in client. This was
> > important
> > > at the time application mode was first introduced, but maybe not that
> > > important anymore, given that per-job mode is deprecated and likely
> > removed
> > > in 2.0. Therefore, +1 for the proposal.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, May 16, 2024 at 11:35 PM Ferenc Csaky
> ferenc.cs...@pm.me.invalid
> > >
> > > wrote:
> > >
> > > > Hello devs,
> > > >
> > > > I saw quite some examples when customers were confused about run, and
> > run-
> > > > application in the Flink CLI and I was wondering about the necessity
> of
> > > > deploying
> > > > Application Mode (AM) jobs with a different command, than Session and
> > > > Per-Job mode jobs.
> > > >
> > > > I can see a point that YarnDeploymentTarget [1] and
> > > > KubernetesDeploymentTarget
> > > > [2] are part of their own maven modules and not known in
> flink-clients,
> > > > so the
> > > > deployment mode validations are happening during cluster deployment
> in
> > > > their specific
> > > > ClusterDescriptor implementation [3]. Although these are
> implementation
> > > > details that
> > > > IMO should not define user-facing APIs.
> > > >
> > > > The command line setup is the same for both run and run-application,
> so
> > > > I think there
> > > > is a quite simple way to achieve a unified flink run experience, but
> I
> > > > might missed
> > > > something so I would appreciate any inputs on this topic.
> > > >
> > > > Based on my assumptions I think it would be possible to deprecate the
> > run-
> > > > application in Flink 1.20 and remove it completely in Flink 2.0. I
> > > > already put together a
> > > > PoC [4], and I was able to deploy AM jobs like this:
> > > >
> > > > flink run --target kubernetes-application ...
> > > >
> > > > If others also agree with this, I would be happy to open a FLIP.
> WDYT?
> > > >
> > > > Thanks,
> > > > Ferenc
> > > >
> > > > [1]
> > > >
> >
> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/configuration/YarnDeploymentTarget.java
> > > > [2]
> > > >
> >
> https://github.com/apache/flink/blob/master/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/configuration/KubernetesDeploymentTarget.java
> > > > [3]
> > > >
> >
> https://github.com/apache/flink/blob/48e5a39c9558083afa7589d2d8b054b625f61ee9/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java#L206
> > > > [4]
> > > >
> >
> https://github.com/ferenc-csaky/flink/commit/40b3e1b998c7a4273eaaff71d9162c9f1ee039c0
> >
>


Re: Flink 1.18.2 release date

2024-05-30 Thread weijie guo
Hi Yang

IIRC, 1.18.2 has not been kicked off yet.

Best regards,

Weijie


Yang LI  于2024年5月30日周四 22:33写道:

> Dear Flink Community,
>
> Anyone know about the release date for 1.18.2?
>
> Thanks very much,
> Yang
>


[jira] [Created] (FLINK-35494) Reorganize sources

2024-05-30 Thread Jira
João Boto created FLINK-35494:
-

 Summary: Reorganize sources
 Key: FLINK-35494
 URL: https://issues.apache.org/jira/browse/FLINK-35494
 Project: Flink
  Issue Type: Sub-task
Reporter: João Boto






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Flink 1.18.2 release date

2024-05-30 Thread Yang LI
Dear Flink Community,

Anyone know about the release date for 1.18.2?

Thanks very much,
Yang


[jira] [Created] (FLINK-35493) Make max history age and count configurable for FlinkStateSnapshot resources

2024-05-30 Thread Mate Czagany (Jira)
Mate Czagany created FLINK-35493:


 Summary: Make max history age and count configurable for 
FlinkStateSnapshot resources
 Key: FLINK-35493
 URL: https://issues.apache.org/jira/browse/FLINK-35493
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Mate Czagany






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Connector releases for Flink 1.19

2024-05-30 Thread weijie guo
Hi Jing

> Do we have an umbrella ticket for Flink 1.19 connectors release?

FYI: https://issues.apache.org/jira/browse/FLINK-35131 :)

Best regards,

Weijie


Jing Ge  于2024年5月28日周二 20:29写道:

> Hi,
>
> Thanks Danny for driving it! Do we have an umbrella ticket for Flink 1.19
> connectors release?
>
> @Sergei
> Thanks for the hint wrt JDBC connector. Where could users know that it
> already supports 1.19?
>
> Best regards,
> Jing
>
> On Fri, May 17, 2024 at 4:07 AM Sergey Nuyanzin 
> wrote:
>
> > >, it looks like opensearch-2.0.0 has been created now, all good.
> > yep, thanks to Martijn
> >
> > I've created RCs for Opensearch connector
> >
> > On Tue, May 14, 2024 at 12:38 PM Danny Cranmer 
> > wrote:
> >
> > > Hello,
> > >
> > > @Sergey Nuyanzin , it looks like opensearch-2.0.0
> > > has been created now, all good.
> > >
> > > @Hongshun Wang, thanks, since the CDC connectors are not yet released I
> > > had omitted them from this task. But happy to include them, thanks for
> > the
> > > support.
> > >
> > > Thanks,
> > > Danny
> > >
> > > On Mon, May 13, 2024 at 3:40 AM Hongshun Wang  >
> > > wrote:
> > >
> > >> Hello Danny,
> > >> Thanks for pushing this forward.  I am available to assist with the
> CDC
> > >> connector[1].
> > >>
> > >> [1] https://github.com/apache/flink-cdc
> > >>
> > >> Best
> > >> Hongshun
> > >>
> > >> On Sun, May 12, 2024 at 8:48 PM Sergey Nuyanzin 
> > >> wrote:
> > >>
> > >> > I'm in a process of preparation of RC for OpenSearch connector
> > >> >
> > >> > however it seems I need PMC help: need to create opensearch-2.0.0 on
> > >> jira
> > >> > since as it was proposed in another ML[1] to have 1.x for OpenSearch
> > >> > v1 and 2.x for OpenSearch v2
> > >> >
> > >> > would be great if someone from PMC could help here
> > >> >
> > >> > [1]
> https://lists.apache.org/thread/3w1rnjp5y612xy5k9yv44hy37zm9ph15
> > >> >
> > >> > On Wed, Apr 17, 2024 at 12:42 PM Ferenc Csaky
> > >> >  wrote:
> > >> > >
> > >> > > Thank you Danny and Sergey for pushing this!
> > >> > >
> > >> > > I can help with the HBase connector if necessary, will comment the
> > >> > > details to the relevant Jira ticket.
> > >> > >
> > >> > > Best,
> > >> > > Ferenc
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wednesday, April 17th, 2024 at 11:17, Danny Cranmer <
> > >> > dannycran...@apache.org> wrote:
> > >> > >
> > >> > > >
> > >> > > >
> > >> > > > Hello all,
> > >> > > >
> > >> > > > I have created a parent Jira to cover the releases [1]. I have
> > >> > assigned AWS
> > >> > > > and MongoDB to myself and OpenSearch to Sergey. Please assign
> the
> > >> > > > relevant issue to yourself as you pick up the tasks.
> > >> > > >
> > >> > > > Thanks!
> > >> > > >
> > >> > > > [1] https://issues.apache.org/jira/browse/FLINK-35131
> > >> > > >
> > >> > > > On Tue, Apr 16, 2024 at 2:41 PM Muhammet Orazov
> > >> > > > mor+fl...@morazow.com.invalid wrote:
> > >> > > >
> > >> > > > > Thanks Sergey and Danny for clarifying, indeed it
> > >> > > > > requires committer to go through the process.
> > >> > > > >
> > >> > > > > Anyway, please let me know if I can be any help.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Muhammet
> > >> > > > >
> > >> > > > > On 2024-04-16 11:19, Danny Cranmer wrote:
> > >> > > > >
> > >> > > > > > Hello,
> > >> > > > > >
> > >> > > > > > I have opened the VOTE thread for the AWS connectors release
> > >> [1].
> > >> > > > > >
> > >> > > > > > > If I'm not mistaking (please correct me if I'm wrong) this
> > >> > request is
> > >> > > > > > > not
> > >> > > > > > > about version update it is about new releases for
> connectors
> > >> > > > > >
> > >> > > > > > Yes, correct. If there are any other code changes required
> > then
> > >> > help
> > >> > > > > > would be appreciated.
> > >> > > > > >
> > >> > > > > > > Are you going to create an umbrella issue for it?
> > >> > > > > >
> > >> > > > > > We do not usually create JIRA issues for releases. That
> being
> > >> said
> > >> > it
> > >> > > > > > sounds like a good idea to have one place to track the
> status
> > of
> > >> > the
> > >> > > > > > connector releases and pre-requisite code changes.
> > >> > > > > >
> > >> > > > > > > I would like to work on this task, thanks for initiating
> it!
> > >> > > > > >
> > >> > > > > > The actual release needs to be performed by a committer.
> > >> However,
> > >> > help
> > >> > > > > > getting the connectors building against Flink 1.19 and
> testing
> > >> the
> > >> > RC
> > >> > > > > > is
> > >> > > > > > appreciated.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Danny
> > >> > > > > >
> > >> > > > > > [1]
> > >> > https://lists.apache.org/thread/0nw9smt23crx4gwkf6p1dd4jwvp1g5s0
> > >> > > > > >
> > >> > > > > > On Tue, Apr 16, 2024 at 6:34 AM Sergey Nuyanzin
> > >> > snuyan...@gmail.com
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Thanks for volunteering Muhammet!
> > >> > > > > > > And thanks Danny for starting the activity.
> > >> > > > > > >
> > 

[jira] [Created] (FLINK-35492) Add metrics for FlinkStateSnapshot resources

2024-05-30 Thread Mate Czagany (Jira)
Mate Czagany created FLINK-35492:


 Summary: Add metrics for FlinkStateSnapshot resources
 Key: FLINK-35492
 URL: https://issues.apache.org/jira/browse/FLINK-35492
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Mate Czagany






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-opensearch v1.2.0, release candidate #1

2024-05-30 Thread weijie guo
Thanks Sergey for driving this release!

+1(non-binding)

1. Verified signatures and hash sums
2. Build from source with 1.8.0_291 succeeded
3. Checked RN.

Best regards,

Weijie


Yuepeng Pan  于2024年5月30日周四 10:08写道:

> +1 (non-binding)
>
> - Built from source code with JDK 1.8 on MaxOS- Run examples locally.-
> Checked release notes Best, Yuepeng Pan
>
>
> At 2024-05-28 22:53:10, "gongzhongqiang" 
> wrote:
> >+1(non-binding)
> >
> >- Verified signatures and hash sums
> >- Reviewed the web PR
> >- Built from source code with JDK 1.8 on Ubuntu 22.04
> >- Checked release notes
> >
> >Best,
> >Zhongqiang Gong
> >
> >
> >Sergey Nuyanzin  于2024年5月16日周四 06:03写道:
> >
> >> Hi everyone,
> >> Please review and vote on release candidate #1 for
> >> flink-connector-opensearch v1.2.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2],
> >> which are signed with the key with fingerprint
> >> F7529FAE24811A5C0DF3CA741596BBF0726835D8 [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag v1.2.0-rc1 [5],
> >> * website pull request listing the new release [6].
> >> * CI build of the tag [7].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Note that this release is for Opensearch v1.x
> >>
> >> Thanks,
> >> Release Manager
> >>
> >> [1] https://issues.apache.org/jira/projects/FLINK/versions/12353812
> >> [2]
> >>
> >>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-opensearch-1.2.0-rc1
> >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1734
> >> [5]
> >>
> >>
> https://github.com/apache/flink-connector-opensearch/releases/tag/v1.2.0-rc1
> >> [6] https://github.com/apache/flink-web/pull/740
> >> [7]
> >>
> >>
> https://github.com/apache/flink-connector-opensearch/actions/runs/9102334125
> >>
>


[jira] [Created] (FLINK-35491) [JUnit5 Migration] Module: Flink CDC modules

2024-05-30 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35491:
---

 Summary: [JUnit5 Migration] Module: Flink CDC modules
 Key: FLINK-35491
 URL: https://issues.apache.org/jira/browse/FLINK-35491
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


Migrate Junit4 tests to Junit5 for the following modules:
 * flink-cdc-common
 * flink-cdc-composer
 * flink-cdc-runtime
 * flink-cdc-connect/flink-cdc-pipeline-connectors
 * flink-cdc-e2e-tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35490) [JUnit5 Migration] Module: Flink CDC flink-cdc-connect/flink-cdc-source-connectors

2024-05-30 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35490:
---

 Summary: [JUnit5 Migration] Module: Flink CDC 
flink-cdc-connect/flink-cdc-source-connectors
 Key: FLINK-35490
 URL: https://issues.apache.org/jira/browse/FLINK-35490
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


Migrate Junit4 tests to Junit5 in the Flink CDC following modules:

 

- flink-cdc-connect/flink-cdc-source-connectors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD

2024-05-30 Thread Mate Czagany
Hi,

I would definitely keep this as a FLIP. Not all FLIPs have to be big
changes, and this format makes it easier for others to chime in and follow.

I am not a Kubernetes expert, but my understanding is that we don't have to
follow any strict convention for the type names in the conditions, e.g.
"Ready" or "Error". And as Gyula said it doesn't add too much value in the
currently proposed way, it might even be confusing for users who have not
read this email thread or FLIP because "Ready" might suggest that the job
is running and is healthy. So my suggestion is the same as Gyulas, to have
more explicit type names instead of just "Ready" and "Error". However
"ClusterReady" sounds weird in case of FlinkSessionJobs.

Regarding appending to the conditions field: if I understand the FLIP
correctly, we would allow multiple elements of the same type to exist in
the conditions list if the message and reason fields are different. From
the Kubernetes documentation it seems like the correct way would be to use
the "type" field as the map key and merge the fields [1].


[1]
https://github.com/kubernetes/kubernetes/blob/bce55b94cdc3a4592749aa919c591fa7df7453eb/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1528

Best regards,
Mate

Gyula Fóra  ezt írta (időpont: 2024. máj. 30., Cs,
10:53):

> David,
>
> The problem is exactly that ResourceLifecycleStates do not correspond to
> specific Job statuses (JobReady condition) in most cases. Let me give you a
> concrete example:
>
> ResourceLifecycleState.STABLE means that app/job defined in the spec has
> been successfully deployed and was observed running, and this spec is now
> considered to be stable (won't be rolled back). Once a resource
> (FlinkDeployment) reached STABLE state, it won't change unless the user
> changes the spec. At the same time, this doesn't really say anything about
> job health/readiness at any given future time. 10 minutes later the job can
> go in an unrecoverable failure loop and never reach a running status, the
> ResourceLifecycleState will remain STABLE.
>
> This is actually not a problem with the ResourceLifecycleState but more
> with the understanding of it. It's called ResourceLifecycleState and not
> JobState exactly because it refers to the upgrade/rollback/suspend etc
> lifecycle of the FlinkDeployment/FlinkSessionJob resource and not the
> underlying flink job itself.
>
> But this is a crucial detail here that we need to consider otherwise the
> "Ready" condition that we may create will be practically useless.
>
> This is the reason why @morh...@apache.org  and
> I suggest separating this to at least 2 independent conditions. One could
> be the UpgradeCompleted/ReconciliationCompleted or something along these
> lines computed based on LifecycleState (as described in your proposal but
> with a different name). The other should be JobReady which could initially
> work based on the JobStatus.state field but ideally would be user
> configurable ready condition such as (job running at least 10 minutes,
> running and have taken checkpoints etcetc).
>
> These 2 conditions should be enough to start with and would actually
> provide a tangible value to users. We can probably leave out ClusterReady
> on a second thought.
>
> Cheers,
> Gyula
>
>
> On Wed, May 29, 2024 at 5:16 PM David Radley 
> wrote:
>
> > Hi Gyula,
> > Thank you for the quick response and confirmation we need a Flip. I am
> not
> > an expert at K8s, Lajith will answer in more detail. Some questions I had
> > anyway:
> >
> > I assume each of the ResourceLifecycleState do have a corresponding
> > jobReady status. You point out some mistakes in the table, for example
> that
> > STABLE should be NotReady; thankyou.  If we put a reason mentioning the
> > stable state, this would help us understand the jobStatus.
> >
> > I guess the jobReady is one perspective that we know is useful (with
> > corrected  mappings from ResourceLifecycleState and with reasons). Can I
> > check that the  2 proposed conditions would also be useful additions? I
> > assume that in your proposal  when jobReady is true, then
> UpgradeCompleted
> > condition would not be present and ClusterReady would always be true? I
> > know conditions do not need to be orthogonal, but I wanted to check what
> > your thoughts are.
> >
> > Kind regards, David.
> >
> >
> >
> >
> > From: Gyula Fóra 
> > Date: Wednesday, 29 May 2024 at 15:28
> > To: dev@flink.apache.org 
> > Cc: morh...@apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink
> CRD
> > Hi David!
> >
> > This change definitely warrants a FLIP even if the code change is not
> huge,
> > there are quite some implications going forward.
> >
> > Looping in @morh...@apache.org  for this discussion.
> >
> > I have some questions / suggestions regarding the condition's meaning and
> > naming.
> >
> > In your proposal you have:
> >  - Ready (True/False) -> This condition is intended for resources which
> are
> > fully ready 

Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD

2024-05-30 Thread Gyula Fóra
David,

The problem is exactly that ResourceLifecycleStates do not correspond to
specific Job statuses (JobReady condition) in most cases. Let me give you a
concrete example:

ResourceLifecycleState.STABLE means that app/job defined in the spec has
been successfully deployed and was observed running, and this spec is now
considered to be stable (won't be rolled back). Once a resource
(FlinkDeployment) reached STABLE state, it won't change unless the user
changes the spec. At the same time, this doesn't really say anything about
job health/readiness at any given future time. 10 minutes later the job can
go in an unrecoverable failure loop and never reach a running status, the
ResourceLifecycleState will remain STABLE.

This is actually not a problem with the ResourceLifecycleState but more
with the understanding of it. It's called ResourceLifecycleState and not
JobState exactly because it refers to the upgrade/rollback/suspend etc
lifecycle of the FlinkDeployment/FlinkSessionJob resource and not the
underlying flink job itself.

But this is a crucial detail here that we need to consider otherwise the
"Ready" condition that we may create will be practically useless.

This is the reason why @morh...@apache.org  and
I suggest separating this to at least 2 independent conditions. One could
be the UpgradeCompleted/ReconciliationCompleted or something along these
lines computed based on LifecycleState (as described in your proposal but
with a different name). The other should be JobReady which could initially
work based on the JobStatus.state field but ideally would be user
configurable ready condition such as (job running at least 10 minutes,
running and have taken checkpoints etcetc).

These 2 conditions should be enough to start with and would actually
provide a tangible value to users. We can probably leave out ClusterReady
on a second thought.

Cheers,
Gyula


On Wed, May 29, 2024 at 5:16 PM David Radley 
wrote:

> Hi Gyula,
> Thank you for the quick response and confirmation we need a Flip. I am not
> an expert at K8s, Lajith will answer in more detail. Some questions I had
> anyway:
>
> I assume each of the ResourceLifecycleState do have a corresponding
> jobReady status. You point out some mistakes in the table, for example that
> STABLE should be NotReady; thankyou.  If we put a reason mentioning the
> stable state, this would help us understand the jobStatus.
>
> I guess the jobReady is one perspective that we know is useful (with
> corrected  mappings from ResourceLifecycleState and with reasons). Can I
> check that the  2 proposed conditions would also be useful additions? I
> assume that in your proposal  when jobReady is true, then UpgradeCompleted
> condition would not be present and ClusterReady would always be true? I
> know conditions do not need to be orthogonal, but I wanted to check what
> your thoughts are.
>
> Kind regards, David.
>
>
>
>
> From: Gyula Fóra 
> Date: Wednesday, 29 May 2024 at 15:28
> To: dev@flink.apache.org 
> Cc: morh...@apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD
> Hi David!
>
> This change definitely warrants a FLIP even if the code change is not huge,
> there are quite some implications going forward.
>
> Looping in @morh...@apache.org  for this discussion.
>
> I have some questions / suggestions regarding the condition's meaning and
> naming.
>
> In your proposal you have:
>  - Ready (True/False) -> This condition is intended for resources which are
> fully ready and operational
>  - Error (True) -> This condition can be used in scenarios where any
> exception/error during resource reconcile process
>
> The problem with the above is that the implementation does not well reflect
> this. ResourceLifecycleState STABLE/ROLLED_BACK does not actually mean the
> job is running, it just means that the resource is fully reconciled and it
> will not be rolled back (so the current pending upgrade is completed). This
> is mainly a fault of the ResourceLifecycleState as it doesn't capture the
> job status but one could argue that it was "designed" this way.
>
> I think we should probably have more condition types to capture the
> difference:
>  - JobReady (True/False) -> Flink job is running (Basically job status but
> with transition time)
>  - ClusterReady (True/False) -> Session / Application cluster is deployed
> (Basically JM deployment status but with transition time)
> -  UpgradeCompleted (True/False) -> Similar to what you call Ready now
> which should correspond to the STABLE/ROLLED_BACK states and mostly tracks
> in-progress CR updates
>
> This is my best idea at the moment, not great as it feels a little
> redundant with the current status fields. But maybe thats not a problem or
> a way to eliminate the old fields later?
>
> I am not so sure of the Error status and what this means in practice. Why
> do we want to track the last error in 2 places? It's already in the status.
>
> What do you think?
> Gyula
>
> On Wed, May 

Re: Slack Invite

2024-05-30 Thread gongzhongqiang
Hi,
The invite  link :
https://join.slack.com/t/apache-flink/shared_invite/zt-2jtsd06wy-31q_aELVkdc4dHsx0GMhOQ

Best,
Zhongqiang Gong

Nelson de Menezes Neto  于2024年5月30日周四 15:01写道:

> Hey guys!
>
> I want to join the slack community but the invite has expired..
> Can u send me a new one?
>


[jira] [Created] (FLINK-35489) Add capability to set min taskmanager.memory.managed.size when enabling autotuning

2024-05-30 Thread Nicolas Fraison (Jira)
Nicolas Fraison created FLINK-35489:
---

 Summary: Add capability to set min taskmanager.memory.managed.size 
when enabling autotuning
 Key: FLINK-35489
 URL: https://issues.apache.org/jira/browse/FLINK-35489
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: 1.8.0
Reporter: Nicolas Fraison


We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
taskmanager.memory.network.max: 4063232b
taskmanager.memory.network.min: 4063232b
taskmanager.memory.jvm-overhead.max: 433791712b
taskmanager.memory.task.heap.size: 3699934605b
taskmanager.memory.framework.off-heap.size: 134217728b
taskmanager.memory.jvm-metaspace.size: 22960020b
taskmanager.memory.framework.heap.size: "0 bytes"
taskmanager.memory.flink.size: 3838215565b
taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Slack invite

2024-05-30 Thread gongzhongqiang
Hi,
The invite  link :
https://join.slack.com/t/apache-flink/shared_invite/zt-2jtsd06wy-31q_aELVkdc4dHsx0GMhOQ

Best,
Zhongqiang Gong

Sai Sankeerth Rao  于2024年5月30日周四 15:40写道:

> Hi,
>
> I would like to join the slack community but the link has expired. Can
> someone send me a new one ?
>
> Thanks
>
> --
> Regards
> P Sai Sankeerth Rao
> +91-9600129402
>


Slack invite

2024-05-30 Thread Sai Sankeerth Rao
Hi,

I would like to join the slack community but the link has expired. Can
someone send me a new one ?

Thanks

-- 
Regards
P Sai Sankeerth Rao
+91-9600129402


Slack Invite

2024-05-30 Thread Nelson de Menezes Neto
Hey guys!

I want to join the slack community but the invite has expired..
Can u send me a new one?