Re: [VOTE] Apache Flink Kubernetes Operator Release 1.9.0, release candidate #1

2024-06-28 Thread Thomas Weise
+1 (binding)

- verified source signatures/hashes
- build from source and tests pass
- install from packaged helm chart

Thanks,
Thomas


On Fri, Jun 28, 2024 at 5:29 AM Robert Metzger  wrote:

> +1 (binding)
>
> - checked the docker file contents
> - installed the operator from the helm chart
> - checked if it can still talk to an existing Flink cluster, deployed from
> v1.8
>
> On Tue, Jun 25, 2024 at 9:05 AM Gyula Fóra  wrote:
>
> > +1 (binding)
> >
> > Verified:
> >  - Sources/signates
> >  - Install 1.9.0 from helm chart
> >  - Stateful example job basic interactions
> >  - Operator upgrade from 1.8.0 -> 1.9.0 with running flinkdeployments
> >  - Flink-web PR looks good
> >
> > Cheers,
> > Gyula
> >
> >
> > On Wed, Jun 19, 2024 at 12:09 PM Gyula Fóra 
> wrote:
> >
> > > Hi,
> > >
> > > I have updated the KEYs file and extended the expiration date so that
> > > should not be an issue. Thanks for pointing that out.
> > >
> > > Gyula
> > >
> > > On Wed, 19 Jun 2024 at 12:07, Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > >> Thanks Gyula and Mate for driving this release!
> > >>
> > >> +1 (binding)
> > >>
> > >> Except the key is expired, and leaving a couple of comments to the
> > >> flink-web PR,
> > >> the rest of them are fine.
> > >>
> > >> - Downloaded artifacts from dist ( svn co https://dist.apache
> > >> .org/repos/dist/dev/flink/flink-kubernetes-operator-1.9.0-rc1/ )
> > >> - Verified SHA512 checksums : ( for i in *.tgz; do echo $i; sha512sum
> > >> --check $i.sha512; done )
> > >> - Verified GPG signatures : ( for i in *.tgz; do echo $i; gpg --verify
> > >> $i.asc $i; done)
> > >> - Build the source with java-11 and java-17 ( mvn -T 20 clean install
> > >> -DskipTests )
> > >> - Verified the license header during build the source
> > >> - Verified that chart and appVersion matches the target release (less
> > the
> > >> index.yaml and Chart.yaml )
> > >> - Download Autoscaler standalone: wget https://repository.apache
> > >> .org/content/repositories/orgapacheflink-1740/org/apache/flink/flink
> > >> -autoscaler-standalone/1.9.0/flink-autoscaler-standalone-1.9.0.jar
> > >> - Ran Autoscaler standalone locally, it works well with rescale api.
> > >>
> > >> Best,
> > >> Rui
> > >>
> > >> On Wed, Jun 19, 2024 at 1:50 AM Mate Czagany 
> > wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > +1 (non-binding)
> > >> >
> > >> > Note: Using the Apache Flink KEYS file [1] to verify the signatures
> > your
> > >> > key seems to be expired, so that file should be updated as well.
> > >> >
> > >> > - Verified checksums and signatures
> > >> > - Built source distribution
> > >> > - Verified all pom.xml versions are the same
> > >> > - Verified install from RC repo
> > >> > - Verified Chart.yaml and values.yaml contents
> > >> > - Submitted basic example with 1.17 and 1.19 Flink versions in
> native
> > >> and
> > >> > standalone mode
> > >> > - Tested operator HA, added new watched namespace dynamically
> > >> > - Checked operator logs
> > >> >
> > >> > Regards,
> > >> > Mate
> > >> >
> > >> > [1] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >> >
> > >> > Gyula Fóra  ezt írta (időpont: 2024. jún.
> 18.,
> > K,
> > >> > 8:14):
> > >> >
> > >> > > Hi Everyone,
> > >> > >
> > >> > > Please review and vote on the release candidate #1 for the version
> > >> 1.9.0
> > >> > of
> > >> > > Apache Flink Kubernetes Operator,
> > >> > > as follows:
> > >> > > [ ] +1, Approve the release
> > >> > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > >> > >
> > >> > > **Release Overview**
> > >> > >
> > >> > > As an overview, the release consists of the following:
> > >> > > a) Kubernetes Operator canonical source distribution (including
> the
> > >> > > Dockerfile), to be deployed to the release repository at
> > >> dist.apache.org
> > >> > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > >> > repository
> > >> > > at dist.apache.org
> > >> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > >> > > d) Docker image to be pushed to dockerhub
> > >> > >
> > >> > > **Staging Areas to Review**
> > >> > >
> > >> > > The staging areas containing the above mentioned artifacts are as
> > >> > follows,
> > >> > > for your review:
> > >> > > * All artifacts for a,b) can be found in the corresponding dev
> > >> repository
> > >> > > at dist.apache.org [1]
> > >> > > * All artifacts for c) can be found at the Apache Nexus Repository
> > [2]
> > >> > > * The docker image for d) is staged on github [3]
> > >> > >
> > >> > > All artifacts are signed with the key 21F06303B87DAFF1 [4]
> > >> > >
> > >> > > Other links for your review:
> > >> > > * JIRA release notes [5]
> > >> > > * source code tag "release-1.9.0-rc1" [6]
> > >> > > * PR to update the website Downloads page to
> > >> > > include Kubernetes Operator links [7]
> > >> > >
> > >> > > **Vote Duration**
> > >> > >
> > >> > > The voting time will run for at least 72 hours.
> > >> > > It is adopted by 

Re: [VOTE] FLIP-461: Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler

2024-06-18 Thread Thomas Weise
+1 (binding)


On Tue, Jun 18, 2024 at 11:38 AM Gabor Somogyi 
wrote:

> +1 (binding)
>
> G
>
>
> On Mon, Jun 17, 2024 at 10:24 AM Matthias Pohl  wrote:
>
> > Hi everyone,
> > the discussion in [1] about FLIP-461 [2] is kind of concluded. I am
> > starting a vote on this one here.
> >
> > The vote will be open for at least 72 hours (i.e. until June 20, 2024;
> > 8:30am UTC) unless there are any objections. The FLIP will be considered
> > accepted if 3 binding votes (from active committers according to the
> Flink
> > bylaws [3]) are gathered by the community.
> >
> > Best,
> > Matthias
> >
> > [1] https://lists.apache.org/thread/nnkonmsv8xlk0go2sgtwnphkhrr5oc3y
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> > [3]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals
> >
>


Re: [DISCUSS] Connector Externalization Retrospective

2024-06-11 Thread Thomas Weise
Thanks for bringing this discussion back.

When we decided to decouple the connectors, we already discussed that we
will only realize the full benefit when the connectors actually become
independent from the Flink minor releases. Until that happens we have a ton
of extra work but limited gain. Based on the assumption that getting to the
binary compatibility guarantee is our goal - not just for the connectors
managed within the Flink project but for the ecosystem as a whole - I don't
see the benefit of mono repo or similar approach that targets the symptom
rather than the cause.

In the final picture we would only need connector releases if/when a
specific connector changes and the repository per connector layout would
work well.

I also agree with Danny that we may not have to wait for Flink 2.0 for
that. How close are we to assume compatibility of the API surface that
affects connectors? It appears that practically there have been little to
no known issues in the last couple of releases? Would it be possible to
verify that by running e2e tests of connector binaries built against an
earlier Flink minor version against the latest Flink minor release
candidate as part of the release?

Thanks,
Thomas


On Tue, Jun 11, 2024 at 11:05 AM Chesnay Schepler 
wrote:

> On 10/06/2024 18:25, Danny Cranmer wrote:
> > This would
> > mean we would usually not need to release a new connector version per
> Flink
> > version, assuming there are no breaking changes.
> We technically can't do this because we don't provide binary
> compatibility across minor versions.
> That's the entire reason we did this coupling in the first place, and
> imo /we/ shouldn't take a shortcut but still have our users face that
> very problem.
> We knew this was gonna by annoying for us; that was intentional and
> meant to finally push us towards binary compatibility /guarantees/.


Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD

2024-06-02 Thread Thomas Weise
Thanks for the proposal. As I understand it the idea is to make the status
of a Flink deployment more accessible to standard k8s tooling, which would
be a nice improvement and further strengthen the k8s native experience!

Regarding the FLIP document's overall structure: Before diving into the
implementation details, can we please expand the intro with the
motivation/rationale for this change? A few examples of the audience that
would benefit from this change. Examples of tools that would pick up the
condition and how that would look like (link or screenshot if you have it).

Regarding multiple conditions: +1 for not commingling reconciliation status
and job status. It would make the resulting condition confusing. I believe
what the user would expect under "Ready" is the representation of the job
status. We can then add another separate condition as suggested, however
can the FLIP document also outline if/how conditions other than "Ready"
would appear in the generic k8s tooling?

Thanks,
Thomas



On Fri, May 31, 2024 at 10:37 AM David Radley 
wrote:

> Hi Mate and Gyula,
> Thank you very much for your clarifications; it is clearer for me now. I
> agree that a reconciliation condition would be useful – maybe reconciled
> instead of ready for the boolean, so it is very explicit.
>
> Your suggestion of a job related readiness condition related to it’s
> health would be useful; you suggest it be user configurable – this seems
> closer to a liveliness / readiness probe.
>
> Kind regards, David.
>
> From: Mate Czagany 
> Date: Thursday, 30 May 2024 at 10:39
> To: dev@flink.apache.org 
> Cc: morh...@apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD
> Hi,
>
> I would definitely keep this as a FLIP. Not all FLIPs have to be big
> changes, and this format makes it easier for others to chime in and follow.
>
> I am not a Kubernetes expert, but my understanding is that we don't have to
> follow any strict convention for the type names in the conditions, e.g.
> "Ready" or "Error". And as Gyula said it doesn't add too much value in the
> currently proposed way, it might even be confusing for users who have not
> read this email thread or FLIP because "Ready" might suggest that the job
> is running and is healthy. So my suggestion is the same as Gyulas, to have
> more explicit type names instead of just "Ready" and "Error". However
> "ClusterReady" sounds weird in case of FlinkSessionJobs.
>
> Regarding appending to the conditions field: if I understand the FLIP
> correctly, we would allow multiple elements of the same type to exist in
> the conditions list if the message and reason fields are different. From
> the Kubernetes documentation it seems like the correct way would be to use
> the "type" field as the map key and merge the fields [1].
>
>
> [1]
>
> https://github.com/kubernetes/kubernetes/blob/bce55b94cdc3a4592749aa919c591fa7df7453eb/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1528
>
> Best regards,
> Mate
>
> Gyula Fóra  ezt írta (időpont: 2024. máj. 30., Cs,
> 10:53):
>
> > David,
> >
> > The problem is exactly that ResourceLifecycleStates do not correspond to
> > specific Job statuses (JobReady condition) in most cases. Let me give
> you a
> > concrete example:
> >
> > ResourceLifecycleState.STABLE means that app/job defined in the spec has
> > been successfully deployed and was observed running, and this spec is now
> > considered to be stable (won't be rolled back). Once a resource
> > (FlinkDeployment) reached STABLE state, it won't change unless the user
> > changes the spec. At the same time, this doesn't really say anything
> about
> > job health/readiness at any given future time. 10 minutes later the job
> can
> > go in an unrecoverable failure loop and never reach a running status, the
> > ResourceLifecycleState will remain STABLE.
> >
> > This is actually not a problem with the ResourceLifecycleState but more
> > with the understanding of it. It's called ResourceLifecycleState and not
> > JobState exactly because it refers to the upgrade/rollback/suspend etc
> > lifecycle of the FlinkDeployment/FlinkSessionJob resource and not the
> > underlying flink job itself.
> >
> > But this is a crucial detail here that we need to consider otherwise the
> > "Ready" condition that we may create will be practically useless.
> >
> > This is the reason why @morh...@apache.org  and
> > I suggest separating this to at least 2 independent conditions. One could
> > be the UpgradeCompleted/ReconciliationCompleted or something along these
> > lines computed based on LifecycleState (as described in your proposal but
> > with a different name). The other should be JobReady which could
> initially
> > work based on the JobStatus.state field but ideally would be user
> > configurable ready condition such as (job running at least 10 minutes,
> > running and have taken checkpoints etcetc).
> >
> > These 2 conditions should be enough to start with and would actually
> > 

Re: [VOTE] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-25 Thread Thomas Weise
+1 (binding)


On Wed, Apr 24, 2024 at 5:14 AM Yuepeng Pan  wrote:

> +1(non-binding)
>
>
> Best,
> Yuepeng Pan
>
> At 2024-04-24 16:05:07, "Rui Fan" <1996fan...@gmail.com> wrote:
> >+1(binding)
> >
> >Best,
> >Rui
> >
> >On Wed, Apr 24, 2024 at 4:03 PM Mate Czagany  wrote:
> >
> >> Hi everyone,
> >>
> >> I'd like to start a vote on the FLIP-446: Kubernetes Operator State
> >> Snapshot CRD [1]. The discussion thread is here [2].
> >>
> >> The vote will be open for at least 72 hours unless there is an
> objection or
> >> insufficient votes.
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-446%3A+Kubernetes+Operator+State+Snapshot+CRD
> >> [2] https://lists.apache.org/thread/q5dzjwj0qk34rbg2sczyypfhokxoc3q7
> >>
> >> Regards,
> >> Mate
> >>
>


Re: [DISCUSS] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-19 Thread Thomas Weise
Thanks for the proposal.

How do you see potential effects on API server performance wrt. number of
objects vs mutations? Is the proposal more or less neutral in that regard?

Thanks for the thorough feedback Robert.

Couple more questions below.

-->

On Fri, Apr 19, 2024 at 5:07 AM Robert Metzger  wrote:

> Hi Mate,
> thanks for proposing this, I'm really excited about your FLIP. I hope my
> questions make sense to you:
>
> 1. I would like to discuss the "FlinkStateSnapshot" name and the fact that
> users have to use either the savepoint or checkpoint spec inside the
> FlinkStateSnapshot.
> Wouldn't it be more intuitive to introduce two CRs:
> FlinkSavepoint and FlinkCheckpoint
> Ideally they can internally share a lot of code paths, but from a users
> perspective, the abstraction is much clearer.
>

There are probably pros and cons either way. For example it is desirable to
have a single list of state snapshots when looking for the initial
savepoint for a new deployment etc.


>
> 2. I also would like to discuss SavepointSpec.completed, as this name is
> not intuitive to me. How about "ignoreExisting"?
>
> 3. The FLIP proposal seems to leave error handling to the user, e.g. when
> you create a FlinkStateSnapshot, it will just move to status FAILED.
> Typically in K8s with the control loop stuff, resources are tried to get
> created until success. I think it would be really nice if the
> FlinkStateSnapshot or FlinkSavepoint resource would retry based on a
> property in the resource. A "FlinkStateSnapshot.retries" number would
> indicate how often the user wants the operator to retry creating a
> savepoint, "retries = -1" means retry forever. In addition, we could
> consider a timeout as well, however, I haven't seen such a concept in K8s
> CRs yet.
> The benefit of this is that other tools relying on the K8s operator
> wouldn't have to implement this retry loop (which is quite natural for
> K8s), they would just have to wait for the CR they've created to transition
> into COMPLETED:
>
> 3. FlinkStateSnapshotStatus.error will only show the last error. What
> about using Events, so that we can show multiple errors and use the
> FlinkStateSnapshotState to report errors?
>
> 4. I wonder if it makes sense to use something like Pod Conditions (e.g.
> Savepoint Conditions):
> https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions
> to track the completion status. We could have the following conditions:
> - Triggered
> - Completed
> - Failed
> The only benefit of this proposal that I see is that it would tell the
> user how long it took to create the savepoint.
>
> 5. You mention that "JobSpec.initialSavepointPath" will be deprecated. I
> assume we will introduce a new field for referencing a FlinkStateSnapshot
> CR? I think it would be good to cover this in the FLIP.
>
> Does that mean one would have to create a FlinkStateSnapshot CR when
starting a new deployment from savepoint? If so, that's rather complicated.
I would prefer something more simple/concise and would rather
keep initialSavepointPath


>
> One minor comment:
>
> "/** Dispose the savepoints upon CRD deletion. */"
>
> I think this should be "upon CR deletion", not "CRD deletion".
>
> Thanks again for this great FLIP!
>
> Best,
> Robert
>
>
> On Fri, Apr 19, 2024 at 9:01 AM Gyula Fóra  wrote:
>
>> Cc'ing some folks who gave positive feedback on this idea in the past.
>>
>> I would love to hear your thoughts on the proposed design
>>
>> Gyula
>>
>> On Tue, Apr 16, 2024 at 6:31 PM Őrhidi Mátyás 
>> wrote:
>>
>>> +1 Looking forward to it
>>>
>>> On Tue, Apr 16, 2024 at 8:56 AM Mate Czagany  wrote:
>>>
>>> > Thank you Gyula!
>>> >
>>> > I think that is a great idea. I have updated the Google doc to only
>>> have 1
>>> > new configuration option of boolean type, which can be used to signal
>>> the
>>> > Operator to use the old mode.
>>> >
>>> > Also described in the configuration description, the Operator will
>>> fallback
>>> > to the old mode if the FlinkStateSnapshot CRD cannot be found on the
>>> > Kubernetes cluster.
>>> >
>>> > Regards,
>>> > Mate
>>> >
>>> > Gyula Fóra  ezt írta (időpont: 2024. ápr. 16.,
>>> K,
>>> > 17:01):
>>> >
>>> > > Thanks Mate, this is great stuff.
>>> > >
>>> > > Mate, I think the new configs should probably default to the new
>>> mode and
>>> > > they should only be useful for users to fall back to the old
>>> behaviour.
>>> > > We could by default use the new Snapshot CRD if the CRD is installed,
>>> > > otherwise use the old mode by default and log a warning on startup.
>>> > >
>>> > > So I am suggesting a "dynamic" default behaviour based on whether
>>> the new
>>> > > CRD was installed or not because we don't want to break operator
>>> startup.
>>> > >
>>> > > Gyula
>>> > >
>>> > > On Tue, Apr 16, 2024 at 4:48 PM Mate Czagany 
>>> wrote:
>>> > >
>>> > > > Hi Ferenc,
>>> > > >
>>> > > > Thank you for your comments, I have updated the Google docs with a
>>> new
>>> > > > section for the new 

Re: Re: [VOTE] FLIP-417: Expose JobManagerOperatorMetrics via REST API

2024-01-29 Thread Thomas Weise
+1 (binding)


On Mon, Jan 29, 2024 at 5:45 AM Maximilian Michels  wrote:

> +1 (binding)
>
> On Fri, Jan 26, 2024 at 6:03 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > +1(binding)
> >
> > Best,
> > Rui
> >
> > On Fri, Jan 26, 2024 at 11:55 AM Xuyang  wrote:
> >
> > > +1 (non-binding)
> > >
> > >
> > > --
> > >
> > > Best!
> > > Xuyang
> > >
> > >
> > >
> > >
> > >
> > > 在 2024-01-26 10:12:34,"Hang Ruan"  写道:
> > > >Thanks for the FLIP.
> > > >
> > > >+1 (non-binding)
> > > >
> > > >Best,
> > > >Hang
> > > >
> > > >Mason Chen  于2024年1月26日周五 04:51写道:
> > > >
> > > >> Hi Devs,
> > > >>
> > > >> I would like to start a vote on FLIP-417: Expose
> > > JobManagerOperatorMetrics
> > > >> via REST API [1] which has been discussed in this thread [2].
> > > >>
> > > >> The vote will be open for at least 72 hours unless there is an
> > > objection or
> > > >> not enough votes.
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API
> > > >> [2]
> https://lists.apache.org/thread/tt0hf6kf5lcxd7g62v9dhpn3z978pxw0
> > > >>
> > > >> Best,
> > > >> Mason
> > > >>
> > >
>


Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.1, release candidate #1

2023-10-24 Thread Thomas Weise
+1 (binding)

- Verified checksums, signatures, source release content
- Run unit tests

Side note:   mvn clean verifyfails with Java 17 compiler. While the
build target version may be 11, preferably a higher JDK version can be used
to build the source.

 Caused by: java.lang.IllegalAccessError: class
com.google.googlejavaformat.java.RemoveUnusedImports (in unnamed module
@0x44f433db) cannot access class com.sun.tools.javac.util.Context (in
module jdk.compiler) because module jdk.compiler does not export
com.sun.tools.javac.util to unnamed module @0x44f433db

at
com.google.googlejavaformat.java.RemoveUnusedImports.removeUnusedImports(RemoveUnusedImports.java:187)

Thanks,
Thomas


On Sat, Oct 21, 2023 at 7:35 AM Rui Fan <1996fan...@gmail.com> wrote:

> Hi Everyone,
>
> Please review and vote on the release candidate #1 for the version 1.6.1 of
> Apache Flink Kubernetes Operator,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Kubernetes Operator canonical source distribution (including the
> Dockerfile), to be deployed to the release repository at dist.apache.org
> b) Kubernetes Operator Helm Chart to be deployed to the release repository
> at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Docker image to be pushed to dockerhub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a,b) can be found in the corresponding dev repository
> at dist.apache.org [1]
> * All artifacts for c) can be found at the Apache Nexus Repository [2]
> * The docker image for d) is staged on github [3]
>
> All artifacts are signed with the
> key B2D64016B940A7E0B9B72E0D7D0528B28037D8BC [4]
>
> Other links for your review:
> * source code tag "release-1.6.1-rc1" [5]
> * PR to update the website Downloads page to
> include Kubernetes Operator links [6]
> * PR to update the doc version of flink-kubernetes-operator[7]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> **Note on Verification**
>
> You can follow the basic verification guide here[8].
> Note that you don't need to verify everything yourself, but please make
> note of what you have tested together with your +- vote.
>
> [1]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.1-rc1/
> [2]
> https://repository.apache.org/content/repositories/orgapacheflink-1663/
> [3]
>
> https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/139454270?tag=51eeae1
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.1-rc1
> [6] https://github.com/apache/flink-web/pull/690
> [7] https://github.com/apache/flink-kubernetes-operator/pull/687
> [8]
>
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
>
> Best,
> Rui
>


Re: [Discuss] CRD for flink sql gateway in the flink k8s operator

2023-09-19 Thread Thomas Weise
It is already possible to bring up a SQL Gateway as a sidecar utilizing the
pod templates - I tend to also see this more of a documentation/example
issue rather than something that calls for a separate CRD or other
dedicated operator support.

Thanks,
Thomas



On Tue, Sep 19, 2023 at 3:41 PM Gyula Fóra  wrote:

> Based on this I think we should start with simple Helm charts / templates
> for creating the `FlinkDeployment` together with a separate Deployment for
> the SQL Gateway.
> If the gateway itself doesn't integrate well with the operator managed CRs
> (sessionjobs) then I think it's better and simpler to have it separately.
>
> These Helm charts should be part of the operator repo / examples with nice
> docs. If we see that it's useful and popular we can start thinking of
> integrating it into the CRD.
>
> What do you think?
> Gyula
>
> On Tue, Sep 19, 2023 at 6:09 AM Yangze Guo  wrote:
>
> > Thanks for the reply, @Gyula.
> >
> > I would like to first provide more context on OLAP scenarios. In OLAP
> > scenarios, users typically submit multiple short batch jobs that have
> > execution times typically measured in seconds or even sub-seconds.
> > Additionally, due to the lightweight nature of these jobs, they often
> > do not require lifecycle management features and can disable high
> > availability functionalities such as failover and checkpointing.
> >
> > Regarding the integration issue, I believe that supporting the
> > generation of FlinkSessionJob through a gateway is a "nice to have"
> > feature rather than a "must-have." Firstly, it may be overkill to
> > create a CRD for such lightweight jobs, and it could potentially
> > impact the end-to-end execution time of the OLAP job. Secondly, as
> > mentioned earlier, these jobs do not have strong lifecycle management
> > requirements, so having an operator manage them would be a bit wasted.
> > Therefore, atm, we can allow users to directly submit jobs using JDBC
> > or REST API. WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Sep 18, 2023 at 4:08 PM Gyula Fóra  wrote:
> > >
> > > As I wrote in my previous answer, this could be done as a helm chart or
> > as
> > > part of the operator easily. Both would work.
> > > My main concern for adding this into the operator is that the SQL
> Gateway
> > > itself is not properly integrated with the Operator Custom resources.
> > >
> > > Gyula
> > >
> > > On Mon, Sep 18, 2023 at 4:24 AM Shammon FY  wrote:
> > >
> > > > Thanks @Gyula, I would like to share our use of sql-gateway with the
> > Flink
> > > > session cluster and I hope that it could help you to have a clearer
> > > > understanding of our needs :)
> > > >
> > > > As @Yangze mentioned, currently we use flink as an olap platform by
> the
> > > > following steps
> > > > 1. Setup a flink session cluster by flink k8s session with k8s or zk
> > > > highavailable.
> > > > 2.  Write a Helm chart for Sql-Gateway image and launch multiple
> > gateway
> > > > instances to submit jobs to the same flink session cluster.
> > > >
> > > > As we mentioned in docs[1], we hope that users can easily launch
> > > > sql-gateway instances in k8s. Does it only need to add a Helm chart
> for
> > > > sql-gateway, or should we need to add this feature to the flink
> > > > operator? Can you help give the conclusion? Thank you very much
> @Gyula
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/olap_quickstart/
> > > >
> > > > Best,
> > > > Shammon FY
> > > >
> > > >
> > > >
> > > > On Sun, Sep 17, 2023 at 2:02 PM Gyula Fóra 
> > wrote:
> > > >
> > > > > Hi!
> > > > > It sounds pretty easy to deploy the gateway automatically with
> > session
> > > > > cluster deployments from the operator , but there is a major
> > limitation
> > > > > currently. The SQL gateway itself doesn't really support any
> operator
> > > > > integration so jobs submitted through the SQL gateway would not be
> > > > > manageable by the operator (they won't show up as session jobs).
> > > > >
> > > > > Without that, this is a very strange feature. We would make
> something
> > > > much
> > > > > easier for users that is not well supported by the operator in the
> > first
> > > > > place. The operator is designed to manage clusters and jobs
> > > > > (FlinkDeployment / FlinkSessionJob). It would be good to understand
> > if we
> > > > > could make the SQL Gateway create a FlinkSessionJob / Deployment
> > (that
> > > > > would require application cluster support) and basically submit the
> > job
> > > > > through the operator.
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > On Sun, Sep 17, 2023 at 1:26 AM Yangze Guo 
> > wrote:
> > > > >
> > > > > > > There would be many different ways of doing this. One gateway
> per
> > > > > > session cluster, one gateway shared across different clusters...
> > > > > >
> > > > > > Currently, sql gateway cannot be shared across multiple clusters.
> > > > > >
> > > > > > > understand the 

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Thomas Weise
+1, thanks for the proposal

On Tue, Sep 5, 2023 at 8:13 AM Gyula Fóra  wrote:

> Hi All!
>
> @Maximilian Michels  has raised the question of Flink
> version support in the operator before the last release. I would like to
> open this discussion publicly so we can finalize this before the next
> release.
>
> Background:
> Currently the Flink Operator supports all Flink versions since Flink 1.13.
> While this is great for the users, it introduces a lot of backward
> compatibility related code in the operator logic and also adds considerable
> time to the CI. We should strike a reasonable balance here that allows us
> to move forward and eliminate some of this tech debt.
>
> In the current model it is also impossible to support all features for all
> Flink versions which leads to some confusion over time.
>
> Proposal:
> Since it's a key feature of the kubernetes operator to support several
> versions at the same time, I propose to support the last 4 stable Flink
> minor versions. Currently this would mean to support Flink 1.14-1.17 (and
> drop 1.13 support). When Flink 1.18 is released we would drop 1.14 support
> and so on. Given the Flink release cadence this means about 2 year support
> for each Flink version.
>
> What do you think?
>
> Cheers,
> Gyula
>


Re: [VOTE] FLIP-246: Dynamic Kafka Source (originally Multi Cluster Kafka Source)

2023-06-21 Thread Thomas Weise
+1 (binding)


On Mon, Jun 19, 2023 at 8:09 AM Ryan van Huuksloot
 wrote:

> +1 (non-binding)
>
> +1 for DynamicKafkaSource
>
> Ryan van Huuksloot
> Sr. Production Engineer | Streaming Platform
> [image: Shopify]
> 
>
>
> On Mon, Jun 19, 2023 at 8:15 AM Martijn Visser 
> wrote:
>
> > +1 (binding)
> >
> > +1 for DynamicKafkaSource
> >
> >
> > On Sat, Jun 17, 2023 at 5:31 AM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > +1 for either DynamicKafkaSource or DiscoveringKafkaSource
> > >
> > > Cheers,
> > > Gordon
> > >
> > > On Thu, Jun 15, 2023, 10:56 Mason Chen  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thank you to everyone for the feedback on FLIP-246 [1]. Based on the
> > > > discussion thread [2], we have come to a consensus on the design and
> > are
> > > > ready to take a vote to contribute this to Flink.
> > > >
> > > > This voting thread will be open for at least 72 hours (excluding
> > > weekends,
> > > > until June 20th 10:00AM PST) unless there is an objection or an
> > > > insufficient number of votes.
> > > >
> > > > (Optional) If you have an opinion on the naming of the connector,
> > please
> > > > include it in your vote:
> > > > 1. DynamicKafkaSource
> > > > 2. MultiClusterKafkaSource
> > > > 3. DiscoveringKafkaSource
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217389320
> > > > [2] https://lists.apache.org/thread/vz7nw5qzvmxwnpktnofc9p13s1dzqm6z
> > > >
> > > > Best,
> > > > Mason
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-11 Thread Thomas Weise
Hi Mason,

Thanks for the iterations on the FLIP, I think this is in a very good shape
now.

Small correction for the MultiClusterKafkaSourceEnumerator section: "This
reader is responsible for discovering and assigning splits from 1+ cluster"

Regarding the user facing name of the connector: I agree with Gordon that
the defining characteristic is the dynamic discovery vs. the fact that
multiple clusters may be consumed in parallel. (Although, as described in
the FLIP, lossless consumer migration only works with a strategy that
involves intermittent parallel consumption of old and new clusters to drain
and switch.)

I think the "Table" in the name of those SQL connectors should avoid
confusion. Perhaps we can also solicit other ideas? I would throw
"DiscoveringKafkaSource" into the mix.

Cheers,
Thomas




On Fri, Jun 9, 2023 at 3:40 PM Tzu-Li (Gordon) Tai 
wrote:

> > Regarding (2), definitely. This is something we planned to add later on
> but
> so far keeping things common has been working well.
>
> My main worry for doing this as a later iteration is that this would
> probably be a breaking change for the public interface. If that can be
> avoided and planned ahead, I'm fine with moving forward with how it is
> right now.
>
> > DynamicKafkaSource may be confusing because it is really similar to the
> KafkaDynamicSource/Sink (table connectors).
>
> The table / sql Kafka connectors (KafkaDynamicTableFactory,
> KafkaDynamicTableSource / KafkaDynamicTableSink) are all internal classes
> not really meant to be exposed to the user though.
> It can cause some confusion internally for the code maintainers, but on the
> actual public surface I don't see this being an issue.
>
> Thanks,
> Gordon
>
> On Wed, Jun 7, 2023 at 8:55 PM Mason Chen  wrote:
>
> > Hi Gordon,
> >
> > Thanks for taking a look!
> >
> > Regarding (1), there is a need from the readers to send this event at
> > startup because the reader state may reflect outdated metadata. Thus, the
> > reader should not start without fresh metadata. With fresh metadata, the
> > reader can filter splits from state--this filtering capability is
> > ultimately how we solve the common issue of "I re-configured my Kafka
> > source and removed some topic, but it refers to the old topic due to
> state
> > *[1]*". I did not mention this because I thought this is more of a detail
> > but I'll make a brief note of it.
> >
> > Regarding (2), definitely. This is something we planned to add later on
> but
> > so far keeping things common has been working well. In that regard, yes
> the
> > metadata service should expose these configurations but the source should
> > not check it into state unlike the other metadata. I'm going to add it
> to a
> > section called "future enhancements". This is also feedback that Ryan, an
> > interested user, gave earlier in this thread.
> >
> > Regarding (3), that's definitely a good point and there are some real use
> > cases, in addition to what you mentioned, to use this in single cluster
> > mode (see *[1] *above). DynamicKafkaSource may be confusing because it is
> > really similar to the KafkaDynamicSource/Sink (table connectors).
> >
> > Best,
> > Mason
> >
> > On Wed, Jun 7, 2023 at 10:40 AM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > Hi Mason,
> > >
> > > Thanks for updating the FLIP. In principle, I believe this would be a
> > > useful addition. Some comments so far:
> > >
> > > 1. In this sequence diagram [1], why is there a need for a
> > > GetMetadataUpdateEvent from the MultiClusterSourceReader going to the
> > > MultiClusterSourceEnumerator? Shouldn't the enumerator simply start
> > sending
> > > metadata update events to the reader once it is registered at the
> > > enumerator?
> > >
> > > 2. Looking at the new builder API, there's a few configurations that
> are
> > > common across *all *discovered Kafka clusters / topics, specifically
> the
> > > deserialization schema, offset initialization strategy, Kafka client
> > > properties, and consumer group ID. Is there any use case that users
> would
> > > want to have these configurations differ across different Kafka
> clusters?
> > > If that's the case, would it make more sense to encapsulate these
> > > configurations to be owned by the metadata service?
> > >
> > > 3. Is MultiClusterKafkaSource the best name for this connector? I find
> > that
> > > the dynamic aspect of Kafka connectivity to be a more defining
> > > characteristic, and that is the main advantage it has compared to the
> > > static KafkaSource. A user may want to use this new connector over
> > > KafkaSource even if they're just consuming from a single Kafka cluster;
> > for
> > > example, one immediate use case I can think of is Kafka repartitioning
> > with
> > > zero Flink job downtime. They create a new topic with higher
> parallelism
> > > and repartition their Kafka records from the old topic to the new
> topic,
> > > and they want the consuming Flink job to be able to move from the old
> > topic
> > 

Re: [NOTICE] Flink master branch now uses Maven 3.8.6

2023-05-16 Thread Thomas Weise
Thanks a lot Chesnay for getting this squared away! It was quite painful to
have to keep an outdated maven around specifically for building Flink but
many had probably given up on this by now ;-)


On Sun, May 14, 2023 at 9:33 PM yuxia  wrote:

> Thanks Chesnay for the efforts. Happy to see we can use Maven 3.8 finnally.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Jing Ge" 
> 收件人: "dev" 
> 发送时间: 星期六, 2023年 5 月 13日 下午 4:37:58
> 主题: Re: [NOTICE] Flink master branch now uses Maven 3.8.6
>
> Great news! We can finally get rid of additional setup to use maven 3.8.
> Thanks @Chesnay for your effort!
>
> Best regards,
> Jing
>
> On Sat, May 13, 2023 at 5:12 AM David Anderson 
> wrote:
>
> > Chesnay, thank you for all your hard work on this!
> >
> > David
> >
> > On Fri, May 12, 2023 at 4:03 PM Chesnay Schepler 
> > wrote:
> > >
> > >
> > >   What happened?
> > >
> > > I have just merged the last commits to properly support Maven 3.3+ on
> > > the Flink master branch.
> > >
> > > mvnw and CI have been updated to use Maven 3.8.6.
> > >
> > >
> > >   What does this mean for me?
> > >
> > >   * You can now use Maven versions beyond 3.2.5 (duh).
> > >   o Most versions should work, but 3.8.6 was the most tested and is
> > > thus recommended.
> > >   o 3.8.*5* is known to *NOT* work.
> > >   * Starting from 1.18.0 you need to use Maven 3.8.6 for releases.
> > >   o This may change to a later version until the release of 1.18.0.
> > >   o There have been too many issues with recent Maven releases to
> > > make a range acceptable.
> > >   * *All dependencies that are bundled by a module must be marked as
> > > optional.*
> > >   o *This is verified on CI
> > > <
> >
> https://github.com/apache/flink/blob/master/tools/ci/flink-ci-tools/src/main/java/org/apache/flink/tools/ci/optional/ShadeOptionalChecker.java
> > >.*
> > >   o *Background info can be found in the wiki
> > > <
> https://cwiki.apache.org/confluence/display/FLINK/Dependencies
> > >.*
> > >
> > >
> > >   Can I continue using Maven 3.2.5?
> > >
> > > For now, yes, but support will eventually be removed.
> > >
> > >
> > >   Does this affect users?
> > >
> > > No.
> > >
> > >
> > > Please ping me if you run into any issues.
> >
>


Re: [DISCUSS] SqlClient gateway mode: support for URLs

2023-05-03 Thread Thomas Weise
Hi Alex,

Thanks for the investigation and for the ideas on how to improve the SQL
gateway REST client.

I think the solution could come in 2 parts. Near term the existing client
can be patched to support URL mapping and HTTPS based on a standard URL. If
I understand your proposal correctly, that can be implemented in a backward
compatible manner and w/o introducing additional dependencies.

Long term it would make sense to switch to a specialized REST client, or at
least the HTTP client that comes with Java 11 (after Flink says bye to JDK
8 support).

Do you have a JIRA for this work?

Thanks,
Thomas


On Wed, May 3, 2023 at 11:46 AM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> CC'ing some people who worked on the original implementation - curious to
> hear your thoughts.
>
> Alex
>
> On Fri, 28 Apr 2023 at 14:41, Alexander Fedulov <
> alexander.fedu...@gmail.com>
> wrote:
>
> > I would like to discuss the current implementation of the SQL Gateway
> > support in SQL Cli Client and how it can be improved.
> >
> > 1) *hosname:port/v1 vs
> > https://hostname:port/flink-clusters/session-cluster-1/v1 *
> > Currently, the *--endpoint* parameter needs to be specified in the
> > *InetSocketAddress*  format, i.e. *hostname:port.* While this works fine
> > for basic use cases, it does not support the placement of the gateway
> > behind a proxy or using an Ingress for routing to a specific Flink
> cluster
> > based on the URL path.  I.e. it expects *some.hostname.com:9001
> > *  to directly serve requests on *
> some.hostname.com:9001/v1
> > * . Mapping to a non-root location,
> > i.e. *some.hostname.com:9001/flink-clusters/sql-preview-cluster-1/v1
> > *
> >  is not supported. Since the client talks to the gateway via its REST
> > endpoint, I believe that the right format for the *--endpoint*  parameter
> > is *URL*, not *InetSocketAddress* .
> >
> > 2) *HTTPS support*
> > Another related issue is that internally SQL Client uses  Flink’s
> > *RestClient* [1].  This client decides whether to enable SSL not on the
> > basis of the URL schema (https://...), but based on Flink configuration,
> > namely a global *security.ssl.rest.enabled*  parameter [2] (which is also
> > used for the REST server-side configuration ). When this parameter is set
> > to true, it automatically requires user-supplied
> > *security.ssl.rest.truststore*  and *security.ssl.rest.keystore* to be
> > configured - there is no default option to use certificates from JDK. I
> was
> > wondering if there is any real benefit in handling the low-level Netty
> > channels and certificates manually for the use case of connecting between
> >  SQL Cli Client and SQL Gateway REST API.  There is already a dependency
> on
> > *OkHttpClient*  in *flink-metrics*. I would like to hear what you think
> > about switching to *OkHttp* and adding the ability to optionally load
> > custom certificates there rather than patching *RestClient*.
> >
> > [1]
> >
> https://github.com/apache/flink/blob/5dddc0dba2be20806e67769314eecadf56b87a53/flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/ExecutorImpl.java#L359
> > [2]
> >
> https://github.com/apache/flink/blob/5d9e63a16f079399c6b51547284bb96db0326bdb/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClientConfiguration.java#L103
> >
> > Best,
> > Alex Fedulov
> >
>


Re: Kubernetes Operator 1.5.0 release planning

2023-05-02 Thread Thomas Weise
+1 and thanks for volunteering!


On Mon, May 1, 2023 at 5:04 PM Peter Huang 
wrote:

> +1, thanks.
>
> On Mon, May 1, 2023 at 7:58 AM Márton Balassi 
> wrote:
>
> > +1, thanks.
> >
> > On Mon, May 1, 2023 at 4:23 PM Őrhidi Mátyás 
> > wrote:
> >
> > > +1 SGTM.
> > >
> > > Cheers,
> > > Matyas
> > >
> > > On Wed, Apr 26, 2023 at 11:43 AM Hao t Chang 
> wrote:
> > >
> > > > Agree. I will help.
> > > >
> > > >
> > >
> >
>


Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-27 Thread Thomas Weise
Is the intention to bump the Flink major version and only support Java 17+?
If so, can Scala not be upgraded at the same time?

Thanks,
Thomas


On Thu, Apr 27, 2023 at 4:53 PM Martijn Visser 
wrote:

> Scala 2.12.7 doesn't compile on Java 17, see
> https://issues.apache.org/jira/browse/FLINK-25000.
>
> On Thu, Apr 27, 2023 at 3:11 PM Jing Ge  wrote:
>
> > Thanks Tamir for the information. According to the latest comment of the
> > task FLINK-24998, this bug should be gone while using the latest JDK 17.
> I
> > was wondering whether it means that there are no more issues to stop us
> > releasing a major Flink version to support Java 17? Did I miss something?
> >
> > Best regards,
> > Jing
> >
> > On Thu, Apr 27, 2023 at 8:18 AM Tamir Sagi 
> > wrote:
> >
> >> More details about the JDK bug here
> >> https://bugs.openjdk.org/browse/JDK-8277529
> >>
> >> Related Jira ticket
> >> https://issues.apache.org/jira/browse/FLINK-24998
> >>
> >> --
> >> *From:* Jing Ge via user 
> >> *Sent:* Monday, April 24, 2023 11:15 PM
> >> *To:* Chesnay Schepler 
> >> *Cc:* Piotr Nowojski ; Alexis Sarda-Espinosa <
> >> sarda.espin...@gmail.com>; Martijn Visser ;
> >> dev@flink.apache.org ; user <
> u...@flink.apache.org>
> >> *Subject:* Re: [Discussion] - Release major Flink version to support JDK
> >> 17 (LTS)
> >>
> >>
> >> *EXTERNAL EMAIL*
> >>
> >>
> >> Thanks Chesnay for working on this. Would you like to share more info
> >> about the JDK bug?
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
> >> wrote:
> >>
> >> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
> >>
> >> On 31/03/2023 08:57, Chesnay Schepler wrote:
> >>
> >>
> >>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
> >>
> >> Kroy themselves state that v5 likely can't read v2 data.
> >>
> >> However, both versions can be on the classpath without classpath as v5
> >> offers a versioned artifact that includes the version in the package.
> >>
> >> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> >> purely from a read/write perspective.
> >>
> >> The bigger question is how we expose this new Kryo version in the API.
> If
> >> we stick to the versioned jar we need to either duplicate all current
> >> Kryo-related APIs or find a better way to integrate other serialization
> >> stacks.
> >> On 30/03/2023 17:50, Piotr Nowojski wrote:
> >>
> >> Hey,
> >>
> >> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> >> which means breaking all checkpoint/savepoint compatibility and
> releasing a
> >> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API
> support
> >> dropped. This is probably the quickest way, but would still mean that we
> >> expose Kryo in the Flink APIs, which is the main reason why we haven't
> been
> >> able to upgrade Kryo at all.
> >>
> >> This sounds pretty bad to me.
> >>
> >> Has anyone looked into what it would take to provide a smooth migration
> >> from Kryo2 -> Kryo5?
> >>
> >> Best,
> >> Piotrek
> >>
> >> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa <
> sarda.espin...@gmail.com>
> >> napisał(a):
> >>
> >> Hi Martijn,
> >>
> >> just to be sure, if all state-related classes use a POJO serializer,
> Kryo
> >> will never come into play, right? Given FLINK-16686 [1], I wonder how
> many
> >> users actually have jobs with Kryo and RocksDB, but even if there aren't
> >> many, that still leaves those who don't use RocksDB for
> >> checkpoints/savepoints.
> >>
> >> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
> >> users choose between v2/v5 jars by separating them like log4j2 jars?
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-16686
> >>
> >> Regards,
> >> Alexis.
> >>
> >> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
> >> martijnvis...@apache.org>:
> >>
> >> Hi all,
> >>
> >> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
> >> which I'm including in this discussion thread to avoid that it gets
> lost.
> >>
> >> From my perspective, there's two main ways to get to Java 17:
> >>
> >> 1. The Flink community agrees that we upgrade Kryo to a later version,
> >> which means breaking all checkpoint/savepoint compatibility and
> releasing a
> >> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API
> support
> >> dropped. This is probably the quickest way, but would still mean that we
> >> expose Kryo in the Flink APIs, which is the main reason why we haven't
> been
> >> able to upgrade Kryo at all.
> >> 2. There's a contributor who makes a contribution that bumps Kryo, but
> >> either a) automagically reads in all old checkpoints/savepoints in using
> >> Kryo v2 and writes them to new snapshots using Kryo v5 (like is
> mentioned
> >> in the Kryo migration guide [2][3] or b) provides an offline tool that
> >> allows users that are interested in migrating their snapshots manually
> >> before 

Re: Flink k8s native support - pod deployments and upgrades

2023-03-03 Thread Thomas Weise
Hi,

The Flink k8s native integration does not handle upgrades. That's what
flink-kubernates-operator was built for. Please check out:

https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/

Thanks,
Thomas


On Fri, Mar 3, 2023 at 2:02 AM ramkrishna vasudevan 
wrote:

> Hi All,
>
> The native implementation of the App mode and session mode - does not have
> any replica set .
> Instead it just allows the JM to create TM pods on demand.
>
> This is simple and easy in terms of creation of resources, but for an
> upgrade story, how is this managed? Leaving K8s to manage a replica set
> based upgrade might be easier right?
>
> Just wanted to understand how upgrades are handled in native K8s mode.
>
> Regards
> Ram
>


Re: [VOTE] Flink minor version support policy for old releases

2023-03-01 Thread Thomas Weise
+1 (binding)

Thanks,
Thomas

On Tue, Feb 28, 2023 at 6:53 AM Sergey Nuyanzin  wrote:

> +1 (non-binding)
>
> Thanks for driving this Danny.
>
> On Tue, Feb 28, 2023 at 9:41 AM Samrat Deb  wrote:
>
> > +1 (non binding)
> >
> > Thanks for driving it
> >
> > Bests,
> > Samrat
> >
> > On Tue, 28 Feb 2023 at 1:36 PM, Junrui Lee  wrote:
> >
> > > Thanks Danny for driving it.
> > >
> > > +1 (non-binding)
> > >
> > > Best regards,
> > > Junrui
> > >
> > > yuxia  于2023年2月28日周二 14:04写道:
> > >
> > > > Thanks Danny for driving it.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > - 原始邮件 -
> > > > 发件人: "Weihua Hu" 
> > > > 收件人: "dev" 
> > > > 发送时间: 星期二, 2023年 2 月 28日 下午 12:48:09
> > > > 主题: Re: [VOTE] Flink minor version support policy for old releases
> > > >
> > > > Thanks, Danny.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Weihua
> > > >
> > > >
> > > > On Tue, Feb 28, 2023 at 12:38 PM weijie guo <
> guoweijieres...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Thanks Danny for bring this.
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Jing Ge  于2023年2月27日周一 20:23写道:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > BTW, should we follow the content style [1] to describe the new
> > rule
> > > > > using
> > > > > > 1.2.x, 1.1.y, 1.1.z?
> > > > > >
> > > > > > [1]
> > > https://flink.apache.org/downloads/#update-policy-for-old-releases
> > > > > >
> > > > > > Best regards,
> > > > > > Jing
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl
> > > > > >  wrote:
> > > > > >
> > > > > > > Thanks, Danny. Sounds good to me.
> > > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer <
> > > > > dannycran...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I am starting a vote to update the "Update Policy for old
> > > releases"
> > > > > [1]
> > > > > > > to
> > > > > > > > include additional bugfix support for end of life versions.
> > > > > > > >
> > > > > > > > As per the discussion thread [2], the change we are voting on
> > is:
> > > > > > > > - Support policy: updated to include: "Upon release of a new
> > > Flink
> > > > > > minor
> > > > > > > > version, the community will perform one final bugfix release
> > for
> > > > > > resolved
> > > > > > > > critical/blocker issues in the Flink minor version losing
> > > support."
> > > > > > > > - Release process: add a step to start the discussion thread
> > for
> > > > the
> > > > > > > final
> > > > > > > > patch version, if there are resolved critical/blocking issues
> > to
> > > > > flush.
> > > > > > > >
> > > > > > > > Voting schema: since our bylaws [3] do not cover this
> > particular
> > > > > > > scenario,
> > > > > > > > and releases require PMC involvement, we will use a consensus
> > > vote
> > > > > with
> > > > > > > PMC
> > > > > > > > binding votes.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Danny
> > > > > > > >
> > > > > > > > [1]
> > > > > > >
> > > >
> https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > > > > > > > [2]
> > > > https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> > > > > > > > [3]
> > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Best regards,
> Sergey
>


Re: [DISCUSS] Flink minor version support policy for old releases

2023-02-17 Thread Thomas Weise
+1

I would not be surprised to also see future requests to extend the window
for older releases in general. Rollout of upgrades may not keep pace with
our current release policy for a variety of reasons.

Thanks,
Thomas


On Fri, Feb 17, 2023 at 12:38 PM Jing Ge  wrote:

> +1
>
> Thanks Danny for starting the discussion! This proposal makes a lot of
> sense. Given that the Flink development is so active, there should be
> always some bugs fix with the previous minor version waiting to be
> "flushed" during the time when a new minor version after the current minor
> version is prepared to be released, i.e. the minor version 1.0.z will be
> cleaned up and retired with a patch release 1.0.(z+1) before it goes
> officially out of support.
>
> Best regards,
> Jing
>
>
> On Fri, Feb 17, 2023 at 5:58 PM Konstantin Knauf 
> wrote:
>
> > Hi Danny,
> >
> > yes, this makes a lot of sense. We should probably by default just do one
> > more patch release shortly after the official support window has ended to
> > flush out all bug fixes.
> >
> > Cheers,
> >
> > Konstantin
> >
> > Am Fr., 17. Feb. 2023 um 17:52 Uhr schrieb Danny Cranmer <
> > dannycran...@apache.org>:
> >
> > > Hello all,
> > >
> > > As proposed by Matthias in a separate thread [1], I would like to
> start a
> > > discussion on changing the policy wording to include the release of bug
> > > fixes during their support window. Our current policy [2] is to only
> > > support the latest two minor versions: " If 1.17.x is the current
> > release,
> > > 1.16.y is the previous minor supported release. Both versions will
> > receive
> > > bugfixes for critical issues.". However there may be bug fixes that
> have
> > > been resolved but not released during their support window. Consider
> this
> > > example:
> > > 1. Current Flink versions are 1.15.3 and 1.16.1
> > > 2. We fix bugs for 1.15.3
> > > 3. 1.17.0 is released
> > > 4. The 1.15 bug fixes will now not be released unless we get an
> exception
> > >
> > > The current process is subject to race conditions between releases.
> > Should
> > > we upgrade the policy to allow bugfix releases to support issues that
> > were
> > > resolved during their support window. I propose we update the policy to
> > > include:
> > >
> > > "Upon release of a new Flink minor version, the community will perform
> > one
> > > final bugfix release for resolved critical/blocker issues in the Flink
> > > version losing support."
> > >
> > > Let's discuss.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1] https://lists.apache.org/thread/019wsqqtjt6h0cb81781ogzldbjq0v48
> > > [2]
> > https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > >
> >
> >
> > --
> > https://twitter.com/snntrable
> > https://github.com/knaufk
> >
>


Re: [VOTE] Apache Flink Kubernetes Operator Release 1.3.1, release candidate #1

2023-01-31 Thread Thomas Weise
Hi,

Sorry for the late reply to this thread, but in the meantime we learned
that the assumption based on which the above mentioned change to
upgradeMode was approved does not hold true. The assumption was that the
generation id in spec metadata and reconciled spec can be used to determine
if changes are reconciled or not, rather than comparing the full specs.

https://issues.apache.org/jira/browse/FLINK-30858

One way to get around the issue is to repeat the operator's spec diff logic
on the client side, but that introduces a tighter coupling with the
operator implementation than desirable. It would be good to fix the
observed generation update in the operator so that generation ids can be
compared safely, including the scenario where the diff is empty but the
generation id has increased.

Thanks,
Thomas


On Mon, Jan 9, 2023 at 11:27 AM Maximilian Michels  wrote:

> +1 (binding)
>
> Thanks for clarifying. I wanted to make sure this is not an unintended
> regression.
>
> On Mon, Jan 9, 2023 at 4:26 PM Gyula Fóra  wrote:
>
> > @Maximilian Michels  this is a completely intentional
> > improvement and it is required to ensure consistency for some operations
> > within the operator logic.
> >
> > On Mon, Jan 9, 2023 at 4:06 PM Maximilian Michels 
> wrote:
> >
> > > +0
> > >
> > > 1. Downloaded the source archive release staged at
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.3.1-rc1/
> > > 2. Verified the signature
> > > 3. Inspected the extracted source code for binaries
> > > 4. Compiled the source code
> > > 5. Verified license files / headers
> > > 6. Deployed to test environment
> > >
> > > I see an issue with (6), I noticed that if "upgradeMode" gets set to
> > > "last-state" for a fresh deployment, the `lastReconciledSpec` field
> > yields
> > > `stateless`. This is an issue when users compare the current spec to
> the
> > > lastReconciledSpec to assess whether the spec was reconciled. I suppose
> > > there are other means to ensure reconciliation, e.g. by looking at the
> > > generation id or similar. Just wanted to double check that this is what
> > we
> > > want.
> > >
> > > -Max
> > >
> > > On Wed, Jan 4, 2023 at 10:07 PM Hao t Chang 
> wrote:
> > >
> > > > I did the following:
> > > > Ran OLM bundle CI test suite for Kubernetes.
> > > > Generated and Deployed OLM bundle.
> > > > Created standalone/session jobs.
> > > > All Look good. Thanks for managing the release!
> > > >
> > > > --
> > > > Best,
> > > > Ted Chang | Software Engineer | htch...@us.ibm.com
> > > >
> > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-30858) Kubernetes operator does not update reconciled generation

2023-01-31 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-30858:


 Summary: Kubernetes operator does not update reconciled generation
 Key: FLINK-30858
 URL: https://issues.apache.org/jira/browse/FLINK-30858
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.1
Reporter: Thomas Weise


Kubernetes manages the generation field as part of the spec metadata. It will 
be increased when changes are made to the resource. The counterpart in status 
is "observed generation", provided by a controller. By comparing the two, the 
client can determine that the controller has processed the spec and in 
conjunction with other status information conclude that a change has been 
reconciled.

The Flink operator currently tracks the generation as part of reconciled and 
stable specs but these cannot be used as "observed generation" to perform the 
check. The value isn't updated in cases where operator determines that there 
are no changes to the spec that require deployment. This can be reproduced 
through PUT/replace with the same spec or a change in upgrade mode.

The operator should provide the observed spec, which in conjunction with 
deployment state can then be used by clients to determine that the spec has 
been reconciled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-opensearch, release candidate #1

2022-12-20 Thread Thomas Weise
+1 (binding)

* Checked hash and signature
* Build from source and run tests
* Checked licenses



On Mon, Dec 19, 2022 at 1:50 PM Maximilian Michels  wrote:

> +1 (binding)
>
> Release looks good.
>
> 1. Downloaded the source archive release staged at
>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-opensearch-1.0.0-rc1/
> 2. Verified the signature
> 3. Inspect extracted source code for binaries
> 4. Compiled the source code
> 5. Verified license files / headers
>
> -Max
>
> On Mon, Dec 19, 2022 at 11:16 AM Sergey Nuyanzin 
> wrote:
>
> > +1 (non-binding)
> >
> > - Validated hashes and signature
> > - Verified that no binaries exist in the source archive
> > - Build from sources with Maven
> > - Verified licenses
> >
> > On Sat, Dec 17, 2022 at 8:34 AM Martijn Visser  >
> > wrote:
> >
> > > Hi Danny,
> > >
> > > +1 (binding)
> > >
> > > - Validated hashes
> > > - Verified signature
> > > - Verified that no binaries exist in the source archive
> > > - Build the source with Maven
> > > - Verified licenses
> > > - Verified web PRs
> > >
> > > Thanks for the help!
> > >
> > > Best regards, Martijn
> > >
> > > On Fri, Dec 16, 2022 at 5:37 PM Danny Cranmer  >
> > > wrote:
> > >
> > > > Apologies I messed up the link to "the official Apache source release
> > to
> > > be
> > > > deployed to dist.apache.org" [1]
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-opensearch-1.0.0-rc1
> > > >
> > > > On Fri, Dec 16, 2022 at 2:04 PM Ahmed Hamdy 
> > > wrote:
> > > >
> > > > > Thank you Danny,
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > * Hashes and Signatures look good
> > > > > * Tag is present in Github
> > > > > * Verified source archive does not contain any binary files
> > > > > * Source archive builds using maven
> > > > > * Verified Notice and Licence files
> > > > >
> > > > > On Fri, 16 Dec 2022 at 12:41, Danny Cranmer <
> dannycran...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > > Please review and vote on the release candidate #1 for the
> version
> > > > 1.0.0,
> > > > > > as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release to be deployed to
> > > dist.apache.org
> > > > > > [2],
> > > > > > which are signed with the key with fingerprint 125FD8DB [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag v1.0.0-rc1 [5],
> > > > > > * website pull request listing the new release [6].
> > > > > > * pull request to integrate opensearch docs into the Flink docs
> > [7].
> > > > > >
> > > > > > The vote will be open for at least 72 hours excluding weekends
> > > > (Wednesday
> > > > > > 21st December 13:00 UTC). It is adopted by majority approval,
> with
> > at
> > > > > least
> > > > > > 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks,
> > > > > > Danny
> > > > > >
> > > > > > [1]
> > https://issues.apache.org/jira/projects/FLINK/versions/12352293
> > > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-
> > > > > > -${NEW_VERSION}-rc${RC_NUM}
> > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > > [4]
> > > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1569
> > > > > > [5]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink-connector-opensearch/releases/tag/v1.0.0-rc1
> > > > > > [6] https://github.com/apache/flink-web/pull/596
> > > > > > [7] https://github.com/apache/flink/pull/21518
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
> >
>


Re: Edit Wiki for Release Verification Steps

2022-12-08 Thread Thomas Weise
Hi Jim,

You should have access now (user: jbusche)

Thanks


On Thu, Dec 8, 2022 at 7:30 PM Jim Busche  wrote:

> I'd like to be able to edit the wiki page:
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
>
> It says it's unrestricted, but maybe I need to be added to the Flink space
> first?  jbus...@us.ibm.com
>
> Thanks!  Jim
>


Re: [VOTE] FLIP-271: Autoscaling

2022-11-27 Thread Thomas Weise
+1 (binding)


On Sat, Nov 26, 2022 at 8:11 AM Zheng Yu Chen  wrote:

> +1(no-binding)
>
> Maximilian Michels  于 2022年11月24日周四 上午12:25写道:
>
> > Hi everyone,
> >
> > I'd like to start a vote for FLIP-271 [1] which we previously discussed
> on
> > the dev mailing list [2].
> >
> > I'm planning to keep the vote open for at least until Tuesday, Nov 29.
> >
> > -Max
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
> > [2] https://lists.apache.org/thread/pvfb3fw99mj8r1x8zzyxgvk4dcppwssz
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Matyas Orhidi

2022-11-21 Thread Thomas Weise
Congrats Matyas!


On Mon, Nov 21, 2022 at 6:28 PM Őrhidi Mátyás 
wrote:

> Thanks folks,
>
> So proud and honored to be part of the pack!
>
> Cheers,
> Matyas
>
> On Mon, Nov 21, 2022 at 12:32 PM Danny Cranmer 
> wrote:
>
> > Congrats Matyas!
> >
> > On Mon, 21 Nov 2022, 17:51 ramkrishna vasudevan, <
> ramvasu.fl...@gmail.com>
> > wrote:
> >
> > > Congrats Matayas
> > >
> > > On Mon, Nov 21, 2022 at 10:06 PM Jim Busche 
> wrote:
> > >
> > > > Congratulations Matyas!
> > > >
> > > > Jim
> > > > --
> > > > James Busche | Sr. Software Engineer, Watson AI and Data Open
> > Technology
> > > |
> > > > 408-460-0737 | jbus...@us.ibm.com
> > > >
> > > >
> > > >
> > > >
> > > > From: Márton Balassi 
> > > > Date: Monday, November 21, 2022 at 6:18 AM
> > > > To: Flink Dev , morh...@apache.org <
> > > > morh...@apache.org>
> > > > Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Matyas
> > Orhidi
> > > > Hi everyone,
> > > >
> > > > On behalf of the PMC, I'm very happy to announce Matyas Orhidi as a
> new
> > > > Flink
> > > > committer.
> > > >
> > > > Matyas has over a decade of experience of the Big Data ecosystem and
> > has
> > > > been working with Flink full time for the past 3 years. In the open
> > > source
> > > > community he is one of the key driving members of the Kubernetes
> > Operator
> > > > subproject. He implemented multiple key features in the operator
> > > including
> > > > the metrics system and the ability to dynamically configure watched
> > > > namespaces. He enjoys spreading the word about Flink and regularly
> does
> > > so
> > > > via authoring blogposts and giving talks or interviews representing
> the
> > > > community.
> > > >
> > > > Please join me in congratulating Matyas for becoming a Flink
> committer!
> > > >
> > > > Best,
> > > > Marton
> > > >
> > >
> >
>


Re: Env Vars in the Flink Web UI

2022-11-16 Thread Thomas Weise
+1 for d) don't print any env vars in the web UI (at least by default)

There could be an option to allow-list printing of env vars but it should
be off by default.

Generally I think that those that should be able to see env vars probably
can get there by other means, like kubetcl exec

Thanks,
Thomas


On Wed, Nov 16, 2022 at 4:43 PM Konstantin Knauf  wrote:

> Hi everyone,
>
> thanks a lot for your feedback so far. Right now, we have pretty much a
> consensus to not show environment variables at all in the Web UI going
> forward.
>
> I think, we can address this in 1.16.1, as I consider this a vulnerability
> that should be addressed in a patch release rather than waiting for the
> next minor release.
>
> Are there any other suggestions on how to proceed?
>
> Cheers,
>
> Konstantin
>
>
>
> Am Mi., 16. Nov. 2022 um 09:23 Uhr schrieb Gyula Fóra <
> gyula.f...@gmail.com
> >:
>
> > I am not opposed to removing this completely based on Chesnay's
> reasoning.
> > In general I agree that this feature probably does more harm than good.
> >
> > Gyula
> >
> > On Wed, Nov 16, 2022 at 9:13 AM Chesnay Schepler 
> > wrote:
> >
> > > I'm inclined to go with d), removing it entirely.
> > >
> > > I must admit that I liked the idea behind the change; exposing more
> > > information about what might impact Flink's behavior is a good thing,
> > > although I'm irked that the statement in the FLIP about env variables
> > > already being exposed in the logs just isn't correct.
> > >
> > > I was quite disappointed when I saw that b) wasn't already implemented.
> > > It is concerning that this actually made it's way through the review
> > > rounds as-is.
> > >
> > > That being said I don't think that b) would be sufficient in anyway;
> the
> > > desensitization logic for config options is quite limited (and
> > > inherently not perfect), but config options are too important to not
> log
> > > them. This isn't really the case for environment variables that have
> > > limited effects on Flink, and it's just too easy to leak secrets.
> > > Oh you abbreviated PASSWORD to PW? Well you just leaked it.
> > >
> > > This brings us to c), where my immediate gut instinct was that no ones
> > > gonna bother to do that.
> > >
> > > as for e*) (opt-in flag that Gyula proposed); I think it's to easy to
> > > shoot yourself in the foot somewhere down the line. It may be fine at
> > > one point but setups evolve after all, and this seems like something to
> > > easily slip through.
> > >
> > > On 15/11/2022 15:41, Konstantin Knauf wrote:
> > > > Hi everyone,
> > > >
> > > > important correction, this is since 1.16.0, not 1.17+.
> > > >
> > > > Best,
> > > >
> > > > Konstantin
> > > >
> > > > Am Di., 15. Nov. 2022 um 14:25 Uhr schrieb Gyula Fóra <
> > > gyula.f...@gmail.com
> > > >> :
> > > >> Thanks for bringing this important issue to discussion Konstantin!
> > > >>
> > > >> I am in favor of not showing them by default with an optional
> > > configuration
> > > >> to enable it.
> > > >> Otherwise this poses a big security risk of exposing previously
> hidden
> > > >> information after upgrade.
> > > >>
> > > >> Gyula
> > > >>
> > > >> On Tue, Nov 15, 2022 at 2:15 PM Maximilian Michels 
> > > wrote:
> > > >>
> > > >>> Hey Konstantin,
> > > >>>
> > > >>> I'd be in favor of not printing them at all, i.e. option (d). We
> have
> > > the
> > > >>> configuration page which lists the effective config and already
> > removes
> > > >> any
> > > >>> known secrets.
> > > >>>
> > > >>> -Max
> > > >>>
> > > >>> On Tue, Nov 15, 2022 at 11:26 AM Konstantin Knauf <
> kna...@apache.org
> > >
> > > >>> wrote:
> > > >>>
> > >  Hi all,
> > > 
> > >  since Flink 1.17 [1] the Flink Web UI prints *all* environment
> > > >> variables
> > > >>> of
> > >  the Taskmanager or Jobmanagers hosts (Jobmanager -> Configuration
> ->
> > >  Environment). Given that environment variables are often used to
> > store
> > >  sensitive information, I think, it is wrong and dangerous to print
> > > >> those
> > > >>> in
> > >  the Flink Web UI. Specifically, thinking about how Kubernetes
> > Secrets
> > > >> are
> > >  usually injected into Pods.
> > > 
> > >  One could argue that anyone who can submit a Flink Job to a
> cluster
> > > has
> > >  access to these environment variables anyway, but not everyone who
> > has
> > >  access to the Flink UI can submit a Flink Job.
> > > 
> > >  I see the the following options:
> > >  a) leave as is
> > >  b) apply same obfuscation as in flink-conf.yaml based on some
> > > heuristic
> > > >>> (no
> > >  "secret", "password" in env var name)
> > >  c) only print allow-listed values
> > >  d) don't print any env vars in the web UI (at least by default)
> > > 
> > >  What do you think?
> > > 
> > >  Cheers,
> > > 
> > >  Konstantin
> > > 
> > >  [1] https://issues.apache.org/jira/browse/FLINK-28311
> > > 
> > >  --
> > >  

[jira] [Created] (FLINK-30004) Cannot resume deployment after suspend with savepoint due to leftover confgmaps

2022-11-12 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-30004:


 Summary: Cannot resume deployment after suspend with savepoint due 
to leftover confgmaps
 Key: FLINK-30004
 URL: https://issues.apache.org/jira/browse/FLINK-30004
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: 1.2
Reporter: Thomas Weise
Assignee: Thomas Weise


Due to the possibility of incomplete cleanup of HA data in Flink 1.14, the 
deployment can get into a limbo state that requires manual intervention after 
suspend with savepoint. If the config maps are not cleaned up the resumed job 
will be considered finished and the operator recognize the JM deployment as 
missing. Due to check for HA data which are now cleaned up, the job fails to 
start and manual redeployment with initial savepoint is necessary.

This can be avoided by removing any leftover HA config maps after the job has 
successfully stopped with savepoint (upgrade mode savepoint).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Create a dedicated aws-base connector repository

2022-10-24 Thread Thomas Weise
Hi Danny,

I'm also leaning slightly towards the single AWS connector repo direction.

Bumps in the underlying AWS SDK would bump all of the connectors in any
case. And if a change occurs that is isolated to a single connector, then
those that do not use that connector can just skip the release.

Cheers,
Thomas


On Mon, Oct 24, 2022 at 3:01 PM Teoh, Hong 
wrote:

> I like the single repo with single version idea.
>
> Pros:
> - Better discoverability for connectors for AWS services means a better
> experience for Flink users
> - Natural placement of AWS-related utils (Credentials, SDK Retry strategy)
>
> Caveats:
> - As you mentioned, it is not desirable if we have to evolve the major
> version of the connector just for a change in a single connector (e.g.
> DynamoDB). However, I think it is reasonable to only evolve the major
> version of the AWS connector repo when there are Flink Source/Sink API
> upgrades or AWS SDK major upgrades (probably quire rare). Any new features
> for individual connectors can be collapsed into minor releases.
> - An additional callout here is that we should be careful adopting any AWS
> connectors that don't use the AWS SDK directly (e.g. how the Kinesis
> connector used KPL for a long time). In my opinion, any new connectors like
> that would be better placed in their own repositories, otherwise we will
> have a complex mesh of dependencies to manage.
>
> Regards,
> Hong
>
>
>
>
> On 21/10/2022, 16:59, "Danny Cranmer"  wrote:
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
> Thanks Chesnay for the suggestion, I will investigate this option.
>
> Related to the single repo idea, I have considered it in the past. Are
> you
> proposing we also use a single version between all connectors? If we
> have a
> single version then it makes sense to combine them in a single repo, if
> they are separate versions, then splitting them makes sense. This was
> discussed last year more generally [1] and the consensus was "we
> ultimately
> propose to have a single repository per connector".
>
> Combining all AWS connectors into a single repo with a single version
> is
> inline with how the AWS SDK works, therefore AWS users are familiar
> with
> this approach. However it is frustrating that we would have to release
> all
> connectors to fix a bug or add a feature in one of them. Example: a
> user is
> using Kinesis Data Streams only (the most popular and mature
> connector),
> and we evolve the version from 1.x to 2.y (or 1.x to 1.y) for a
> DynamoDB
> change.
>
> I am torn and will think some more, but it would be great to hear other
> people's opinions.
>
> [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
>
> Thanks,
> Danny
>
> On Fri, Oct 21, 2022 at 3:11 PM Jing Ge  wrote:
>
> > I agree with Jark. It would be easier for the further development and
> > maintenance, if all aws related connectors and the base module are
> in the
> > same repo. It might make sense to upgrade the
> flink-connector-dynamodb to
> > flink-connector-aws and move the other modules including the
> > flink-connector-aws-base into it. The aws sdk could be managed in
> > flink-connector-aws-base. Any future common connector features could
> also
> > be developed in the base module.
> >
> > Best regards,
> > Jing
> >
> > On Fri, Oct 21, 2022 at 1:26 PM Jark Wu  wrote:
> >
> >> How about creating a new repository flink-connector-aws and merging
> >> dynamodb, kinesis firehouse into it?
> >> This can reduce the maintenance for complex dependencies and make
> the
> >> release easy.
> >> I think the maintainers of aws-releated connectors are the same
> people.
> >>
> >> Best,
> >> Jark
> >>
> >> > 2022年10月21日 17:41,Chesnay Schepler  写道:
> >> >
> >> > I would not go with 2); I think it'd just be messy .
> >> >
> >> > Here's another option:
> >> >
> >> > Create another repository (aws-connector-base) (following the
> >> externalization model), add it as a sub-module to the downstream
> >> repositories, and make it part of the release process of said
> connector.
> >> >
> >> > I.e., we never create a release for aws-connector-bose, but
> release it
> >> as part of the connector.
> >> > This main benefit here is that we'd always be able to make
> changes to
> >> the aws-base code without delaying connector releases.
> >> > I would assume that any added overhead due to _technically_
> releasing
> >> the aws code multiple times to be negligible.
> >> >
> >> >
> >> > On 20/10/2022 22:38, Danny Cranmer wrote:
> >> >> Hello all,
> >> >>
> >> >> Currently we have 2 AWS Flink connectors in the main Flink
> codebase
> >> >> 

[jira] [Created] (FLINK-29634) Support periodic checkpoint triggering

2022-10-13 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-29634:


 Summary: Support periodic checkpoint triggering
 Key: FLINK-29634
 URL: https://issues.apache.org/jira/browse/FLINK-29634
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Thomas Weise


Similar to the support for periodic savepoints, the operator should support 
triggering periodic checkpoints to break the incremental checkpoint chain.

Support for external triggering will come with 1.17: 
https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Externalized connector release details​

2022-10-13 Thread Thomas Weise
+1 (binding) for the vote and thanks for the explanation

On Thu, Oct 13, 2022 at 5:58 AM Chesnay Schepler  wrote:

> @Thomas:
> Version-specific modules that either contain a connector or shims to
> support that Flink version.
> Alternatively, since the addition of such code (usually) goes beyond a
> patch release you'd create a new minor version and could have that only
> support the later version.
>
> On 13/10/2022 02:05, Thomas Weise wrote:
> > "Branches are not specific to a Flink version. (i.e., no v3.2-1.15)"
> >
> > Sorry for the late question. I could not find in the discussion thread
> how
> > a connector can make use of features of the latest Flink version that
> were
> > not present in the previous Flink version, when branches cannot be Flink
> > version specific?
> >
> > Thanks,
> > Thomas
> >
> > On Wed, Oct 12, 2022 at 4:09 PM Ferenc Csaky  >
> > wrote:
> >
> >> +1 from my side (non-binding)
> >>
> >> Best,
> >> F
> >>
> >>
> >> --- Original Message ---
> >> On Wednesday, October 12th, 2022 at 15:47, Martijn Visser <
> >> martijnvis...@apache.org> wrote:
> >>
> >>
> >>>
> >>> +1 (binding), I am indeed assuming that Chesnay meant the last two
> minor
> >>> versions as supported.
> >>>
> >>> Op wo 12 okt. 2022 om 20:18 schreef Danny Cranmer
> >> dannycran...@apache.org
> >>>> Thanks for the concise summary Chesnay.
> >>>>
> >>>> +1 from me (binding)
> >>>>
> >>>> Just one clarification, for "3.1) The Flink versions supported by the
> >>>> project (last 2 major Flink versions) must be supported.". Do we
> >> actually
> >>>> mean major here, as in Flink 1.x.x and 2.x.x? Right now we would only
> >>>> support Flink 1.15.x and not 1.14.x? I would be inclined to support
> the
> >>>> latest 2 minor Flink versions (major.minor.patch) given that we only
> >> have 1
> >>>> active major Flink version.
> >>>>
> >>>> Danny
> >>>>
> >>>> On Wed, Oct 12, 2022 at 2:12 PM Chesnay Schepler ches...@apache.org
> >>>> wrote:
> >>>>
> >>>>> Since the discussion
> >>>>> (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo)
> >> has
> >>>>> stalled a bit but we need a conclusion to move forward I'm opening a
> >>>>> vote.
> >>>>>
> >>>>> Proposal summary:
> >>>>>
> >>>>> 1) Branch model
> >>>>> 1.1) The default branch is called "main" and used for the next major
> >>>>> iteration.
> >>>>> 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> >>>>> 1.3) Branches are not specific to a Flink version. (i.e., no
> >> v3.2-1.15)
> >>>>> 2) Versioning
> >>>>> 2.1) Source releases: major.minor.patch
> >>>>> 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> >>>>> (This may imply releasing the exact same connector jar multiple times
> >>>>> under different versions)
> >>>>>
> >>>>> 3) Flink compatibility
> >>>>> 3.1) The Flink versions supported by the project (last 2 major Flink
> >>>>> versions) must be supported.
> >>>>> 3.2) How this is achived is left to the connector, as long as it
> >>>>> conforms to the rest of the proposal.
> >>>>>
> >>>>> 4) Support
> >>>>> 4.1) The last 2 major connector releases are supported with only the
> >>>>> latter receiving additional features, with the following exceptions:
> >>>>> 4.1.a) If the older major connector version does not support any
> >>>>> currently supported Flink version, then it is no longer supported.
> >>>>> 4.1.b) If the last 2 major versions do not cover all supported Flink
> >>>>> versions, then the latest connector version that supports the older
> >>>>> Flink version /additionally /gets patch support.
> >>>>> 4.2) For a given major connector version only the latest minor
> >> version
> >>>>> is supported.
> >>>>> (This means if 1.1.x is released there will be no more 1.0.x release)
> >>>>>
> >>>>> I'd like to clarify that these won't be set in stone for eternity.
> >>>>> We should re-evaluate how well this model works over time and adjust
> >> it
> >>>>> accordingly, consistently across all connectors.
> >>>>> I do believe that as is this strikes a good balance between
> >>>>> maintainability for us and clarity to users.
> >>>>>
> >>>>> Voting schema:
> >>>>>
> >>>>> Consensus, committers have binding votes, open for at least 72 hours.
> >>> --
> >>> Martijn
> >>> https://twitter.com/MartijnVisser82
> >>> https://github.com/MartijnVisser
>
>
>


Re: [VOTE] Externalized connector release details​

2022-10-12 Thread Thomas Weise
"Branches are not specific to a Flink version. (i.e., no v3.2-1.15)"

Sorry for the late question. I could not find in the discussion thread how
a connector can make use of features of the latest Flink version that were
not present in the previous Flink version, when branches cannot be Flink
version specific?

Thanks,
Thomas

On Wed, Oct 12, 2022 at 4:09 PM Ferenc Csaky 
wrote:

> +1 from my side (non-binding)
>
> Best,
> F
>
>
> --- Original Message ---
> On Wednesday, October 12th, 2022 at 15:47, Martijn Visser <
> martijnvis...@apache.org> wrote:
>
>
> >
> >
> > +1 (binding), I am indeed assuming that Chesnay meant the last two minor
> > versions as supported.
> >
> > Op wo 12 okt. 2022 om 20:18 schreef Danny Cranmer
> dannycran...@apache.org
> >
> > > Thanks for the concise summary Chesnay.
> > >
> > > +1 from me (binding)
> > >
> > > Just one clarification, for "3.1) The Flink versions supported by the
> > > project (last 2 major Flink versions) must be supported.". Do we
> actually
> > > mean major here, as in Flink 1.x.x and 2.x.x? Right now we would only
> > > support Flink 1.15.x and not 1.14.x? I would be inclined to support the
> > > latest 2 minor Flink versions (major.minor.patch) given that we only
> have 1
> > > active major Flink version.
> > >
> > > Danny
> > >
> > > On Wed, Oct 12, 2022 at 2:12 PM Chesnay Schepler ches...@apache.org
> > > wrote:
> > >
> > > > Since the discussion
> > > > (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo)
> has
> > > > stalled a bit but we need a conclusion to move forward I'm opening a
> > > > vote.
> > > >
> > > > Proposal summary:
> > > >
> > > > 1) Branch model
> > > > 1.1) The default branch is called "main" and used for the next major
> > > > iteration.
> > > > 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> > > > 1.3) Branches are not specific to a Flink version. (i.e., no
> v3.2-1.15)
> > > >
> > > > 2) Versioning
> > > > 2.1) Source releases: major.minor.patch
> > > > 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> > > > (This may imply releasing the exact same connector jar multiple times
> > > > under different versions)
> > > >
> > > > 3) Flink compatibility
> > > > 3.1) The Flink versions supported by the project (last 2 major Flink
> > > > versions) must be supported.
> > > > 3.2) How this is achived is left to the connector, as long as it
> > > > conforms to the rest of the proposal.
> > > >
> > > > 4) Support
> > > > 4.1) The last 2 major connector releases are supported with only the
> > > > latter receiving additional features, with the following exceptions:
> > > > 4.1.a) If the older major connector version does not support any
> > > > currently supported Flink version, then it is no longer supported.
> > > > 4.1.b) If the last 2 major versions do not cover all supported Flink
> > > > versions, then the latest connector version that supports the older
> > > > Flink version /additionally /gets patch support.
> > > > 4.2) For a given major connector version only the latest minor
> version
> > > > is supported.
> > > > (This means if 1.1.x is released there will be no more 1.0.x release)
> > > >
> > > > I'd like to clarify that these won't be set in stone for eternity.
> > > > We should re-evaluate how well this model works over time and adjust
> it
> > > > accordingly, consistently across all connectors.
> > > > I do believe that as is this strikes a good balance between
> > > > maintainability for us and clarity to users.
> > > >
> > > > Voting schema:
> > > >
> > > > Consensus, committers have binding votes, open for at least 72 hours.
> >
> > --
> > Martijn
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
>


Re: [DISCUSS] Reference operator from Flink Kubernetes deployment docs

2022-10-12 Thread Thomas Weise
+1


On Wed, Oct 12, 2022 at 5:03 PM Martijn Visser 
wrote:

> +1 from my end to include the operator in the related Kubernetes sections
> of the Flink docs
>
> On Wed, Oct 12, 2022 at 5:31 PM Chesnay Schepler 
> wrote:
>
> > I don't see a reason for why we shouldn't at least mention the operator
> > in the kubernetes docs.
> >
> > On 12/10/2022 16:25, Gyula Fóra wrote:
> > > Hi Devs!
> > >
> > > I would like to start a discussion about referencing the Flink
> Kubernetes
> > > Operator directly from the Flink Kubernetes deployment documentation.
> > >
> > > Currently the Flink deployment/resource provider docs provide some
> > > information for the Standalone and Native Kubernetes integration
> without
> > > any reference to the operator.
> > >
> > > I think we reached a point with the operator where we should provide a
> > bit
> > > more visibility and value to the users by directly proposing to use the
> > > operator when considering Flink on Kubernetes. We should definitely
> keep
> > > the current docs but make the point that for most users the easiest way
> > to
> > > use Flink on Kubernetes is probably through the operator (where they
> can
> > > now benefit from both standalone and native integration under the
> hood).
> > > This should help us avoid cases where a new user completely misses the
> > > existence of the operator when starting out based on the Flink docs.
> > >
> > > What do you think?
> > >
> > > Gyula
> > >
> >
> >
>


Re: [VOTE] Apache Flink Kubernetes Operator Release 1.2.0, release candidate #2

2022-10-06 Thread Thomas Weise
+1 (binding)

Built from tag, manually tested upgrade mode change scenarios, upgraded
existing environment


On Thu, Oct 6, 2022 at 4:00 PM Márton Balassi 
wrote:

> +1 (binding)
>
> Verified upgrade process from 1.1 with a running application
>
> On Thu, Oct 6, 2022 at 5:52 PM Biao Geng  wrote:
>
> > +1(non-binding)
> > Thanks a lot for the great work.
> >
> > Successfully verified the following:
> > - Checksums and gpg signatures of the tar files.
> > - No binaries in source release
> > - Build from source, build image from source without errors
> > - Helm Repo works, Helm install works
> > - Run HA/python example in application mode
> > - Check licenses in source code
> >
> > Best,
> > Biao Geng
> >
> >
> > Maximilian Michels  于2022年10月6日周四 20:16写道:
> >
> > > Turns out the issue with the Helm installation was that I was using
> > > cert-manager 1.9.1 instead of the recommended version 1.8.2. The
> operator
> > > now deploys cleanly in my local environment.
> > >
> > > On Thu, Oct 6, 2022 at 12:34 PM Maximilian Michels 
> > wrote:
> > >
> > > > +1 (binding) because the source release looks good.
> > > >
> > > > I've verified the following:
> > > >
> > > > 1. Downloaded, compiled, and verified the signature of the source
> > release
> > > > staged at
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.2.0-rc2/
> > > > 2. Verified licenses (Not a blocker: the LICENSE file does not
> contain
> > a
> > > > link to the bundled licenses directory, this should be fixed in
> future
> > > > releases)
> > > > 3. Verified the Helm Chart
> > > >
> > > > The Helm chart installation resulted in the following pod events:
> > > >
> > > > Events:
> > > >   Type Reason   Age   From
> >  Message
> > > >    --     
> >  ---
> > > >   Normal   Scheduled5m42s default-scheduler
> > > >  Successfully assigned
> > default/flink-kubernetes-operator-54fcd9df98-645rf
> > > > to docker-desktop
> > > >   Warning  FailedMount  3m39s kubelet
> Unable
> > > to
> > > > attach or mount volumes: unmounted volumes=[keystore], unattached
> > > > volumes=[kube-api-access-pdnzw keystore
> flink-operator-config-volume]:
> > > > timed out waiting for the condition
> > > >   Warning  FailedMount  92s (x10 over 5m42s)  kubelet
> > > >  MountVolume.SetUp failed for volume "keystore" : secret
> > > > "webhook-server-cert" not found
> > > >   Warning  FailedMount  84s   kubelet
> Unable
> > > to
> > > > attach or mount volumes: unmounted volumes=[keystore], unattached
> > > > volumes=[flink-operator-config-volume kube-api-access-pdnzw
> keystore]:
> > > > timed out waiting for the condition
> > > >
> > > > Do we need to list any additional steps in the docs?
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/
> > > >
> > > > -Max
> > > >
> > > > On Wed, Oct 5, 2022 at 2:16 PM Őrhidi Mátyás <
> matyas.orh...@gmail.com>
> > > > wrote:
> > > >
> > > >> +1 (non-binding)
> > > >>
> > > >> - Verified source distributions (except the licenses and maven
> > > artifacts)
> > > >> - Verified Helm chart and Docker image
> > > >> - Verified basic examples
> > > >>
> > > >> Everything seems okay to me.
> > > >>
> > > >> Cheers,
> > > >> Matyas
> > > >>
> > > >> On Tue, Oct 4, 2022 at 10:27 PM Gyula Fóra 
> > > wrote:
> > > >>
> > > >> > +1 (binding)
> > > >> >
> > > >> > - Verified Helm repo works as expected, points to correct image
> tag,
> > > >> build,
> > > >> > version
> > > >> > - Verified examples + checked operator logs everything looks as
> > > expected
> > > >> > - Verified hashes, signatures and source release contains no
> > binaries
> > > >> > - Ran built-in tests, built jars + docker image from source
> > > successfully
> > > >> >
> > > >> > Cheers,
> > > >> > Gyula
> > > >> >
> > > >> > On Sat, Oct 1, 2022 at 2:27 AM Jim Busche 
> > wrote:
> > > >> >
> > > >> > > +1 (not-binding)
> > > >> > >
> > > >> > > Thank you Gyula,
> > > >> > >
> > > >> > >
> > > >> > > Helm install from flink-kubernetes-operator-1.2.0-helm.tgz looks
> > > good,
> > > >> > > logs look normal
> > > >> > >
> > > >> > > podman Dockerfile build from source looks good.
> > > >> > >
> > > >> > > twistlock security scans of the proposed image look good:
> > > >> > > ghcr.io/apache/flink-kubernetes-operator:95128bf
> > > >> > >
> > > >> > > UI and basic sample look good.
> > > >> > >
> > > >> > > Checksums looked good.
> > > >> > >
> > > >> > > Tested on OpenShift 4.10.25.  Will try additional versions (4.8
> > and
> > > >> 4.11)
> > > >> > > if I get an opportunity, but I don't expect issues.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Thank you,
> > > >> > >
> > > >> > > James Busche
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>


[jira] [Created] (FLINK-29497) Provide an option to publish the flink-dist jar file artifact

2022-10-03 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-29497:


 Summary: Provide an option to publish the flink-dist jar file 
artifact
 Key: FLINK-29497
 URL: https://issues.apache.org/jira/browse/FLINK-29497
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.16.0
Reporter: Thomas Weise
Assignee: Thomas Weise


Currently deployment is skipped for the flink-dist jar file. Instead of 
hardcoding that in pom.xml, use a property that can control this behavior from 
the maven command line.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-258 Guarantee binary compatibility for Public/-Evolving APIs between patch releases​

2022-09-07 Thread Thomas Weise
+1


On Wed, Sep 7, 2022 at 4:48 AM Danny Cranmer 
wrote:

> +1
>
> On Wed, 7 Sept 2022, 07:32 Zhu Zhu,  wrote:
>
> > +1
> >
> > Thanks,
> > Zhu
> >
> > Jingsong Li  于2022年9月6日周二 19:49写道:
> > >
> > > +1
> > >
> > > On Tue, Sep 6, 2022 at 7:11 PM Yu Li  wrote:
> > > >
> > > > +1
> > > >
> > > > Thanks for the efforts, Chesnay
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Tue, 6 Sept 2022 at 18:17, Martijn Visser <
> martijnvis...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Op di 6 sep. 2022 om 11:59 schreef Xingbo Huang <
> hxbks...@gmail.com
> > >:
> > > > >
> > > > > > Thanks Chesnay for driving this,
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Best,
> > > > > > Xingbo
> > > > > >
> > > > > > Xintong Song  于2022年9月6日周二 17:57写道:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xintong
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Sep 6, 2022 at 5:55 PM Konstantin Knauf <
> > kna...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1. Thanks, Chesnay.
> > > > > > > >
> > > > > > > > Am Di., 6. Sept. 2022 um 11:51 Uhr schrieb Chesnay Schepler <
> > > > > > > > ches...@apache.org>:
> > > > > > > >
> > > > > > > > > Since no one objected in the discuss thread, let's vote!
> > > > > > > > >
> > > > > > > > > FLIP:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=225152857
> > > > > > > > >
> > > > > > > > > The vote will be open for at least 72h.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Chesnay
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > https://twitter.com/snntrable
> > > > > > > > https://github.com/knaufk
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >
>


Re: [DISCUSS] Switch docker iamge base to Eclipse Temurin

2022-09-01 Thread Thomas Weise
+1, the change to Ubuntu (hopefully) also reduces the ripple effect for
downstream customizers of the image.


On Thu, Sep 1, 2022 at 10:00 AM Chesnay Schepler  wrote:

> Unless anyone objects I will announce the switch on Monday via the
> mailing lists / twitter and execute it on Wednesday.
>
> On 01/09/2022 14:27, Chesnay Schepler wrote:
> > The e2e tests have passed successfully for the updated
> > 1.14/1.15/master images.
> >
> > On 01/09/2022 11:05, Chesnay Schepler wrote:
> >> Thanks Xingbo. Should've known that the Flink side relies on the
> >> distro name, sorry for the inconvenience.
> >>
> >> On 01/09/2022 06:55, Xingbo Huang wrote:
> >>> Thanks Chesnay for driving this. I found a problem with image name
> >>> change[1] and I have created a PR[2] to fix it.
> >>>
> >>> Best,
> >>> Xingbo
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-29161
> >>> [2] https://github.com/apache/flink/pull/20726
> >>>
> >>> Chesnay Schepler  于2022年8月31日周三 17:15写道:
> >>>
>  I will optimistically merge the PRs that make the switch so we can
>  gather some e2e testing data.
> 
>  On 30/08/2022 14:51, Chesnay Schepler wrote:
> > yes, alpine would have similar issues as CentOS. As for usability,
> > from personal experience it has always been a bit of a drag to extend
> > or use manually because it is such a minimal image.
> >
> > On 30/08/2022 14:45, Matthias Pohl wrote:
> >> Thanks for bringing this up, Chesnay. Can you elaborate a bit
> >> more on
> >> what
> >> you mean when referring to Alpine as being "not as user-friendly"?
> >> Doesn't
> >> it come with the same issue that switching to CentOS comes with
> >> that we
> >> have to update our scripts (I'm thinking of apt in particular)?
> >> Or what
> >> else did you have in mind in terms of user-friendliness? I would
> >> imagine
> >> selecting the required packages would be a bit more tedious.
> >>
> >> I'm wondering whether we considered the security aspect. A more
> >> minimal
> >> Alpine base image might reduce the risk of running into CVEs. But
> >> then;
> >> it's also the question how fast those CVEs are actually fixed on
> >> average
> >> (now comparing Ubuntu and Alpine for instance). Or is this even a
> >> concern
> >> for us?
> >>
> >> I didn't find any clear answers around that topic with a quick
> >> Google
> >> search. [1] was kind of interesting to read.
> >>
> >> Anyway, I definitely see the benefits of just switching to Ubuntu
> >> due to
> >> the fact that it also relies on Debian's package management
> >> (reducing
> >> the
> >> migration effort) and that we're using it for our CI builds
> >> (consistency).
> >>
> >> +1 for going with Ubuntu if security is not a big concern
> >>
> >> Best,
> >> Matthias
> >>
> >> [1]
> >>
> 
> https://jfrog.com/knowledge-base/why-use-ubuntu-as-a-docker-base-image-when-alpine-exists/
> 
> >>
> >> On Tue, Aug 30, 2022 at 11:40 AM Chesnay Schepler
> >> 
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> during the release of the 1.15.2 images
> >>> 
> >>> it was
> >>> noted that we use the openjdk:8/11 images, which have been
> >>> deprecated
> >>>  and thus no
> >>> longer receive any updates.
> >>>
> >>> There are a number of alternatives, the most promising being
> >>> Eclipse
> >>> Temurin , the
> >>> successor of
> >>> AdoptOpenJDK, since it's vendor neutral.
> >>>
> >>> This would imply a switch of distros from Debian to most likely
> >>> Ubuntu
> >>> 22.04 (Alpine isn't as user-friendly, and CentOS is likely
> >>> incompatible
> >>> with existing images using our images as a base). We are also
> >>> running
> >>> our CI on Ubuntu, so I don't expect any issues.
> >>>
> >>> Let me know what you think.
> >>>
> >>> The required changes on our side appear to be minimal; I have
> >>> already
> >>> prepared a PR .
> >>>
> 
> >>
> >
>
>


[jira] [Created] (FLINK-29159) Revisit/harden initial deployment logic

2022-08-31 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-29159:


 Summary: Revisit/harden initial deployment logic
 Key: FLINK-29159
 URL: https://issues.apache.org/jira/browse/FLINK-29159
 Project: Flink
  Issue Type: Technical Debt
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Thomas Weise


Found isFirstDeployment logic not working as expected for a deployment that had 
never successfully deployed (image pull error).  We are probably also lacking 
test coverage for the initialSavepointPath field.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29109) Checkpoint path conflict with stateless upgrade mode

2022-08-25 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-29109:


 Summary: Checkpoint path conflict with stateless upgrade mode
 Key: FLINK-29109
 URL: https://issues.apache.org/jira/browse/FLINK-29109
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Thomas Weise
Assignee: Thomas Weise


A stateful job with stateless upgrade mode (yes, there are such use cases) 
fails with checkpoint path conflict due to constant jobId and FLINK-19358 
(applies to Flink < 1.16x). Since with stateless upgrade mode the checkpoint id 
resets on restart the job is going to write to previously used locations and 
fail. The workaround is to rotate the jobId on every redeploy when the upgrade 
mode is stateless. While this can be worked around externally it is best done 
in the operator itself because reconciliation resolves when a restart is 
actually required while rotating jobId externally may trigger unnecessary 
restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29100) Deployment with last-state upgrade mode stuck after initial error

2022-08-24 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-29100:


 Summary: Deployment with last-state upgrade mode stuck after 
initial error
 Key: FLINK-29100
 URL: https://issues.apache.org/jira/browse/FLINK-29100
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Thomas Weise
Assignee: Thomas Weise


A deployment with last_state upgrade mode that never succeeds will be stuck in 
deploying state because no HA data exists. This can be reproduced by creating a 
deployment with invalid image or exception in entry point. Update to the CR 
that corrects the issue won't be reconciled due to 
"o.a.f.k.o.r.d.ApplicationReconciler [INFO ] 
[default.basic-checkpoint-ha-example] Job is not running yet and HA metadata is 
not available, waiting for upgradeable state". This forces manual intervention 
to delete the CR.

Instead,  operator should check if this is the initial deployment and if so 
skip the HA metadata check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Bump Kafka to 3.2.1 for 1.16.0

2022-08-09 Thread Thomas Weise
+1 for bumping the Kafka dependency.

Flink X.Y.0 releases require thorough testing, so considering the severity
of the problem this is still good timing, even that close to the first RC.

Thanks for bringing this up.

Thomas

On Tue, Aug 9, 2022 at 7:51 AM Chesnay Schepler  wrote:

> Hello,
>
> The Kafka upgrade in 1.15.0 resulted in a regression
> (https://issues.apache.org/jira/browse/FLINK-28060) where offsets are
> not committed to Kafka, impeding monitoring and the starting offsets
> functionality of the connector.
>
> This has been fixed a about a week ago in Kafka 3.2.1.
>
> The question is whether we want to upgrade Kafka so close to the feature
> freeze. I'm usually not a friend of doing that in general, but in this
> case there is a specific issue we'd like to get fixed and we still have
> the entire duration of the feature freeze to observe the behavior.
>
> I'd like to know what you think about this.
>
> For reference, our current Kafka version is 3.1.1, and our CI is passing
> with 3.2.1.
>
>
>


Re: [VOTE] FLIP-217: Support watermark alignment of source splits

2022-08-03 Thread Thomas Weise
+1 (binding)


On Sun, Jul 31, 2022 at 10:57 PM Sebastian Mattheis 
wrote:

> Hi everyone,
>
> I would like to start the vote for FLIP-217 [1]. Thanks for your feedback
> and the discussion in [2].
>
> FLIP-217 is a follow-up on FLIP-182 [3] and adds support for watermark
> alignment of source splits.
>
> The poll will be open until August 4th, 8.00AM GMT (72h) unless there is a
> binding veto or an insufficient number of votes.
>
> Best regards,
> Sebastian
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> [2] https://lists.apache.org/thread/4qwkcr3y1hrnlm2h9d69ofb4vo1lprvr
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
>


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-07-26 Thread Thomas Weise
Hi Sebastian,

Thank you for updating the FLIP page. It looks good and I think you
can start a VOTE.

Thomas

On Tue, Jul 26, 2022 at 10:57 AM Sebastian Mattheis
 wrote:
>
> Hi everybody,
>
> I have updated FLIP-217 [1] and have implemented the respective changes in
> [2]. Please review. If there are no concerns, I would initiate the voting
> on Thursday.
>
> Best regards,
> Sebastian
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>
> On Mon, Jul 25, 2022 at 9:19 AM Piotr Nowojski  wrote:
>
> > Thanks for the update Sebastian :)
> >
> > Best,
> > Piotrek
> >
> > pon., 25 lip 2022 o 08:12 Sebastian Mattheis 
> > napisał(a):
> >
> >> Hi everybody,
> >>
> >> I discussed last week the semantics and an implementation stragegy of the
> >> configuration parameter with Piotr and did the implementation and some
> >> tests this weekend.
> >>
> >> A short summary of what I discussed and recapped with Piotr:
> >>
> >>- The configuration parameter allows (and tolerates) the use of
> >>`SourceReader`s that do not implement `pauseOrResumeSplits` method. (The
> >>exception is ignored in `SourceOperator`.)
> >>- The configuration parameter allows (and tolerates) the use of
> >>`SourceSplitReader`s that do not implement `pauseOrResumeSplits` method.
> >>(The exception is ignored in the `PauseResumeSplitsTask` of the
> >>`SplitFetcher`.)
> >>
> >> In particular, this means that a `SourceReader` with two `SplitReader`s
> >> where one does not implement `pauseOrResumeSplits` and the other does. It
> >> will allow the use of the one that doesn't and will, nevertheless, still
> >> attempt to pause/resume the other. (Consequently, if the one that doesn't
> >> support pause is ahead it simply cannot not pause the `SplitReader` but if
> >> the other is ahead it will be paused until watermarks are aligned.)
> >>
> >> There is one flaw that I don't really like but which I accept as from the
> >> discussion and which I will add/update in the FLIP:
> >> If there is any other mechanism (e.g. other than watermark alignment)
> >> that attempts to pause or resume `SplitReader`s, it will have side effects
> >> and potential unexpected behavior if one or more `SplitReader`s do not
> >> implement `pauseOrResumeSplits` and the user set the configuration
> >> parameter to allow/tolerate it for split-level watermark alignment. (The
> >> reason is simply that we cannot differentiate which mechanism attempts to
> >> pause/resume, i.e., if it used for watermark alignment or something else.)
> >> Given that this configuration parameter is supposed to be an intermediate
> >> fallback, it is acceptable for me but changed at latest when some other
> >> mechanism uses pauseOrResumeSplits.
> >>
> >> As for the parameter naming, I have implemented it the following way
> >> (reason: There exists a parameter `pipeline.auto-watermark-interval`.):
> >>
> >> pipeline.watermark-alignment.allow-unaligned-source-splits (default:
> >> false)
> >>
> >> Status: I have implemented the configuration parameter (and an IT case).
> >> I still need to update the FLIP and will ping you (tomorrow or so) when I'm
> >> done with that. Please check/review my description from above if you see
> >> any problems with that.
> >>
> >> Thanks a lot and regards,
> >> Sebastian
> >>
> >>
> >> On Wed, Jul 20, 2022 at 11:24 PM Thomas Weise  wrote:
> >>
> >>> Hi Sebastian,
> >>>
> >>> Thank you for updating the FLIP and sorry for my delayed response. As
> >>> Piotr pointed out, we would need to incorporate the fallback flag into
> >>> the design to reflect the outcome of the previous discussion.
> >>>
> >>> Based on the current FLIP and as detailed by Becket, the
> >>> SourceOperator coordinates the alignment. It is responsible for the
> >>> pause/resume decision and knows how many splits are assigned.
> >>> Therefore shouldn't it have all the information needed to efficiently
> >>> handle the case of UnsupportedOperationException thrown by a reader?
> >>>
> >>> Although the fallback requires some extra implementation effort, I
> >>> think that is more than offset by not surprising users

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1

2022-07-23 Thread Thomas Weise
+1 (binding)

* built from source archive
* run examples

Thanks,
Thomas

On Wed, Jul 20, 2022 at 5:48 AM Gyula Fóra  wrote:
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 1.1.0 of
> Apache Flink Kubernetes Operator,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Kubernetes Operator canonical source distribution (including the
> Dockerfile), to be deployed to the release repository at dist.apache.org
> b) Kubernetes Operator Helm Chart to be deployed to the release repository
> at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Docker image to be pushed to dockerhub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a,b) can be found in the corresponding dev repository
> at dist.apache.org [1]
> * All artifacts for c) can be found at the Apache Nexus Repository [2]
> * The docker image for d) is staged on github [3]
>
> All artifacts are signed with the key
> 0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [4]
>
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-1.1.0-rc1" [6]
> * PR to update the website Downloads page to include Kubernetes Operator
> links [7]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> **Note on Verification**
>
> You can follow the basic verification guide here[8].
> Note that you don't need to verify everything yourself, but please make
> note of what you have tested together with your +- vote.
>
> Thanks,
> Gyula Fora
>
> [1]
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.1.0-rc1/
> [2] https://repository.apache.org/content/repositories/orgapacheflink-1518/
> [3] ghcr.io/apache/flink-kubernetes-operator:c9dec3f
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351723
> [6]
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.1.0-rc1
> [7] https://github.com/apache/flink-web/pull/560
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-07-20 Thread Thomas Weise
;>>>> previous behavior
>>>>>
>>>>> I think what we agreed in the last couple of emails was to add a
>>>>> configuration toggle, that would allow Flink 1.15 users, that are using
>>>>> watermark alignment with multiple splits per source operator, to continue
>>>>> using it with the old 1.15 semantic, even if their source doesn't support
>>>>> pausing/resuming splits. It seems to me like the current FLIP and
>>>>> implementation proposal would always throw an exception in that case?
>>>>>
>>>>> Best,
>>>>> Piotrek
>>>>>
>>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis 
>>>>> napisał(a):
>>>>>
>>>>> > Hi all,
>>>>> >
>>>>> > I have updated FLIP-217 [1] to the proposed specification and adapted 
>>>>> > the
>>>>> > current implementation [2] respectively.
>>>>> >
>>>>> > This means both, FLIP and implementation, are ready for review from my
>>>>> > side. (I would revise commit history and messages for the final PR but 
>>>>> > left
>>>>> > it as is for now and the records of this discussion.)
>>>>> >
>>>>> > 1. Please review the updated version of FLIP-217 [1]. If there are no
>>>>> > further concerns, I would initiate the voting.
>>>>> > (2. If you want to speed up things, please also have a look into the
>>>>> > updated implementation [2].)
>>>>> >
>>>>> > Please consider the following updated specification in the current 
>>>>> > status
>>>>> > of FLIP-217 where the essence is as follows:
>>>>> >
>>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with default
>>>>> > implementation that throws UnsupportedOperationException.
>>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with default
>>>>> > implementation that throws UnsupportedOperationException.
>>>>> > 3. SourceOperator initiates split alignment only if more than one split 
>>>>> > is
>>>>> > assigned to the source (and, of course, only if withSplitAlignment is 
>>>>> > used).
>>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
>>>>> > indicate if the implementation supports pause/resume capabilities).
>>>>> > 5. There is NO configuration option to enable watermark alignment of
>>>>> > splits or disable pause/resume capabilities.
>>>>> >
>>>>> > *Note:* If the SourceReader or some SplitReader do not override
>>>>> > pauseOrResumeSplits but it is required in the application, an exception 
>>>>> > is
>>>>> > thrown at runtime when an split alignment attempt is executed (not 
>>>>> > during
>>>>> > startup or any time earlier).
>>>>> >
>>>>> > Also, I have revised the compatibility/migration section to describe a 
>>>>> > bit
>>>>> > of a rationale for the default implementation with exception throwing
>>>>> > behavior.
>>>>> >
>>>>> > Regards,
>>>>> > Sebastian
>>>>> >
>>>>> > [1]
>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>>> > [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>>>>> >
>>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise  wrote:
>>>>> >
>>>>> >> Hi,
>>>>> >>
>>>>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>>>>> >> Strictly speaking we are introducing a regression by allowing an
>>>>> >> exception to bubble up that did not exist in the previous release,
>>>>> >> regardless how suboptimal the behavior may be. However, given that the
>>>>> >> feature is still experimental and we are planning to have a
>>>>> >> configuration based way to revert to the previous behavior, I think
>>>>> >> this is a good solution.
>>>>> >>
>>>>> >> +1 from my side
>>>>> >>

Re: Python Job Support for the Kubernetes Operator

2022-07-06 Thread Thomas Weise
Since SQL or Python are essentially just examples of how to use the
operator vs. features of the operator itself, they should not affect
the release schedule and can be added anytime, as examples to the
operator or elsewhere.

Thanks,
Thomas

On Wed, Jul 6, 2022 at 8:33 AM Gyula Fóra  wrote:
>
> Hi All!
>
> One thing we could do already now is to add a simple example on how to
> execute Python jobs like java jobs (with the right main class, args etc).
>
> It would be similar to
> https://github.com/apache/flink-kubernetes-operator/tree/main/examples/flink-sql-runner-example
> but
> slightly simpler as we don't need a maven module most likely.
>
> Unfortunately I cannot do it myself as @Geng Biao  
> pointed
> out that Flink Python on M1 macbook is unsupported so cannot really test
> this locally.
>
> Cheers,
> Gyula
>
> On Wed, Jul 6, 2022 at 4:56 AM Dian Fu  wrote:
>
> > Thanks for the confirmation Matyas!
> >
> > On Tue, Jul 5, 2022 at 3:00 PM Őrhidi Mátyás 
> > wrote:
> >
> > > Yes, this is the plan Dian. Appreciate your assistance!
> > >
> > > Best,
> > > Matyas
> > >
> > > On Tue, Jul 5, 2022 at 8:55 AM Dian Fu  wrote:
> > >
> > >> Hi Matyas,
> > >>
> > >> According to the release schedule defined in [1], it seems that the
> > >> feature freeze of v1.2 may occur at the beginning of September, is this
> > >> correct? If this is the case, I think it should be reasonable to make
> > it in
> > >> v1.2 for Python support.
> > >>
> > >> Regards,
> > >> Dian
> > >>
> > >> [1]
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/Release+Schedule+and+Planning
> > >>
> > >> On Tue, Jul 5, 2022 at 2:10 PM Őrhidi Mátyás 
> > >> wrote:
> > >>
> > >>> Both sql and py support is requested frequently. I guess we should aim
> > >>> to support both in v1.2.
> > >>>
> > >>> Matyas
> > >>>
> > >>> On Tue, Jul 5, 2022 at 6:26 AM Gyula Fóra 
> > wrote:
> > >>>
> >  Thank you for the info and help Dian :)
> > 
> >  Gyula
> > 
> >  On Tue, 5 Jul 2022 at 05:14, Yang Wang  wrote:
> > 
> >  > Thanks Dian for the confirmation and nice help.
> >  >
> >  > Best,
> >  > Yang
> >  >
> >  > Dian Fu  于2022年7月5日周二 09:27写道:
> >  >
> >  > > @Yang, Yes, you are right. Python jobs could be seen as special
> > JAR
> >  jobs
> >  > > whose main class is always
> >  `org.apache.flink.client.python.PythonDriver`.
> >  > > What we could do in Flink K8s operator is to make it more
> >  convenient and
> >  > > handle properly for the different kinds of dependencies[1].
> >  > >
> >  > > @Gyula, I can help on this. I will find some time to investigate
> >  this in
> >  > > the following days and will let you know when there is any
> > progress.
> >  > >
> >  > > Regards,
> >  > > Dian
> >  > >
> >  > > [1]
> >  > >
> >  >
> > 
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/dependency_management/
> >  > >
> >  > > On Mon, Jul 4, 2022 at 11:52 AM Yang Wang 
> >  wrote:
> >  > >
> >  > >> AFAIK, the python job could be considered as a special case of
> > jar
> >  job.
> >  > >> The user jar is flink-python-*.jar and is located in the opt
> >  directory.
> >  > >> The python script is just the argument of this user jar. So I
> >  believe
> >  > the
> >  > >> users already could submit python jobs via Flink Kubernetes
> >  operator.
> >  > >> However, they need some manual operations, including specify the
> >  user
> >  > >> jar, download python script via init container, etc.
> >  > >>
> >  > >> What we could do in the Flink kubernetes operator is to make the
> >  > >> submission more convenient by introducing a new field(e.g.
> >  pyScript).
> >  > >>
> >  > >> cc @Dian Fu   @biaoge...@gmail.com
> >  > >>  WDYT?
> >  > >>
> >  > >> Best,
> >  > >> Yang
> >  > >>
> >  > >> Gyula Fóra  于2022年7月4日周一 00:12写道:
> >  > >>
> >  > >>> Hi Devs!
> >  > >>>
> >  > >>> Would anyone with a good understanding of the Python execution
> >  layer be
> >  > >>> interested in working on adding Python job support for the Flink
> >  > >>> Kubernetes
> >  > >>> Operator?
> >  > >>>
> >  > >>> This is a feature request that comes up often (
> >  > >>> https://issues.apache.org/jira/browse/FLINK-28364) and it would
> >  be a
> >  > >>> great
> >  > >>> way to fill some missing feature gaps on the operator :)
> >  > >>>
> >  > >>> I am of course happy to help or work together with someone on
> >  this but
> >  > I
> >  > >>> have zero experience with the Python API at this stage and don't
> >  want
> >  > to
> >  > >>> miss some obvious requirements.
> >  > >>>
> >  > >>> Cheers,
> >  > >>> Gyula
> >  > >>>
> >  > >>
> >  >
> > 
> > >>>
> >


Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-30 Thread Thomas Weise
Hi Mason,

I added mason6345 to the Flink confluence space, you should be able to
add a FLIP now.

Looking forward to the contribution!

Thomas

On Thu, Jun 30, 2022 at 9:25 AM Martijn Visser  wrote:
>
> Hi Mason,
>
> I'm sure there's a PMC (*hint*) out there who can grant you access to
> create a FLIP. Looking forward to it, this sounds like an improvement that
> users are looking forward to.
>
> Best regards,
>
> Martijn
>
> Op di 28 jun. 2022 om 09:21 schreef Mason Chen :
>
> > Hi all,
> >
> > Thanks for the feedback! I'm adding the users, who responded in the user
> > mailing list, to this thread.
> >
> > @Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
> > module. It makes a lot of sense since the dependencies are the same and the
> > implementation can also extend and improve some of the test utilities you
> > have been working on for the FLIP 27 Kafka Source. I will enumerate the
> > migration steps in the FLIP template.
> >
> > @Ryan - I don't have a public branch available yet, but I would appreciate
> > your review on the FLIP design! When the FLIP design is approved by devs
> > and the community, I can start to commit our implementation to a fork.
> >
> > @Andrew - Yup, one of the requirements of the connector is to read
> > multiple clusters within a single source, so it should be able to work well
> > with your use case.
> >
> > @Devs - what do I need to get started on the FLIP design? I see the FLIP
> > template and I have an account (mason6345), but I don't have access to
> > create a page.
> >
> > Best,
> > Mason
> >
> >
> >
> >
> > On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren  wrote:
> >
> >> Hi Mason,
> >>
> >> It sounds like an exciting enhancement to the Kafka source and will
> >> benefit a lot of users I believe.
> >>
> >> Would you prefer to reuse the existing flink-connector-kafka module or
> >> create a new one for the new multi-cluster feature? Personally I prefer the
> >> former one because users won’t need to introduce another dependency module
> >> to their projects in order to use the feature.
> >>
> >> Thanks for the effort on this and looking forward to your FLIP!
> >>
> >> Best,
> >> Qingsheng
> >>
> >> > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
> >> >
> >> > Hi community,
> >> >
> >> > We have been working on a Multi Cluster Kafka Source and are looking to
> >> > contribute it upstream. I've given a talk about the features and design
> >> at
> >> > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> >> >
> >> > The main features that it provides is:
> >> > 1. Reading multiple Kafka clusters within a single source.
> >> > 2. Adjusting the clusters and topics the source consumes from
> >> dynamically,
> >> > without Flink job restart.
> >> >
> >> > Some of the challenging use cases that these features solve are:
> >> > 1. Transparent Kafka cluster migration without Flink job restart.
> >> > 2. Transparent Kafka topic migration without Flink job restart.
> >> > 3. Direct integration with Hybrid Source.
> >> >
> >> > In addition, this is designed with wrapping and managing the existing
> >> > KafkaSource components to enable these features, so it can continue to
> >> > benefit from KafkaSource improvements and bug fixes. It can be
> >> considered
> >> > as a form of a composite source.
> >> >
> >> > I think the contribution of this source could benefit a lot of users who
> >> > have asked in the mailing list about Flink handling Kafka migrations and
> >> > removing topics in the past. I would love to hear and address your
> >> thoughts
> >> > and feedback, and if possible drive a FLIP!
> >> >
> >> > Best,
> >> > Mason
> >>
> >>


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-06-14 Thread Thomas Weise
Hi everyone,

Thank you for all the effort that went into this discussion. The split
level watermark alignment will be an important feature for Flink that
will address operational problems for various use cases. From reading
through this thread it appears that not too much remains to bring this
FLIP to acceptance and allow development to move forward. I would like
to contribute if possible.

Regarding option 1 vs. option 2: I don't have a strong preference,
perhaps slightly leaning towards option 1.

However, from a user perspective, should the split level alignment be
an opt-in feature, at least for a few releases? If yes, then we would
require a knob similar to supportsPausingSplits(), which I understand
won't be part of the revised FLIP. Such control may be beneficial:

* Compare runtime behavior with split level alignment on/off
* Allow use of sources that don't implement pausing splits yet

The second point would, from the user's perspective, be necessary for
backward compatibility? While the interface aspect and source
compatibility has been discussed in great detail, I don't think it
would be desirable if an application that already uses alignment fails
after upgrading to the new Flink version, forcing users to lock step
modify sources for the new non-optional split level alignment.

So I think clarification of the compatibility aspect on the FLIP page
would be necessary.

Thanks,
Thomas

On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski  wrote:
>
> Hi Becket,
>
> Thanks for summing this up. Just one correction:
>
> > Piotr prefers option 2, his opinions are:
> >   e) It is OK that the code itself in option 2 indicates the developers
> that a feature is optional. We will rely on the documentation to correct
> that and clarify that the feature is actually obligatory.
>
> I would say based on a) and b) that feature would be still optional. So
> both the implementation and the documentation would be saying that. We
> could add a mention to the docs and release notes, that this feature will
> be obligatory in the next major release and plan such a release accordingly.
>
> Re the option 1., as you mentioned:
> > As for option 1: For developers, the feature is still optional due to the
> default implementation in the interface, regardless of what the default
> implementation does, because the code compiles without overriding these
> methods
>
> Also importantly, the code will work in most cases.
>
> > Obligatory: Jobs may fail if these methods are not implemented properly.
> e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in Java,
> e.g. Iterator.remove() by default throws "UnsupportedOperationException",
> informing the implementation that things may go wrong if this method is not
> implemented.
>
> For me `Iterator#remove()` is an optional feature. Personally, I don't
> remember if I have ever implemented it.
>
> Best,
> Piotrek
>
> pt., 27 maj 2022 o 10:14 Becket Qin  napisał(a):
>
> > I had an offline discussion with Piotr and here is the summary. Please
> > correct me if I miss something, Piotr.
> >
> > There are two things we would like to seek more opinions from the
> > community, so we can make progress on this FLIP.
> >
> > 1. The General pattern to add obligatory features to existing interfaces.
> >
> > ***
> > For interfaces exposed to the developers for implementation, they are
> > either intended to be *optional* or *obligatory. *While it is quite clear
> > about how to convey that intention when creating the interfaces, it is not
> > as commonly agreed when we are adding new features to an existing
> > interface. In general, Flink uses decorative interfaces when adding
> > optional features to existing interfaces. Both Piotr and I agree that looks
> > good.
> >
> > Different opinions are mainly about how to add obligatory features to the
> > existing interfaces, probably due to different understandings of
> > "obligatory".
> >
> > We have discussed about four options:
> >
> > *Option 1:*
> >
> >- Just add a new method to the existing interface.
> >- For backwards compatibility, the method would have a default
> >implementation throwing "UnsupportedOperationException".
> >- In the next major version, remove the default implementation.
> >- For the developers, any method with a default implementation
> >throwing an "UnsupportedOperationException" should be taken as 
> > obligatory.
> >
> > *Option 2:*
> >
> >- Always make the features optional by adding a decorative interface,
> >just like ordinary optional features.
> >- Inform the developers via documentation that this feature is
> >obligatory, although it looks like optional from the code.
> >- In case the developers did not implement the decorative interface,
> >throw an exception
> >- In the next major version, move the methods in the decorative
> >interface to the base interface, and 

Re: [DISCUSS] Next Flink Kubernetes Operator release timeline

2022-05-18 Thread Thomas Weise
I think before we release 1.0, we need to define and document the
compatibility guarantees.

At the moment, the CR changes  frequently and as was pointed out
earlier, there isn't any automation to ensure changes are backward
compatible. In addition, our documentation still refers to upgrade as
a process that involves removing the prior CRD, which IMO needs to
change for a 1.0 release.

If we feel that we are not ready to put a compatibility guarantee in
place, then perhaps release the next version as 0.2?

Thanks,
Thomas


[1] 
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/upgrade/

On Mon, May 16, 2022 at 5:13 PM Aitozi  wrote:
>
> Thanks Gyula. It looks good to me. I could do a favor during the release
> also.
> Please feel free to ping me to help the doc, release and test work :)
>
> Best,
> Aitozi
>
> Yang Wang  于2022年5月16日周一 21:57写道:
>
> > Thanks Gyula for sharing the progress. It is very likely we could have the
> > first release candidate next Monday.
> >
> > Best,
> > Yang
> >
> > Gyula Fóra  于2022年5月16日周一 20:50写道:
> >
> > > Hi Devs!
> > >
> > > We are on track for our planned 1.0.0 release timeline. There are no
> > > outstanding blocker issues on JIRA for the release.
> > >
> > > There are 3 outstanding new feature PRs. They are all in pretty good
> > shape
> > > and should be merged within a day:
> > > https://github.com/apache/flink-kubernetes-operator/pull/213
> > > https://github.com/apache/flink-kubernetes-operator/pull/216
> > > https://github.com/apache/flink-kubernetes-operator/pull/217
> > >
> > > As we agreed previously we should not merge any more new features for
> > > 1.0.0 and focus our efforts on testing, bug fixes and documentation for
> > > this week.
> > >
> > > I will cut the release branch tomorrow once these PRs are merged. And the
> > > target day for the first release candidate is next Monday.
> > >
> > > The release managers for this release will be Yang Wang and myself.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Wed, Apr 27, 2022 at 11:28 AM Yang Wang 
> > wrote:
> > >
> > >> Thanks @Chesnay Schepler  for pointing out this.
> > >>
> > >> The only public interface the flink-kubernetes-operator provides is the
> > >> CRD[1]. We are trying to stabilize the CRD from v1beta1.
> > >> If more fields are introduced to support new features(e.g. standalone
> > >> mode,
> > >> SQL jobs), they should have the default value to ensure compatibility.
> > >> Currently, we do not have some tools to enforce the compatibility
> > >> guarantees. But we have created a ticket[1] to follow this and hope it
> > >> could be resolved before releasing 1.0.0.
> > >>
> > >> Just as you said, now is also a good time to think more about the
> > approach
> > >> of releases. Since flink-kubernetes-operator is much simpler than Flink,
> > >> we
> > >> could have a shorter release cycle.
> > >> Two month for a major release(1.0, 1.1, etc.) is reasonable to me. And
> > >> this
> > >> could be shorten for the minor releases. Also we need to support at
> > least
> > >> the last two major versions.
> > >>
> > >> Maybe the standalone mode support is a big enough feature for version
> > 2.0.
> > >>
> > >>
> > >> [1].
> > >>
> > >>
> > https://github.com/apache/flink-kubernetes-operator/tree/main/helm/flink-kubernetes-operator/crds
> > >> [2]. https://issues.apache.org/jira/browse/FLINK-26955
> > >>
> > >>
> > >> @Hao t Chang  We do not have regular sync up
> > meeting
> > >> so
> > >> far. But I think we could schedule some sync up for the 1.0.0 release if
> > >> necessary. Anyone who is interested are welcome.
> > >>
> > >>
> > >> Best,
> > >> Yang
> > >>
> > >>
> > >>
> > >>
> > >> Hao t Chang  于2022年4月27日周三 07:45写道:
> > >>
> > >> > Hi Gyula,
> > >> >
> > >> > Thanks for the release timeline information. I would like to learn the
> > >> > gathered knowledge and volunteer as well. Will there be sync up
> > >> > meeting/call for this collaboration ?
> > >> >
> > >> > From: Gyula Fóra 
> > >> > Date: Monday, April 25, 2022 at 11:22 AM
> > >> > To: dev 
> > >> > Subject: [DISCUSS] Next Flink Kubernetes Operator release timeline
> > >> > Hi Devs!
> > >> >
> > >> > The community has been working hard on cleaning up the operator logic
> > >> and
> > >> > adding some core features that have been missing from the preview
> > >> release
> > >> > (session jobs for example). We have also added some significant
> > >> > improvements around deployment/operations.
> > >> >
> > >> > With the current pace of the development I think in a few weeks we
> > >> should
> > >> > be in a good position to release next version of the operator. This
> > >> would
> > >> > also give us the opportunity to add support for the upcoming 1.15
> > >> release
> > >> > :)
> > >> >
> > >> > We have to decide on 2 main things:
> > >> >  1. Target release date
> > >> >  2. Release version
> > >> >
> > >> > With the current state of the project I am confident that we could
> > cut a
> > >> > really good 

Re: [DISCUSS ] HybridSouce Table & Sql api timeline

2022-05-10 Thread Thomas Weise
fyi there is a related ticket here:
https://issues.apache.org/jira/browse/FLINK-22793

On Mon, May 9, 2022 at 11:34 PM Becket Qin  wrote:
>
> Cool, I've granted you the permission.
>
> Cheers,
>
> Jiangjie (Becket) Qin
>
> On Tue, May 10, 2022 at 1:14 PM Ran Tao  wrote:
>
> > Hi, Becket. Thanks for your suggestions. My id is: Ran Tao
> > And i will draft this flip in a few days. thanks~
> >
> > Becket Qin  于2022年5月10日周二 12:40写道:
> >
> > > Hi Ran,
> > >
> > > The FLIP process can be found here[1].
> > >
> > > You don't need to pass the vote, in fact the vote is based on the FLIP
> > > wiki. So drafting the FLIP wiki would be the first step. After that you
> > may
> > > start a discussion thread in the mailing list so people can have the
> > > discussion about the feature based on your FLIP wiki. Note that it is
> > very
> > > important to follow the structure of the FLIP template so the discussion
> > > would be more efficient.
> > >
> > > In case you don't have the permission to add a FLIP page yet, please let
> > me
> > > know your Apache Confluence ID so I can grant you the permission.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > [1]
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > > On Tue, May 10, 2022 at 12:06 PM Ran Tao  wrote:
> > >
> > > > Hi, Martijn, Jacky. Thanks for your responding. It indeed need a
> > designed
> > > > doc or FLIP to illustrate some details and concrete implementation.
> > > >
> > > > And i'm glad to work on this issue. I wonder whether i can create a
> > FLIP
> > > > under discussion firstly
> > > > to write the draft design of the implementation about table & sql api
> > and
> > > > some details, or we must pass voting?
> > > >
> > >
> >
> >
> > --
> > Best,
> > Ran Tao
> >


Re: JobManager High Availability

2022-05-10 Thread Thomas Weise
+1 to what Konstantin said. There is no real benefit running multiple
JMs on k8s, unless you need to optimize the JM startup time. Often the
time to get a replacement pod is negligible compared to the job
restart time.

Thomas

On Tue, May 10, 2022 at 2:27 AM Őrhidi Mátyás  wrote:
>
> Ah, ok. Thanks, Konstantin for the clarification, I appreciate the quick
> response.
>
> Best,
> Matyas
>
> On Tue, May 10, 2022 at 10:59 AM Konstantin Knauf  wrote:
>
> > Hi Matyas,
> >
> > yes, that's expected. The feature should have never been called "high
> > availability", but something like "Flink Jobmanager failover", because
> > that's what it is.
> >
> > With standby Jobmanagers what you gain is a faster failover, because a new
> > Jobmanager does not need to be started before restarting the Job. That's
> > all.
> >
> > Cheers,
> >
> > Konstantin
> >
> > Am Di., 10. Mai 2022 um 10:56 Uhr schrieb Őrhidi Mátyás <
> > matyas.orh...@gmail.com>:
> >
> > > Hi Folks!
> > >
> > > I've been goofing around with the JobManager HA configs using multiple JM
> > > replicas (in the Flink Kubernetes Operator). It's working seemingly fine,
> > > however the job itself is being restarted when you kill the leader JM
> > pod.
> > > Is this expected?
> > >
> > > Thanks,
> > > Matyas
> > >
> >
> >
> > --
> > https://twitter.com/snntrable
> > https://github.com/knaufk
> >


Re: Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-10 Thread Thomas Weise
Congratulations, Yang!

On Tue, May 10, 2022 at 3:15 AM Márton Balassi  wrote:
>
> Congrats, Yang. Well deserved :-)
>
> On Tue, May 10, 2022 at 9:16 AM Terry Wang  wrote:
>
> > Congrats Yang!
> >
> > On Mon, May 9, 2022 at 11:19 AM LuNing Wang  wrote:
> >
> > > Congrats Yang!
> > >
> > > Best,
> > > LuNing Wang
> > >
> > > Dian Fu  于2022年5月7日周六 17:21写道:
> > >
> > > > Congrats Yang!
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > On Sat, May 7, 2022 at 12:51 PM Jacky Lau 
> > wrote:
> > > >
> > > > > Congrats Yang and well Deserved!
> > > > >
> > > > > Best,
> > > > > Jacky Lau
> > > > >
> > > > > Yun Gao  于2022年5月7日周六 10:44写道:
> > > > >
> > > > > > Congratulations Yang!
> > > > > >
> > > > > > Best,
> > > > > > Yun Gao
> > > > > >
> > > > > >
> > > > > >
> > > > > >  --Original Mail --
> > > > > > Sender:David Morávek 
> > > > > > Send Date:Sat May 7 01:05:41 2022
> > > > > > Recipients:Dev 
> > > > > > Subject:Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > > > > > Nice! Congrats Yang, well deserved! ;)
> > > > > >
> > > > > > On Fri 6. 5. 2022 at 17:53, Peter Huang <
> > huangzhenqiu0...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Congrats, Yang!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best Regards
> > > > > > > Peter Huang
> > > > > > >
> > > > > > > On Fri, May 6, 2022 at 8:46 AM Yu Li  wrote:
> > > > > > >
> > > > > > > > Congrats and welcome, Yang!
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Yu
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 6 May 2022 at 14:48, Paul Lam 
> > > > wrote:
> > > > > > > >
> > > > > > > > > Congrats, Yang! Well Deserved!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Paul Lam
> > > > > > > > >
> > > > > > > > > > 2022年5月6日 14:38,Yun Tang  写道:
> > > > > > > > > >
> > > > > > > > > > Congratulations, Yang!
> > > > > > > > > >
> > > > > > > > > > Best
> > > > > > > > > > Yun Tang
> > > > > > > > > > 
> > > > > > > > > > From: Jing Ge 
> > > > > > > > > > Sent: Friday, May 6, 2022 14:24
> > > > > > > > > > To: dev 
> > > > > > > > > > Subject: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > > > > > > > > >
> > > > > > > > > > Congrats Yang and well Deserved!
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Jing
> > > > > > > > > >
> > > > > > > > > > On Fri, May 6, 2022 at 7:38 AM Lincoln Lee <
> > > > > lincoln.8...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Congratulations Yang!
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Lincoln Lee
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Őrhidi Mátyás  于2022年5月6日周五
> > > 12:46写道:
> > > > > > > > > >>
> > > > > > > > > >>> Congrats Yang! Well deserved!
> > > > > > > > > >>> Best,
> > > > > > > > > >>> Matyas
> > > > > > > > > >>>
> > > > > > > > > >>> On Fri, May 6, 2022 at 5:30 AM huweihua <
> > > > > huweihua@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > >  Congratulations Yang!
> > > > > > > > > 
> > > > > > > > >  Best,
> > > > > > > > >  Weihua
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best Regards,
> > Terry Wang
> >


Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Thomas Weise
Thank you to all who contributed for making this release happen!

Thomas

On Thu, May 5, 2022 at 7:41 AM Zhu Zhu  wrote:
>
> Thanks Yun, Till and Joe for the great work and thanks everyone who
> makes this release possible!
>
> Cheers,
> Zhu
>
> Jiangang Liu  于2022年5月5日周四 21:09写道:
> >
> > Congratulations! This version is really helpful for us . We will explore it
> > and help to improve it.
> >
> > Best
> > Jiangang Liu
> >
> > Yu Li  于2022年5月5日周四 18:53写道:
> >
> > > Hurray!
> > >
> > > Thanks Yun Gao, Till and Joe for all the efforts as our release managers.
> > > And thanks all contributors for making this happen!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Thu, 5 May 2022 at 18:01, Sergey Nuyanzin  wrote:
> > >
> > > > Great news!
> > > > Congratulations!
> > > > Thanks to the release managers, and everyone involved.
> > > >
> > > > On Thu, May 5, 2022 at 11:57 AM godfrey he  wrote:
> > > >
> > > > > Congratulations~
> > > > >
> > > > > Thanks Yun, Till and Joe for driving this release
> > > > > and everyone who made this release happen.
> > > > >
> > > > > Best,
> > > > > Godfrey
> > > > >
> > > > > Becket Qin  于2022年5月5日周四 17:39写道:
> > > > > >
> > > > > > Hooray! Thanks Yun, Till and Joe for driving the release!
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > JIangjie (Becket) Qin
> > > > > >
> > > > > > On Thu, May 5, 2022 at 5:20 PM Timo Walther 
> > > > wrote:
> > > > > >
> > > > > > > It took a bit longer than usual. But I'm sure the users will love
> > > > this
> > > > > > > release.
> > > > > > >
> > > > > > > Big thanks to the release managers!
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > > > > > > Great!
> > > > > > > >
> > > > > > > > Thanks, Yun Gao, Till, and Joe for driving the release, and
> > > thanks
> > > > to
> > > > > > > > everyone for making this release happen!
> > > > > > > >
> > > > > > > > Best
> > > > > > > > Yuan
> > > > > > > >
> > > > > > > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu 
> > > > wrote:
> > > > > > > >
> > > > > > > >> Congratulations!
> > > > > > > >>
> > > > > > > >> Thanks Yun Gao, Till and Joe for the great work as our release
> > > > > manager
> > > > > > > and
> > > > > > > >> everyone who involved.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Leonard
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> 2022年5月5日 下午4:30,Yang Wang  写道:
> > > > > > > >>>
> > > > > > > >>> Congratulations!
> > > > > > > >>>
> > > > > > > >>> Thanks Yun Gao, Till and Joe for driving this release and
> > > > everyone
> > > > > who
> > > > > > > >> made
> > > > > > > >>> this release happen.
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Sergey
> > > >
> > >


Re: Source alignment for Iceberg

2022-05-05 Thread Thomas Weise
t; >>
>> >> >> quick input from my side. I think this is from the implementation
>> >> >> perspective what Piotr and I had in mind for a global min watermark
>> >> that
>> >> >> helps in idleness cases. See also point 3 in
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
>> >> >> .
>> >> >>
>> >> >> Basically, we would like to empower source enumerators to determine the
>> >> >> global min watermark for all source readers factoring in even future
>> >> >> splits. Not all sources can supply that information (think of a general
>> >> >> file source) but most should be able to. Basically, Flink should know
>> >> for a
>> >> >> given source at a given point in time what the min watermark across all
>> >> >> source subtasks is.
>> >> >>
>> >> >> Here is some background:
>> >> >> In the context of idleness, we can deterministically advance the
>> >> >> watermark. In the pre-FLIP-27 era, we had heuristic approaches in
>> >> sources
>> >> >> to switch to idleness and thus allow watermarks to increase in cases
>> >> where
>> >> >> fewer splits than source tasks are available. However, for sources with
>> >> >> dynamic split discovery that actually yields incorrect results. Think
>> >> of a
>> >> >> Kinesis consumer where a shard is split. Then a previously idle source
>> >> >> subtask may receive a new split with time t0 as the lowest timestamp.
>> >> Since
>> >> >> the source subtask did not participate in the global watermark
>> >> generation
>> >> >> (because it was idle), the previously emitted watermark may be past t0
>> >> and
>> >> >> thus results in late records potentially being discarded. A rerun of
>> >> the
>> >> >> same pipeline on historic data would not render the source subtask
>> >> idle and
>> >> >> not result in late records. The solution was to not render source
>> >> subtasks
>> >> >> automatically idle by the framework if there are no spits. That leads
>> >> to
>> >> >> confusion for Kafka users with static topic subscription where #splits
>> >> <
>> >> >> #parallelism stalls pipelines because the watermark is not advancing.
>> >> Here,
>> >> >> your sketched solution can be transferred to KafkaSource to let Flink
>> >> know
>> >> >> that min global watermark on a static assignment is determined by the
>> >> >> slowest partition. Hence, all idle readers emit that min global
>> >> watermark
>> >> >> and the user sees progress.
>> >> >> This whole idea is related to FLIP-182 watermark alignment but I'd go
>> >> >> with another FLIP as the goal is quite different even though the
>> >> >> implementation overlaps.
>> >> >>
>> >> >> Now Iceberg seems to use the same information to actually pause the
>> >> >> consumption of files and create some kind of orderness guarantees as
>> >> far as
>> >> >> I understood. This probably can be applied to any source with dynamic
>> >> split
>> >> >> discovery. However, I wouldn't mix up the concepts and hence I
>> >> appreciate
>> >> >> you not chiming into the FLIP-182 and ff. threads. The goal of
>> >> FLIP-182 is
>> >> >> to pause readers while consuming a split, while your approach pauses
>> >> >> readers before processing another split. So it feels more closely
>> >> related
>> >> >> to the global min watermark - so it could either be part of that FLIP
>> >> or a
>> >> >> FLIP of its own. Afaik API changes should actually happen only on the
>> >> >> enumerator side both for your ideas and for global min watermark.
>> >> >>
>> >> >> Best,
>> >> >>
>> >> >> Arvid
>> >> >>
>> >> >> On Wed, Apr 27, 2022 at 7:31 PM Thomas Weise  wrote:
>> >> >>
>> >> >>> Hi Steven,
>> >> >>>
>> >> >>> Would it be better to bring this as a separate thread related to
>> >> Iceberg
>> >> >>> source to the dev@ list? I think this could benefit from broader
>> >> input?
>> >> >>>
>> >> >>> Thanks
>> >> >>>
>> >> >>> On Wed, Apr 27, 2022 at 9:36 AM Steven Wu 
>> >> wrote:
>> >> >>>
>> >> >>>> + Becket and Sebastian
>> >> >>>>
>> >> >>>> It is also related to the split level watermark alignment discussion
>> >> >>>> thread. Because it is already very long, I don't want to further
>> >> complicate
>> >> >>>> the ongoing discussion there. But I can move the discussion to that
>> >> >>>> existing thread if that is preferred.
>> >> >>>>
>> >> >>>>
>> >> >>>> On Tue, Apr 26, 2022 at 10:03 PM Steven Wu 
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> We are thinking about how to align with the Flink community and
>> >> >>>>> leverage the FLIP-182 watermark alignment in the Iceberg source. I
>> >> put some
>> >> >>>>> context in this google doc. Would love to get hear your thoughts on
>> >> this.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> https://docs.google.com/document/d/1zfwF8e5LszazcOzmUAOeOtpM9v8dKEPlY_BRFSmI3us/edit#
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Steven
>> >> >>>>>
>> >> >>>>
>> >>
>> >


Re: Source alignment for Iceberg

2022-05-05 Thread Thomas Weise
k) --> User space
>> >>> >> Pluggables
>> >>> >> > (e.g. SplitEnumerator) --> User space Actions (e.g. Pause reading a
>> >>> >> split).
>> >>> >> > b. Framework space information (e.g. task failure) --> User space
>> >>> >> > pluggables (e.g. SplitEnumerator) --> Framework space actions (e.g.
>> >>> exit
>> >>> >> > the job)
>> >>> >> > c. User space information (e.g. a custom workload metric) --> User
>> >>> space
>> >>> >> > pluggables (e.g. SplitEnumerator) --> User space actions (e.g.
>> >>> rebalance
>> >>> >> > the workload across the source readers).
>> >>> >> > d. Use space information (e.g. a custom stopping event in the
>> >>> stream)
>> >>> >> -->
>> >>> >> > User space pluggables (e.g. SplitEnumerator) --> Framework space
>> >>> actions
>> >>> >> > (e.g. stop the job).
>> >>> >> >
>> >>> >> > So basically for any user provided pluggables, the input
>> >>> information may
>> >>> >> > either come from another user provided logic or from the framework,
>> >>> and
>> >>> >> > after receiving that information, the pluggable may either want the
>> >>> >> > framework or another pluggable to take an action. So this gives the
>> >>> >> above 4
>> >>> >> > combinations.
>> >>> >> >
>> >>> >> > In our case, when the pluggables are SplitEnumerator and
>> >>> SourceReader,
>> >>> >> the
>> >>> >> > control flows that only involve user space actions are fully
>> >>> supported.
>> >>> >> But
>> >>> >> > it seems that when it comes to control flows involving framework
>> >>> space
>> >>> >> > information, some of the information has not been exposed to the
>> >>> >> pluggable,
>> >>> >> > and some framework actions might also be missing.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> >
>> >>> >> > Jiangjie (Becket) Qin
>> >>> >> >
>> >>> >> > On Thu, Apr 28, 2022 at 3:44 PM Arvid Heise 
>> >>> wrote:
>> >>> >> >
>> >>> >> >> Hi folks,
>> >>> >> >>
>> >>> >> >> quick input from my side. I think this is from the implementation
>> >>> >> >> perspective what Piotr and I had in mind for a global min watermark
>> >>> >> that
>> >>> >> >> helps in idleness cases. See also point 3 in
>> >>> >> >>
>> >>> >>
>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
>> >>> >> >> .
>> >>> >> >>
>> >>> >> >> Basically, we would like to empower source enumerators to
>> >>> determine the
>> >>> >> >> global min watermark for all source readers factoring in even
>> >>> future
>> >>> >> >> splits. Not all sources can supply that information (think of a
>> >>> general
>> >>> >> >> file source) but most should be able to. Basically, Flink should
>> >>> know
>> >>> >> for a
>> >>> >> >> given source at a given point in time what the min watermark
>> >>> across all
>> >>> >> >> source subtasks is.
>> >>> >> >>
>> >>> >> >> Here is some background:
>> >>> >> >> In the context of idleness, we can deterministically advance the
>> >>> >> >> watermark. In the pre-FLIP-27 era, we had heuristic approaches in
>> >>> >> sources
>> >>> >> >> to switch to idleness and thus allow watermarks to increase in
>> >>> cases
>> >>> >> where
>> >>> >> >> fewer splits than source tasks are available. However, for sources
>> >>> with
>> >>> >> >> dynamic split discovery that actually yields incorrect results.
>> >>> Think
>> >>> >> of a
>> >>> >> >> Kinesis consumer where a shard is split. Then a previously idle
>> >>> source
>> >>> >> >> subtask may receive a new split with time t0 as the lowest
>> >>> timestamp.
>> >>> >> Since
>> >>> >> >> the source subtask did not participate in the global watermark
>> >>> >> generation
>> >>> >> >> (because it was idle), the previously emitted watermark may be
>> >>> past t0
>> >>> >> and
>> >>> >> >> thus results in late records potentially being discarded. A rerun
>> >>> of
>> >>> >> the
>> >>> >> >> same pipeline on historic data would not render the source subtask
>> >>> >> idle and
>> >>> >> >> not result in late records. The solution was to not render source
>> >>> >> subtasks
>> >>> >> >> automatically idle by the framework if there are no spits. That
>> >>> leads
>> >>> >> to
>> >>> >> >> confusion for Kafka users with static topic subscription where
>> >>> #splits
>> >>> >> <
>> >>> >> >> #parallelism stalls pipelines because the watermark is not
>> >>> advancing.
>> >>> >> Here,
>> >>> >> >> your sketched solution can be transferred to KafkaSource to let
>> >>> Flink
>> >>> >> know
>> >>> >> >> that min global watermark on a static assignment is determined by
>> >>> the
>> >>> >> >> slowest partition. Hence, all idle readers emit that min global
>> >>> >> watermark
>> >>> >> >> and the user sees progress.
>> >>> >> >> This whole idea is related to FLIP-182 watermark alignment but I'd
>> >>> go
>> >>> >> >> with another FLIP as the goal is quite different even though the
>> >>> >> >> implementation overlaps.
>> >>> >> >>
>> >>> >> >> Now Iceberg seems to use the same information to actually pause the
>> >>> >> >> consumption of files and create some kind of orderness guarantees
>> >>> as
>> >>> >> far as
>> >>> >> >> I understood. This probably can be applied to any source with
>> >>> dynamic
>> >>> >> split
>> >>> >> >> discovery. However, I wouldn't mix up the concepts and hence I
>> >>> >> appreciate
>> >>> >> >> you not chiming into the FLIP-182 and ff. threads. The goal of
>> >>> >> FLIP-182 is
>> >>> >> >> to pause readers while consuming a split, while your approach
>> >>> pauses
>> >>> >> >> readers before processing another split. So it feels more closely
>> >>> >> related
>> >>> >> >> to the global min watermark - so it could either be part of that
>> >>> FLIP
>> >>> >> or a
>> >>> >> >> FLIP of its own. Afaik API changes should actually happen only on
>> >>> the
>> >>> >> >> enumerator side both for your ideas and for global min watermark.
>> >>> >> >>
>> >>> >> >> Best,
>> >>> >> >>
>> >>> >> >> Arvid
>> >>> >> >>
>> >>> >> >> On Wed, Apr 27, 2022 at 7:31 PM Thomas Weise 
>> >>> wrote:
>> >>> >> >>
>> >>> >> >>> Hi Steven,
>> >>> >> >>>
>> >>> >> >>> Would it be better to bring this as a separate thread related to
>> >>> >> Iceberg
>> >>> >> >>> source to the dev@ list? I think this could benefit from broader
>> >>> >> input?
>> >>> >> >>>
>> >>> >> >>> Thanks
>> >>> >> >>>
>> >>> >> >>> On Wed, Apr 27, 2022 at 9:36 AM Steven Wu 
>> >>> >> wrote:
>> >>> >> >>>
>> >>> >> >>>> + Becket and Sebastian
>> >>> >> >>>>
>> >>> >> >>>> It is also related to the split level watermark alignment
>> >>> discussion
>> >>> >> >>>> thread. Because it is already very long, I don't want to further
>> >>> >> complicate
>> >>> >> >>>> the ongoing discussion there. But I can move the discussion to
>> >>> that
>> >>> >> >>>> existing thread if that is preferred.
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> On Tue, Apr 26, 2022 at 10:03 PM Steven Wu > >>> >
>> >>> >> >>>> wrote:
>> >>> >> >>>>
>> >>> >> >>>>> Hi all,
>> >>> >> >>>>>
>> >>> >> >>>>> We are thinking about how to align with the Flink community and
>> >>> >> >>>>> leverage the FLIP-182 watermark alignment in the Iceberg
>> >>> source. I
>> >>> >> put some
>> >>> >> >>>>> context in this google doc. Would love to get hear your
>> >>> thoughts on
>> >>> >> this.
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >>
>> >>> https://docs.google.com/document/d/1zfwF8e5LszazcOzmUAOeOtpM9v8dKEPlY_BRFSmI3us/edit#
>> >>> >> >>>>>
>> >>> >> >>>>> Thanks,
>> >>> >> >>>>> Steven
>> >>> >> >>>>>
>> >>> >> >>>>
>> >>> >>
>> >>> >
>> >>>
>> >>


Re: [RESULT] [VOTE] Release 1.15.0, release candidate #4

2022-05-02 Thread Thomas Weise
Hi,

Thank you for working on the 1.15.0 release. Is there an update wrt
finalizing the release?

Thanks,
Thomas


On Mon, Apr 25, 2022 at 9:17 PM Yun Gao 
wrote:

> I'm happy to announce that we have unanimously approved this release.
>
> There are 6 explicit approving votes, 3 of which are binding:
>
>  * Dawid Wysakowicz (binding)
> * Xingbo Huang
> * Matthias Pohl
> * Yang Wang
> * Zhu Zhu (binding)
> * Guowei Ma (binding)
>
> There are no disapproving votes.
>
> Thanks everyone!
>
> Best,
> Yun Gao


Re: Failed Unit Test on Master Branch

2022-05-01 Thread Thomas Weise
I reproduced the issue.

The inconsistent test result is due to time zone dependency in the test.

The underlying issue is that convertToTimestamp attempts to set negative
value with java.sql.Timestamp.setNanos

https://issues.apache.org/jira/browse/FLINK-27465

Thanks,
Thomas


On Thu, Apr 28, 2022 at 7:23 PM Guowei Ma  wrote:

> Hi Haizhou
>
> I ran the test and there is no problem.
> And commit is "d940af688be90c92ce4f8b9ca883f6753c94aa0f"
>
> Best,
> Guowei
>
>
> On Fri, Apr 29, 2022 at 5:39 AM Haizhou Zhao 
> wrote:
>
> > Hello Flink Community,
> >
> > I was encountering some unit test failure in the flink-avro sub-module
> when
> > I tried to pull down the master branch and build.
> >
> > Here is the command I ran:
> >
> > mvn clean package -pl flink-formats/flink-avro
> >
> > Here is the test that fails:
> >
> >
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/test/java/org/apache/flink/formats/avro/AvroRowDeSerializationSchemaTest.java#L178
> >
> > Here is the exception that was thrown:
> > [ERROR]
> >
> >
> org.apache.flink.formats.avro.AvroRowDeSerializationSchemaTest.testGenericDeserializeSeveralTimes
> >  Time elapsed: 0.008 s  <<< ERROR!
> > java.io.IOException: Failed to deserialize Avro record.
> > ...
> >
> > Here is the latest commit of the HEAD I pulled:
> > commit c5430e2e5d4eeb0aba14ce3ea8401747afe0182d (HEAD -> master,
> > oss/master)
> >
> > Can someone confirm this is indeed a problem on the master branch? If
> yes,
> > any suggestions on fixing it?
> >
> > Thank you,
> > Haizhou Zhao
> >
>


[jira] [Created] (FLINK-27465) AvroRowDeserializationSchema.convertToTimestamp fails with negative nano seconds

2022-05-01 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-27465:


 Summary: AvroRowDeserializationSchema.convertToTimestamp fails 
with negative nano seconds
 Key: FLINK-27465
 URL: https://issues.apache.org/jira/browse/FLINK-27465
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.15.0
Reporter: Thomas Weise
Assignee: Thomas Weise


The issue is exposed due to time zone dependency in 
AvroRowDeSerializationSchemaTest.
 
The root cause is that convertToTimestamp attempts to set negative value with 
java.sql.Timestamp.setNanos



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27435) Kubernetes Operator keeps savepoint history

2022-04-27 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-27435:


 Summary: Kubernetes Operator keeps savepoint history
 Key: FLINK-27435
 URL: https://issues.apache.org/jira/browse/FLINK-27435
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Thomas Weise


Currently the operator keeps track of the most recent savepoint that was 
triggered through savepointTriggerNonce. In some cases it is necessary to find 
older savepoints. For that, it would be nice if the operator can optionally 
maintain a savepoint history (and perhaps also trigger disposal of savepoints 
that fall out of the history). The maximum number of savepoints retained could 
be configured by cound and/or age.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-04-21 Thread Thomas Weise
Thanks for working on this!

I wonder if "supporting" split alignment in SourceReaderBase and then doing
nothing if the split reader does not implement AlignedSplitReader could be
misleading? Perhaps WithSplitsAlignment can instead be added to the
specific source reader (i.e. KafkaSourceReader) to make it explicit that
the source actually supports it.

Thanks,
Thomas


On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf  wrote:

> Hi Sebastian, Hi Dawid,
>
> As part of this FLIP, the `AlignedSplitReader` interface (aka the stop &
> resume behavior) will be implemented for Kafka and Pulsar only, correct?
>
> +1 in general. I believe it is valuable to complete the watermark aligned
> story with this FLIP.
>
> Cheers,
>
> Konstantin
>
>
>
>
>
>
>
> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz 
> wrote:
>
> > To be explicit, having worked on it, I support it ;) I think we can
> > start a vote thread soonish, as there are no concerns so far.
> >
> > Best,
> >
> > Dawid
> >
> > On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > > Dear Flink developers,
> > >
> > > I would like to open a discussion on FLIP 217 [1] for an extension of
> > > Watermark Alignment to perform alignment also in SplitReaders. To do
> so,
> > > SplitReaders must be able to suspend and resume reading from split
> > sources
> > > where the SourceOperator coordinates and controlls suspend and resume.
> To
> > > gather information about current watermarks of the SplitReaders, we
> > extend
> > > the internal WatermarkOutputMulitplexer and report watermarks to the
> > > SourceOperator.
> > >
> > > There is a PoC for this FLIP [2], prototyped by Arvid Heise and revised
> > and
> > > reworked by Dawid Wysakowicz (He did most of the work.) and me. The
> > changes
> > > are backwards compatible in a way that if affected components do not
> > > support split alignment the behavior is as before.
> > >
> > > Best,
> > > Sebastian
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > [2] https://github.com/dawidwys/flink/tree/aligned-splits
> > >
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


Re: [DISCUSS] Make Kubernetes Operator config "dynamic" and consider merging with flinkConfiguration

2022-04-04 Thread Thomas Weise
Hi Gyula,

Can you please specify which of the current settings you see as
candidates to expose to the user? Force upgrade definitely makes sense
and there will likely be others going forward that govern the upgrade
or shutdown behavior. The main consideration should probably be that
settings that can be controlled through the deployment should not
affect other deployments.

I would also prefer to add these to flinkConfiguaration vs.
introducing a separate operatorConfiguration.

Thanks,
Thomas


On Sat, Apr 2, 2022 at 12:21 AM Gyula Fóra  wrote:
>
> That's a very good point Matyas, we cannot risk any interference with other 
> jobs but I think we don't necessarily have to.
>
> First of all we should only allow users to overwrite selected configs. For 
> deciding what to allow, we can separate the operator related configs into 2 
> main groups:
>
> Group 1: Config options that are specific to the reconciliation logic of a 
> specific job such as feature flags etc (for example 
> https://issues.apache.org/jira/browse/FLINK-26926).
> These configs cannot possibly cause interference, they are part of the 
> natural reconciliation logic.
>
> Group 2: Config options that actually affect the controller scheduling, 
> memory/cpu requirements. These are the problematic ones as they can actually 
> break the operator if we are not careful.
>
> For Group 1 there are no safeguards necessary and I would say this is the 
> primary use-case I wanted to cover with this discussion.
>
> I think Group 2 could also be supported as long as we specifically validate 
> the values that for example scheduling delays are within pre-configured 
> bounds. One example would be configuring client timeouts, there could be 
> special cases where the operator hardcoded timeout is not good enough, but we 
> also want to set a hard max bound on the configurable value.
>
> Cheers,
> Gyula
>
>
>
> On Sat, Apr 2, 2022 at 8:57 AM Őrhidi Mátyás  wrote:
>>
>> Thanks Gyula for bringing this topic up! Although the suggestion would
>> indeed simplify the configuration handling I have some concerns about
>> opening the operator configuration for end users in certain cases. In a
>> multitenant scenario for example, how could we protect against one user
>> messing up the configs and potentially distract others? As I see it, the
>> operator acts as the control plane, ideally totally transparent for end
>> users, often behind a rest API. Let me know what you think.
>>
>> Cheers,
>> Matyas
>>
>> On Sat, Apr 2, 2022 at 5:12 AM Yang Wang  wrote:
>>
>> > I also like the proposal 2. Maybe it could be named with
>> > *KubernetesOperatorConfigOptions*, which just looks like all other
>> > ConfigOption(e.g. *KubernetesConfigOptions, YarnConfigOptions*) in Flink.
>> > The proposal 2 is more natural and easy to use for Flink users.
>> >
>> >
>> > Best,
>> > Yang
>> >
>> > Gyula Fóra  于2022年4月2日周六 02:25写道:
>> >
>> >> Hi Devs!
>> >>
>> >> *Background*:
>> >> With more and more features and options added to the flink kubernetes
>> >> operator it would make sense to not expose everything as first class
>> >> options in the deployment/jobspec (same as we do for flink configuration
>> >> currently).
>> >>
>> >> Furthermore it would be beneficial if users could control reconciliation
>> >> specific settings like timeouts, reschedule delays etc on a per deployment
>> >> basis.
>> >>
>> >>
>> >> *Proposal 1*The more conservative proposal would be to add a new
>> >> *operatorConfiguration* field to the deployment spec that the operator
>> >> would use during the controller loop (merged with the default operator
>> >> config). This makes the operator very extensible with new options and
>> >> would
>> >> also allow overrides to the default operator config on a per deployment
>> >> basis.
>> >>
>> >>
>> >> *Proposal 2*I would actually go one step further and propose that we
>> >> should
>> >> merge *flinkConfiguration* and *operatorConfiguration* -as whether
>> >> something affects the flink job submission/job or the operator behaviour
>> >> does not really make a difference to the end user. For users the operator
>> >> is part of flink so having a multiple configuration maps could simply
>> >> cause
>> >> confusion.
>> >> We could simply prefix all operator related configs with
>> >> `kubernetes.operator` to ensure that we do not accidentally conflict with
>> >> flink native config options.
>> >> If we go this route I would even go as far as to naming it simply
>> >> *configuration* for sake of simplicity.
>> >>
>> >> I personally would go with proposal 2 to make this as simple as possible
>> >> for the users.
>> >>
>> >> Please let me know what you think!
>> >> Gyula
>> >>
>> >


Re: [VOTE] Apache Flink Kubernetes Operator Release 0.1.0, release candidate #3

2022-03-31 Thread Thomas Weise
+1 (binding)

* installed from staged helm chart and run example
* built from source


On Wed, Mar 30, 2022 at 7:39 AM Gyula Fóra  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version 0.1.0 of
> Apache Flink Kubernetes Operator,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Kubernetes Operator canonical source distribution (including the
> Dockerfile), to be deployed to the release repository at dist.apache.org
> b) Kubernetes Operator Helm Chart to be deployed to the release repository
> at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Docker image to be pushed to dockerhub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a,b) can be found in the corresponding dev repository
> at dist.apache.org [1]
> * All artifacts for c) can be found at the Apache Nexus Repository [2]
> * The docker image is staged on github [7]
>
> All artifacts are signed with the key
> 0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [3]
>
> Other links for your review:
> * JIRA release notes [4]
> * source code tag "release-0.1.0-rc3" [5]
> * PR to update the website Downloads page to include Kubernetes Operator
> links [6]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> **Note on Verification**
>
> You can follow the basic verification guide here
> <
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
> >
> .
> Note that you don't need to verify everything yourself, but please make
> note of what you have tested together with your +- vote.
>
> Thanks,
> Gyula
>
> [1]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-0.1.0-rc3/
> [2]
> https://repository.apache.org/content/repositories/orgapacheflink-1492/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351499
> [5]
> https://github.com/apache/flink-kubernetes-operator/tree/release-0.1.0-rc3
> [6] https://github.com/apache/flink-web/pull/519
> [7] ghcr.io/apache/flink-kubernetes-operator:2c166e3
>


Re: [VOTE] Apache Flink Kubernetes Operator Release 0.1.0, release candidate #1

2022-03-28 Thread Thomas Weise
I believe if we as the PMC distribute a docker image (which is optional,
"convenience"), then that image has to follow the rules for binary packages
[1]. (And I would assume that applies regardless where we host the images.)

Now to say that we only publish sources kind of side steps that problem. At
the same time it probably also defeats the purpose of the preview release,
which is to make it easier for folks that are not active contributors to
take the operator for a test drive.

So I'm leaning towards publishing the images with respective NOTICE
requirements (for the layers that we add).

We are also planning to publish the jar files in the future as it helps to
build clients and those would need to have the binary NOTICE also.

Cheers,
Thomas

[1] https://infra.apache.org/licensing-howto.html#binary

On Mon, Mar 28, 2022 at 9:20 AM Gyula Fóra  wrote:

> I see your point and the value for having such a notice added.
>
> I think there are 2 completely distinct questions at play here:
>
> a) Is there a legal requirement for a NOTICE file for the docker image?
> b) If not, should we block the release on this and add it immediately?
>
> For a)
> I think from a legal (and ASF policy) perspective there is one question
> that decides whether this is a requirement for this release or not:
> Is the docker image part of the release?
>
> I think the answer here is clearly no, the image is not part of the
> release. Only the Dockerfile is part of the release.
>
> For b)
> I think adding the NOTICE is a good idea but it will take some work so I
> would not block the preview release on it.
> If someone has some handy utility or experience generating it, I don't mind
> including it in later RCs of course.
> Otherwise we can aim for the next release.
>
> Gyula
>
>
> On Mon, Mar 28, 2022 at 6:03 PM Chesnay Schepler 
> wrote:
>
> > One difference to Flink is that the distribution bundled in the docker
> > image still contains the NOTICE covering the contents of it.
> >
> > It may admittedly not be the most discoverable place, but a reasonable
> one
> > I think.
> >
> > Docker as a whole is very weird when it comes to licensing.
> > Think of all the things are are shipped in an image; I don't think we can
> > (nor should) try to document everything in there.
> > For the most part this is also not necessary as the Flink images are
> based
> > on Debian,
> > where (al)most every installed package already embeds licensing
> > information into the image.
> >
> > However, for content that we copy into the image (i.e., the jars), I
> think
> > it would be reasonable to document that.
> > (and based on experience from the Flink side has also shown other
> > advantages beyond licensing...)
> >
> > On 28/03/2022 17:41, Gyula Fóra wrote:
> >
> > Thanks for the input!
> >
> > I am not an expert on this topic and have been contemplating this myself
> > also.
> > We are basically trying to follow the precedent set by Flink and Statefun
> > projects where the docker builds that we use to publish images to
> > dockerhub do not declare any notices.
> >
> > We will not use ghcr.io for the final release but will use dockerhub
> like
> > flink and other apache projects.
> >
> > If I look at it from a strictly technical point of view, the docker image
> > is not part of the official release (as it's also not part of the
> > flink/statefun release).
> >
> > It would be good to get some input from others on this. It's not
> > impossible to add the notices but it's considerable work and maintenance
> > overhead.
> > By extending the logic would you then also add license information for
> the
> > base images of the docker container (and so on so forth)?
> >
> > My gut feeling would be that we could highlight this in the NOTICE of the
> > main project  (or some other appropriate place) but we do not explicitly
> > list the dependencies.
> >
> > Would be good to hear how others feel about this!
> >
> > Gyula
> >
> > On Mon, Mar 28, 2022 at 5:26 PM Chesnay Schepler 
> > wrote:
> >
> >> I don't think having users build the fat-jar & docker image absolves us
> >> of all responsibility w.r.t. the licensing of used products.
> >>
> >> At the very least we need to inform users what licenses the fat-jar &
> >> docker image fall under such that they can make an informed decision as
> to
> >> whether they can adhere to said restrictions.
> >> In particular since building it yourself is (apparently) a hard
> >> requirement for using said product.
> >>
> >> Even beyond that though, as *we* push images to ghcr.io we still need
> to
> >> adhere to the licensing requirements in any case afaict.
> >>
> >> On 28/03/2022 17:07, Gyula Fóra wrote:
> >>
> >> Hi Chesnay,
> >>
> >> Let me try to explain the "strange stuff"
> >>
> >> flink-kubernetes-shaded relocates some classes found in flink-kubernetes
> >> in order to not conflict with some of the operator dependencies.
> >> This is necessary as flink-kubernetes packages almost everything in the
> >> fat-jar as-is 

[jira] [Created] (FLINK-26812) Flink native k8s integration should support owner reference

2022-03-22 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26812:


 Summary: Flink native k8s integration should support owner 
reference
 Key: FLINK-26812
 URL: https://issues.apache.org/jira/browse/FLINK-26812
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.14.4
Reporter: Thomas Weise


Currently the JobManager deployment and other resource created by the 
integration do not support the owner reference, allowing for the possibility of 
them to become orphaned when managed by a higher level entity like the CR of 
the k8s operator. Any top level resource created should optionally have an 
owner reference to ensure safe cleanup when managing entity gets removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26811) Document CRD upgrade process

2022-03-22 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26811:


 Summary: Document CRD upgrade process
 Key: FLINK-26811
 URL: https://issues.apache.org/jira/browse/FLINK-26811
 Project: Flink
  Issue Type: Sub-task
Reporter: Thomas Weise


We need to document how to update the CRD with a newer version. During 
development, we delete the old CRD and create it from scratch. In an 
environment with existing deployments that isn't possible, as deleting the CRD 
would wipe out all existing CRs.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Support the session job management in kubernetes operator

2022-03-21 Thread Thomas Weise
Hi Aitozi,

Thanks for the proposal. Can you please clarify in the FLIP the
relationship between the session deployment and the jobs that depend on it?
Will, for example, the operator ensure that the individual jobs are
deleted when the underlying cluster is deleted?

Side note: When the discussion thread started 5 days ago and a FLIP vote
was started 2 days later and there is also a weekend included, then this is
probably on the short side for broader feedback.

Thanks,
Thomas


On Fri, Mar 18, 2022 at 4:01 AM Yang Wang  wrote:

> Great work. Since we are introducing a new public API, it deserves a FLIP.
> And the FLIP will help the later contributors catch up soon.
>
> Best,
> Yang
>
> Gyula Fóra  于2022年3月18日周五 18:11写道:
>
> > Thank Aitozi, a FLIP might be an overkill at this point but no harm in
> > voting on it anyways :)
> >
> > Looks good!
> >
> > Gyula
> >
> > On Fri, Mar 18, 2022 at 10:25 AM Aitozi  wrote:
> >
> > > Hi Guys:
> > >
> > > FYI, I have integrated your comments and drawn the FLIP-215[1], I
> > will
> > > create another thread to vote for it.
> > >
> > > [1]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > >
> > > Best,
> > >
> > > Aitozi.
> > >
> > >
> > > Aitozi  于2022年3月17日周四 11:16写道:
> > >
> > > > Hi Biao Geng:
> > > >
> > > >Thanks for your feedback, I'm +1 to go with option#2. It's a good
> > > > point that
> > > >
> > > > we should improve the error message debugging for the session job, I
> > > > think
> > > >
> > > > it can be a follow up work as an improvement after we support the
> > session
> > > > job operation.
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Aitozi.
> > > >
> > > >
> > > > Geng Biao  于2022年3月17日周四 10:55写道:
> > > >
> > > >> Thanks Aitozi for the work!
> > > >>
> > > >> I lean to option#2 of using JarRunHeaders with uber job jar as well.
> > As
> > > >> Yang said, the user defined dependencies may be better supported in
> > > >> upstream flink.
> > > >> A follow-up thought: I think we should care the  potential influence
> > on
> > > >> user experiences: as the job graph is generated in JM, when the
> > > generation
> > > >> fails due to some issues in the main() method, we should do some
> work
> > on
> > > >> showing such error messages in this proposal or the later k8s
> operator
> > > >> implementation.  Reason for this question is that if users submit
> many
> > > jobs
> > > >> to one same session cluster, it may be not easy for them to find
> > > relevant
> > > >> error logs about main() method of a specific job. The FLINK-25715
> > could
> > > >> help us later.
> > > >>
> > > >>
> > > >> Best,
> > > >> Biao Geng
> > > >>
> > > >>
> > > >> 发件人: Aitozi 
> > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > >> 收件人: dev@flink.apache.org 
> > > >> 主题: Re: [DISCUSS] Support the session job management in kubernetes
> > > >> operator
> > > >> Hi Yang Wang
> > > >> Thanks for your feedback, Provide the local and http
> > implementation
> > > >> for
> > > >> the first version makes sense to me.
> > > >> +1 for it.
> > > >>
> > > >> Best,
> > > >> Aitozi
> > > >>
> > > >> Yang Wang  于2022年3月16日周三 16:44写道:
> > > >>
> > > >> > # How to download the user jars
> > > >> > I agree with Gyula that it will be a burden if we bundle the flink
> > > >> > filesystem dependencies in the operator image.
> > > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > > >> > flink-kubernetes-operator. By default, we provide the local and
> http
> > > >> > implementation,
> > > >> > which means we could get the user jars from local files or HTTP
> > URLs.
> > > >> Flink
> > > >> > filesystem support could be done as a follow-up based on the
> > feedback.
> > > >> >
> > > >> > If the user wants to use the local implementation, they need to
> > mount
> > > a
> > > >> > PV(aka persist volume) to the operator first and then put their
> jars
> > > >> into
> > > >> > the PV.
> > > >> >
> > > >> > # How to talk to session JobManager to submit the job
> > > >> > After more consideration, I also prefer the second approach, via
> > REST
> > > >> API
> > > >> > /jars/:jarid/run. If we have strong requirements to support
> > > dependencies
> > > >> > jars and
> > > >> > artifacts, we could try to support this in the upstream project.
> > > >> >
> > > >> > Best,
> > > >> > Yang
> > > >> >
> > > >> >
> > > >> > Aitozi  于2022年3月16日周三 16:11写道:
> > > >> >
> > > >> > > Hi Gyula
> > > >> > > Thanks for your quick response. Regarding the different
> > > >> filesystems
> > > >> > > dependency,
> > > >> > > I think we can make it optional and pluggable, and let it choose
> > by
> > > >> user
> > > >> > > when building
> > > >> > > their operator image. Users can build their image from the base
> > > >> operator
> > > >> > > image and
> > > >> > > add filesystem dependency they want to use to it. BTW, we can
> > > support
> > > >> the
> > > >> > > http URI
> > > >> > > by default.
> > > >> > >

[jira] [Created] (FLINK-26639) Publish flink-kubernetes-operator maven artifacts

2022-03-14 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26639:


 Summary: Publish flink-kubernetes-operator maven artifacts
 Key: FLINK-26639
 URL: https://issues.apache.org/jira/browse/FLINK-26639
 Project: Flink
  Issue Type: Sub-task
Reporter: Thomas Weise


We should publish the Maven artifacts in addition to the Docker images so that 
downstream Java projects can utilize the CRD classes directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26554) Clean termination of FlinkDeployment

2022-03-09 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26554:


 Summary: Clean termination of FlinkDeployment
 Key: FLINK-26554
 URL: https://issues.apache.org/jira/browse/FLINK-26554
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Thomas Weise


After stopping the deployment, operator attempts to list jobs:


 2022-03-09 07:55:37,114 o.a.f.k.o.u.FlinkUtils         [INFO ] 
[default.basic-example] Waiting for cluster shutdown... (16)
2022-03-09 07:55:38,123 o.a.f.k.o.u.FlinkUtils         [INFO ] 
[default.basic-example] Cluster shutdown completed.
2022-03-09 07:55:38,160 o.a.f.k.o.c.FlinkDeploymentController [INFO ] 
[default.basic-example] Stopping cluster basic-example
2022-03-09 07:55:38,160 o.a.f.k.o.o.Observer           [INFO ] 
[default.basic-example] Getting job statuses for basic-example

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26538) Ability to restart deployment w/o spec change

2022-03-08 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26538:


 Summary: Ability to restart deployment w/o spec change
 Key: FLINK-26538
 URL: https://issues.apache.org/jira/browse/FLINK-26538
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Thomas Weise


Operator should allow restart of the Flink deployment w/o any other spec 
change. This provides the escape hatch for an operator to recover a deployment 
that has gone into a bad state (for whatever reason including memory leaks, 
hung JVM etc.) without direct access to the k8s cluster. This can be addressed 
by adding a restartNonce to the CRD.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Flink Kubernetes Operator controller flow

2022-02-28 Thread Thomas Weise
Thanks for bringing the discussion. It's a good time to revisit this
area as the operator implementation becomes more complex.

I think the most important part of the design are well defined states
and transitions. Unless that can be reasoned about, there will be
confusion. For example:

1) For job upgrade, it wasn't clear in which state reconciliation
should happen and as a result the implementation ended up incomplete.
2) "JobStatusObserver" would attempt to list jobs before the REST API
is ready. And it would be used in session mode, even though the
session mode needs a different state machine.

The implementation currently contains a lot of if-then-else logic,
which is hard to follow and difficult to maintain. It will make it
harder to introduce new states that would be necessary to implement
more advanced upgrade strategies, amongst others.

Did you consider a state machine centric approach? See [1] for example.

As Biao mentioned, observe -> reconcile may repeat and different
states will require different checks. Once a job (in job mode) is
running, there is no need to check the JM deployment etc. A monolithic
observer may not work so well for that. Rather, I think different
states have different monitoring needs that inform transitions,
including the actual state and changes made to the CR.

It would also be good if the current state is directly reflected in
the CR status so that it is easier to check where the deployment is
at.

Cheers,
Thomas


[1] https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md

On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra  wrote:
>
> Hi All!
>
> Thank you for the feedback, I agree with what has been proposed to include
> as much as possible in the actual resource status and make the reconciler
> completely independent from the observer.
>
> @Biao Geng:
> Validation could depend on the current status (depending on how we
> implement the validation logic) so it might always be necessary (and it is
> also cheap).
> What you are saying with multiple observe -> reconcile cycles ties back to
> what Matyas said, that we should probably have an Observe loop until we
> have a stable state ready for reconciliation, then reconcile once.
>
> So probably validate -> observe until stable -> reconcile -> observe until
> stable
>
> Cheers,
> Gyula
>
> On Mon, Feb 28, 2022 at 4:49 PM Biao Geng  wrote:
>
> > Hi Gyula,
> > Thanks for the discussion. It also makes senses to me on the separation of
> > 3 components and Yang's proposal.
> > Just 1 follow-up thought after checking your PR: in the reconcile()
> > method of controller, IIUC, the real flow could be
> > `validate->observe->reconcile->validate->observe->reconcile...". The
> > validation phase seems to be required only when the creation of the job
> > cluster and the upgrade of config. For phases like waiting the JM from
> > deploying to ready, it is not mandatory and thus the flow can look like
> > `validate->observe->reconcile->optional validate due to current
> > state->observe->reconcile...`
> >
> > Őrhidi Mátyás  于2022年2月28日周一 21:26写道:
> >
> > > It is worth looking at the controller code in the spotify operator too:
> > >
> > >
> > https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
> > >
> > > It is looping in the 'observer phase' until it reaches a stable state,
> > then
> > > it performs the necessary changes.
> > >
> > > Based on this I also suggest keeping the logic in separate
> > > modules(Validate->Observe->Reconcile). The three components might not
> > > even be enough as we add more and more complexity to the code.
> > >
> > > Cheers,
> > > Matyas
> > >
> > >
> > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi  wrote:
> > >
> > > > Hi, Gyula
> > > >   Thanks for driving this discussion. I second Yang Wang's idea
> > that
> > > > it's better to make the `validator`, `observer` and `reconciler`
> > > > self-contained. I also prefer to define the `Observer` as an interface
> > > and
> > > > we could define the statuses that `Observer` will expose. It acts like
> > > the
> > > > observer protocol between the `Observer` and `Reconciler`.
> > > >
> > > > Best,
> > > > Aitozi.
> > > >
> > > > Yang Wang  于2022年2月28日周一 16:28写道:
> > > >
> > > > > Thanks for posting the discussion here.
> > > > >
> > > > >
> > > > > Having the components `validator` `observer` `reconciler` makes lots
> > of
> > > > > sense. And the "Validate -> Observe -> Reconcile"
> > > > > flow seems natural to me.
> > > > >
> > > > > Regarding the implementation in the PR, instead of directly using the
> > > > > observer in the reconciler, I lean to let the observer
> > > > > exports the results to the status(e.g. jobmanager deployment status,
> > > rest
> > > > > port readiness, flink jobs status, etc.) and
> > > > > the reconciler reads it from the status. Then each component is more
> > > > > self-contained and the boundary will be clearer.
> > > > >
> > > > >
> > > > > Best,
> > 

Re: [DISCUSS] Plan to externalize connectors and versioning

2022-02-23 Thread Thomas Weise
This plan LGTM.

Thanks,
Thomas

On Wed, Feb 23, 2022 at 4:28 AM Chesnay Schepler  wrote:
>
> That sounds fine to me.
>
> On 23/02/2022 10:49, Konstantin Knauf wrote:
> > Hi Chesnay, Hi everyone,
> >
> > I think the idea for the migration is the following (with the example of
> > ElasticSearch). I talked to Martijn offline.
> >
> > 1. ElasticSearch Connector is released from the core repository with the
> > Flink 1.15.0 release. No changes.
> >
> > 2. At the beginning of the Flink 1.16 release cycle the connector is
> > removed from `master` of the core repository. It remains on the
> > `release-1.15` branch and earlier release branches.
> >
> > 3. The connector code is "copied" over to the `master` branch of a
> > `flink-connector-elastic-search` repository. Bugfixes to the connector need
> > to go to both `release-1.15` and before in the core repository and `master`
> > of the external repository.
> >
> > 4. Once all the processes required to do a release in the
> > `flink-connector-elastic-search` are in place (docs integration, release
> > automation,...), we release flink-connector-elastic-search:3.0.0, which
> > will be compatible with Flink 1.15. At this point, users can choose whether
> > they use flink-connector-elastic-search:1.15.x (released from the core
> > repository) or flink-connector-elastic-search:3.0.0 already released from
> > the external repository with Flink 1.15. The documentation will already
> > advertise the one released from the external repository. This is the
> > "overlap" that Martijn mentioned.
> >
> > 5. From here onwards, the release cycle of the ElasticSearch Connector is
> > independent. There could be 3.1.0 and 3.0.1 etc. The compatibility matrix
> > will be part of the connector documentation.
> >
> > 6. If there is a patch release for Flink 1.15-, this will of course also
> > include flink-connector-elastic-search release from the core repository.
> >
> > 7. For Flink 1.16, there might or might not be a release of the
> > elastic-search-connector from the external repository. Depends on
> > compatibility.
> >
> > I hope this clarifies it a bit and it makes sense to me.
> >
> > It also makes sense to me to do this as soon as possible (probably once the
> > release-1.15 branch is cut) with the example of ElasticSearch. Afterwards
> > (hopefully still in the Flink 1.16 release cycle) we do the same for other
> > connectors like Kafka, Pulsar, Kinesis. I don't think it's feasible or
> > helpful to make it a condition that this happens for all connectors at the
> > same time.
> >
> > Cheers and thanks,
> >
> > Konstantin
> >
> >
> > On Sun, Feb 20, 2022 at 7:57 PM Chesnay Schepler  wrote:
> >
> >> If we don't make a release, I think it would appear as partially
> >> externalized (since the binaries are still only created with Flink core,
> >> not from the external repository).
> >>
> >> I'm wondering you are referring to when you say "it appear[s]". Users
> >> don't know about it, and contributors can be easily informed about the
> >> current state. Who's opinion are you worried about?
> >>
> >> doing a release [means] that we have then completely externalized the
> >> connector
> >>
> >> In my mind the connector is completely externalized once the connector
> >> project is complete and can act independently from the core repo. That
> >> includes having all the code, working CI and the documentation being
> >> integrated into the Flink website. And /then/ we can do a release. I
> >> don't see how this could work any other way; how could we possibly argue
> >> that the connector is externalized when development on the connector
> >> isn't even possible in that repository?
> >>
> >> There are also other connectors (like Opensearch and I believe RabbitMQ)
> >> that will end up straight in their own repositories
> >>
> >>
> >> Which is a bit of a different situation because here the only source of
> >> this connector will have been that repository.
> >>
> >> Would you prefer to remove a connector in a Flink patch release?
> >>
> >>
> >> No. I think I misread your statement; when you said that there "is 1
> >> release cycle where the connector both exists in Flink core and the
> >> external repo", you are referring to 1.15, correct? (although this
> >> should also apply to 1.14 so isn't it 2 releases...?)
> >> How I understood it was that we'd keep the connector around until 1.16,
> >> which would obviously be terrible.
> >>
> >> On 19/02/2022 13:30, Martijn Visser wrote:
> >>> Hi Chesnay,
> >>>
> >>> I think the advantage of also doing a release is that we have then
> >>> completely externalized the connector. If we don't make a release, I
> >> think
> >>> it would appear as partially externalized (since the binaries are still
> >>> only created with Flink core, not from the external repository). It would
> >>> also mean that in our documentation we would still point to the binary
> >>> created with the Flink core release.
> >>>
> >>> There are also other connectors (like 

[jira] [Created] (FLINK-26290) Revisit serviceAccount and taskSlots direct fields in CRD

2022-02-21 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26290:


 Summary: Revisit serviceAccount and taskSlots direct fields in CRD
 Key: FLINK-26290
 URL: https://issues.apache.org/jira/browse/FLINK-26290
 Project: Flink
  Issue Type: Sub-task
Reporter: Thomas Weise
Assignee: Thomas Weise


serviceAccount should be a first class field, not hidden as 
kubernetes.jobmanager.service-account (since the CRD should be agnostic to the 
embedded integration option)

 

taskSlots should be removed and 
flinkConfiguration.taskmanager.numberOfTaskSlots used instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26261) Reconciliation should try to start job when not already started or move to permanent error

2022-02-20 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26261:


 Summary: Reconciliation should try to start job when not already 
started or move to permanent error
 Key: FLINK-26261
 URL: https://issues.apache.org/jira/browse/FLINK-26261
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Thomas Weise


When job submission fails, the operator currently keeps trying to find the job 
status. In the case I'm looking at the cluster wasn't created because the image 
could not be resolved. We either need the logic to re-attempt job submission or 
flag the submission as failed so that JobStatusObserver does not attempt to 
check again. We should also capture the submission error as event on the CR.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-15 Thread Thomas Weise
;
> > > > > > > *Basic Admission control using a Webhook*Standard resource
> > > admission
> > > > > > > control in Kubernetes to validate and potentially reject
> > resources
> > > is
> > > > > > done
> > > > > > > through Webhooks.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
> > > > > > > This is a necessary mechanism to give the user an upfront error
> > > when
> > > > an
> > > > > > > incorrect resource was submitted. In the Flink operator's case we
> > > > need
> > > > > to
> > > > > > > validate that the FlinkDeployment yaml actually makes sense and
> > > does
> > > > > not
> > > > > > > contain erroneous config options that would inevitably lead to
> > > > > > > deployment/job failures.
> > > > > > >
> > > > > > > We have implemented a simple webhook that we can use for this
> > type
> > > of
> > > > > > > validation, as a separate maven module
> > (flink-kubernetes-webhook).
> > > > The
> > > > > > > webhook is an optional component and can be enabled or disabled
> > > > during
> > > > > > > deployment. To avoid pulling in new external dependencies we have
> > > > used
> > > > > > the
> > > > > > > Flink Shaded Netty module to build the simple rest endpoint
> > > required.
> > > > > If
> > > > > > > the community feels that Netty adds unnecessary complexity to the
> > > > > webhook
> > > > > > > implementation we are open to alternative backends such as
> > > Springboot
> > > > > for
> > > > > > > instance which would practically eliminate all the boilerplate.
> > > > > > >
> > > > > > >
> > > > > > > *Helm Chart for deployment*Helm charts provide an industry
> > standard
> > > > way
> > > > > > of
> > > > > > > managing kubernetes deployments. We have created a helm chart
> > > > prototype
> > > > > > > that can be used to deploy the operator together with all
> > required
> > > > > > > resources. The helm chart allows easy configuration for things
> > like
> > > > > > images,
> > > > > > > namespaces etc and flags to control specific parts of the
> > > deployment
> > > > > such
> > > > > > > as RBAC or the webhook.
> > > > > > >
> > > > > > > The helm chart provided is intended to be a first version that
> > > worked
> > > > > for
> > > > > > > us during development but we expect to have a lot of iterations
> > on
> > > it
> > > > > > based
> > > > > > > on the feedback from the community.
> > > > > > >
> > > > > > > *Acknowledgment*
> > > > > > > We would like to thank everyone who has provided support and
> > > valuable
> > > > > > > feedback on this FLIP.
> > > > > > > We would also like to thank Yang Wang & Alexis Sarda-Espinosa
> > > > > > specifically
> > > > > > > for making their operators open source and available to us which
> > > had
> > > > a
> > > > > > big
> > > > > > > impact on the FLIP and the prototype.
> > > > > > >
> > > > > > > We are looking forward to continuing development on the operator
> > > > > together
> > > > > > > with the broader community.
> > > > > > > All work will be tracked using the ASF Jira from now on.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Gyula
> > > > > > >
> > > > > > > On Mon, Feb 14, 2022 at 9:21 AM K Fred 
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Gyula,
> > > > > > > >
> > > > > > > > Thanks!
> > >

Re: [DISCUSS] FLINK-25927: Make flink-connector-base dependency usage consistent across all connectors

2022-02-12 Thread Thomas Weise
Hi Chesnay,

My understanding is that source compatibility is the initial step
towards a stronger guarantee that will reduce the pain downstream. In
that spirit, I would anticipate that we are not taking steps to make
the long term goal harder to achieve?

The FLIP [1] states:

"There is no official guarantee that a program compiled against an
earlier version can be executed on a newer Flink cluster (no ABI
backwards compatibility). But eventually we should try provide this
guarantee."

Cheers,
Thomas

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees

On Fri, Feb 11, 2022 at 12:26 AM Chesnay Schepler  wrote:
>
> The conclusion in FLIP-196 was that we only provide source compatibility for 
> Public(Evolving) APIs, which means that _everything_ must be compiled against 
> the Flink version you are using it with.
>
> Doesn't that mean that such dependency conflicts are then user errors?
>
> On 10/02/2022 20:19, Thomas Weise wrote:
>
> Hi Alexander,
>
> It is beneficial for users to be able to replace/choose a connector as
> part of their application. When flink-connector-base is included in
> dist, that becomes more difficult. We can run into hard to understand
> dependency conflicts such as [1]. Depending on the Flink platform
> used, it may also be hard to update something that is part of dist. I
> would prefer to keep the module outside dist.
>
> Thanks,
> Thomas
>
> [1] https://lists.apache.org/thread/0ht5y2tyzpt16ft36zm428182dxfs3zx
>
> On Wed, Feb 9, 2022 at 3:26 AM Alexander Fedulov
>  wrote:
>
> Hi everyone,
>
> I would like to discuss the best approach to address the issue raised
> in FLINK-25927 [1]. It can be summarized as follows:
>
> flink-connector-base is currently inconsistently used in connectors
>
> (directly shaded in some and transitively pulled in via
> flink-connector-files which is itself shaded in the table uber jar)
>
> FLINK-24687 [2] moved flink-connector-files out of the flink-table  uber
>
> jar
>
> It is necessary to make usage of flink-connector-base consistent across
>
> all connectors
>
> One approach is to stop shading flink-connector-files in all connectors and
> instead package it in flink-dist, making it a part of Flink-wide provided
> public API. This approach is investigated in the following PoC PR: 18545
> [3].  The issue with this approach is that it breaks any existing CI and
> IDE setups that do not directly rely on flink-dist and also do not include
> flink-connector-files as an explicit dependency.
>
> In theory, a nice alternative would be to make it a part of a dependency
> that is ubiquitously provided, for instance, flink-streaming-java. Doing
> that for flink-streaming-java would, however,  introduce a dependency cycle
> and is currently not feasible.
>
> It would be great to hear your opinions on what could be the best way
> forward here.
>
> [1] https://issues.apache.org/jira/browse/FLINK-25927
> [2] https://issues.apache.org/jira/browse/FLINK-24687
> [3] https://github.com/apache/flink/pull/18545
>
>
> Thanks,
> Alexander Fedulov
>
>


Re: [DISCUSS] FLINK-25927: Make flink-connector-base dependency usage consistent across all connectors

2022-02-10 Thread Thomas Weise
Hi Alexander,

It is beneficial for users to be able to replace/choose a connector as
part of their application. When flink-connector-base is included in
dist, that becomes more difficult. We can run into hard to understand
dependency conflicts such as [1]. Depending on the Flink platform
used, it may also be hard to update something that is part of dist. I
would prefer to keep the module outside dist.

Thanks,
Thomas

[1] https://lists.apache.org/thread/0ht5y2tyzpt16ft36zm428182dxfs3zx

On Wed, Feb 9, 2022 at 3:26 AM Alexander Fedulov
 wrote:
>
> Hi everyone,
>
> I would like to discuss the best approach to address the issue raised
> in FLINK-25927 [1]. It can be summarized as follows:
>
> > flink-connector-base is currently inconsistently used in connectors
> (directly shaded in some and transitively pulled in via
> flink-connector-files which is itself shaded in the table uber jar)
>
> > FLINK-24687 [2] moved flink-connector-files out of the flink-table  uber
> jar
>
> > It is necessary to make usage of flink-connector-base consistent across
> all connectors
>
> One approach is to stop shading flink-connector-files in all connectors and
> instead package it in flink-dist, making it a part of Flink-wide provided
> public API. This approach is investigated in the following PoC PR: 18545
> [3].  The issue with this approach is that it breaks any existing CI and
> IDE setups that do not directly rely on flink-dist and also do not include
> flink-connector-files as an explicit dependency.
>
> In theory, a nice alternative would be to make it a part of a dependency
> that is ubiquitously provided, for instance, flink-streaming-java. Doing
> that for flink-streaming-java would, however,  introduce a dependency cycle
> and is currently not feasible.
>
> It would be great to hear your opinions on what could be the best way
> forward here.
>
> [1] https://issues.apache.org/jira/browse/FLINK-25927
> [2] https://issues.apache.org/jira/browse/FLINK-24687
> [3] https://github.com/apache/flink/pull/18545
>
>
> Thanks,
> Alexander Fedulov


[RESULT] [VOTE] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-09 Thread Thomas Weise
Hi everyone,

FLIP-212 [1] has been accepted. There were 7 binding +1 votes and 6
non-binding +1 votes. No other votes.

binding:
Gyula Fóra
Márton Balassi
Xintong Song
Yang Wang
Danny Cranmer
Yangze Guo
Thomas Weise

non-binding:
Chenya Zhang
Shqiprim Bunjaku
Israel Ekpo
Peter Huang
Biao Geng
K Fred

Thank you all for voting, we are excited to take this forward. The
next step will be the creation of the repository.

Cheers,
Thomas

[1]  
https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator


On Mon, Feb 7, 2022 at 1:10 AM Yangze Guo  wrote:
>
> +1 (binding)
>
> Best,
> Yangze Guo
>
> On Mon, Feb 7, 2022 at 5:04 PM K Fred  wrote:
> >
> > +1 (non-binding)
> >
> > Best Regards
> > Peng Yuan
> >
> > On Mon, Feb 7, 2022 at 4:49 PM Danny Cranmer 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks for this effort!
> > > Danny Cranmer
> > >
> > > On Mon, Feb 7, 2022 at 7:59 AM Biao Geng  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Biao Geng
> > > >
> > > > Peter Huang  于2022年2月7日周一 14:31写道:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > >
> > > > > Best Regards
> > > > > Peter Huang
> > > > >
> > > > > On Sun, Feb 6, 2022 at 7:35 PM Yang Wang 
> > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Xintong Song  于2022年2月7日周一 10:25写道:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 12:52 AM Márton Balassi <
> > > > > balassi.mar...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > On Sat, Feb 5, 2022 at 5:35 PM Israel Ekpo  > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I am very excited to see this.
> > > > > > > > >
> > > > > > > > > Thanks for driving the effort
> > > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sat, Feb 5, 2022 at 10:53 AM Shqiprim Bunjaku <
> > > > > > > > > shqiprimbunj...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sat, Feb 5, 2022 at 4:39 PM Chenya Zhang <
> > > > > > > > chenyazhangche...@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > >
> > > > > > > > > > > Thanks folks for leading this effort and making it happen
> > > so
> > > > > > fast!
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Chenya
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Feb 5, 2022 at 12:02 AM Gyula Fóra <
> > > > gyf...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Thomas!
> > > > > > > > > > > >
> > > > > > > > > > > > +1 (binding) from my side
> > > > > > > > > > > >
> > > > > > > > > > > > Happy to see this effort getting some traction!
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Gyula
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Feb 5, 2022 at 3:00 AM Thomas Weise <
> > > > t...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'd like to start a vote on FLIP-212: Introduce Flink
> > > > > > > Kubernetes
> > > > > > > > > > > > > Operator [1] which has been discussed in [2].
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vote will be open for at least 72 hours unless
> > > there
> > > > is
> > > > > > an
> > > > > > > > > > > > > objection or not enough votes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > > > > > > > [2]
> > > > > > > > >
> > > https://lists.apache.org/thread/1z78t6rf70h45v7fbd2m93rm2y1bvh0z
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > Thomas
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Israel Ekpo
> > > > > > > > > Lead Instructor, IzzyAcademy.com
> > > > > > > > > https://www.youtube.com/c/izzyacademy
> > > > > > > > > https://izzyacademy.com/
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >


Re: [VOTE] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-09 Thread Thomas Weise
+1 (binding)

On Mon, Feb 7, 2022 at 1:10 AM Yangze Guo  wrote:
>
> +1 (binding)
>
> Best,
> Yangze Guo
>
> On Mon, Feb 7, 2022 at 5:04 PM K Fred  wrote:
> >
> > +1 (non-binding)
> >
> > Best Regards
> > Peng Yuan
> >
> > On Mon, Feb 7, 2022 at 4:49 PM Danny Cranmer 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks for this effort!
> > > Danny Cranmer
> > >
> > > On Mon, Feb 7, 2022 at 7:59 AM Biao Geng  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Biao Geng
> > > >
> > > > Peter Huang  于2022年2月7日周一 14:31写道:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > >
> > > > > Best Regards
> > > > > Peter Huang
> > > > >
> > > > > On Sun, Feb 6, 2022 at 7:35 PM Yang Wang 
> > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Xintong Song  于2022年2月7日周一 10:25写道:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 12:52 AM Márton Balassi <
> > > > > balassi.mar...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > On Sat, Feb 5, 2022 at 5:35 PM Israel Ekpo  > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I am very excited to see this.
> > > > > > > > >
> > > > > > > > > Thanks for driving the effort
> > > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sat, Feb 5, 2022 at 10:53 AM Shqiprim Bunjaku <
> > > > > > > > > shqiprimbunj...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sat, Feb 5, 2022 at 4:39 PM Chenya Zhang <
> > > > > > > > chenyazhangche...@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > >
> > > > > > > > > > > Thanks folks for leading this effort and making it happen
> > > so
> > > > > > fast!
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Chenya
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Feb 5, 2022 at 12:02 AM Gyula Fóra <
> > > > gyf...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Thomas!
> > > > > > > > > > > >
> > > > > > > > > > > > +1 (binding) from my side
> > > > > > > > > > > >
> > > > > > > > > > > > Happy to see this effort getting some traction!
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Gyula
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Feb 5, 2022 at 3:00 AM Thomas Weise <
> > > > t...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'd like to start a vote on FLIP-212: Introduce Flink
> > > > > > > Kubernetes
> > > > > > > > > > > > > Operator [1] which has been discussed in [2].
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vote will be open for at least 72 hours unless
> > > there
> > > > is
> > > > > > an
> > > > > > > > > > > > > objection or not enough votes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > > > > > > > [2]
> > > > > > > > >
> > > https://lists.apache.org/thread/1z78t6rf70h45v7fbd2m93rm2y1bvh0z
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > Thomas
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Israel Ekpo
> > > > > > > > > Lead Instructor, IzzyAcademy.com
> > > > > > > > > https://www.youtube.com/c/izzyacademy
> > > > > > > > > https://izzyacademy.com/
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >


Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-07 Thread Thomas Weise
os that require external job
> > > management. Is
> > > > > there anything on the FLIP page that creates a different impression?
> > > > >
> > > >
> > > > Sounds good to me. I don't remember what created the impression of
> > > > 2
> > step
> > > > submission back then. I revisited the latest version of this FLIP
> > > > and
> > it
> > > > looks good to me.
> > > >
> > > > @Gyula,
> > > >
> > > > Versioning:
> > > > > Versioning will be independent from Flink and the operator will
> > depend
> > > on a
> > > > > fixed flink version (in every given operator version).
> > > > > This should be the exact same setup as with Stateful Functions (
> > > > > https://github.com/apache/flink-statefun). So independent
> > > > > release
> > > cycle
> > > > > but
> > > > > still within the Flink umbrella.
> > > > >
> > > >
> > > > Does this mean if someone wants to upgrade Flink to a version that
> > > > is released after the operator version that is being used, he/she
> > > > would
> > need
> > > > to upgrade the operator version first?
> > > > I'm not questioning this, just trying to make sure I'm
> > > > understanding
> > this
> > > > correctly.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Feb 7, 2022 at 3:14 AM Gyula Fóra 
> > wrote:
> > > >
> > > > > Thank you Alexis,
> > > > >
> > > > > Will definitely check this out. You are right, Kotlin makes it
> > > difficult to
> > > > > adopt pieces of this code directly but I think it will be good
> > > > > to get inspiration for the architecture and look at how
> > > > > particular problems
> > > have
> > > > > been solved. It will be a great help for us I am sure.
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > On Sat, Feb 5, 2022 at 12:28 PM Alexis Sarda-Espinosa <
> > > > > alexis.sarda-espin...@microfocus.com> wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > just wanted to mention that my employer agreed to open source
> > > > > > the
> > > PoC I
> > > > > > developed:
> > > > > > https://github.com/MicroFocus/opsb-flink-k8s-operator
> > > > > >
> > > > > > I understand the concern for maintainability, so Gradle &
> > > > > > Kotlin
> > > might
> > > > > not
> > > > > > be appealing to you, but at least it gives you another reference.
> > The
> > > > > Helm
> > > > > > resources in particular might be useful.
> > > > > >
> > > > > > There are bits and pieces there referring to Flink sessions,
> > > > > > but
> > > those
> > > > > are
> > > > > > just placeholders, the functioning parts use application mode
> > > > > > with
> > > native
> > > > > > integration.
> > > > > >
> > > > > > Regards,
> > > > > > Alexis.
> > > > > >
> > > > > > 
> > > > > > From: Thomas Weise 
> > > > > > Sent: Saturday, February 5, 2022 2:41 AM
> > > > > > To: dev 
> > > > > > Subject: Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes
> > Operator
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thanks for the continued feedback and discussion. Looks like
> > > > > > we are ready to start a VOTE, I will initiate it shortly.
> > > > > >
> > > > > > In parallel it would be good to find the repository name.
> > > > > >
> > > > > > My suggestion would be: flink-kubernetes-operator
> > > > > >
> > > > > > I thought "flink-operator" could be a bit misleading since the
> > > > > > term operator already has a meaning in Flink.
> &g

Re: [DISCUSS] Checkpointing (partially) failing jobs

2022-02-07 Thread Thomas Weise
Hi,

Thanks for opening this discussion! The proposed enhancement would be
interesting for use cases in our infrastructure as well.

There are scenarios where it makes sense to have multiple disconnected
subgraphs in a single job because it can significantly reduce the
operational burden as well as the runtime overhead. Since we allow
subgraphs to recover independently, then why not allow them to make
progress independently also, which would imply that checkpointing must
succeed for non affected subgraphs as certain behavior is tied to
checkpoint completion, like Kafka offset commit, file output etc.

As for source offset redistribution, offset/position needs to be tied
to splits (in FLIP-27) and legacy sources. (It applies to both Kafka
and Kinesis legacy sources and FLIP-27 Kafka source.). With the new
source framework, it would be hard to implement a source with correct
behavior that does not track the position along with the split.

In Yuan's example, is there a reason why CP8 could not be promoted to
CP10 by the coordinator for PR2 once it receives the notification that
CP10 did not complete? It appears that should be possible since in its
effect it should be no different than no data processed between CP8
and CP10?

Thanks,
Thomas

On Mon, Feb 7, 2022 at 2:36 AM Till Rohrmann  wrote:
>
> Thanks for the clarification Yuan and Gen,
>
> I agree that the checkpointing of the sources needs to support the
> rescaling case, otherwise it does not work. Is there currently a source
> implementation where this wouldn't work? For Kafka it should work because
> we store the offset per assigned partition. For Kinesis it is probably the
> same. For the Filesource we store the set of unread input splits in the
> source coordinator and the state of the assigned splits in the sources.
> This should probably also work since new splits are only handed out to
> running tasks.
>
> Cheers,
> Till
>
> On Mon, Feb 7, 2022 at 10:29 AM Yuan Mei  wrote:
>
> > Hey Till,
> >
> > > Why rescaling is a problem for pipelined regions/independent execution
> > subgraphs:
> >
> > Take a simplified example :
> > job graph : source  (2 instances) -> sink (2 instances)
> > execution graph:
> > source (1/2)  -> sink (1/2)   [pieplined region 1]
> > source (2/2)  -> sink (2/2)   [pieplined region 2]
> >
> > Let's assume checkpoints are still triggered globally, meaning different
> > pipelined regions share the global checkpoint id (PR1 CP1 matches with PR2
> > CP1).
> >
> > Now let's assume PR1 completes CP10 and PR2 completes CP8.
> >
> > Let's say we want to rescale to parallelism 3 due to increased input.
> >
> > - Notice that we can not simply rescale based on the latest completed
> > checkpoint (CP8), because PR1 has already had data (CP8 -> CP10) output
> > externally.
> > - Can we take CP10 from PR1 and CP8 from PR2? I think it depends on how the
> > source's offset redistribution is implemented.
> >The answer is yes if we treat each input partition as independent from
> > each other, *but I am not sure whether we can make that assumption*.
> >
> > If not, the rescaling cannot happen until PR1 and PR2 are aligned with CPs.
> >
> > Best
> > -Yuan
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 7, 2022 at 4:17 PM Till Rohrmann  wrote:
> >
> > > Hi everyone,
> > >
> > > Yuan and Gen could you elaborate why rescaling is a problem if we say
> > that
> > > separate pipelined regions can take checkpoints independently?
> > > Conceptually, I somehow think that a pipelined region that is failed and
> > > cannot create a new checkpoint is more or less the same as a pipelined
> > > region that didn't get new input or a very very slow pipelined region
> > which
> > > couldn't read new records since the last checkpoint (assuming that the
> > > checkpoint coordinator can create a global checkpoint by combining
> > > individual checkpoints (e.g. taking the last completed checkpoint from
> > each
> > > pipelined region)). If this comparison is correct, then this would mean
> > > that we have rescaling problems under the latter two cases.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Feb 7, 2022 at 8:55 AM Gen Luo  wrote:
> > >
> > > > Hi Gyula,
> > > >
> > > > Thanks for sharing the idea. As Yuan mentioned, I think we can discuss
> > > this
> > > > within two scopes. One is the job subgraph, the other is the execution
> > > > subgraph, which I suppose is the same as PipelineRegion.
> > > >
> > > > An idea is to individually checkpoint the PipelineRegions, for the
> > > > recovering in a single run.
> > > >
> > > > Flink has now supported PipelineRegion based failover, with a subset
> > of a
> > > > global checkpoint snapshot. The checkpoint barriers are spread within a
> > > > PipelineRegion, so the checkpointing of individual PipelineRegions is
> > > > actually independent. Since in a single run of a job, the
> > PipelineRegions
> > > > are fixed, we can individually checkpoint separated PipelineRegions,
> > > > despite what status the other 

[VOTE] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-04 Thread Thomas Weise
Hi everyone,

I'd like to start a vote on FLIP-212: Introduce Flink Kubernetes
Operator [1] which has been discussed in [2].

The vote will be open for at least 72 hours unless there is an
objection or not enough votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
[2] https://lists.apache.org/thread/1z78t6rf70h45v7fbd2m93rm2y1bvh0z

Thanks!
Thomas


Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-04 Thread Thomas Weise
Hi,

Thanks for the continued feedback and discussion. Looks like we are
ready to start a VOTE, I will initiate it shortly.

In parallel it would be good to find the repository name.

My suggestion would be: flink-kubernetes-operator

I thought "flink-operator" could be a bit misleading since the term
operator already has a meaning in Flink.

I also considered "flink-k8s-operator" but that would be almost
identical to existing operator implementations and could lead to
confusion in the future.

Thoughts?

Thanks,
Thomas



On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra  wrote:
>
> Hi Danny,
>
> So far we have been focusing our dev efforts on the initial native
> implementation with the team.
> If the discussion and vote goes well for this FLIP we are looking forward
> to contributing the initial version sometime next week (fingers crossed).
>
> At that point I think we can already start the dev work to support the
> standalone mode as well, especially if you can dedicate some effort to
> pushing that side.
> Working together on this sounds like a great idea and we should start as
> soon as possible! :)
>
> Cheers,
> Gyula
>
> On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer 
> wrote:
>
> > I have been discussing this one with my team. We are interested in the
> > Standalone mode, and are willing to contribute towards the implementation.
> > Potentially we can work together to support both modes in parallel?
> >
> > Thanks,
> >
> > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra  wrote:
> >
> > > Hi Danny!
> > >
> > > Thanks for the feedback :)
> > >
> > > Versioning:
> > > Versioning will be independent from Flink and the operator will depend
> > on a
> > > fixed flink version (in every given operator version).
> > > This should be the exact same setup as with Stateful Functions (
> > > https://github.com/apache/flink-statefun). So independent release cycle
> > > but
> > > still within the Flink umbrella.
> > >
> > > Deployment error handling:
> > > I think that's a very good point, as general exception handling for the
> > > different failure scenarios is a tricky problem. I think the exception
> > > classifiers and retry strategies could avoid a lot of manual intervention
> > > from the user. We will definitely need to add something like this. Once
> > we
> > > have the repo created with the initial operator code we should open some
> > > tickets for this and put it on the short term roadmap!
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer 
> > > wrote:
> > >
> > > > Hey team,
> > > >
> > > > Great work on the FLIP, I am looking forward to this one. I agree that
> > we
> > > > can move forward to the voting stage.
> > > >
> > > > I have general feedback around how we will handle job submission
> > failure
> > > > and retry. As discussed in the Rejected Alternatives section, we can
> > use
> > > > Java to handle job submission failures from the Flink client. It would
> > be
> > > > useful to have the ability to configure exception classifiers and retry
> > > > strategy as part of operator configuration.
> > > >
> > > > Given this will be in a separate Github repository I am curious how
> > ther
> > > > versioning strategy will work in relation to the Flink version? Do we
> > > have
> > > > any other components with a similar setup I can look at? Will the
> > > operator
> > > > version track Flink or will it use its own versioning strategy with a
> > > Flink
> > > > version support matrix, or similar?
> > > >
> > > > Thanks,
> > > >
> > > >
> > > >
> > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi <
> > balassi.mar...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi team,
> > > > >
> > > > > Thank you for the great feedback, Thomas has updated the FLIP page
> > > > > accordingly. If you are comfortable with the currently existing
> > design
> > > > and
> > > > > depth in the FLIP [1] I suggest moving forward to the voting stage -
> > > once
> > > > > that reaches a positive conclusion it lets us create the separate
> > code
> > > > > repository under the flink project for the operator.
> > > > >
> > > 

[jira] [Created] (FLINK-25963) FLIP-212: Introduce Flink Kubernetes Operator

2022-02-04 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-25963:


 Summary: FLIP-212: Introduce Flink Kubernetes Operator
 Key: FLINK-25963
 URL: https://issues.apache.org/jira/browse/FLINK-25963
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Reporter: Thomas Weise
Assignee: Thomas Weise






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-01-31 Thread Thomas Weise
HI Xintong,

Thanks for the feedback and please see responses below -->

On Fri, Jan 28, 2022 at 12:21 AM Xintong Song  wrote:

> Thanks Thomas for drafting this FLIP, and everyone for the discussion.
>
> I also have a few questions and comments.
>
> ## Job Submission
> Deploying a Flink session cluster via kubectl & CR and then submitting jobs
> to the cluster via Flink cli / REST is probably the approach that requires
> the least effort. However, I'd like to point out 2 weaknesses.
> 1. A lot of users use Flink in perjob/application modes. For these users,
> having to run the job in two steps (deploy the cluster, and submit the job)
> is not that convenient.
> 2. One of our motivations is being able to manage Flink applications'
> lifecycles with kubectl. Submitting jobs from cli sounds not aligned with
> this motivation.
> I think it's probably worth it to support submitting jobs via kubectl & CR
> in the first version, both together with deploying the cluster like in
> perjob/application mode and after deploying the cluster like in session
> mode.
>

The intention is to support application management through operator and CR,
which means there won't be any 2 step submission process, which as you
allude to would defeat the purpose of this project. The CR example shows
the application part. Please note that the bare cluster support is an
*additional* feature for scenarios that require external job management. Is
there anything on the FLIP page that creates a different impression?


>
> ## Versioning
> Which Flink versions does the operator plan to support?
> 1. Native K8s deployment was firstly introduced in Flink 1.10
> 2. Native K8s HA was introduced in Flink 1.12
> 3. The Pod template support was introduced in Flink 1.13
> 4. There was some changes to the Flink docker image entrypoint script in,
> IIRC, Flink 1.13
>

Great, thanks for providing this. It is important for the compatibility
going forward also. We are targeting Flink 1.14.x upwards. Before the
operator is ready there will be another Flink release. Let's see if anyone
is interested in earlier versions?


>
> ## Compatibility
> What kind of API compatibility we can commit to? It's probably fine to have
> alpha / beta version APIs that allow incompatible future changes for the
> first version. But eventually we would need to guarantee backwards
> compatibility, so that an early version CR can work with a new version
> operator.
>

Another great point and please let me include that on the FLIP page. ;-)

I think we should allow incompatible changes for the first one or two
versions, similar to how other major features have evolved recently, such
as FLIP-27.

Would be great to get broader feedback on this one.

Cheers,
Thomas



>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise  wrote:
>
> > Thanks for the feedback!
> >
> > >
> > > # 1 Flink Native vs Standalone integration
> > > Maybe we should make this more clear in the FLIP but we agreed to do
> the
> > > first version of the operator based on the native integration.
> > > While this clearly does not cover all use-cases and requirements, it
> > seems
> > > this would lead to a much smaller initial effort and a nicer first
> > version.
> > >
> >
> > I'm also leaning towards the native integration, as long as it reduces
> the
> > MVP effort. Ultimately the operator will need to also support the
> > standalone mode. I would like to gain more confidence that native
> > integration reduces the effort. While it cuts the effort to handle the TM
> > pod creation, some mapping code from the CR to the native integration
> > client and config needs to be created. As mentioned in the FLIP, native
> > integration requires the Flink job manager to have access to the k8s API
> to
> > create pods, which in some scenarios may be seen as unfavorable.
> >
> >  > > > # Pod Template
> > > > > Is the pod template in CR same with what Flink has already
> > > supported[4]?
> > > > > Then I am afraid not the arbitrary field(e.g. cpu/memory resources)
> > > could
> > > > > take effect.
> >
> > Yes, pod template would look almost identical. There are a few settings
> > that the operator will control (and that may need to be blacklisted), but
> > in general we would not want to place restrictions. I think a mechanism
> > where a pod template is merged from multiple layers would also be
> > interesting to make this more flexible.
> >
> > Cheers,
> > Thomas
> >
>


Re: [VOTE] Deprecate Per-Job Mode in Flink 1.15

2022-01-28 Thread Thomas Weise
+1 (binding)

On Fri, Jan 28, 2022 at 9:27 AM David Morávek  wrote:

> +1 (non-binding)
>
> D.
>
> On Fri 28. 1. 2022 at 17:53, Till Rohrmann  wrote:
>
> > +1 (binding)
> >
> > Cheers,
> > Till
> >
> > On Fri, Jan 28, 2022 at 4:57 PM Gabor Somogyi  >
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > We're intended to make tests when FLINK-24897
> > >  is fixed.
> > > In case of further issues we're going to create further jiras.
> > >
> > > BR,
> > > G
> > >
> > >
> > > On Fri, Jan 28, 2022 at 4:30 PM Konstantin Knauf 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Based on the discussion in [1], I would like to start a vote on
> > > deprecating
> > > > per-job mode in Flink 1.15. Consequently, we would target to drop it
> in
> > > > Flink 1.16 or Flink 1.17 latest.
> > > >
> > > > The only limitation that would block dropping Per-Job mode mentioned
> in
> > > [1]
> > > > is tracked in https://issues.apache.org/jira/browse/FLINK-24897. In
> > > > general, the implementation of application mode in YARN should be on
> > par
> > > > with the standalone and Kubernetes before we drop per-job mode.
> > > >
> > > > The vote will last for at least 72 hours, and will be accepted by a
> > > > consensus of active committers.
> > > >
> > > > Thanks,
> > > >
> > > > Konstantin
> > > >
> > > > [1] https://lists.apache.org/thread/b8g76cqgtr2c515rd1bs41vy285f317n
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf
> > > >
> > > > https://twitter.com/snntrable
> > > >
> > > > https://github.com/knaufk
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-01-27 Thread Thomas Weise
Thanks for the feedback!

>
> # 1 Flink Native vs Standalone integration
> Maybe we should make this more clear in the FLIP but we agreed to do the
> first version of the operator based on the native integration.
> While this clearly does not cover all use-cases and requirements, it seems
> this would lead to a much smaller initial effort and a nicer first version.
>

I'm also leaning towards the native integration, as long as it reduces the
MVP effort. Ultimately the operator will need to also support the
standalone mode. I would like to gain more confidence that native
integration reduces the effort. While it cuts the effort to handle the TM
pod creation, some mapping code from the CR to the native integration
client and config needs to be created. As mentioned in the FLIP, native
integration requires the Flink job manager to have access to the k8s API to
create pods, which in some scenarios may be seen as unfavorable.

 > > > # Pod Template
> > > Is the pod template in CR same with what Flink has already
> supported[4]?
> > > Then I am afraid not the arbitrary field(e.g. cpu/memory resources)
> could
> > > take effect.

Yes, pod template would look almost identical. There are a few settings
that the operator will control (and that may need to be blacklisted), but
in general we would not want to place restrictions. I think a mechanism
where a pod template is merged from multiple layers would also be
interesting to make this more flexible.

Cheers,
Thomas


[DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-01-24 Thread Thomas Weise
Hi,

As promised in [1] we would like to start the discussion on the
addition of a Kubernetes operator to the Flink project as FLIP-212:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator

Please note that the FLIP is currently focussed on the overall
direction; the intention is to fill in more details once we converge
on the high level plan.

Thanks and looking forward to a lively discussion!

Thomas

[1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4


[ANNOUNCE] Apache Flink 1.14.3 released

2022-01-19 Thread Thomas Weise
The Apache Flink community is very happy to announce the release of
Apache Flink 1.14.3, which is the second bugfix release for the Apache
Flink 1.14 series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data
streaming applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the
improvements for this bugfix release:
https://flink.apache.org/news/2022/01/17/release-1.14.3.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351075=12315522


We would like to thank all contributors of the Apache Flink community
who made this release possible!

Regards,
Martijn & Thomas


Re: [VOTE] Moving connectors from Flink to external connector repositories

2022-01-19 Thread Thomas Weise
+1 (binding)

I think it would be a good idea to test the new infrastructure with a
sample connector for the next release before moving existing connectors.


On Wed, Jan 19, 2022 at 7:56 AM Till Rohrmann  wrote:

> +1 (binding)
>
> Cheers,
> Till
>
> On Wed, Jan 19, 2022 at 3:58 PM Timo Walther  wrote:
>
> > +1 (binding)
> >
> > I think this will speed up connector evolution and core development
> > experience.
> >
> > Regards,
> > Timo
> >
> > On 19.01.22 14:47, Martijn Visser wrote:
> > > Hi everyone,
> > >
> > > I'd like to start a vote on moving connectors from Flink to external
> > > connector repositories [1].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > not enough votes.
> > >
> > > If the vote passes, I'll document the outcome on a new page under
> > > https://cwiki.apache.org/confluence/display/FLINK/Contributors
> > >
> > > Best regards,
> > >
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > >
> > > [1] https://lists.apache.org/thread/bk9f91o6wk66zdh353j1n7sfshh262tr
> > >
> >
> >
>


Re: [VOTE] Creating external connector repositories

2022-01-19 Thread Thomas Weise
+1 (binding)


On Wed, Jan 19, 2022 at 7:57 AM Till Rohrmann  wrote:

> +1
>
> Cheers,
> Till
>
> On Wed, Jan 19, 2022 at 4:01 PM Timo Walther  wrote:
>
> > +1 (binding)
> >
> > Regards,
> > Timo
> >
> > On 19.01.22 14:47, Martijn Visser wrote:
> > > Hi everyone,
> > >
> > > I'd like to start a vote on creating external repositories [1]. Since
> the
> > > thread is fairly long, I've also previously summarised it [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > not enough votes.
> > >
> > > If the vote passes, I'll document the outcome on a new page under
> > > https://cwiki.apache.org/confluence/display/FLINK/Contributors
> > >
> > > Best regards,
> > >
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > >
> > > [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> > > [2] https://lists.apache.org/thread/xt01g9vko7v54ddq7h6pt4xpdd19rc4s
> > >
> >
> >
>


Re: [VOTE] Release 1.14.3, release candidate #1

2022-01-18 Thread Thomas Weise
Progress: We are waiting for the dockerhub update:
https://github.com/docker-library/official-images/pull/11693

Cheers,
Thomas

On Mon, Jan 17, 2022 at 1:39 PM Thomas Weise  wrote:
>
> Release finalization steps have been completed, except for the Docker images.
>
> Can someone please approve/merge:
>
> https://github.com/apache/flink-docker/pull/97
> https://github.com/apache/flink-docker/pull/98
>
> I don't have access to publishing to DockerHub: apache/flink yet. I'm
> going to request it, but it would be great if someone with access can
> complete that part so as not to delay the release promotion.
>
> Cheers,
> Thomas
>
> On Mon, Jan 17, 2022 at 5:42 AM Martijn Visser  wrote:
> >
> > Hi everyone,
> >
> > The vote duration has passed and we have approved the release.
> >
> > Binding votes:
> > * Till
> > * Timo
> > * Chesnay
> >
> > I would like to ask Thomas to finalise the release and make the 
> > announcement.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Mon, 17 Jan 2022 at 11:29, Chesnay Schepler  wrote:
> >>
> >> +1 (binding)
> >>
> >> Went over the pom changes and double-checked licensing.
> >>
> >> On 17/01/2022 10:44, Timo Walther wrote:
> >> > +1 (binding)
> >> >
> >> > I had an offline chat with Fabian Paul about the mentioned changes.
> >> >
> >> > The serializer changes should be acceptable as the migration would
> >> > have failed/stuck otherwise. At no point in time we end up with a
> >> > corrupt serializer snapshot. We will update older 1.14 release notes
> >> > to point out the upgrade shortcoming.
> >> >
> >> > The transformation changes should also be fine. The same
> >> > transformation is returned. The type has changed but it is unlikely
> >> > that user query the output type of a sink?
> >> >
> >> > Regards,
> >> > Timo
> >> >
> >> >
> >> >
> >> > On 17.01.22 09:45, Timo Walther wrote:
> >> >> I went through the commit diff and found the following changes that
> >> >> might cause issues in my opinion. Maybe it makes sense to take a
> >> >> second look before I will cast my vote:
> >> >>
> >> >> // change of transformations
> >> >> https://github.com/apache/flink/commit/b7cd97b02d7f39b5190d27fbaf6e6287f127b9a9
> >> >>
> >> >>
> >> >>
> >> >> // change of a serializer
> >> >> https://github.com/apache/flink/commit/564dee0752619ecb739b4bee1cacba856ea53bac
> >> >>
> >> >>
> >> >> In any case, the following commits should be explicitly mentioned in
> >> >> the release notes as they could cause issues during an upgrade from
> >> >> 1.14.2 (mostly SQL related because we don't have an upgrade story
> >> >> there yet):
> >> >>
> >> >> https://github.com/apache/flink/commit/d3df986a75e34e1ed475b2e1236b7770698e7bd1
> >> >>
> >> >>
> >> >> https://github.com/apache/flink/commit/3719c0402ec979c619371fcde9f2e7d2c46d69ed
> >> >>
> >> >>
> >> >> https://github.com/apache/flink/commit/bafb0b4c2377d6d502ed9dba8853631ebf16cfb7
> >> >>
> >> >>
> >> >> https://github.com/apache/flink/commit/309dc0479172b979ee3c951893a04304b5416a08
> >> >>
> >> >>
> >> >> https://github.com/apache/flink/commit/7c5ddbd201005e55ab68b4db7ee74c7cbeb13400
> >> >>
> >> >>
> >> >> Regards,
> >> >> Timo
> >> >>
> >> >>
> >> >> On 13.01.22 12:22, Xingbo Huang wrote:
> >> >>> +1 (non-binding)
> >> >>>
> >> >>> - Verified checksums and signatures
> >> >>> - Verified Python wheel package contents
> >> >>> - Pip install apache-flink-libraries source package and apache-flink
> >> >>> wheel
> >> >>> package in Mac
> >> >>> - Run the examples from Python Table API Tutorial[1] in Python REPL
> >> >>>
> >> >>> [1]
> >> >>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table_api_tutorial/
> >> >>>
> >> >>>
> >> >>> Best,
> >> >>&

Re: [VOTE] Release 1.14.3, release candidate #1

2022-01-17 Thread Thomas Weise
 >>>>> - Verified checksums and signatures
>> >>>>> - Build from source and ran StateMachineExample
>> >>>>> - Reviewed the flink-web PR
>> >>>>> - Verified that there were no other dependency changes than
>> >>>> testcontainer,
>> >>>>> japicmp-plugin and log4j.
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Till
>> >>>>>
>> >>>>> On Wed, Jan 12, 2022 at 9:58 AM Yun Tang  wrote:
>> >>>>>
>> >>>>>> +1 (non-binding)
>> >>>>>>
>> >>>>>>
>> >>>>>>*   Checked the signature of source code, some of binaries and
>> >>>>>> some
>> >>>> of
>> >>>>>> python packages.
>> >>>>>>*   Launch a local cluster successfully on linux machine with
>> >>>>>> correct
>> >>>>>> flink-version and commit id and run the state machine example
>> >>>> successfully
>> >>>>>> as expected.
>> >>>>>>*   Reviewed the flink-web PR.
>> >>>>>>
>> >>>>>> Best
>> >>>>>> Yun Tang
>> >>>>>> 
>> >>>>>> From: Martijn Visser 
>> >>>>>> Sent: Wednesday, January 12, 2022 0:34
>> >>>>>> To: dev 
>> >>>>>> Subject: [VOTE] Release 1.14.3, release candidate #1
>> >>>>>>
>> >>>>>> Hi everyone,
>> >>>>>> Please review and vote on the release candidate #1 for the version
>> >>>>>> 1.14.3, as follows:
>> >>>>>> [ ] +1, Approve the release
>> >>>>>> [ ] -1, Do not approve the release (please provide specific
>> >>>>>> comments)
>> >>>>>>
>> >>>>>>
>> >>>>>> The complete staging area is available for your review, which
>> >>>>>> includes:
>> >>>>>> * JIRA release notes [1],
>> >>>>>> * the official Apache source release and binary convenience
>> >>>>>> releases to
>> >>>>>> be deployed to dist.apache.org [2], which are signed with the key
>> >>>>>> with
>> >>>>>> fingerprint 12DEE3E4D920A98C [3],
>> >>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>> >>>>>> * source code tag "release-1.14.3-rc1" [5],
>> >>>>>> * website pull request listing the new release and adding
>> >>>>>> announcement
>> >>>>>> blog post [6].
>> >>>>>>
>> >>>>>> The vote will be open for at least 72 hours. It is adopted by
>> >>>>>> majority
>> >>>>>> approval, with at least 3 PMC affirmative votes.
>> >>>>>>
>> >>>>>> Thanks on behalf of Thomas Weise and myself,
>> >>>>>>
>> >>>>>> Martijn Visser
>> >>>>>> http://twitter.com/MartijnVisser82
>> >>>>>>
>> >>>>>> [1]
>> >>>>>>
>> >>>>>>
>> >>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351075=12315522
>> >>>>
>> >>>>>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
>> >>>>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> >>>>>> [4]
>> >>>>>>
>> >>>> https://repository.apache.org/content/repositories/orgapacheflink-1481/
>> >>>>
>> >>>>>> [5] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
>> >>>>>> [6] https://github.com/apache/flink-web/pull/497
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> >
>>


Re: [DISCUSS] Future of Per-Job Mode

2022-01-13 Thread Thomas Weise
Regarding session mode:

## Session Mode
* main() method executed in client

Session mode also supports execution of the main method on Jobmanager
with submission through REST API. That's how Flinkk k8s operators like
[1] work. It's actually an important capability because it allows for
allocation of the cluster resources prior to taking down the previous
job during upgrade when the goal is optimization for availability.

Thanks,
Thomas

[1] https://github.com/lyft/flinkk8soperator

On Thu, Jan 13, 2022 at 12:32 AM Konstantin Knauf  wrote:
>
> Hi everyone,
>
> I would like to discuss and understand if the benefits of having Per-Job
> Mode in Apache Flink outweigh its drawbacks.
>
>
> *# Background: Flink's Deployment Modes*
> Flink currently has three deployment modes. They differ in the following
> dimensions:
> * main() method executed on Jobmanager or Client
> * dependencies shipped by client or bundled with all nodes
> * number of jobs per cluster & relationship between job and cluster
> lifecycle* (supported resource providers)
>
> ## Application Mode
> * main() method executed on Jobmanager
> * dependencies already need to be available on all nodes
> * dedicated cluster for all jobs executed from the same main()-method
> (Note: applications with more than one job, currently still significant
> limitations like missing high-availability). Technically, a session cluster
> dedicated to all jobs submitted from the same main() method.
> * supported by standalone, native kubernetes, YARN
>
> ## Session Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * cluster is shared by multiple jobs submitted from different clients,
> independent lifecycle
> * supported by standalone, Native Kubernetes, YARN
>
> ## Per-Job Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * dedicated cluster for a single job
> * supported by YARN only
>
>
> *# Reasons to Keep** There are use cases where you might need the
> combination of a single job per cluster, but main() method execution in the
> client. This combination is only supported by per-job mode.
> * It currently exists. Existing users will need to migrate to either
> session or application mode.
>
>
> *# Reasons to Drop** With Per-Job Mode and Application Mode we have two
> modes that for most users probably do the same thing. Specifically, for
> those users that don't care where the main() method is executed and want to
> submit a single job per cluster. Having two ways to do the same thing is
> confusing.
> * Per-Job Mode is only supported by YARN anyway. If we keep it, we should
> work towards support in Kubernetes and Standalone, too, to reduce special
> casing.
> * Dropping per-job mode would reduce complexity in the code and allow us to
> dedicate more resources to the other two deployment modes.
> * I believe with session mode and application mode we have to easily
> distinguishable and understandable deployment modes that cover Flink's use
> cases:
>* session mode: olap-style, interactive jobs/queries, short lived batch
> jobs, very small jobs, traditional cluster-centric deployment mode (fits
> the "Hadoop world")
>* application mode: long-running streaming jobs, large scale &
> heterogenous jobs (resource isolation!), application-centric deployment
> mode (fits the "Kubernetes world")
>
>
> *# Call to Action*
> * Do you use per-job mode? If so, why & would you be able to migrate to one
> of the other methods?
> * Am I missing any pros/cons?
> * Are you in favor of dropping per-job mode midterm?
>
> Cheers and thank you,
>
> Konstantin
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk


Re: Flink native k8s integration vs. operator

2022-01-12 Thread Thomas Weise
;> server directly. It acts like embedding an operator inside Flink's master,
> >>> which manages the resources (pod, deployment, configmap, etc.) and watches
> >>> / reacts to related events.
> >>> - Active resource management means Flink can actively start / terminate
> >>> workers as needed. Its key characteristic is that the resource a Flink
> >>> deployment uses is decided by the job's execution plan, unlike the 
> >>> opposite
> >>> reactive mode (resource available to the deployment decides the execution
> >>> plan) or the standalone mode (both execution plan and deployment resources
> >>> are predefined).
> >>>
> >>> Currently, we have the yarn and native k8s deployments (and the recently
> >>> removed mesos deployment) in active mode, due to their ability to request 
> >>> /
> >>> release worker resources from the underlying cluster. And all the existing
> >>> operators, AFAIK, work with a Flink standalone deployment, where Flink
> >>> cannot request / release resources by itself.
> >>>
> >>> From this perspective, I think a large part of the native k8s
> >>> integration advantages come from the active mode: being able to better
> >>> understand the job's resource requirements and adjust the deployment
> >>> resource accordingly. Both fine-grained resource management (customizing 
> >>> TM
> >>> resources for different tasks / operators) and adaptive batch scheduler
> >>> (rescale the deployment w.r.t. different stages) fall into this category.
> >>>
> >>> I'm wondering if we can have an operator that also works with the active
> >>> mode. Instead of talking to the api server directly for adding / deleting
> >>> resources, Flink's active resource manager can talk to the operator (via
> >>> CR) about the resources the deployment needs, and let the operator to
> >>> actually add / remove the resources. The operator should be able to work
> >>> with (active) or without (standalone) the information of deployment's
> >>> resource requirements. In this way, users are free to choose between 
> >>> active
> >>> and reactive (e.g., HPA) rescaling, while always benefiting from the
> >>> beyond-deployment lifecycle (upgrades, savepoint management, etc.) and
> >>> alignment with the K8s ecosystem (Flink client free, operating via 
> >>> kubectl,
> >>> etc.).
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Thu, Jan 6, 2022 at 1:06 PM Thomas Weise  wrote:
> >>>
> >>>> Hi David,
> >>>>
> >>>> Thank you for the reply and context!
> >>>>
> >>>> As for workload types and where native integration might fit: I think
> >>>> that any k8s native solution that satisfies category 3) can also take
> >>>> care of 1) and 2) while the native integration by itself can't achieve
> >>>> that. Existence of [1] might serve as further indication.
> >>>>
> >>>> The k8s operator pattern would be an essential building block for a
> >>>> k8s native solution that is interoperable with k8s ecosystem tooling
> >>>> like kubectl, which is why [2] and subsequent derived art were
> >>>> created. Specifically the CRD allows us to directly express the
> >>>> concept of a Flink application consisting of job manager and task
> >>>> manager pods along with associated create/update/delete operations.
> >>>>
> >>>> Would it make sense to gauge interest to have such an operator as part
> >>>> of Flink? It appears so from discussions like [3]. I think such
> >>>> addition would significantly lower the barrier to adoption, since like
> >>>> you mentioned one cannot really run mission critical streaming
> >>>> workloads with just the Apache Flink release binaries alone. While it
> >>>> is great to have multiple k8s operators to choose from that are
> >>>> managed outside Flink, it is unfortunately also evident that today's
> >>>> hot operator turns into tomorrow's tech debt. I think such fate would
> >>>> be less likely within the project, when multiple parties can join
> >>>> forces and benefit from each other's contributions. There were similar
> >>>> considerations 

Re: [VOTE] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Thomas Weise
+1 (binding)

On Wed, Jan 12, 2022 at 4:58 AM Till Rohrmann  wrote:
>
> +1 (binding)
>
> Cheers,
> Till
>
> On Wed, Jan 12, 2022 at 11:07 AM Wei Zhong  wrote:
>
> > +1(binding)
> >
> > Best,
> > Wei
> >
> > > 2022年1月12日 下午5:58,Xingbo Huang  写道:
> > >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-206[1], which was discussed and
> > > reached a consensus in the discussion thread[2].
> > >
> > > The vote will be open for at least 72h, unless there is an objection or
> > not
> > > enough votes.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
> > > [2] https://lists.apache.org/thread/c7d2mt1vh8v11x2sl8slm4sy9j3t2pdg
> > >
> > > Best,
> > > Xingbo
> >
> >


Use of JIRA fixVersion

2022-01-11 Thread Thomas Weise
Hi,

As part of preparing the 1.14.3 release, I observed that there were
around 200 JIRA issues with fixVersion 1.14.3 that were unresolved
(after blocking issues had been dealt with). Further cleanup resulted
in removing fixVersion 1.14.3  from most of these and we are left with
[1] - these are the tickets that rolled over to 1.14.4.

The disassociated issues broadly fell into following categories:

* test infrastructure / stability related (these can be found by label)
* stale tickets (can also be found by label)
* tickets w/o label that pertain to addition of features that don't
fit into or don't have to go into patch release

I wanted to bring this up so that we can potentially come up with
better guidance for use of the fixVersion field, since it is important
for managing releases [2]. Manual cleanup as done in this case isn't
desirable. A few thoughts:

* In most cases, it is not necessary to set fixVersion upfront.
Instead, we can set it when the issue is actually resolved, and set it
for all versions/branches for which a backport occured after the
changes are merged
* How to know where to backport? "Affect versions" seems to be the
right field to use for that purpose. While resolving an issue for
master it can guide backporting.
* What if an issue should block a release? The priority of the issue
should be blocker. Blockers are low cardinality and need to be fixed
before release. So that would be the case where fixVersion is set
upfront.

Thanks,
Thomas

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20Flink%20and%20fixVersion%20%3D%201.14.4%20and%20resolution%20%3D%20Unresolved%20
[2] https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release


Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-11 Thread Thomas Weise
ly
> >> > modifications of PyFlink custom Operators, such as
> >> > PythonScalarFunctionOperator[6], which won't affect deployment and
> >> memory
> >> > management.
> >> >
> >> > >>> One more question that came to my mind: How much performance
> >> > improvement dowe gain on a real-world Python use case? Were the
> >> > measurements more like micro benchmarks where the Python UDF was called
> >> w/o
> >> > the overhead of Flink? I would just be curious how much the Python
> >> > component contributes to the overall runtime of a real world job. Do we
> >> > have some data on this?
> >> >
> >> > The last figure I put in FLIP is the performance comparison of three
> >> real
> >> > Flink Stream Sql Jobs. They are a Java UDF job, a Python UDF job in
> >> Process
> >> > Mode, and a Python UDF job in Thread Mode. The calculated value of QPS
> >> is
> >> > the end-to-end Flink job execution result. As shown in the performance
> >> > comparison chart, the performance of Python udf with the same function
> >> can
> >> > often only reach 20% of Java udf, so the performance of python udf will
> >> > often become the performance bottleneck in a PyFlink job.
> >> >
> >> > For Thomas:
> >> >
> >> > The first time that I realized the framework overhead of various IPC
> >> > (socket, grpc, shared memory) cannot be ignored in some scenarios is
> >> due to
> >> > an image algorithm prediction job of PyFlink. Its input parameters are a
> >> > series of huge image binary arrays, and its data size is bigger than 1G.
> >> > The performance overhead of serialization/deserialization has become an
> >> > important part of its poor performance. Although this job is a bit
> >> extreme,
> >> > through measurement, we did find the impact of the
> >> > serialization/deserialization overhead caused by larger size parameters
> >> on
> >> > the performance of the IPC framework.
> >> >
> >> > >>> As I understand it, you measured the difference in throughput for
> >> UPPER
> >> > between process and embedded mode and the difference is 50% increased
> >> > throughput?
> >> >
> >> > This 50% is the result when the data size is less than 100byte. When the
> >> > data size reaches 1k, the performance of the Embedded Mode will reach
> >> about
> >> > 3.5 times the performance of the Process Mode shown in the FLIP. When
> >> the
> >> > data reaches 1M, the performance of Embedded Mode can reach 5 times the
> >> > performance of the Process Mode. The biggest difference here is that in
> >> > Embedded Mode, input/result data does not need to be
> >> > serialized/deserialized.
> >> >
> >> > >>> Is that a typical UDF in your usage?
> >> >
> >> > The reason for choosing UPPER is that a simpler udf implementation can
> >> make
> >> > it easier to evaluate the performance of different execution modes.
> >> >
> >> > >>> What do you observe when the function becomes more complex?
> >> >
> >> > We can analyze the QPS of the framework (process mode or embedded mode)
> >> and
> >> > the QPS of the UDF calculation logic separately. A more complex UDF
> >> means
> >> > that it is a UDF with a smaller QPS. The main factors that affect the
> >> > framework QPS are data type of parameters, number of parameters and
> >> size of
> >> > parameters, which will greatly affect the serialization/deserialization
> >> > overhead in Process Mode.
> >> >
> >> > The purpose of introducing thread mode is not to replace Process mode,
> >> but
> >> > to supplement Python udf usage scenarios such as cep and join, and some
> >> > scenarios where higher performance is pursued. Compared with Thread
> >> mode,
> >> > Process Mode has better isolation, which can solve the limitation of
> >> thread
> >> > mode in some scenarios such as session mode.
> >> >
> >> > [1] https://www.mail-archive.com/user@flink.apache.org/msg42760.html
> >> > [2] https://www.mail-archive.com/user@flink.apache.org/msg44975.html
> >> > [3] https://pandas.pydata.org/
> >> > [4] https://cython.org/
> >> &g

Re: [DISCUSS] Releasing Flink 1.14.3

2022-01-10 Thread Thomas Weise
Thank you Xingbo. I meanwhile also got my Azure pipeline working and
was able to build the artifacts. Although in general it would be nice
if not every release volunteer had to set up their separate Azure
environment.

Martijn,

The release is staged, except for the website PR:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351075=12315522
https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
https://repository.apache.org/content/repositories/orgapacheflink-1481/
https://github.com/apache/flink/releases/tag/release-1.14.3-rc1

Would you like to create the website PR and start a VOTE?

(If not, I can look into that tomorrow.)

Cheers,
Thomas



On Sun, Jan 9, 2022 at 9:17 PM Xingbo Huang  wrote:
>
> Hi Thomas,
>
> Since multiple wheel packages with different python versions for mac and
> linux are generated, building locally requires you have multiple machines
> with different os and Python environments. I have triggered the wheel
> package build of release-1.14.3-rc1 in my private Azure[1] and you can
> download the wheels after building successfully.
>
> [1]
> https://dev.azure.com/hxbks2ks/FLINK-TEST/_build/results?buildId=1704=results
>
> Best,
> Xingbo
>
> Thomas Weise  于2022年1月10日周一 11:12写道:
>
> > Hi Martijn,
> >
> > I started building the release artifacts. The Maven part is ready.
> > Currently blocked on the Azure build for the PyFlink wheel packages.
> >
> > I had to submit a "Azure DevOps Parallelism Request" and that might
> > take a couple of days.
> >
> > Does someone have the steps to build the wheels locally?
> > Alternatively, if someone can build them on their existing setup and
> > point me to the result, that would speed up things as well.
> >
> > The release branch:
> > https://github.com/apache/flink/tree/release-1.14.3-rc1
> >
> > Thanks,
> > Thomas
> >
> > On Thu, Jan 6, 2022 at 9:14 PM Martijn Visser 
> > wrote:
> > >
> > > Hi Thomas,
> > >
> > > Thanks for volunteering! There was no volunteer yet, so would be great if
> > > you could help out.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > Op vr 7 jan. 2022 om 01:54 schreef Thomas Weise 
> > >
> > > > Hi Martijn,
> > > >
> > > > Thanks for preparing the release. Did a volunteer check in with you?
> > > > If not, I would like to take this up.
> > > >
> > > > Thomas
> > > >
> > > > On Mon, Dec 27, 2021 at 7:11 AM Martijn Visser 
> > > > wrote:
> > > > >
> > > > > Thank you all! That means that there's currently no more blocker to
> > start
> > > > > with the Flink 1.14.3 release.
> > > > >
> > > > > The only thing that's needed is a committer that's willing to follow
> > the
> > > > > release process [1] Any volunteers?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > [1]
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > > > >
> > > > > On Mon, 27 Dec 2021 at 03:17, Qingsheng Ren 
> > wrote:
> > > > >
> > > > > > Hi Martjin,
> > > > > >
> > > > > > FLINK-25132 has been merged to master and release-1.14.
> > > > > >
> > > > > > Thanks for your work for releasing 1.14.3!
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Qingsheng Ren
> > > > > >
> > > > > > > On Dec 26, 2021, at 3:46 PM, Konstantin Knauf  > >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Martijn,
> > > > > > >
> > > > > > > FLINK-25375 is merged to release-1.14.
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Konstantin
> > > > > > >
> > > > > > > On Wed, Dec 22, 2021 at 12:02 PM David Morávek 
> > > > wrote:
> > > > > > >
> > > > > > >> Hi Martijn, FLINK-25271 has been merged to 1.14 branch.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> D.
> > > > > > >>
> >

Re: [DISCUSS] Releasing Flink 1.14.3

2022-01-09 Thread Thomas Weise
Hi Martijn,

I started building the release artifacts. The Maven part is ready.
Currently blocked on the Azure build for the PyFlink wheel packages.

I had to submit a "Azure DevOps Parallelism Request" and that might
take a couple of days.

Does someone have the steps to build the wheels locally?
Alternatively, if someone can build them on their existing setup and
point me to the result, that would speed up things as well.

The release branch: https://github.com/apache/flink/tree/release-1.14.3-rc1

Thanks,
Thomas

On Thu, Jan 6, 2022 at 9:14 PM Martijn Visser  wrote:
>
> Hi Thomas,
>
> Thanks for volunteering! There was no volunteer yet, so would be great if
> you could help out.
>
> Best regards,
>
> Martijn
>
> Op vr 7 jan. 2022 om 01:54 schreef Thomas Weise 
>
> > Hi Martijn,
> >
> > Thanks for preparing the release. Did a volunteer check in with you?
> > If not, I would like to take this up.
> >
> > Thomas
> >
> > On Mon, Dec 27, 2021 at 7:11 AM Martijn Visser 
> > wrote:
> > >
> > > Thank you all! That means that there's currently no more blocker to start
> > > with the Flink 1.14.3 release.
> > >
> > > The only thing that's needed is a committer that's willing to follow the
> > > release process [1] Any volunteers?
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > >
> > > On Mon, 27 Dec 2021 at 03:17, Qingsheng Ren  wrote:
> > >
> > > > Hi Martjin,
> > > >
> > > > FLINK-25132 has been merged to master and release-1.14.
> > > >
> > > > Thanks for your work for releasing 1.14.3!
> > > >
> > > > Cheers,
> > > >
> > > > Qingsheng Ren
> > > >
> > > > > On Dec 26, 2021, at 3:46 PM, Konstantin Knauf 
> > wrote:
> > > > >
> > > > > Hi Martijn,
> > > > >
> > > > > FLINK-25375 is merged to release-1.14.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Konstantin
> > > > >
> > > > > On Wed, Dec 22, 2021 at 12:02 PM David Morávek 
> > wrote:
> > > > >
> > > > >> Hi Martijn, FLINK-25271 has been merged to 1.14 branch.
> > > > >>
> > > > >> Best,
> > > > >> D.
> > > > >>
> > > > >> On Wed, Dec 22, 2021 at 7:27 AM 任庆盛  wrote:
> > > > >>
> > > > >>> Hi Martjin,
> > > > >>>
> > > > >>> Thanks for the effort on Flink 1.14.3. FLINK-25132 has been merged
> > on
> > > > >>> master and is waiting for CI on release-1.14. I think it can be
> > closed
> > > > >>> today.
> > > > >>>
> > > > >>> Cheers,
> > > > >>>
> > > > >>> Qingsheng Ren
> > > > >>>
> > > > >>>> On Dec 21, 2021, at 6:26 PM, Martijn Visser <
> > mart...@ververica.com>
> > > > >>> wrote:
> > > > >>>>
> > > > >>>> Hi everyone,
> > > > >>>>
> > > > >>>> I'm restarting this thread [1] with a new subject, given that
> > Flink
> > > > >>> 1.14.1 was a (cancelled) emergency release for the Log4j update and
> > > > we've
> > > > >>> released Flink 1.14.2 as an emergency release for Log4j updates
> > [2].
> > > > >>>>
> > > > >>>> To give an update, this is the current blocker for Flink 1.14.3:
> > > > >>>>
> > > > >>>> * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
> > > > >>> cannot work with object-reusing DeserializationSchema -> @
> > > > >>> renqs...@gmail.com can you provide an ETA for this ticket?
> > > > >>>>
> > > > >>>> There are two critical tickets open for Flink 1.14.3. That means
> > that
> > > > >> if
> > > > >>> the above ticket is resolved, these two will not block the
> > release. If
> > > > we
> > > > >>> can merge them in before the above ticket is completed, that's a
> > bonus.
> > > > >>>>
> > > > >

Re: [DISCUSS] Releasing Flink 1.14.3

2022-01-06 Thread Thomas Weise
Hi Martijn,

Thanks for preparing the release. Did a volunteer check in with you?
If not, I would like to take this up.

Thomas

On Mon, Dec 27, 2021 at 7:11 AM Martijn Visser  wrote:
>
> Thank you all! That means that there's currently no more blocker to start
> with the Flink 1.14.3 release.
>
> The only thing that's needed is a committer that's willing to follow the
> release process [1] Any volunteers?
>
> Best regards,
>
> Martijn
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>
> On Mon, 27 Dec 2021 at 03:17, Qingsheng Ren  wrote:
>
> > Hi Martjin,
> >
> > FLINK-25132 has been merged to master and release-1.14.
> >
> > Thanks for your work for releasing 1.14.3!
> >
> > Cheers,
> >
> > Qingsheng Ren
> >
> > > On Dec 26, 2021, at 3:46 PM, Konstantin Knauf  wrote:
> > >
> > > Hi Martijn,
> > >
> > > FLINK-25375 is merged to release-1.14.
> > >
> > > Cheers,
> > >
> > > Konstantin
> > >
> > > On Wed, Dec 22, 2021 at 12:02 PM David Morávek  wrote:
> > >
> > >> Hi Martijn, FLINK-25271 has been merged to 1.14 branch.
> > >>
> > >> Best,
> > >> D.
> > >>
> > >> On Wed, Dec 22, 2021 at 7:27 AM 任庆盛  wrote:
> > >>
> > >>> Hi Martjin,
> > >>>
> > >>> Thanks for the effort on Flink 1.14.3. FLINK-25132 has been merged on
> > >>> master and is waiting for CI on release-1.14. I think it can be closed
> > >>> today.
> > >>>
> > >>> Cheers,
> > >>>
> > >>> Qingsheng Ren
> > >>>
> >  On Dec 21, 2021, at 6:26 PM, Martijn Visser 
> > >>> wrote:
> > 
> >  Hi everyone,
> > 
> >  I'm restarting this thread [1] with a new subject, given that Flink
> > >>> 1.14.1 was a (cancelled) emergency release for the Log4j update and
> > we've
> > >>> released Flink 1.14.2 as an emergency release for Log4j updates [2].
> > 
> >  To give an update, this is the current blocker for Flink 1.14.3:
> > 
> >  * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
> > >>> cannot work with object-reusing DeserializationSchema -> @
> > >>> renqs...@gmail.com can you provide an ETA for this ticket?
> > 
> >  There are two critical tickets open for Flink 1.14.3. That means that
> > >> if
> > >>> the above ticket is resolved, these two will not block the release. If
> > we
> > >>> can merge them in before the above ticket is completed, that's a bonus.
> > 
> >  * https://issues.apache.org/jira/browse/FLINK-25199 - fromValues does
> > >>> not emit final MAX watermark -> @Marios Trivyzas any update or thoughts
> > >> on
> > >>> this?
> >  * https://issues.apache.org/jira/browse/FLINK-25227 - Comparing the
> > >>> equality of the same (boxed) numeric values returns false -> @Caizhi
> > Weng
> > >>> any update or thoughts on this?
> > 
> >  Best regards,
> > 
> >  Martijn
> > 
> >  [1] https://lists.apache.org/thread/r0xhs9x01k8hnm0hyq2kk4ptrhkzgdw9
> >  [2]
> > https://flink.apache.org/news/2021/12/16/log4j-patch-releases.html
> > 
> >  On Thu, 9 Dec 2021 at 17:21, David Morávek  wrote:
> >  Hi Martijn, I've just opened a backport PR [1] for FLINK-23946 [2].
> > 
> >  [1] https://github.com/apache/flink/pull/18066
> >  [2] https://issues.apache.org/jira/browse/FLINK-23946
> > 
> >  Best,
> >  D.
> > 
> >  On Thu, Dec 9, 2021 at 4:59 PM Fabian Paul  wrote:
> >  Actually I meant https://issues.apache.org/jira/browse/FLINK-25126
> >  sorry for the confusion.
> > 
> >  On Thu, Dec 9, 2021 at 4:55 PM Fabian Paul  wrote:
> > >
> > > Hi Martijn,
> > >
> > > I just opened the backport for
> > > https://issues.apache.org/jira/browse/FLINK-25132. The changes are
> > > already approved I only wait for a green Azure build.
> > >
> > > Best,
> > > Fabian
> > >
> > > On Thu, Dec 9, 2021 at 4:01 PM Martijn Visser  > >>>
> > >>> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Thanks for the fixes Jingsong and Zhu!
> > >>
> > >> That means that we still have the following tickets open:
> > >>
> > >> * https://issues.apache.org/jira/browse/FLINK-23946 - Application
> > >>> mode
> > >> fails fatally when being shut down -> A PR is there, just pending a
> > >>> review.
> > >> * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka
> > >>> connector tries
> > >> to commit aborted transaction in batch mode -> I believe this is
> > >>> pending a
> > >> backport, correct @fp...@apache.org  ?
> > >> * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
> > >>> cannot
> > >> work with object-reusing DeserializationSchema -> @
> > >>> renqs...@gmail.com
> > >>  can you provide an ETA for this ticket?
> > >> * https://issues.apache.org/jira/browse/FLINK-25199 - fromValues
> > >>> does not
> > >> emit final MAX watermark -> @Marios Trivyzas 
> > >> can
> > >>> you
> > >> provide an ETA for this ticket?
> > >> * https://issues.apache.org/jira/browse/FLINK-25227 - 

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2022-01-03 Thread Thomas Weise
+1 for bumping minimum supported Hadoop version to 2.8.5

On Mon, Jan 3, 2022 at 12:25 AM David Morávek  wrote:
>
> As there were no strong objections, we'll proceed with bumping the Hadoop
> version to 2.8.5 and removing the safeguards and the CI for any earlier
> versions. This will effectively make the Hadoop 2.8.5 the least supported
> version in Flink 1.15.
>
> Best,
> D.
>
> On Thu, Dec 23, 2021 at 11:03 AM Till Rohrmann  wrote:
>
> > If there are no users strongly objecting to dropping Hadoop support for <
> > 2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
> > said.
> >
> > Cheers,
> > Till
> >
> > On Wed, Dec 22, 2021 at 10:33 AM David Morávek  wrote:
> >
> > > Agreed, if we drop the CI for lower versions, there is actually no point
> > > of having safeguards as we can't really test for them.
> > >
> > > Maybe one more thought (it's more of a feeling), I feel that users
> > running
> > > really old Hadoop versions are usually slower to adopt (they most likely
> > > use what the current HDP / CDH version they use offers) and they are less
> > > likely to use Flink 1.15 any time soon, but I don't have any strong data
> > to
> > > support this.
> > >
> > > D.
> > >
> >


  1   2   3   4   >