Re: [VOTE] FLIP-460: Display source/sink I/O metrics on Flink Web UI

2024-07-16 Thread Robert Metzger
+1 (binding)

Nice to see this fixed ;)



On Tue, Jul 16, 2024 at 8:46 AM Yong Fang  wrote:

> +1 (binding)
>
> Best,
> FangYong
>
>
> On Tue, Jul 16, 2024 at 1:14 PM Zhanghao Chen 
> wrote:
>
> > Hi everyone,
> >
> >
> > Thanks for all the feedback about the FLIP-460: Display source/sink I/O
> > metrics on Flink Web UI [1]. The discussion
> > thread is here [2]. I'd like to start a vote on it.
> >
> > The vote will be open for at least 72 hours unless there is an objection
> > or insufficient votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=309496355
> > [2] https://lists.apache.org/thread/sy271nhd2jr1r942f29xbvbgq7fsd841
> >
> > Best,
> > Zhanghao Chen
> >
>


Re: [VOTE] Apache Flink Kubernetes Operator Release 1.9.0, release candidate #1

2024-06-28 Thread Robert Metzger
+1 (binding)

- checked the docker file contents
- installed the operator from the helm chart
- checked if it can still talk to an existing Flink cluster, deployed from
v1.8

On Tue, Jun 25, 2024 at 9:05 AM Gyula Fóra  wrote:

> +1 (binding)
>
> Verified:
>  - Sources/signates
>  - Install 1.9.0 from helm chart
>  - Stateful example job basic interactions
>  - Operator upgrade from 1.8.0 -> 1.9.0 with running flinkdeployments
>  - Flink-web PR looks good
>
> Cheers,
> Gyula
>
>
> On Wed, Jun 19, 2024 at 12:09 PM Gyula Fóra  wrote:
>
> > Hi,
> >
> > I have updated the KEYs file and extended the expiration date so that
> > should not be an issue. Thanks for pointing that out.
> >
> > Gyula
> >
> > On Wed, 19 Jun 2024 at 12:07, Rui Fan <1996fan...@gmail.com> wrote:
> >
> >> Thanks Gyula and Mate for driving this release!
> >>
> >> +1 (binding)
> >>
> >> Except the key is expired, and leaving a couple of comments to the
> >> flink-web PR,
> >> the rest of them are fine.
> >>
> >> - Downloaded artifacts from dist ( svn co https://dist.apache
> >> .org/repos/dist/dev/flink/flink-kubernetes-operator-1.9.0-rc1/ )
> >> - Verified SHA512 checksums : ( for i in *.tgz; do echo $i; sha512sum
> >> --check $i.sha512; done )
> >> - Verified GPG signatures : ( for i in *.tgz; do echo $i; gpg --verify
> >> $i.asc $i; done)
> >> - Build the source with java-11 and java-17 ( mvn -T 20 clean install
> >> -DskipTests )
> >> - Verified the license header during build the source
> >> - Verified that chart and appVersion matches the target release (less
> the
> >> index.yaml and Chart.yaml )
> >> - Download Autoscaler standalone: wget https://repository.apache
> >> .org/content/repositories/orgapacheflink-1740/org/apache/flink/flink
> >> -autoscaler-standalone/1.9.0/flink-autoscaler-standalone-1.9.0.jar
> >> - Ran Autoscaler standalone locally, it works well with rescale api.
> >>
> >> Best,
> >> Rui
> >>
> >> On Wed, Jun 19, 2024 at 1:50 AM Mate Czagany 
> wrote:
> >>
> >> > Hi,
> >> >
> >> > +1 (non-binding)
> >> >
> >> > Note: Using the Apache Flink KEYS file [1] to verify the signatures
> your
> >> > key seems to be expired, so that file should be updated as well.
> >> >
> >> > - Verified checksums and signatures
> >> > - Built source distribution
> >> > - Verified all pom.xml versions are the same
> >> > - Verified install from RC repo
> >> > - Verified Chart.yaml and values.yaml contents
> >> > - Submitted basic example with 1.17 and 1.19 Flink versions in native
> >> and
> >> > standalone mode
> >> > - Tested operator HA, added new watched namespace dynamically
> >> > - Checked operator logs
> >> >
> >> > Regards,
> >> > Mate
> >> >
> >> > [1] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> >
> >> > Gyula Fóra  ezt írta (időpont: 2024. jún. 18.,
> K,
> >> > 8:14):
> >> >
> >> > > Hi Everyone,
> >> > >
> >> > > Please review and vote on the release candidate #1 for the version
> >> 1.9.0
> >> > of
> >> > > Apache Flink Kubernetes Operator,
> >> > > as follows:
> >> > > [ ] +1, Approve the release
> >> > > [ ] -1, Do not approve the release (please provide specific
> comments)
> >> > >
> >> > > **Release Overview**
> >> > >
> >> > > As an overview, the release consists of the following:
> >> > > a) Kubernetes Operator canonical source distribution (including the
> >> > > Dockerfile), to be deployed to the release repository at
> >> dist.apache.org
> >> > > b) Kubernetes Operator Helm Chart to be deployed to the release
> >> > repository
> >> > > at dist.apache.org
> >> > > c) Maven artifacts to be deployed to the Maven Central Repository
> >> > > d) Docker image to be pushed to dockerhub
> >> > >
> >> > > **Staging Areas to Review**
> >> > >
> >> > > The staging areas containing the above mentioned artifacts are as
> >> > follows,
> >> > > for your review:
> >> > > * All artifacts for a,b) can be found in the corresponding dev
> >> repository
> >> > > at dist.apache.org [1]
> >> > > * All artifacts for c) can be found at the Apache Nexus Repository
> [2]
> >> > > * The docker image for d) is staged on github [3]
> >> > >
> >> > > All artifacts are signed with the key 21F06303B87DAFF1 [4]
> >> > >
> >> > > Other links for your review:
> >> > > * JIRA release notes [5]
> >> > > * source code tag "release-1.9.0-rc1" [6]
> >> > > * PR to update the website Downloads page to
> >> > > include Kubernetes Operator links [7]
> >> > >
> >> > > **Vote Duration**
> >> > >
> >> > > The voting time will run for at least 72 hours.
> >> > > It is adopted by majority approval, with at least 3 PMC affirmative
> >> > votes.
> >> > >
> >> > > **Note on Verification**
> >> > >
> >> > > You can follow the basic verification guide here[8].
> >> > > Note that you don't need to verify everything yourself, but please
> >> make
> >> > > note of what you have tested together with your +- vote.
> >> > >
> >> > > Cheers!
> >> > > Gyula Fora
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> 

[jira] [Created] (FLINK-35526) Remove deprecated stedolan/jq Docker image from Flink e2e tests

2024-06-05 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-35526:
--

 Summary: Remove deprecated stedolan/jq Docker image from Flink e2e 
tests
 Key: FLINK-35526
 URL: https://issues.apache.org/jira/browse/FLINK-35526
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Reporter: Robert Metzger
Assignee: Robert Metzger


Our CI logs contain this warning: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60060=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d=3828

{code}
latest: Pulling from stedolan/jq
[DEPRECATION NOTICE] Docker Image Format v1, and Docker Image manifest version 
2, schema 1 support will be removed in an upcoming release. Suggest the author 
of docker.io/stedolan/jq:latest to upgrade the image to the OCI Format, or 
Docker Image manifest v2, schema 2. More information at 
https://docs.docker.com/go/deprecated-image-specs/
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Slack Invite

2024-06-03 Thread Robert Metzger
I will update the Flink website.

On Thu, May 30, 2024 at 10:08 AM gongzhongqiang 
wrote:

> Hi,
> The invite  link :
>
> https://join.slack.com/t/apache-flink/shared_invite/zt-2jtsd06wy-31q_aELVkdc4dHsx0GMhOQ
>
> Best,
> Zhongqiang Gong
>
> Nelson de Menezes Neto  于2024年5月30日周四 15:01写道:
>
> > Hey guys!
> >
> > I want to join the slack community but the invite has expired..
> > Can u send me a new one?
> >
>


Re: [VOTE] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-25 Thread Robert Metzger
Ah, true -- now I remember. Thanks for fixing the wiki page

+1 (binding)


On Thu, Apr 25, 2024 at 4:40 PM Gyula Fóra  wrote:

> That's my fault @Robert Metzger  , since the new FLIP
> process and a lack of confluent access for non-committers this is a bit
> tricky to get it sync :)
>
> Gyula
>
> On Thu, Apr 25, 2024 at 4:17 PM Robert Metzger 
> wrote:
>
> > In principle I'm +1 on the proposal, but I think the FLIP in the wiki is
> > not in sync with the Google doc.
> > For example in the Wiki FlinkStateSnapshotSpec.backoffLimit is missing.
> >
> > On Thu, Apr 25, 2024 at 3:27 PM Thomas Weise  wrote:
> >
> > > +1 (binding)
> > >
> > >
> > > On Wed, Apr 24, 2024 at 5:14 AM Yuepeng Pan 
> > wrote:
> > >
> > > > +1(non-binding)
> > > >
> > > >
> > > > Best,
> > > > Yuepeng Pan
> > > >
> > > > At 2024-04-24 16:05:07, "Rui Fan" <1996fan...@gmail.com> wrote:
> > > > >+1(binding)
> > > > >
> > > > >Best,
> > > > >Rui
> > > > >
> > > > >On Wed, Apr 24, 2024 at 4:03 PM Mate Czagany 
> > > wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I'd like to start a vote on the FLIP-446: Kubernetes Operator
> State
> > > > >> Snapshot CRD [1]. The discussion thread is here [2].
> > > > >>
> > > > >> The vote will be open for at least 72 hours unless there is an
> > > > objection or
> > > > >> insufficient votes.
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-446%3A+Kubernetes+Operator+State+Snapshot+CRD
> > > > >> [2]
> > https://lists.apache.org/thread/q5dzjwj0qk34rbg2sczyypfhokxoc3q7
> > > > >>
> > > > >> Regards,
> > > > >> Mate
> > > > >>
> > > >
> > >
> >
>


Re: [VOTE] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-25 Thread Robert Metzger
In principle I'm +1 on the proposal, but I think the FLIP in the wiki is
not in sync with the Google doc.
For example in the Wiki FlinkStateSnapshotSpec.backoffLimit is missing.

On Thu, Apr 25, 2024 at 3:27 PM Thomas Weise  wrote:

> +1 (binding)
>
>
> On Wed, Apr 24, 2024 at 5:14 AM Yuepeng Pan  wrote:
>
> > +1(non-binding)
> >
> >
> > Best,
> > Yuepeng Pan
> >
> > At 2024-04-24 16:05:07, "Rui Fan" <1996fan...@gmail.com> wrote:
> > >+1(binding)
> > >
> > >Best,
> > >Rui
> > >
> > >On Wed, Apr 24, 2024 at 4:03 PM Mate Czagany 
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I'd like to start a vote on the FLIP-446: Kubernetes Operator State
> > >> Snapshot CRD [1]. The discussion thread is here [2].
> > >>
> > >> The vote will be open for at least 72 hours unless there is an
> > objection or
> > >> insufficient votes.
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-446%3A+Kubernetes+Operator+State+Snapshot+CRD
> > >> [2] https://lists.apache.org/thread/q5dzjwj0qk34rbg2sczyypfhokxoc3q7
> > >>
> > >> Regards,
> > >> Mate
> > >>
> >
>


Re: [DISCUSS] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-23 Thread Robert Metzger
t;> 4. I really like the idea of having something like Pod Conditions, but I
> >> think it wouldn't add too much value here, because the only 2 stages
> >> important to the user are "Triggered" and "Completed", and those
> timestamps
> >> will already be included in the status field. I think it would make more
> >> sense to implement this if there were more lifecycle stages.
> >>
> >> 5. There will be a new field in JobSpec called
> >> "flinkStateSnapshotReference" to reference a FlinkStateSnapshot to
> restore
> >> from.
> >>
> >> > How do you see potential effects on API server performance wrt. number
> >> of
> >> objects vs mutations? Is the proposal more or less neutral in that
> regard?
> >>
> >> While I am not an expert in Kubernetes internals, my understanding is
> >> that for the api-server, editing an existing resource or creating a new
> one
> >> is not different performance-wise, because the whole resource will
> always
> >> be written to etcd anyways.
> >> Retrieving the savepoints from etcd will be different though for some
> >> use-cases, e.g. retrieving all snapshots for a specific FlinkDeployment
> >> would require the api-server to retrieve every snapshots first in a
> >> namespace from etcd, then filter them for that specific
> FlinkDeployment. I
> >> think this is a worst-case scenario, and it will be up to the user to
> >> optimize their queries via e.g. watch queries [1] or resourceVersions
> [2].
> >>
> >> > Does that mean one would have to create a FlinkStateSnapshot CR when
> >> starting a new deployment from savepoint? If so, that's rather
> >> complicated.
> >> I would prefer something more simple/concise and would rather
> >> keep initialSavepointPath
> >>
> >> Starting a job from a savepoint path will indeed be deprecated with this
> >> FLIP. I agree that it will be more complicated to restore from a
> savepoint
> >> in those cases, but if the user decides to move away from the deprecated
> >> savepoint mechanisms, every savepoint will result in a new
> >> FlinkStateSnapshot CR. So the only situation I expect this to be an
> >> inconvenience is when the user onboards a new Flink job to the operator.
> >> But I may not be thinking this through, so please let me know if you
> >> disagree.
> >>
> >> Thank you very much for your questions and suggestions!
> >>
> >> [1]
> >>
> https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
> >> [2]
> >>
> https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions
> >>
> >> Regards,
> >> Mate
> >>
> >> Thomas Weise  ezt írta (időpont: 2024. ápr. 19., P,
> >> 11:31):
> >>
> >>> Thanks for the proposal.
> >>>
> >>> How do you see potential effects on API server performance wrt. number
> of
> >>> objects vs mutations? Is the proposal more or less neutral in that
> >>> regard?
> >>>
> >>> Thanks for the thorough feedback Robert.
> >>>
> >>> Couple more questions below.
> >>>
> >>> -->
> >>>
> >>> On Fri, Apr 19, 2024 at 5:07 AM Robert Metzger 
> >>> wrote:
> >>>
> >>> > Hi Mate,
> >>> > thanks for proposing this, I'm really excited about your FLIP. I hope
> >>> my
> >>> > questions make sense to you:
> >>> >
> >>> > 1. I would like to discuss the "FlinkStateSnapshot" name and the fact
> >>> that
> >>> > users have to use either the savepoint or checkpoint spec inside the
> >>> > FlinkStateSnapshot.
> >>> > Wouldn't it be more intuitive to introduce two CRs:
> >>> > FlinkSavepoint and FlinkCheckpoint
> >>> > Ideally they can internally share a lot of code paths, but from a
> users
> >>> > perspective, the abstraction is much clearer.
> >>> >
> >>>
> >>> There are probably pros and cons either way. For example it is
> desirable
> >>> to
> >>> have a single list of state snapshots when looking for the initial
> >>> savepoint for a new deployment etc.
> >>>
> >>>
> >>> >
> >>> > 2. I also would like to discuss SavepointSpec.completed, as this name
> >>> i

Re: [DISCUSS] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-19 Thread Robert Metzger
Hi Mate,
thanks for proposing this, I'm really excited about your FLIP. I hope my
questions make sense to you:

1. I would like to discuss the "FlinkStateSnapshot" name and the fact that
users have to use either the savepoint or checkpoint spec inside the
FlinkStateSnapshot.
Wouldn't it be more intuitive to introduce two CRs:
FlinkSavepoint and FlinkCheckpoint
Ideally they can internally share a lot of code paths, but from a users
perspective, the abstraction is much clearer.

2. I also would like to discuss SavepointSpec.completed, as this name is
not intuitive to me. How about "ignoreExisting"?

3. The FLIP proposal seems to leave error handling to the user, e.g. when
you create a FlinkStateSnapshot, it will just move to status FAILED.
Typically in K8s with the control loop stuff, resources are tried to get
created until success. I think it would be really nice if the
FlinkStateSnapshot or FlinkSavepoint resource would retry based on a
property in the resource. A "FlinkStateSnapshot.retries" number would
indicate how often the user wants the operator to retry creating a
savepoint, "retries = -1" means retry forever. In addition, we could
consider a timeout as well, however, I haven't seen such a concept in K8s
CRs yet.
The benefit of this is that other tools relying on the K8s operator
wouldn't have to implement this retry loop (which is quite natural for
K8s), they would just have to wait for the CR they've created to transition
into COMPLETED:

3. FlinkStateSnapshotStatus.error will only show the last error. What about
using Events, so that we can show multiple errors and use the
FlinkStateSnapshotState to report errors?

4. I wonder if it makes sense to use something like Pod Conditions (e.g.
Savepoint Conditions):
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions
to track the completion status. We could have the following conditions:
- Triggered
- Completed
- Failed
The only benefit of this proposal that I see is that it would tell the user
how long it took to create the savepoint.

5. You mention that "JobSpec.initialSavepointPath" will be deprecated. I
assume we will introduce a new field for referencing a FlinkStateSnapshot
CR? I think it would be good to cover this in the FLIP.


One minor comment:

"/** Dispose the savepoints upon CRD deletion. */"

I think this should be "upon CR deletion", not "CRD deletion".

Thanks again for this great FLIP!

Best,
Robert


On Fri, Apr 19, 2024 at 9:01 AM Gyula Fóra  wrote:

> Cc'ing some folks who gave positive feedback on this idea in the past.
>
> I would love to hear your thoughts on the proposed design
>
> Gyula
>
> On Tue, Apr 16, 2024 at 6:31 PM Őrhidi Mátyás 
> wrote:
>
>> +1 Looking forward to it
>>
>> On Tue, Apr 16, 2024 at 8:56 AM Mate Czagany  wrote:
>>
>> > Thank you Gyula!
>> >
>> > I think that is a great idea. I have updated the Google doc to only
>> have 1
>> > new configuration option of boolean type, which can be used to signal
>> the
>> > Operator to use the old mode.
>> >
>> > Also described in the configuration description, the Operator will
>> fallback
>> > to the old mode if the FlinkStateSnapshot CRD cannot be found on the
>> > Kubernetes cluster.
>> >
>> > Regards,
>> > Mate
>> >
>> > Gyula Fóra  ezt írta (időpont: 2024. ápr. 16., K,
>> > 17:01):
>> >
>> > > Thanks Mate, this is great stuff.
>> > >
>> > > Mate, I think the new configs should probably default to the new mode
>> and
>> > > they should only be useful for users to fall back to the old
>> behaviour.
>> > > We could by default use the new Snapshot CRD if the CRD is installed,
>> > > otherwise use the old mode by default and log a warning on startup.
>> > >
>> > > So I am suggesting a "dynamic" default behaviour based on whether the
>> new
>> > > CRD was installed or not because we don't want to break operator
>> startup.
>> > >
>> > > Gyula
>> > >
>> > > On Tue, Apr 16, 2024 at 4:48 PM Mate Czagany 
>> wrote:
>> > >
>> > > > Hi Ferenc,
>> > > >
>> > > > Thank you for your comments, I have updated the Google docs with a
>> new
>> > > > section for the new configs.
>> > > > All of the newly added config keys will have defaults set, and by
>> > default
>> > > > all the savepoint/checkpoint operations will use the old system:
>> write
>> > > > their results to the FlinkDeployment/FlinkSessionJob status field.
>> > > >
>> > > > I have also added a default for the checkpoint type to be FULL
>> (which
>> > is
>> > > > also the default currently). That was an oversight on my part to
>> miss
>> > > that.
>> > > >
>> > > > Regards,
>> > > > Mate
>> > > >
>> > > > Ferenc Csaky  ezt írta (időpont: 2024.
>> > ápr.
>> > > > 16., K, 16:10):
>> > > >
>> > > > > Thank you Mate for initiating this discussion. +1 for this idea.
>> > > > > Some Qs:
>> > > > >
>> > > > > Can you specify the newly introduced configurations in more
>> > > > > details? Currently, it is not fully clear to me what are the
>> > > > > possible values of 

Re: [Vote] FLIP-438: Amazon SQS Sink Connector

2024-04-16 Thread Robert Metzger
+1 binding

On Tue, Apr 16, 2024 at 2:05 PM Jeyhun Karimov  wrote:

> Thanks Priya for driving the FLIP.
>
> +1 (non-binding)
>
> Regards,
> Jeyhun
>
> On Tue, Apr 16, 2024 at 12:37 PM Hong Liang  wrote:
>
> > +1 (binding)
> >
> > Thanks Priya for driving this! This has been a requested feature for a
> > while now, and will benefit the community :)
> >
> > Hong
> >
> > On Tue, Apr 16, 2024 at 3:23 AM Muhammet Orazov
> >  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks Priya for the FLIP and driving it!
> > >
> > > Best,
> > > Muhammet
> > >
> > > On 2024-04-12 21:56, Dhingra, Priya wrote:
> > > > Hi devs,
> > > >
> > > >
> > > >
> > > > Thank you to everyone for the feedback on FLIP-438: Amazon SQS Sink
> > > > Connector<
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-438%3A+Amazon+SQS+Sink+Connector
> > > >
> > > >
> > > >
> > > >
> > > > I would like to start a vote for it. The vote will be open for at
> least
> > > > 72
> > > >
> > > > hours unless there is an objection or not enough votes.
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-438%3A+Amazon+SQS+Sink+Connector
> > > >
> > > > Regards
> > > > Priya
> > >
> >
>


Re: [VOTE] FLIP-399: Flink Connector Doris

2024-04-09 Thread Robert Metzger
+1 (binding)

On Tue, Apr 9, 2024 at 10:33 AM Ahmed Hamdy  wrote:

> Hi Wudi,
>
> +1 (non-binding).
>
> Best Regards
> Ahmed Hamdy
>
>
> On Tue, 9 Apr 2024 at 09:21, Yuepeng Pan  wrote:
>
> > Hi, Di.
> >
> > Thank you for driving it !
> >
> > +1 (non-binding).
> >
> > Best,
> > Yuepeng Pan
> >
> > On 2024/04/09 02:47:55 wudi wrote:
> > > Hi devs,
> > >
> > > I would like to start a vote about FLIP-399 [1]. The FLIP is about
> > contributing the Flink Doris Connector[2] to the Flink community.
> > Discussion thread [3].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > insufficient votes.
> > >
> > >
> > > Thanks,
> > > Di.Wu
> > >
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris
> > > [2] https://github.com/apache/doris-flink-connector
> > > [3] https://lists.apache.org/thread/p3z4wsw3ftdyfs9p2wd7bbr2gfyl3xnh
> > >
> > >
> >
>


Re: [External] Inquiry Regarding Azure Pipelines

2024-04-09 Thread Robert Metzger
I'm not 100% sure, but I think the manual trigger via a command will use
the latest commit (e.g. your second commit) to trigger a build.
Historically, manual triggering has not been very reliable. If the manual
triggering isn't working, you can also just push a commit with a tiny
change to trigger a new build.


On Sun, Apr 7, 2024 at 5:42 AM Yisha Zhou 
wrote:

> Hi Robert,
>
> Thank you for your prompt response to my previous email. I appreciate the
> information provided. However, I still have a few remaining questions that
> I hope you can assist me with.
>
> I have noticed that other PRs are now able to trigger AZP builds
> automatically. In my case, I have two commits associated with my PR. The
> first commit has already triggered an Azure build successfully, while the
> second commit was made during the service downtime.  My question are :
>
> 1. If I use the command "@flinkbot run azure" now, will it trigger the
> build corresponding to my second commit, or will it rerun the already
> successful build from the first commit?
>
> 2. If the answer for question 1 is the former, how can I trigger AZP for
> my second commit?
>
> I would appreciate any clarification you can provide on this matter. Thank
> you for your attention to this issue, and I look forward to your response.
>
> Best regards,
> Yisha
>
>
>
> > 2024年4月4日 00:52,Robert Metzger  写道:
> >
> > Hi Yisha,
> >
> > flinkbot is currently not active, so new PRs are not triggering any AZP
> > builds. We hope to restore the service soon.
> >
> > AZP is still the source of truth for CI builds.
> >
> >
> > On Wed, Apr 3, 2024 at 11:34 AM Yisha Zhou  <mailto:zhouyi...@bytedance.com.invalid>>
> > wrote:
> >
> >> Hi devs,
> >>
> >> I hope this email finds you well. I am writing to seek clarification
> >> regarding the status of Azure Pipelines within the Apache community and
> >> seek assistance with a specific issue I encountered.
> >>
> >> Today, I made some new commits to a pull request in one of the Apache
> >> repositories. However, I noticed that even after approximately six
> hours,
> >> there were no triggers initiated for the Azure Pipeline. I have a
> couple of
> >> questions regarding this matter:
> >>
> >> 1. Is the Apache community still utilizing Azure Pipelines for CI/CD
> >> purposes? I came across an issue discussing the migration from Azure to
> >> GitHub Actions, but I am uncertain about the timeline for discontinuing
> the
> >> use of Azure Pipelines.
> >>
> >> 2. If Azure Pipelines are still in use, where can I find information
> about
> >> the position of my commits in the CI queue, awaiting execution?
> >>
> >> I would greatly appreciate any insights or guidance you can provide
> >> regarding these questions. Thank you for your time and attention.
> >>
> >> My PR link is https://github.com/apache/flink/pull/24567 <
> >> https://github.com/apache/flink/pull/24567 <
> https://github.com/apache/flink/pull/24567>>.
> >>
> >> Best regards,
> >> Yisha
>
>


Re: Inquiry Regarding Azure Pipelines

2024-04-03 Thread Robert Metzger
Hi Yisha,

flinkbot is currently not active, so new PRs are not triggering any AZP
builds. We hope to restore the service soon.

AZP is still the source of truth for CI builds.


On Wed, Apr 3, 2024 at 11:34 AM Yisha Zhou 
wrote:

> Hi devs,
>
> I hope this email finds you well. I am writing to seek clarification
> regarding the status of Azure Pipelines within the Apache community and
> seek assistance with a specific issue I encountered.
>
> Today, I made some new commits to a pull request in one of the Apache
> repositories. However, I noticed that even after approximately six hours,
> there were no triggers initiated for the Azure Pipeline. I have a couple of
> questions regarding this matter:
>
> 1. Is the Apache community still utilizing Azure Pipelines for CI/CD
> purposes? I came across an issue discussing the migration from Azure to
> GitHub Actions, but I am uncertain about the timeline for discontinuing the
> use of Azure Pipelines.
>
> 2. If Azure Pipelines are still in use, where can I find information about
> the position of my commits in the CI queue, awaiting execution?
>
> I would greatly appreciate any insights or guidance you can provide
> regarding these questions. Thank you for your time and attention.
>
> My PR link is https://github.com/apache/flink/pull/24567 <
> https://github.com/apache/flink/pull/24567>.
>
> Best regards,
> Yisha


Re: [DISCUSS] Planning Flink 1.20

2024-03-25 Thread Robert Metzger
Hi, thanks for starting the discussion.

+1 for the proposed timeline and the three proposed release managers.

I'm happy to join the release managers group as well, as a backup for Ufuk
(unless there are objections about the number of release managers)

On Mon, Mar 25, 2024 at 11:04 AM Ufuk Celebi  wrote:

> Hey all,
>
> I'd like to join the release managers for 1.20 as well. I'm looking
> forward to getting more actively involved again.
>
> Cheers,
>
> Ufuk
>
> On Sun, Mar 24, 2024, at 11:27 AM, Ahmed Hamdy wrote:
> > +1 for the proposed timeline and release managers.
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Fri, 22 Mar 2024 at 07:41, Xintong Song 
> wrote:
> >
> > > +1 for the proposed timeline and Weijie & Rui as the release managers.
> > >
> > > I think it would be welcomed if another 1-2 volunteers join as the
> release
> > > managers, but that's not a must. We used to have only 1-2 release
> managers
> > > for each release,
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Fri, Mar 22, 2024 at 2:55 PM Jark Wu  wrote:
> > >
> > > > Thanks for kicking this off.
> > > >
> > > > +1 for the volunteered release managers (Weijie Guo, Rui Fan) and the
> > > > targeting date (feature freeze: June 15).
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, 22 Mar 2024 at 14:00, Rui Fan <1996fan...@gmail.com> wrote:
> > > >
> > > > > Thanks Leonard for this feedback and help!
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Fri, Mar 22, 2024 at 12:36 PM weijie guo <
> guoweijieres...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Leonard!
> > > > > >
> > > > > > > I'd like to help you if you need some help like permissions
> from
> > > PMC
> > > > > > side, please feel free to ping me.
> > > > > >
> > > > > > Nice to know. It'll help a lot!
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Weijie
> > > > > >
> > > > > >
> > > > > > Leonard Xu  于2024年3月22日周五 12:09写道:
> > > > > >
> > > > > >> +1 for the proposed release managers (Weijie Guo, Rui Fan),
> both the
> > > > two
> > > > > >> candidates are pretty active committers thus I believe they
> know the
> > > > > >> community development process well. The recent releases have
> four
> > > > > release
> > > > > >> managers, and I am also looking forward to having other
> volunteers
> > > > > >>  join the management of Flink 1.20.
> > > > > >>
> > > > > >> +1 for targeting date (feature freeze: June 15, 2024),
> referring to
> > > > the
> > > > > >> release cycle of recent versions, release cycle of 4 months
> makes
> > > > sense
> > > > > to
> > > > > >> me.
> > > > > >>
> > > > > >>
> > > > > >> I'd like to help you if you need some help like permissions
> from PMC
> > > > > >> side, please feel free to ping me.
> > > > > >>
> > > > > >> Best,
> > > > > >> Leonard
> > > > > >>
> > > > > >>
> > > > > >> > 2024年3月19日 下午5:35,Rui Fan <1996fan...@gmail.com> 写道:
> > > > > >> >
> > > > > >> > Hi Weijie,
> > > > > >> >
> > > > > >> > Thanks for kicking off 1.20! I'd like to join you and
> participate
> > > in
> > > > > the
> > > > > >> > 1.20 release.
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Rui
> > > > > >> >
> > > > > >> > On Tue, Mar 19, 2024 at 5:30 PM weijie guo <
> > > > guoweijieres...@gmail.com
> > > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Hi everyone,
> > > > > >> >>
> > > > > >> >> With the release announcement of Flink 1.19, it's a good
> time to
> > > > kick
> > > > > >> off
> > > > > >> >> discussion of the next release 1.20.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> - Release managers
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> I'd like to volunteer as one of the release managers this
> time.
> > > It
> > > > > has
> > > > > >> been
> > > > > >> >> good practice to have a team of release managers from
> different
> > > > > >> >> backgrounds, so please raise you hand if you'd like to
> volunteer
> > > > and
> > > > > >> get
> > > > > >> >> involved.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> - Timeline
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Flink 1.19 has been released. With a target release cycle of
> 4
> > > > > months,
> > > > > >> >> we propose a feature freeze date of *June 15, 2024*.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> - Collecting features
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> As usual, we've created a wiki page[1] for collecting new
> > > features
> > > > in
> > > > > >> 1.20.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> In addition, we already have a number of FLIPs that have been
> > > voted
> > > > > or
> > > > > >> are
> > > > > >> >> in the process, including pre-works for version 2.0.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> In the meantime, the release management team will be
> finalized in
> > > > the
> > > > > >> next
> > > > > >> >> few days, and we'll continue to create Jira Boards and Sync
> > > > meetings
> > > > > >> 

Re: [DISCUSS] Manual savepoint triggering in flink-kubernetes-operator

2024-03-12 Thread Robert Metzger
Have you guys considered making savepoints a first class citizen in the
Kubernetes operator?
E.g. to trigger a savepoint, you create a "FlinkSavepoint" CR, the K8s
operator picks up that resource and tries to create a savepoint
indefinitely until the savepoint has been successfully created. We report
the savepoint status and location in the "status" field.

We could even add an (optional) finalizer to delete the physical savepoint
from the savepoint storage once the "FlinkSavepoint" CR has been deleted.
optional: the savepoint spec could contain a field "retain
physical savepoint" or something, that controls the delete behavior.


On Thu, Mar 3, 2022 at 4:02 AM Yang Wang  wrote:

> I agree that we could start with the annotation approach and collect the
> feedback at the same time.
>
> Best,
> Yang
>
> Őrhidi Mátyás  于2022年3月2日周三 20:06写道:
>
> > Thank you for your feedback!
> >
> > The annotation on the
> >
> > @ControllerConfiguration(generationAwareEventProcessing = false)
> > FlinkDeploymentController
> >
> > already enables the event triggering based on metadata changes. It was
> set
> > earlier to support some failure scenarios. (It can be used for example to
> > manually reenable the reconcile loop when it got stuck in an error phase)
> >
> > I will go ahead and propose a PR using annotations then.
> >
> > Cheers,
> > Matyas
> >
> > On Wed, Mar 2, 2022 at 12:47 PM Yang Wang  wrote:
> >
> > > I also like the annotation approach since it is more natural.
> > > But I am not sure about whether the meta data change will trigger an
> > event
> > > in java-operator-sdk.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Gyula Fóra  于2022年3月2日周三 16:29写道:
> > >
> > > > Thanks Matyas,
> > > >
> > > > From a user perspective I think the annotation is pretty nice and
> user
> > > > friendly so I personally prefer that approach.
> > > >
> > > > You said:
> > > >  "It seems, the java-operator-sdk handles the changes of the
> .metadata
> > > and
> > > > .spec fields of custom resources differently."
> > > >
> > > > What implications does this have on the above mentioned 2 approaches?
> > > Does
> > > > it make one more difficult than the other?
> > > >
> > > > Cheers
> > > > Gyula
> > > >
> > > >
> > > >
> > > > On Tue, Mar 1, 2022 at 1:52 PM Őrhidi Mátyás <
> matyas.orh...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All!
> > > > >
> > > > > I'd like to start a quick discussion about the way we allow users
> to
> > > > > trigger savepoints manually in the operator [FLINK-26181]
> > > > > . There are
> > > existing
> > > > > solutions already for this functionality in other operators, for
> > > example:
> > > > > - counter based
> > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/spotify/flink-on-k8s-operator/blob/master/docs/savepoints_guide.md#2-taking-savepoints-by-updating-the-flinkcluster-custom-resource
> > > > > >
> > > > > - annotation based
> > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/spotify/flink-on-k8s-operator/blob/master/docs/savepoints_guide.md#3-taking-savepoints-by-attaching-annotation-to-the-flinkcluster-custom-resource
> > > > > >
> > > > >
> > > > > We could implement any of these or both or come up with our own
> > > approach.
> > > > > It seems, the java-operator-sdk handles the changes of the
> .metadata
> > > and
> > > > > .spec fields of custom resources differently. For further info see
> > the
> > > > > chapter Generation Awareness and Event Filtering in the docs
> > > > > .
> > > > >
> > > > > Let me know what you think.
> > > > >
> > > > > Cheers,
> > > > > Matyas
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-29 Thread Robert Metzger
Hey Kevin,

Thanks a lot. Then let's contribute the Confluent implementation to
apache/flink. We can't start working on this immediately because of a team
event next week, but within the next two weeks, we will start working on
this.
It probably makes sense for us to open a pull request of what we have
already, so that you can start reviewing and maybe also contributing to the
PR.
I hope this timeline works for you!

Let's also decide if we need a FLIP once the code is public.
We will look into the field ids.


On Tue, Feb 27, 2024 at 8:56 PM Kevin Lam 
wrote:

> Hey Robert,
>
> Thanks for your response. I have a partial implementation, just for the
> decoding portion.
>
> The code I have is pretty rough and doesn't do any of the refactors I
> mentioned, but the decoder logic does pull the schema from the schema
> registry and use that to deserialize the DynamicMessage before converting
> it to RowData using a DynamicMessageToRowDataConverter class. For the other
> aspects, I would need to start from scratch for the encoder.
>
> Would be very happy to see you drive the contribution back to open source
> from Confluent, or collaborate on this.
>
> Another topic I had is Protobuf's field ids. Ideally in Flink it would be
> nice if we are idiomatic about not renumbering them in incompatible ways,
> similar to what's discussed on the Schema Registry issue here:
> https://github.com/confluentinc/schema-registry/issues/2551
>
>
> On Tue, Feb 27, 2024 at 5:51 AM Robert Metzger 
> wrote:
>
> > Hi all,
> >
> > +1 to support the format in Flink.
> >
> > @Kevin: Do you already have an implementation for this inhouse that you
> are
> > looking to upstream, or would you start from scratch?
> > I'm asking because my current employer, Confluent, has a Protobuf Schema
> > registry implementation for Flink, and I could help drive contributing
> this
> > back to open source.
> > If you already have an implementation, let's decide which one to use :)
> >
> > Best,
> > Robert
> >
> > On Thu, Feb 22, 2024 at 2:05 PM David Radley 
> > wrote:
> >
> > > Hi Kevin,
> > > Some thoughts on this.
> > > I suggested an Apicurio registry format in the dev list, and was
> advised
> > > to raise a FLIP for this, I suggest the same would apply here (or the
> > > alternative to FLIPs if you cannot raise one). I am prototyping an Avro
> > > Apicurio format, prior to raising the Flip,  and notice that the
> > readSchema
> > > in the SchemaCoder only takes a byte array ,but I need to pass down the
> > > Kafka headers (where the Apicurio globalId identifying the schema
> lives).
> > >
> > > I assume:
> > >
> > >   *   for the confluent Protobuf format you would extend the Protobuf
> > > format to drive some Schema Registry logic for Protobuf (similar to the
> > way
> > > Avro does it) where the magic byte _ schema id can be obtained and the
> > > schema looked up using the Confluent Schema registry.
> > >   *   It would be good if any protobuf format enhancements for Schema
> > > registries pass down the Kafka headers (I am thinking as a Map > > Object> for Avro) as well as the message payload so Apicurio registry
> > could
> > > work with this.
> > >   *   It would make sense to have the Confluent schema lookup in common
> > > code, which is part of the SchemaCoder readSchema  logic.
> > >   *   I assume the ProtobufSchemaCoder readSchema would return a
> Protobuf
> > > Schema object.
> > >
> > >
> > >
> > > I also wondered whether these Kafka only formats should be moved to the
> > > Kafka connector repo, or whether they might in the future be used
> outside
> > > Kafka – e.g. Avro/Protobuf files in a database.
> > >Kind regards, David.
> > >
> > >
> > > From: Kevin Lam 
> > > Date: Wednesday, 21 February 2024 at 18:51
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf
> > > Confluent Format
> > > I would love to get some feedback from the community on this JIRA
> issue:
> > > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440
> > >
> > > I am looking into creating a PR and would appreciate some review on the
> > > approach.
> > >
> > > In terms of design I think we can mirror the `debezium-avro-confluent`
> > and
> > > `avro-confluent` formats already available in Flink:
> > >
> > >1. `protobuf-confluent` format which uses DynamicM

Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-27 Thread Robert Metzger
Hi all,

+1 to support the format in Flink.

@Kevin: Do you already have an implementation for this inhouse that you are
looking to upstream, or would you start from scratch?
I'm asking because my current employer, Confluent, has a Protobuf Schema
registry implementation for Flink, and I could help drive contributing this
back to open source.
If you already have an implementation, let's decide which one to use :)

Best,
Robert

On Thu, Feb 22, 2024 at 2:05 PM David Radley 
wrote:

> Hi Kevin,
> Some thoughts on this.
> I suggested an Apicurio registry format in the dev list, and was advised
> to raise a FLIP for this, I suggest the same would apply here (or the
> alternative to FLIPs if you cannot raise one). I am prototyping an Avro
> Apicurio format, prior to raising the Flip,  and notice that the readSchema
> in the SchemaCoder only takes a byte array ,but I need to pass down the
> Kafka headers (where the Apicurio globalId identifying the schema lives).
>
> I assume:
>
>   *   for the confluent Protobuf format you would extend the Protobuf
> format to drive some Schema Registry logic for Protobuf (similar to the way
> Avro does it) where the magic byte _ schema id can be obtained and the
> schema looked up using the Confluent Schema registry.
>   *   It would be good if any protobuf format enhancements for Schema
> registries pass down the Kafka headers (I am thinking as a Map Object> for Avro) as well as the message payload so Apicurio registry could
> work with this.
>   *   It would make sense to have the Confluent schema lookup in common
> code, which is part of the SchemaCoder readSchema  logic.
>   *   I assume the ProtobufSchemaCoder readSchema would return a Protobuf
> Schema object.
>
>
>
> I also wondered whether these Kafka only formats should be moved to the
> Kafka connector repo, or whether they might in the future be used outside
> Kafka – e.g. Avro/Protobuf files in a database.
>Kind regards, David.
>
>
> From: Kevin Lam 
> Date: Wednesday, 21 February 2024 at 18:51
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf
> Confluent Format
> I would love to get some feedback from the community on this JIRA issue:
> https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440
>
> I am looking into creating a PR and would appreciate some review on the
> approach.
>
> In terms of design I think we can mirror the `debezium-avro-confluent` and
> `avro-confluent` formats already available in Flink:
>
>1. `protobuf-confluent` format which uses DynamicMessage
><
> https://protobuf.dev/reference/java/api-docs/com/google/protobuf/DynamicMessage
> >
>for encoding and decoding.
>   - For encoding the Flink RowType will be used to dynamically create a
>   Protobuf Schema and register it with the Confluent Schema
> Registry. It will
>   use the same schema to construct a DynamicMessage and serialize it.
>   - For decoding, the schema will be fetched from the registry and use
>   DynamicMessage to deserialize and convert the Protobuf object to a
> Flink
>   RowData.
>   - Note: here there is no external .proto file
>2. `debezium-avro-confluent` format which unpacks the Debezium Envelope
>and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT, DELETE
>events.
>   - We may be able to refactor and reuse code from the existing
>   DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema
> since
>   the deser logic is largely delegated to and these Schemas are
> concerned
>   with the handling the Debezium envelope.
>3. Move the Confluent Schema Registry Client code to a separate maven
>module, flink-formats/flink-confluent-common, and extend it to support
>ProtobufSchemaProvider
><
> https://github.com/confluentinc/schema-registry/blob/ca226f2e1e2091c67b372338221b57fdd435d9f2/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider.java#L26
> >
>.
>
>
> Does anyone have any feedback or objections to this approach?
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-09 Thread Robert Metzger
+1 (binding)


On Tue, Jan 9, 2024 at 9:54 AM Guowei Ma  wrote:

> +1 (binding)
> Best,
> Guowei
>
>
> On Tue, Jan 9, 2024 at 4:49 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Rui
> >
> > On Tue, Jan 9, 2024 at 4:41 PM Hang Ruan  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Hang
> > >
> > > gongzhongqiang  于2024年1月9日周二 16:25写道:
> > >
> > > > +1 non-binding
> > > >
> > > > Best,
> > > > Zhongqiang
> > > >
> > > > Leonard Xu  于2024年1月9日周二 15:05写道:
> > > >
> > > > > Hello all,
> > > > >
> > > > > This is the official vote whether to accept the Flink CDC code
> > > > contribution
> > > > >  to Apache Flink.
> > > > >
> > > > > The current Flink CDC code, documentation, and website can be
> > > > > found here:
> > > > > code: https://github.com/ververica/flink-cdc-connectors <
> > > > > https://github.com/ververica/flink-cdc-connectors>
> > > > > docs: https://ververica.github.io/flink-cdc-connectors/ <
> > > > > https://ververica.github.io/flink-cdc-connectors/>
> > > > >
> > > > > This vote should capture whether the Apache Flink community is
> > > interested
> > > > > in accepting, maintaining, and evolving Flink CDC.
> > > > >
> > > > > Regarding my original proposal[1] in the dev mailing list, I firmly
> > > > believe
> > > > > that this initiative aligns perfectly with Flink. For the Flink
> > > > community,
> > > > > it represents an opportunity to bolster Flink's competitive edge in
> > > > > streaming
> > > > > data integration, fostering the robust growth and prosperity of the
> > > > Apache
> > > > > Flink
> > > > > ecosystem. For the Flink CDC project, becoming a sub-project of
> > Apache
> > > > > Flink
> > > > > means becoming an integral part of a neutral open-source community,
> > > > > capable of
> > > > > attracting a more diverse pool of contributors.
> > > > >
> > > > > All Flink CDC maintainers are dedicated to continuously
> contributing
> > to
> > > > > achieve
> > > > > seamless integration with Flink. Additionally, PMC members like
> Jark,
> > > > > Qingsheng,
> > > > > and I are willing to infacilitate the expansion of contributors and
> > > > > committers to
> > > > > effectively maintain this new sub-project.
> > > > >
> > > > > This is a "Adoption of a new Codebase" vote as per the Flink bylaws
> > > [2].
> > > > > Only PMC votes are binding. The vote will be open at least 7 days
> > > > > (excluding weekends), meaning until Thursday January 18 12:00 UTC,
> or
> > > > > until we
> > > > > achieve the 2/3rd majority. We will follow the instructions in the
> > > Flink
> > > > > Bylaws
> > > > > in the case of insufficient active binding voters:
> > > > >
> > > > > > 1. Wait until the minimum length of the voting passes.
> > > > > > 2. Publicly reach out via personal email to the remaining binding
> > > > voters
> > > > > in the
> > > > > voting mail thread for at least 2 attempts with at least 7 days
> > between
> > > > > two attempts.
> > > > > > 3. If the binding voter being contacted still failed to respond
> > after
> > > > > all the attempts,
> > > > > the binding voter will be considered as inactive for the purpose of
> > > this
> > > > > particular voting.
> > > > >
> > > > > Welcome voting !
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > > [1]
> https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > > >
> > >
> >
>


Re: [DISCUSS] Removal of unused e2e tests

2023-10-24 Thread Robert Metzger
I left a comment in FLINK-5.

On Mon, Oct 23, 2023 at 5:18 AM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> FLINK-17375 [1] removed [2] run-pre-commit-tests.sh in Flink 1.12. Since
> then the following tests are not executed anymore:
> test_state_migration.sh
> test_state_evolution.sh
> test_streaming_kinesis.sh
> test_streaming_classloader.sh
> test_streaming_distributed_cache_via_blob.sh
>
> Certain classes that were prior used for classloading and state evolution
> testing only via the aforementioned scripts are still in the project. I
> would like to understand if the removal was deliberate and if it is OK to
> do a clean up [3].
>
> [1] https://issues.apache.org/jira/browse/FLINK-17375
> [2]
>
> https://github.com/apache/flink/pull/12268/files#diff-39f0aea40d2dd3f026544bb4c2502b2e9eab4c825df5f2b68c6d4ca8c39d7b5e
> [3] https://issues.apache.org/jira/browse/FLINK-5
>
> Best,
> Alexander Fedulov
>


[jira] [Created] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-33217:
--

 Summary: Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL 
type in array
 Key: FLINK-33217
 URL: https://issues.apache.org/jira/browse/FLINK-33217
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.3, 1.18.0, 1.19.0
Reporter: Robert Metzger


Steps to reproduce:

Take a column of type 
{code:java}
business_data ROW<`id` STRING, `updateEvent` ARRAY 
NOT NULL>> {code}
Take this query
{code:java}
select id, ue_name from reproduce_unnest LEFT JOIN 
UNNEST(reproduce_unnest.business_data.updateEvent) AS exploded_ue(ue_name) ON 
true {code}
And get this error
{code:java}
Caused by: java.lang.AssertionError: Type mismatch:rowtype of rel before 
registration: RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) 
business_data, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" ue_name) NOT 
NULLrowtype of rel after registration: 
RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) business_data, 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT NULL name) NOT 
NULLDifference:ue_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT NULL
at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)at 
org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)   at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)  at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
 at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
 at 
org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
 at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315) 
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
 {code}
I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
the latest master branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP 333 - Redesign Apache Flink website

2023-09-29 Thread Robert Metzger
There's now a PR (https://github.com/apache/flink-web/pull/676) and a
preview available for this FLIP (
https://website-refresh.d193kg429zpv7e.amplifyapp.com/).

On Fri, Jul 28, 2023 at 8:45 AM Mohan, Deepthi 
wrote:

> Matthias, Markos, Martijn, thank you for your feedback.
>
> Markos, I've addressed you feedback re: separating use cases from the
> Flink capabilities. I think it is a positive change that will help new
> users distinguish between the two.
> Also attached screenshots for 'light mode'. I am personally partial to
> dark mode, and several developers say they prefer dark mode vs light
> generally. However, this is not a specific question I've posed to customers
> regarding the Flink website. In addition, after talking to engineering,
> I've been told it's not 2x the effort to introduce both modes on the
> website and a toggle to switch between the two. For implementation, we
> could start with one mode, without the toggle.
>
> Mattias, your comments about accessibility were very useful and helped us
> to further improve the design. A UX designer (Kaushal, also new to the
> community) helped evaluate the color and text size for accessibility. He
> can respond to any specific questions that you may have about
> accessibility. I did not quite get the related comment about "the menu
> structure stays the same and there are no plans to replace text with
> images". However, since it's not there in the current website I propose we
> table this conversation for now.
>
> Martijn, thanks for your comment on accessibility. I hope some of your
> concerns are addressed above in my response to Mattias and the screenshots
> now attached to the FLIP. I have purposely kept documentation out of scope
> due to the comments received in the previous discussion thread on this
> topic [1]. We will also include links to the blog and GitHub repo in the
> drop down under the getting started menu as well as include the links at
> the bottom of the page (as seen in a few other Apache websites).
>
> [1] https://lists.apache.org/thread/c3pt00cf77lrtgt242p26lgp9l2z5yc8
>
> Thanks,
> Deepthi
>
>
>
>
> On 7/23/23, 7:22 PM, "liu ron"  ron9@gmail.com>> wrote:
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
>
>
>
> +1,
>
>
> The dark mode looks very cool.
>
>
> Best,
> Ron
>
>
> Matthias Pohl  matthias.p...@aiven.io.inva>lid> 于2023年7月20日周四 15:45写道:
>
>
> > I think Martijn and Markos brought up a few good points:
> >
> > - We shouldn't degrade the accessibility but ideally improve it as part
> of
> > the redesign. The current proposal doesn't look like we're doing changes
> in
> > a way that it lowers the accessibility (considering that the menu
> structure
> > stays the same and there are no plans to replace text with images). But
> > nonetheless, it would be good to have this topic explicitly covered in
> the
> > FLIP.
> >
> > - I see Markos' concern about the white vs black background (Flink
> > documentation vs Flink website): Does it make a big difference to change
> to
> > a white background design? The switch to dark mode could happen after a
> > similar (or the same) theme is supported by the documentation. Or are
> there
> > other reasons why the dark background is favorable?
> >
> > Best,
> > Matthias
> >
> > On Wed, Jul 19, 2023 at 12:28 PM Markos Sfikas
> > mailto:markos.sfi...@aiven.io.inva>lid>
> wrote:
> >
> > > +1 Thanks for proposing this FLIP, Deepthi.
> > >
> > > The designs on FLIP-333 [1] look fresh and modern and I feel they
> achieve
> > > the goal in general.
> > >
> > > A couple of suggestions from my side could be the following:
> > >
> > > [a] Assuming that no changes are implemented to the Flink
> documentation,
> > I
> > > would like to see a visual with a 'white background' instead of the
> 'dark
> > > mode'. This is primarily for two reasons: Firstly, it provides a more
> > > consistent experience for the website visitor going from the home page
> to
> > > the documentation (instead of switching from dark to white mode on the
> > > website) and secondly, from an accessibility and inclusivity
> perspective
> > > that was mentioned earlier, we should give the option to either switch
> > > between dark and white mode or have something that is universally easy
> to
> > > read and consume (not everyone is comfortable reading white text on
> dark
> > > background).
> > >
> > > [b] Regarding structuring the home page, right now the Flink website
> has
> > > use cases blending with what seems to be Flink's 'technical
> > > characteristics' (i.e. the sections that talk about 'Guaranteed
> > > correctness', 'Layered APIs', 'Operational Focus', etc.). As someone
> new
> > to
> > > Flink and considering using the technology, I would like to understand
> > > firstly the use cases and secondly dive into the characteristics that
> > make
> > > Flink stand out. I would suggest 

Re: [VOTE] Apache Flink Stateful Functions Release 3.3.0, release candidate #2

2023-09-14 Thread Robert Metzger
I did a shallow pass over the release for it to get the +3 votes. Please
verify other aspects of the release when voting ;)

+1 (binding)

- maven clean install on the source tgz (not on an M1 macbook because of
protoc, but on x86 linux) (not on Java 17 either ;) )
- staging repo seem fine
- statefun-shaded/statefun-protobuf-shaded seems to properly report
bundled dependencies
- statefun-sdk-java reports bundled dependency
.. I believe the concerns raised by Danny have been all addressed, based on
these two random probes


On Tue, Sep 12, 2023 at 12:25 PM Martijn Visser 
wrote:

> Hi everyone,
>
> Please review [1] and vote on the release candidate #2 for the version
> 3.3.0 of Apache Flink Stateful Functions,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Stateful Functions canonical source distribution, to be deployed to the
> release repository at dist.apache.org
> b) Stateful Functions Python SDK distributions to be deployed to PyPI
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Dockerfiles for new images to be deployed to Docker Hub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
> * PR for new Dockerfiles for this release [4]
>
> All artifacts are signed with the key with fingerprint
> A5F3BCE4CBE993573EC5966A65321B8382B219AF [5]
>
> Other links for your review:
> * JIRA release notes [6]
> * source code tag "release-3.3.0-rc1" [7]
> * PR to update the website Downloads page to include Stateful Functions
> links [8]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release manager
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Stateful+Functions+Release
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-3.3.0-rc2/
> [3]
> https://repository.apache.org/content/repositories/orgapacheflink-1653/
> [4] https://github.com/apache/flink-statefun-docker/pull/20
> [5] https://dist.apache.org/repos/dist/release/flink/KEYS
> [6]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351276
> [7] https://github.com/apache/flink-statefun/tree/release-3.3.0-rc2
> [8] https://github.com/apache/flink-web/pull/674
>


[jira] [Created] (FLINK-32439) Kubernetes operator is silently overwriting the "execution.savepoint.path" config

2023-06-26 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-32439:
--

 Summary: Kubernetes operator is silently overwriting the 
"execution.savepoint.path" config
 Key: FLINK-32439
 URL: https://issues.apache.org/jira/browse/FLINK-32439
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Robert Metzger


I recently stumbled across the fact that the K8s operator is silently deleting 
/ overwriting the execution.savepoint.path config option.

I understand why this happens, but I wonder if the operator should write a log 
message if the user configured the execution.savepoint.path option.

And / or add a list to the docs about "Operator managed" config options?

https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java#L155-L159



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-15 Thread Robert Metzger
Thanks for the FLIP.

Some comments:
1. Can you specify the full proposed configuration name? "
scaling-cooldown-period" is probably not the full config name?
2. Why is the concept of scaling events and a scaling queue needed? If I
remember correctly, the adaptive scheduler will just check how many
TaskManagers are available and then adjust the execution graph accordingly.
There's no need to store a number of scaling events. We just need to
determine the time to trigger an adjustment of the execution graph.
3. What's the behavior wrt to JobManager failures (e.g. we lose the state
of the Adaptive Scheduler?). My proposal would be to just reset the
cooldown period, so after recovery of a JobManager, we have to wait at
least for the cooldown period until further scaling operations are done.
4. What's the relationship to the
"jobmanager.adaptive-scheduler.resource-stabilization-timeout"
configuration?

Thanks a lot for working on this!

Best,
Robert

On Wed, Jun 14, 2023 at 3:38 PM Etienne Chauchot 
wrote:

> Hi all,
>
> @Yukia,I updated the FLIP to include the aggregation of the staked
> operations that we discussed below PTAL.
>
> Best
>
> Etienne
>
>
> Le 13/06/2023 à 16:31, Etienne Chauchot a écrit :
> > Hi Yuxia,
> >
> > Thanks for your feedback. The number of potentially stacked operations
> > depends on the configured length of the cooldown period.
> >
> >
> >
> > The proposition in the FLIP is to add a minimum delay between 2 scaling
> > operations. But, indeed, an optimization could be to still stack the
> > operations (that arrive during a cooldown period) but maybe not take
> > only the last operation but rather aggregate them in order to end up
> > with a single aggregated operation when the cooldown period ends. For
> > example, let's say 3 taskManagers come up and 1 comes down during the
> > cooldown period, we could generate a single operation of scale up +2
> > when the period ends.
> >
> > As a side note regarding your comment on "it'll take a long time to
> > finish all", please keep in mind that the reactive mode (at least for
> > now) is only available for streaming pipeline which are in essence
> > infinite processing.
> >
> > Another side note: when you mention "every taskManagers connecting",
> > if you are referring to the start of the pipeline, please keep in mind
> > that the adaptive scheduler has a "waiting for resources" timeout
> > period before starting the pipeline in which all taskmanagers connect
> > and the parallelism is decided.
> >
> > Best
> >
> > Etienne
> >
> > Le 13/06/2023 à 03:58, yuxia a écrit :
> >> Hi, Etienne. Thanks for driving it. I have one question about the
> >> mechanism of the cooldown timeout.
> >>
> >> From the Proposed Changes part, if a scalling event is received and
> >> it falls during the cooldown period, it'll be stacked to be executed
> >> after the period ends. Also, from the description of FLINK-21883[1],
> >> cooldown timeout is to avoid rescaling the job very frequently,
> >> because TaskManagers are not all connecting at the same time.
> >>
> >> So, is it possible that every taskmanager connecting will produce a
> >> scalling event and it'll be stacked with many scale up event which
> >> causes it'll take a long time to finish all? Can we just take the
> >> last one event?
> >>
> >> [1]: https://issues.apache.org/jira/browse/FLINK-21883
> >>
> >> Best regards, Yuxia
> >>
> >> - 原始邮件 - 发件人: "Etienne Chauchot" 
> >> 收件人:
> >> "dev" , "Robert Metzger" 
> >> 发送时间: 星期一, 2023年 6 月 12日 下午 11:34:25 主题: [DISCUSS] FLIP-322
> >> Cooldown
> >> period for adaptive scheduler
> >>
> >> Hi,
> >>
> >> I’d like to start a discussion about FLIP-322 [1] which introduces a
> >> cooldown period for the adaptive scheduler.
> >>
> >> I'd like to get your feedback especially @Robert as you opened the
> >> related ticket and worked on the reactive mode a lot.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler
> >>
> >>
> >>
> > Best
> >>
> >> Etienne


[jira] [Created] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-31840:
--

 Summary: NullPointerException in 
operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
 Key: FLINK-31840
 URL: https://issues.apache.org/jira/browse/FLINK-31840
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger


While running a Flink SQL Query (with a hop window), I got this error.
{code}
Caused by: 
org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
Could not forward element to next operator
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
at StreamExecCalc$11.processElement(Unknown Source)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
... 23 more
Caused by: java.lang.NullPointerException
at 
org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
at 
org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
... 29 more
{code}

It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31834) Azure Warning: no space left on device

2023-04-18 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-31834:
--

 Summary: Azure Warning: no space left on device
 Key: FLINK-31834
 URL: https://issues.apache.org/jira/browse/FLINK-31834
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Reporter: Robert Metzger


In this CI run: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21

There was this warning:
{code}
Loaded image: confluentinc/cp-kafka:6.2.2
Loaded image: testcontainers/ryuk:0.3.3
ApplyLayer exit status 1 stdout:  stderr: write /opt/jdk-15.0.1+9/lib/modules: 
no space left on device
##[error]Bash exited with code '1'.
Finishing: Restore docker images
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31810) RocksDBException: Bad table magic number on checkpoint rescale

2023-04-14 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-31810:
--

 Summary: RocksDBException: Bad table magic number on checkpoint 
rescale
 Key: FLINK-31810
 URL: https://issues.apache.org/jira/browse/FLINK-31810
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.15.2
Reporter: Robert Metzger


While rescaling a job from checkpoint, I ran into this exception:

{code:java}
SinkMaterializer[7] -> rob-result[7]: Writer -> rob-result[7]: Committer 
(4/4)#3 (c1b348f7eed6e1ce0e41ef75338ae754) switched from INITIALIZING to FAILED 
with failure cause: java.lang.Exception: Exception while creating 
StreamOperatorStateContext.
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:265)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:703)
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:679)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:646)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state 
backend for SinkUpsertMaterializer_7d9b7588bc2ff89baed50d7a4558caa4_(4/4) from 
any of the 1 provided restore options.
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
... 11 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
unexpected exception.
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:395)
at 
org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:483)
at 
org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:97)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:329)
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 13 more
Caused by: java.io.IOException: Error while opening RocksDB instance.
at 
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:92)
at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreDBInstanceFromStateHandle(RocksDBIncrementalRestoreOperation.java:465)
at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:321)
at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:164)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:315)
... 18 more
Caused by: org.rocksdb.RocksDBException: Bad table magic number: expected 
9863518390377041911, found 4096 in 
/tmp/job__op_SinkUpsertMaterializer_7d9b7588bc2ff89baed50d7a4558caa4__4_4__uuid_d5587dfc-78b3-427c-8cb6-35507b71bc4b/46475654-5515-430e-b215-389d42cddb97/000232.sst
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:306)
at 
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:80)
... 22 more
{code}

I haven't found any other

Re: [VOTE] Release flink-connector-aws, 4.1.0 for Flink 1.17

2023-04-12 Thread Robert Metzger
+1 (binding)

- tried out the kinesis sql binary with sql client
- staging binaries look fine

On Mon, Apr 3, 2023 at 10:12 PM Elphas Tori  wrote:

> +1 (non-binding)
>
> + verified hashes and signatures
> + checked local build of website pull request and approved
>
> On 2023/04/03 16:19:00 Danny Cranmer wrote:
> > Hi everyone,
> > Please review and vote on the binaries for flink-connector-aws version
> > 4.1.0-1.17, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The v4.1.0 source release has already been approved [1], this vote is to
> > distribute the binaries for Flink 1.17 support.
> >
> > The complete staging area is available for your review, which includes:
> > * all artifacts to be deployed to the Maven Central Repository [2], which
> > are signed with the key with fingerprint
> > 0F79F2AFB2351BC29678544591F9C1EC125FD8DB [3],
> > * website pull request listing the new release [4].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Danny
> >
> > [1] https://lists.apache.org/thread/7q3ysg9jz5cjwdgdmgckbnqhxh44ncmv
> > [2]
> https://repository.apache.org/content/repositories/orgapacheflink-1602/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4] https://github.com/apache/flink-web/pull/628
> >
>


Re: [VOTE] Release flink-connector-aws 4.1.0, release candidate #1

2023-03-22 Thread Robert Metzger
+1 (binding)

Note, in the NOTICE files, the copyright year is still only until 2022. I
don't think it's a blocker, but should be updated.

+ Checked the website update
+ checked the source archive for copied code (no licenses are listed in the
root NOTICE file)
+ src archive contents build locally


On Thu, Mar 16, 2023 at 1:11 PM Danny Cranmer 
wrote:

> Hi everyone,
> Please review and vote on the flink-connector-aws release candidate #1 for
> the version 4.1.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint
> 0F79F2AFB2351BC29678544591F9C1EC125FD8DB [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v4.1.0-rc1 [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours (ending 2023-03-21 13:00 UTC).
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Danny
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352646
> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-aws-4.1.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1599/
> [5] https://github.com/apache/flink-connector-aws/releases/tag/v4.1.0-rc1
> [6] https://github.com/apache/flink-web/pull/623
>


Re: [VOTE] Release 1.17.0, release candidate #3

2023-03-21 Thread Robert Metzger
+1 (binding)

+ Started Flink locally with the changelog statebackend + rocksdb against
minio
+ used the maven artifacts to develop a little, large-state sample job to
test with the changelog statebackend
+ Opened this PR https://github.com/apache/flink/pull/22232
+ Manually eyeballed the diff for license issues, looked fine
+ build Flink from source (from the src archive)



On Tue, Mar 21, 2023 at 11:41 AM Sergey Nuyanzin 
wrote:

> +1 (non-binding)
> - downloaded, checked hashes
> - verified signatures
> - built from sources
> - checked LICENSE and NOTICE files
> - ran simple jobs
>
>
> On Tue, Mar 21, 2023, 11:16 Etienne Chauchot  wrote:
>
> > Hi all,
> >
> > - I read the release notes: I'd suggest if possible that we group the
> > subtasks by main task and show the main tasks to give better
> > understanding to the reader.
> >
> > - tested RC3 on a standalone cluster with this user code:
> > https://github.com/echauchot/tpcds-benchmark-flink
> >
> > +1 (not-binding)
> >
> > Best Etienne
> >
> > Le 17/03/2023 à 15:01, Qingsheng Ren a écrit :
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #3 for the version
> > 1.17.0,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > >
> > > * JIRA release notes [1], and the pull request adding release note for
> > > users [2]
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [3], which are signed with the key with
> > > fingerprint A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [4],
> > > * all artifacts to be deployed to the Maven Central Repository [5],
> > > * source code tag "release-1.17.0-rc3" [6],
> > > * website pull request listing the new release and adding announcement
> > blog
> > > post [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351585
> > > [2] https://github.com/apache/flink/pull/22146
> > > [3] https://dist.apache.org/repos/dist/dev/flink/flink-1.17.0-rc3/
> > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [5]
> > https://repository.apache.org/content/repositories/orgapacheflink-1600
> > > [6] https://github.com/apache/flink/releases/tag/release-1.17.0-rc3
> > > [7] https://github.com/apache/flink-web/pull/618
> > >
> > > Thanks,
> > > Martijn and Matthias, Leonard and Qingsheng
> > >
> >
>


Re: [VOTE] Release flink-connector-dynamodb v3.0.0, release candidate #0

2022-12-02 Thread Robert Metzger
+1 (binding)

- the source tgz archive contents look fine
- staging repo contents are fine
- verified web PR

On Wed, Nov 30, 2022 at 9:26 AM Danny Cranmer 
wrote:

> +1 (binding)
>
> - Validated hashes/signature
> - Verified that no binaries exist in the source archive
> - Build the source with Maven
> - Verified NOTICE file
> - Verified versions in pom files are correct
> - Verified SinkIntoDynamoDb sample application writes to DynamoDB
>
> Thanks,
> Danny
>
> On Tue, Nov 29, 2022 at 9:26 PM Martijn Visser 
> wrote:
>
> > Hi Danny,
> >
> > +1 (binding)
> >
> > - Validated hashes
> > - Verified signature
> > - Verified that no binaries exist in the source archive
> > - Build the source with Maven
> > - Verified licenses
> > - Verified web PR
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, Nov 29, 2022 at 8:37 PM Martijn Visser 
> wrote:
> >
> > > Hi Danny,
> > >
> > > +1 (binding)
> > >
> > > - Validated hashes
> > > - Verified signature
> > > - Verified that no binaries exist in the source archive
> > > - Build the source with Maven
> > > - Verified licenses
> > > - Verified web PR
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Tue, Nov 29, 2022 at 12:57 PM Hamdy, Ahmed
> > 
> > > wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> On 29/11/2022, 08:27, "Teoh, Hong" 
> > wrote:
> > >>
> > >> CAUTION: This email originated from outside of the organization.
> Do
> > >> not click links or open attachments unless you can confirm the sender
> > and
> > >> know the content is safe.
> > >>
> > >>
> > >>
> > >> +1 (non-binding)
> > >>
> > >> * Hashes and Signatures look good
> > >> * All required files on dist.apache.org
> > >> * Tag is present in Github
> > >> * Verified source archive does not contain any binary files
> > >> * Source archive builds using maven
> > >> * Started packaged example SQL job using SQL client. Verified that
> > it
> > >> writes successfully to the sink table.
> > >> * Verified sink metrics look ok.
> > >>
> > >>
> > >> Cheers,
> > >> Hong
> > >>
> > >> On 28/11/2022, 16:44, "Danny Cranmer" 
> > >> wrote:
> > >>
> > >> CAUTION: This email originated from outside of the
> organization.
> > >> Do not click links or open attachments unless you can confirm the
> sender
> > >> and know the content is safe.
> > >>
> > >>
> > >>
> > >> Hi everyone,
> > >> Please review and vote on the release candidate #0 for the
> > >> version 3.0.0 as
> > >> follows:
> > >> [ ] +1, Approve the release
> > >> [ ] -1, Do not approve the release (please provide specific
> > >> comments)
> > >>
> > >>
> > >> The complete staging area is available for your review, which
> > >> includes:
> > >> * JIRA release notes [1],
> > >> * the official Apache source release to be deployed to
> > >> dist.apache.org [2],
> > >> which are signed with the key with fingerprint 125FD8DB [3],
> > >> * all artifacts to be deployed to the Maven Central Repository
> > >> [4],
> > >> * source code tag v3.0.0-rc0 [5],
> > >> * website pull request listing the new release [6].
> > >>
> > >> The vote will be open for at least 72 hours (Thursday 1st
> > >> December 17:00
> > >> UTC). It is adopted by majority approval, with at least 3 PMC
> > >> affirmative
> > >> votes.
> > >>
> > >> Please note, this is a new connector and the first release.
> > >>
> > >> Thanks,
> > >> Danny
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352277
> > >> [2]
> > >>
> > >>
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-aws-3.0.0-rc0
> > >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >> [4]
> > >>
> https://repository.apache.org/content/repositories/orgapacheflink-1552/
> > >> [5]
> > >> https://github.com/apache/flink-connector-aws/releases/tag/v3.0.0-rc0
> > >> [6] https://github.com/apache/flink-web/pull/588
> > >>
> > >>
> > >>
> >
>


Re: [DISCUSS] Repeatable cleanup of checkpoint data

2022-11-25 Thread Robert Metzger
Yeah, at some point I've investigated performance issues with AWS K8s. They
have somewhat strict rate limits on the K8s api server.

You run into the rate limits by configuring a very high checkpoint
frequency (I guess something like 500ms) and a high
state.checkpoints.num-retained count (e.g. 10). On each checkpoint
completion (every 500ms), a list of 10 checkpoints is written to a
ConfigMap. IIRC the combination of frequent, large ConfigMap updates were
the factors that really killed it.
When you run into the rate limits, the K8s client in Flink is throwing an
error, causing loss of leadership.

I don't know enough about the internals of the K8s HA store, but for
solving the problem with a high number of retained checkpoints, I wonder if
it has any benefits to create a ConfigMap per retained checkpoint, instead
of a big ConfigMap, which contains a list of retained checkpoints?
Otherwise, I don't have any ideas at the moment how to mitigate this
problem.

I hope this helps.


On Thu, Nov 10, 2022 at 10:56 AM Matthias Pohl
 wrote:

> Thanks for sharing your opinions on the proposal. The concerns sound
> reasonable. I guess, I'm going to follow-up on Chesnay's idea about
> combining multiple requests into one for the k8s implementation. Having a
> performance test for the k8s API server access sounds like a good idea,
> too. Both action items are a prerequisite before continuing with FLIP-270
> [1].
>
> @Yang Wang: Do we have some Jira issue or ML discussion on the k8s API
> server performance issues? I couldn't come up with a good search query
> myself. :-D
>
> @Robert (CC'd): was it you who worked on the k8s API server overload issue
> in 1.15? Do you have some memory about it or some starting point with
> source code or something similar?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-270%3A+Repeatable+Cleanup+of+Checkpoints
>
> On Mon, Nov 7, 2022 at 12:59 PM Chesnay Schepler 
> wrote:
>
> > This is a nice FLIP. I particular like how much background it provides
> > on the issue; something that other FLIPs could certainly benefit from...
> >
> > I went over the FLIP and had a chat with Matthias about it.
> >
> > Somewhat unrelated to the FLIP we found a flaw in the current cleanup
> > mechanism of failed checkpoints, where the JM deletes files while a TM
> > may still be in the process of writing checkpoint data. This is because
> > we never wait for an ack from the TMs that that have aborted the
> > checkpoint.
> > We additionally noted that when incremental checkpoints are enabled we
> > might be storing a large number of checkpoints in HA, without a
> > conclusion on what to do about it.
> >
> >
> > As for the FLIP itself, I'm concerned about proposal #2 because it
> > requires iterating over the entire checkpoint directory on /any/
> > failover to find checkpoints that can be deleted. This can be an
> > expensive operation for certain filesystems (S3), particularly when
> > incremental checkpoints are being used.
> > In the interest of fast failovers we ideally don't use mechanisms that
> > scale with.../anything/, really.
> >
> > However, storing more data in HA is also concerning, as Yang Wang
> > pointed out.
> > To not increase the number of requests made against HA we could maybe
> > consider looking into piggy-backing delete operations on other HA
> > operations, like the checkpoint counter increments.
> >
> > On that note, do we have any benchmarks for HA? I remember we looked
> > into that for...1.15 I believe at some point. With HA load being such a
> > major concern for this FLIP it would be good to have _something_ to
> > measure that.
> >
> > On 27/10/2022 14:20, Matthias Pohl wrote:
> > > I would like to bring this topic up one more time. I put some more
> > thought
> > > into it and created FLIP-270 [1] as a follow-up of FLIP-194 [2] with an
> > > updated version of what I summarized in my previous email. It would be
> > > interesting to get some additional perspectives on this; more
> > specifically,
> > > the two included proposals about either just repurposing the
> > > CompletedCheckpointStore into a more generic CheckpointStore or
> > refactoring
> > > the StateHandleStore interface moving all the cleanup logic from the
> > > CheckpointsCleaner and StateHandleStore into what's currently called
> > > CompletedCheckpointStore.
> > >
> > > Looking forward to feedback on that proposal.
> > >
> > > Best,
> > > Matthias
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-270%3A+Repeatable+Cleanup+of+Checkpoints
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore
> > >
> > > On Wed, Sep 28, 2022 at 4:07 PM Matthias Pohl
> > > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I’d like to start a discussion on repeatable cleanup of checkpoint
> data.
> > >> In FLIP-194 [1] we introduced repeatable cleanup of HA data along the
> > >> introduction of the JobResultStore component. The 

[jira] [Created] (FLINK-30083) Bump maven-shade-plugin to 3.4.0

2022-11-18 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-30083:
--

 Summary: Bump maven-shade-plugin to 3.4.0
 Key: FLINK-30083
 URL: https://issues.apache.org/jira/browse/FLINK-30083
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.17.0
Reporter: Robert Metzger
 Fix For: 1.17.0


FLINK-24273 proposes to relocate the io.fabric8 dependencies of 
flink-kubernetes.
This is not possible because of a problem with the maven shade plugin ("mvn 
install" doesn't work, it needs to be "mvn clean install").
MSHADE-425 solves this issue, and has been released with maven-shade-plugin 
3.4.0.

Upgrading the shade plugin will solve the problem, unblocking the K8s 
relocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29779) Allow using MiniCluster with a PluginManager to use metrics reporters

2022-10-27 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-29779:
--

 Summary: Allow using MiniCluster with a PluginManager to use 
metrics reporters
 Key: FLINK-29779
 URL: https://issues.apache.org/jira/browse/FLINK-29779
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.16.0
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 1.16.1


Currently, using MiniCluster with a metric reporter loaded as a plugin is not 
supported, because the {{ReporterSetup.fromConfiguration(config, null));}} gets 
passed {{null}} for the PluginManager.

I think it generally valuable to allow passing a PluginManager to the 
MiniCluster.

I'll open a PR for this.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Issue tracking workflow

2022-10-25 Thread Robert Metzger
Thank you for starting this discussion Xintong!

I would also prefer option 1.

The ASF Jira is probably one of the largest, public Jira instances on the
internet. Most other Jiras are internal within companies, so Atlassian is
probably not putting a lot of effort into automatically detecting and
preventing spam and malicious account creation.
If we want to convince Infra to keep the current sign up process, we
probably need to help them find a solution for the problem.
Maybe we can configure the ASF Jira to rely on GitHub as an identity
provider? I've just proposed that in the discussion on
us...@infra.apache.org, let's see ;)

Best,
Robert


On Tue, Oct 25, 2022 at 2:08 PM Konstantin Knauf  wrote:

> Hi everyone,
>
> while I see some benefits in moving to Github Issues completely, we need to
> be aware that Github Issues lacks many features that Jira has. From the top
> of my head:
> * there are no issue types
> * no priorities
> * issues can only be assigned to one milestone
> So, you need to work a lot with labels and conventions and basically need
> bots or actions to manage those. Agreeing on those processes, setting them
> up and getting used to them will be a lot of work for the community.
>
> So, I am also in favor of 1) for now, because I don't really see a good
> alternative option.
>
> Cheers,
>
> Konstantin
>
>
>
> Am Mo., 24. Okt. 2022 um 22:27 Uhr schrieb Matthias Pohl
> :
>
> > I agree that leaving everything as is would be the best option. I also
> tend
> > to lean towards option 4 as a fallback for the reasons already mentioned.
> > I'm still not a big fan of the Github issues. But that's probably only
> > because I'm used to the look-and-feel and the workflows of Jira. I see
> > certain benefits of moving to Github, though. We're still having the idea
> > of migrating from AzureCI to GitHub Actions. Moving the issues to GitHub
> as
> > well might improve the user experience even more. Reducing the number of
> > services a new contributor should be aware of to reach the community is a
> > good way to reduce the confusion for newcomers, I could imagine.
> > Additionally, I also like the fact that I wouldn't have to bother about
> the
> > Apache Jira markdown anymore. 8)
> >
> > I agree with Martijn's concern about not being able to track all
> > Flink-related issues in a single system. I'm just wondering whether
> > something is holding us back from collecting all Flink-related issues in
> > the Flink's Github repository and disabling the issue feature in any
> other
> > Flink-related repository?
> >
> > About migrating the Jira issues: I would be hopeful that migrating is
> > doable in the end. There is a blog post from the spring data guys about
> > their journey on migrating from Jira to GitHub issues [1]. Unfortunately,
> > they didn't provide any scripts.
> >
> > For the case that infra moves forward with disabling the signup without
> us
> > having come up with a decision and its actual execution (i.e. migrating
> the
> > Jira issues to GH), I would prefer having users send a request to the
> > mailing list. I would rather have a temporary phase where there's a bit
> of
> > overhead of registering the users in the Apache Jira than having two
> > locations for bug tracking. I suspect that there are no statistics on how
> > many new users register with Jira in a given timeframe to contribute to
> > Flink?
> >
> > Matthias
> >
> > [1]
> >
> >
> https://spring.io/blog/2021/01/07/spring-data-s-migration-from-jira-to-github-issues
> > [2] https://lists.apache.org/thread/pjb5jzvw41xjtzgf4w0gggpqrt2fq6ov
> >
> >
> > On Mon, Oct 24, 2022 at 10:28 AM Xintong Song 
> > wrote:
> >
> > > I agree with you that option 1) would be the best for us. Let's keep
> > hoping
> > > for the best.
> > >
> > > Option 4), as you said, comes with prices. At the moment, I don't have
> > > thorough answers to your questions.
> > >
> > > Just one quick response, I think there's a good chance that we can
> import
> > > current Jira tickets into GH. Jira supports exporting issues with
> fields
> > > that you specified as CSV/XML/... files. With a brief google search, I
> > > found some tools that help bulk creating issues in GH. E.g.,
> > > github-csv-tools [1] is described to support importing issues with
> title,
> > > body, labels, status and milestones from a CSV file. That's probably
> good
> > > enough for us to search/filter the issues in GH, and a link to the Jira
> > > ticket can be posted in the GH issue for complete conversation history
> > and
> > > other details.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > > [1] https://github.com/gavinr/github-csv-tools
> > >
> > >
> > >
> > > On Mon, Oct 24, 2022 at 3:58 PM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi Xintong,
> > > >
> > > > I'm also not in favour of option 2, I think that two systems will
> > result
> > > > in an administrative burden and less-efficient workflow. I'm also not
> > in
> > > > favour of 

[ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-10 Thread Robert Metzger
Hi everyone,

I'm very happy and excited to announce that Danny Cranmer has joined the
Flink PMC!

Danny has been a committer since January 2021. He has been very active with
all Amazon-related connectors and projects in Flink, such as DynamoDB,
Kinesis (EFO, Firehose, ...) and the AsyncSinkBase. Besides that, he takes
wider responsibility in running the project, such as creating releases,
helping with the effort to put connectors in separate repositories etc.

Congratulations and welcome Danny!

Best,
Robert (on behalf of the Flink PMC)


[jira] [Created] (FLINK-29492) Kafka exactly-once sink causes OutOfMemoryError

2022-10-01 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-29492:
--

 Summary: Kafka exactly-once sink causes OutOfMemoryError
 Key: FLINK-29492
 URL: https://issues.apache.org/jira/browse/FLINK-29492
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.15.2
Reporter: Robert Metzger


My Kafka exactly-once sinks are periodically failing with a `OutOfMemoryError: 
Java heap space`.

This looks very similar to FLINK-28250. But I am running 1.15.2, which contains 
a fix for FLINK-28250.

Exception:
{code:java}
java.io.IOException: Could not perform checkpoint 2281 for operator 
http_events[3]: Writer (1/1)#1.
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1210)
at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:147)
at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:287)
at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:64)
at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:493)
at 
org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.triggerGlobalCheckpoint(AbstractAlignedBarrierHandlerState.java:74)
at 
org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:66)
at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.lambda$processBarrier$2(SingleCheckpointBarrierHandler.java:234)
at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.markCheckpointAlignedAndTransformState(SingleCheckpointBarrierHandler.java:262)
at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:231)
at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:181)
at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:159)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:519)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not 
complete snapshot 2281 for operator http_events[3]: Writer (1/1)#1. Failure 
reason: Checkpoint was declined.
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:269)
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:173)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:227)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:212)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:192)
at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:647)
at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:320)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$12(StreamTask.java:1253

[jira] [Created] (FLINK-29212) Properly load Hadoop native libraries in Flink docker iamges

2022-09-06 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-29212:
--

 Summary: Properly load Hadoop native libraries in Flink docker 
iamges
 Key: FLINK-29212
 URL: https://issues.apache.org/jira/browse/FLINK-29212
 Project: Flink
  Issue Type: Bug
  Components: flink-docker
Affects Versions: 1.17.0
Reporter: Robert Metzger


On startup, Flink logs:

{code:java}
2022-09-04 12:36:03.559 [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - 
Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
{code}

Hadoop native libraries are used for:
- Compression Codecs (bzip2, lz4, zlib)
- Native IO utilities for HDFS Short-Circuit Local Reads and Centralized Cache 
Management in HDFS
- CRC32 checksum implementation
(Source: 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html)

Resolving this for the docker images we are providing should be easy, remove 
one unnecessary WARNing and provide performance benefits for some users.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29122) Improve robustness of FileUtils.expandDirectory()

2022-08-26 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-29122:
--

 Summary: Improve robustness of FileUtils.expandDirectory() 
 Key: FLINK-29122
 URL: https://issues.apache.org/jira/browse/FLINK-29122
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.16.0, 1.17.0
Reporter: Robert Metzger


`FileUtils.expandDirectory()` can potentially write to invalid locations if the 
zip file is invalid (contains entry names with ../).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [SUMMARY] Flink 1.16 release sync of 2022-08-02

2022-08-09 Thread Robert Metzger
Thanks. I had the old invite link in my calendar. I'll join next week again
;)


Re: [SUMMARY] Flink 1.16 release sync of 2022-08-02

2022-08-09 Thread Robert Metzger
Thanks for the summary.

I tried joining the sync today, but nobody joined. On the Flink Slack,
Martijn's status is set to vacation, so I guess the meeting didn't happen
today?

It seems that the pulsar blocker (FLINK-27399) is still open, but the PR
seems to make progress.

On Wed, Aug 3, 2022 at 6:46 AM Xingbo Huang  wrote:

> Hi everyone,
>
> I would like to give you a brief update of the Flink 1.16 release sync
> meating of 2022-08-02.
>
> Currently, we have 19 features that have been completed for this release
> and 32 features are still expected to make it. We only have one week
> remaining until the feature freeze (at 2022-08-09), which means that we
> will start with the release testing and cut the release branch once the CI
> is stable[1].
>
> We currently have one blocker ticket[2] that is being worked on. Many
> thanks to these contributors and reviewers.
>
> Next, we also have some critical test stability tickets[2] that are not
> picked up. We need to guarantee the CI stable before the release branch
> cut. You can either directly assign it to yourself (don't forget to mark it
> as In Progress) or ping me (@Huang Xingbo) to get it assigned to you. Your
> help is much appreciated.
>
> For more information about Flink release 1.16, you can refer to
> https://cwiki.apache.org/confluence/display/FLINK/1.16+Release
>
> The next Flink release sync will be on Tuesday the 9th of August at 9am
> CEST/ 3pm China Standard Time / 7am UTC. The link can also be found on the
> first page.
>
> On behalf of all the release managers,
>
> best regards,
> Xingbo
>
> [1]
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=561=planning=FLINK-28766=2540=100
> [2]
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=561=planning=FLINK-27399=2539=100
> [3]
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=561=FLINK=planning=FLINK-28440=2540=2544=100
>


Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-21 Thread Robert Metzger
+1

On Wed, Jul 20, 2022 at 10:48 PM Konstantin Knauf  wrote:

> +1. Thanks!
>
> Am Mi., 20. Juli 2022 um 16:48 Uhr schrieb Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>:
>
> > +1
> >
> > On Wed, Jul 20, 2022 at 6:13 AM Danny Cranmer 
> > wrote:
> >
> > > Hi there,
> > >
> > > After the discussion in [1], I’d like to open a voting thread for
> > FLIP-252
> > > [2], which proposes the addition of an Amazon DynamoDB sink based on
> the
> > > Async Sink [3].
> > >
> > > The vote will be open until July 23rd earliest (72h), unless there are
> > any
> > > binding vetos.
> > >
> > > Cheers, Danny
> > >
> > > [1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
> > > [3]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
> > >
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>


Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-20 Thread Robert Metzger
Thanks a lot for this nice proposal!

DynamoDB seems to be a connector that Flink is still lacking, and with the
Async Sink interface, it seems that we can implement this fairly easily.

+1 to proceed to the formal vote for this FLIP!

On Fri, Jul 15, 2022 at 7:51 PM Danny Cranmer 
wrote:

> Hello all,
>
> We would like to start a discussion thread on FLIP-252: Amazon DynamoDB
> Sink Connector [1] where we propose to provide a sink connector for Amazon
> DynamoDB [2] based on the Async Sink [3]. Looking forward to comments and
> feedback. Thank you.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
> [2] https://aws.amazon.com/dynamodb
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
>


Re: [VOTE] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-07-19 Thread Robert Metzger
+1

On Wed, Jul 20, 2022 at 4:41 AM Rui Fan <1996fan...@gmail.com> wrote:

> +1(non-binding)
>
> New Source can better support some features, such as
> Unaligned Checkpoint, Watermark alignment, etc.
> The data generator based on the new Source is very helpful
> for daily testing.
>
> Very much looking forward to using it.
>
> Best wishes
> Rui Fan
>
> On Wed, Jul 20, 2022 at 4:22 AM Martijn Visser 
> wrote:
>
> > +1 (binding)
> >
> > Thanks for the efforts Alex!
> >
> > Op di 19 jul. 2022 om 21:31 schreef Alexander Fedulov <
> > alexan...@ververica.com>:
> >
> > > Hi everyone,
> > >
> > > following the discussion in [1], I would like to open up a vote for
> > > adding a FLIP-27-based Data Generator Source [2].
> > >
> > > The addition of this source also unblocks the currently pending
> > > efforts for deprecating the Source Function API [3].
> > >
> > > The poll will be open until July 25 (72h + weekend), unless there is
> > > an objection or not enough votes.
> > >
> > > [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
> > > [2] https://cwiki.apache.org/confluence/x/9Av1D
> > > [3] https://github.com/apache/flink/pull/20049#issuecomment-1170948767
> > >
> > > Best,
> > > Alexander Fedulov
> > >
> >
>


[jira] [Created] (FLINK-28303) Kafka SQL Connector loses data when restoring from a savepoint with a topic with empty partitions

2022-06-29 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-28303:
--

 Summary: Kafka SQL Connector loses data when restoring from a 
savepoint with a topic with empty partitions
 Key: FLINK-28303
 URL: https://issues.apache.org/jira/browse/FLINK-28303
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.14.4
Reporter: Robert Metzger


Steps to reproduce:
- Set up a Kafka topic with 10 partitions
- produce records 0-9 into the topic
- take a savepoint and stop the job
- produce records 10-19 into the topic
- restore the job from the savepoint.

The job will be missing usually 2-4 records from 10-19.

My assumption is that if a partition never had data (which is likely with 10 
partitions and 10 records), the savepoint will only contain offsets for 
partitions with data. 
While the job was offline (and we've written record 10-19 into the topic), all 
partitions got filled. Now, when Kafka comes online again, it will use the 
"latest" offset for those partitions, skipping some data.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release 1.15.1, release candidate #1

2022-06-28 Thread Robert Metzger
+1 (binding)

- staging repo contents look fine
- KEYS file ok
- binaries start locally properly. WebUI accessible on Mac.

On Mon, Jun 27, 2022 at 11:21 AM Qingsheng Ren  wrote:

> +1 (non-binding)
>
> - checked/verified signatures and hashes
> - checked that all POM files point to the same version
> - built from source, without Hadoop and using Scala 2.12
> - started standalone cluster locally, WebUI is accessiable and ran
> WordCount example successfully
> - executed a job with SQL client consuming from Kafka source to collect
> sink
>
> Best,
> Qingsheng
>
>
> > On Jun 27, 2022, at 14:46, Xingbo Huang  wrote:
> >
> > +1 (non-binding)
> >
> > - verify signatures and checksums
> > - no binaries found in source archive
> > - build from source
> > - Reviewed the release note blog
> > - verify python wheel package contents
> > - pip install apache-flink-libraries and apache-flink wheel packages
> > - run the examples from Python Table API tutorial
> >
> > Best,
> > Xingbo
> >
> > Chesnay Schepler  于2022年6月24日周五 21:42写道:
> >
> >> +1 (binding)
> >>
> >> - signatures OK
> >> - all required artifacts appear to be present
> >> - tag exists with the correct version adjustments
> >> - binary shows correct commit and version
> >> - examples run fine
> >> - website PR looks good
> >>
> >> On 22/06/2022 14:20, David Anderson wrote:
> >>> Hi everyone,
> >>>
> >>> Please review and vote on release candidate #1 for version 1.15.1, as
> >>> follows:
> >>> [ ] +1, Approve the release
> >>> [ ] -1, Do not approve the release (please provide specific comments)
> >>>
> >>> The complete staging area is available for your review, which includes:
> >>>
> >>> * JIRA release notes [1],
> >>> * the official Apache source release and binary convenience releases to
> >> be
> >>> deployed to dist.apache.org [2], which are signed with the key with
> >>> fingerprint E982F098 [3],
> >>> * all artifacts to be deployed to the Maven Central Repository [4],
> >>> * source code tag "release-1.15.1-rc1" [5],
> >>> * website pull request listing the new release and adding announcement
> >> blog
> >>> post [6].
> >>>
> >>> The vote will be open for at least 72 hours. It is adopted by majority
> >>> approval, with at least 3 PMC affirmative votes.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>> [1]
> >>>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=
> >>> 12351546
> >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.1-rc1/
> >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >>> [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1511/
> >>> [5] https://github.com/apache/flink/tree/release-1.15.1-rc1
> >>> [6] https://github.com/apache/flink-web/pull/554
> >>>
> >>
> >>
>
>


[jira] [Created] (FLINK-28265) Inconsistency in Kubernetes HA service: broken state handle

2022-06-27 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-28265:
--

 Summary: Inconsistency in Kubernetes HA service: broken state 
handle
 Key: FLINK-28265
 URL: https://issues.apache.org/jira/browse/FLINK-28265
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.14.4
Reporter: Robert Metzger


I have a JobManager, which at some point failed to acknowledge a checkpoint:

{code}
Error while processing AcknowledgeCheckpoint message
org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete the 
pending checkpoint 193393. Failure reason: Failure to finalize checkpoint.
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1255)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1100)
at 
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89)
at 
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: 
org.apache.flink.runtime.persistence.StateHandleStore$AlreadyExistException: 
checkpointID-0193393 already exists in ConfigMap 
cm--jobmanager-leader
at 
org.apache.flink.kubernetes.highavailability.KubernetesStateHandleStore.getKeyAlreadyExistException(KubernetesStateHandleStore.java:534)
at 
org.apache.flink.kubernetes.highavailability.KubernetesStateHandleStore.lambda$addAndLock$0(KubernetesStateHandleStore.java:155)
at 
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:316)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
... 3 common frames omitted
{code}

the JobManager creates subsequent checkpoints successfully.
Upon failure, it tries to recover this checkpoint (0193393), but 
fails to do so because of:
{code}
Caused by: org.apache.flink.util.FlinkException: Could not retrieve checkpoint 
193393 from state handle under checkpointID-0193393. This indicates 
that the retrieved state handle is broken. Try cleaning the state handle store 
... Caused by: java.io.FileNotFoundException: No such file or directory: 
s3://xxx/flink-ha/xxx/completedCheckpoint72e30229420c
{code}

I'm running Flink 1.14.4.




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28260) flink-runtime-web fails to execute "npm ci" on Apple Silicon (arm64) without rosetta

2022-06-27 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-28260:
--

 Summary: flink-runtime-web fails to execute "npm ci" on Apple 
Silicon (arm64) without rosetta
 Key: FLINK-28260
 URL: https://issues.apache.org/jira/browse/FLINK-28260
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger


Flink 1.16-SNAPSHOT fails to build in the flink-runtime-web project because we 
are using an outdated frontend-maven-plugin (v 1.11.3).
This is the error:
{code}
[ERROR] Failed to execute goal 
com.github.eirslett:frontend-maven-plugin:1.11.3:npm (npm install) on project 
flink-runtime-web: Failed to run task: 'npm ci --cache-max=0 --no-save 
${npm.proxy}' failed. java.io.IOException: Cannot run program 
"/Users/rmetzger/Projects/flink/flink-runtime-web/web-dashboard/node/node" (in 
directory "/Users/rmetzger/Projects/flink/flink-runtime-web/web-dashboard"): 
error=86, Bad CPU type in executable -> [Help 1]
{code}

Using the latest frontend-maven-plugin (v. 1.12.1) properly passes the build, 
because this version downloads the proper arm64 npm version. However, 
frontend-maven-plugin 1.12.1 requires Maven 3.6.0, which is too high for Flink 
(highest mvn version for Flink is 3.2.5).

The best workaround is using rosetta on M1 Macs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28259) flink-parquet doesn't compile on M1 mac without rosetta

2022-06-27 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-28259:
--

 Summary: flink-parquet doesn't compile on M1 mac without rosetta
 Key: FLINK-28259
 URL: https://issues.apache.org/jira/browse/FLINK-28259
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.16.0
Reporter: Robert Metzger
Assignee: Robert Metzger


Compiling Flink 1.16-SNAPSHOT fails on an M1 Mac (apple silicon) without the 
rosetta translation layer, because the automatically downloaded 
"protoc-3.17.3-osx-aarch_64.exe" file is actually just a copy of 
"protoc-3.17.3-osx-x86_64.exe". (as you can read here: 
https://github.com/os72/protoc-jar/issues/93)

This is the error:
{code}
[ERROR] Failed to execute goal 
org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.1:test-compile (default) 
on project flink-parquet: An error occurred while invoking protoc. Error while 
executing process. Cannot run program 
"/Users/rmetzger/Projects/flink/flink-formats/flink-parquet/target/protoc-plugins/protoc-3.17.3-osx-aarch_64.exe":
 error=86, Bad CPU type in executable -> [Help 1]
{code}





--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28232) Allow for custom pre-flight checks for SQL UDFs

2022-06-23 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-28232:
--

 Summary: Allow for custom pre-flight checks for SQL UDFs
 Key: FLINK-28232
 URL: https://issues.apache.org/jira/browse/FLINK-28232
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Robert Metzger


Currently, implementors of SQL UDFs [1] can not validate the UDF input before 
submitting a SQL query to the runtime. 
Take for example a UDF that computes a regex based on user input. Ideally 
there's a callback for the UDF implementor to check if the user-provided regex 
is valid and compiles, to avoid errors during the execution of the SQL query.

It would be ideal to get access to the schema information resolved by the SQL 
planner in that pre-flight validation to also allow for schema related checks 
pre-flight.


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] Bi-Weekly Flink Community Sync Meeting

2022-06-02 Thread Robert Metzger
Thanks for your feedback!

Nobody should feel obliged to attend these meetings, or fear that they are
missing something by not attending. Everything relevant discussed there has
to be reflected on the mailing list, either as a meeting summary, or in
existing discussion threads.
My main motivation is to provide a room for people to get to know each
other, float some ideas and have informal conversations about Flink.
Maybe we should call the meeting "Release Sync & Virtual Get Together" or
something, to manage expectations?

Looking at other projects, this this not uncommon:

Apache Cassandra used to have such meetings for some time in 2020:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting

(also the K8s SIG:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+SIG
)

The Kubernetes project seems to have quite many meetings from the various
SIGs:
- Overview:
https://github.com/kubernetes/community/blob/master/events/community-meeting.md
- Calendar:
https://calendar.google.com/calendar/u/0/embed?src=calen...@kubernetes.io

Best,
Robert


On Tue, May 31, 2022 at 2:04 PM Konstantin Knauf  wrote:

> Hi everyone,
>
> can you be more specific what you mean by "current topics in the Flink
> Community"? Shouldn't asynchronous communication be the default, and if
> that doesn't work, we consider a synchronous channel?
>
> Cheers,
>
> Konstantin
>
> Am Di., 31. Mai 2022 um 13:54 Uhr schrieb Jing Ge :
>
> > +1
> > Sounds good! Thanks Robert!
> >
> > On Tue, May 31, 2022 at 1:46 PM Márton Balassi  >
> > wrote:
> >
> > > Hi Robert,
> > >
> > > Thanks for the suggestion +1 from me. You already listed the topic of
> the
> > > timezone on the wiki that I wanted to bring up.
> > >
> > > On Tue, May 31, 2022 at 9:38 AM Robert Metzger 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We currently have a bi-weekly release sync meeting on Google Meet
> every
> > > > Tuesday at 9am CEST / 3pm China Standard Time / 7am UTC.
> > > > I would like to propose extending the purpose of the meeting to a
> > general
> > > > "Flink Community Sync" meeting, to discuss current topics in the
> Flink
> > > > community.
> > > >
> > > > I propose that we just collect agenda items on the mailing list in
> the
> > > two
> > > > weeks prior to the meeting.
> > > > I'm happy to take care of prioritizing agenda items and taking notes
> in
> > > the
> > > > wiki.
> > > > I've created already a page for the next sync:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/2022-06-14+Community+Sync
> > > >
> > > > Let me know what you think!
> > > >
> > > > Best,
> > > > Robert
> > > >
> > >
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>


Re: [DISCUSS] Initializing Apache Flink Slack

2022-05-31 Thread Robert Metzger
I renamed the channel.

On Tue, May 31, 2022 at 10:54 AM Xintong Song  wrote:

> Ok, then let's try it out.
>
> Best,
>
> Xintong
>
>
>
> On Tue, May 31, 2022 at 4:24 PM Robert Metzger 
> wrote:
>
> > +1 to merging #contribution-helps with #dev.
> >
> > I'm pretty sure we'll have to revisit this once we have a bit
> > more experience with running the slack community.
> >
> > On Tue, May 31, 2022 at 10:07 AM Jark Wu  wrote:
> >
> > > I'm fine with the #dev channel.
> > > I remembered in the previous discussion that most people are positive
> > about
> > > the dev channel
> > >  as long as the discussions are properly reflected back to the mailing
> > > lists.
> > >
> > > If we create the dev channel, maybe we can merge #contribution-helps
> into
> > > it as well?
> > > Just like, currently, some contributors are looking for reviewers on
> the
> > > dev mailing list.
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 31 May 2022 at 15:57, Gyula Fóra  wrote:
> > >
> > > > I agree with Robert.
> > > >
> > > > I think discussing implementation ideas etc on the dev channel
> > > > briefly before posting discussions on the ML can make the design
> > > > discussions much more productive as the initial iterations can be
> often
> > > > slow and cumbersome via email alone.
> > > >
> > > > I understand the general sentiment against this (based on the general
> > > slack
> > > > discussions) but for me personally (and I think for many Flink
> > > developers)
> > > > this will be one of the more interesting channels :)
> > > > We have to see how this plays out in practice and we can make future
> > > > decisions accordingly.
> > > >
> > > > Gyula
> > > >
> > > >
> > > > On Tue, May 31, 2022 at 9:42 AM Robert Metzger 
> > > > wrote:
> > > >
> > > > > Thanks for your input.
> > > > >
> > > > > I share the concern that we'll potentially have too important
> > > discussions
> > > > > in Slack. However, I would still like to try it out. There are
> valid
> > > > > use-cases, such as requests for VOTEs on releases, or briefly
> > floating
> > > an
> > > > > idea before opening a thread on dev@.
> > > > > If we find out that too much is happening on the channel that
> belongs
> > > to
> > > > > the ML, we can close the channel again.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 31, 2022 at 4:01 AM Xintong Song <
> tonysong...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > #contribution-helps is meant for new contributors to ask
> > > non-technical
> > > > > > questions. E.g., looking for starter issues or reviewers (just
> > don't
> > > DM
> > > > > > people).
> > > > > >
> > > > > > For having a #dev/development, I'm a little concerned this may
> > > > encourage
> > > > > > people to start discussions directly in slack. IMHO, we may want
> to
> > > > stick
> > > > > > to the existing communication channels (mailing lists / jira /
> > github
> > > > pr)
> > > > > > for most of the technical discussions, and only come to slack
> when
> > > the
> > > > > > conversation turns into back-and-forth. In such cases, a
> temporary
> > > > > private
> > > > > > channel or DM group could be set up for the specific topic. Maybe
> > I'm
> > > > > being
> > > > > > too conservative. WDYT?
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xintong
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 31, 2022 at 2:06 AM Gyula Fóra  >
> > > > wrote:
> > > > > >
> > > > > > > Development channel (#dev/development) makes sense to me. It’s
> > > > related
> > > > > to
> > > > > > > actual development related questions instead of the contrib
> > > process.
> > > &

Re: [DISCUSS] Initializing Apache Flink Slack

2022-05-31 Thread Robert Metzger
+1 to merging #contribution-helps with #dev.

I'm pretty sure we'll have to revisit this once we have a bit
more experience with running the slack community.

On Tue, May 31, 2022 at 10:07 AM Jark Wu  wrote:

> I'm fine with the #dev channel.
> I remembered in the previous discussion that most people are positive about
> the dev channel
>  as long as the discussions are properly reflected back to the mailing
> lists.
>
> If we create the dev channel, maybe we can merge #contribution-helps into
> it as well?
> Just like, currently, some contributors are looking for reviewers on the
> dev mailing list.
>
> Best,
> Jark
>
> On Tue, 31 May 2022 at 15:57, Gyula Fóra  wrote:
>
> > I agree with Robert.
> >
> > I think discussing implementation ideas etc on the dev channel
> > briefly before posting discussions on the ML can make the design
> > discussions much more productive as the initial iterations can be often
> > slow and cumbersome via email alone.
> >
> > I understand the general sentiment against this (based on the general
> slack
> > discussions) but for me personally (and I think for many Flink
> developers)
> > this will be one of the more interesting channels :)
> > We have to see how this plays out in practice and we can make future
> > decisions accordingly.
> >
> > Gyula
> >
> >
> > On Tue, May 31, 2022 at 9:42 AM Robert Metzger 
> > wrote:
> >
> > > Thanks for your input.
> > >
> > > I share the concern that we'll potentially have too important
> discussions
> > > in Slack. However, I would still like to try it out. There are valid
> > > use-cases, such as requests for VOTEs on releases, or briefly floating
> an
> > > idea before opening a thread on dev@.
> > > If we find out that too much is happening on the channel that belongs
> to
> > > the ML, we can close the channel again.
> > >
> > >
> > >
> > > On Tue, May 31, 2022 at 4:01 AM Xintong Song 
> > > wrote:
> > >
> > > > #contribution-helps is meant for new contributors to ask
> non-technical
> > > > questions. E.g., looking for starter issues or reviewers (just don't
> DM
> > > > people).
> > > >
> > > > For having a #dev/development, I'm a little concerned this may
> > encourage
> > > > people to start discussions directly in slack. IMHO, we may want to
> > stick
> > > > to the existing communication channels (mailing lists / jira / github
> > pr)
> > > > for most of the technical discussions, and only come to slack when
> the
> > > > conversation turns into back-and-forth. In such cases, a temporary
> > > private
> > > > channel or DM group could be set up for the specific topic. Maybe I'm
> > > being
> > > > too conservative. WDYT?
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Tue, May 31, 2022 at 2:06 AM Gyula Fóra 
> > wrote:
> > > >
> > > > > Development channel (#dev/development) makes sense to me. It’s
> > related
> > > to
> > > > > actual development related questions instead of the contrib
> process.
> > > > >
> > > > > Gyula
> > > > >
> > > > > On Mon, 30 May 2022 at 19:19, Jing Ge  wrote:
> > > > >
> > > > > > Good idea, thanks! Is the channel #contribution-helps a good fit
> > for
> > > > it?
> > > > > Or
> > > > > > should we just rename it to #development? For me, development is
> a
> > > > subset
> > > > > > of contribution.
> > > > > >
> > > > > > Best regards,
> > > > > > Jing
> > > > > >
> > > > > > On Mon, May 30, 2022 at 4:28 PM Robert Metzger <
> > metrob...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks a lot for kicking this off.
> > > > > > >
> > > > > > > Is there a reason why we haven't set up a #development channel
> > yet?
> > > > > > > I have a short question that is more suitable for that channel,
> > > > > compared
> > > > > > to
> > > > > > > the dev@ list ;)
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 23,

Re: [DISCUSS] Initializing Apache Flink Slack

2022-05-31 Thread Robert Metzger
Thanks for your input.

I share the concern that we'll potentially have too important discussions
in Slack. However, I would still like to try it out. There are valid
use-cases, such as requests for VOTEs on releases, or briefly floating an
idea before opening a thread on dev@.
If we find out that too much is happening on the channel that belongs to
the ML, we can close the channel again.



On Tue, May 31, 2022 at 4:01 AM Xintong Song  wrote:

> #contribution-helps is meant for new contributors to ask non-technical
> questions. E.g., looking for starter issues or reviewers (just don't DM
> people).
>
> For having a #dev/development, I'm a little concerned this may encourage
> people to start discussions directly in slack. IMHO, we may want to stick
> to the existing communication channels (mailing lists / jira / github pr)
> for most of the technical discussions, and only come to slack when the
> conversation turns into back-and-forth. In such cases, a temporary private
> channel or DM group could be set up for the specific topic. Maybe I'm being
> too conservative. WDYT?
>
> Best,
>
> Xintong
>
>
>
> On Tue, May 31, 2022 at 2:06 AM Gyula Fóra  wrote:
>
> > Development channel (#dev/development) makes sense to me. It’s related to
> > actual development related questions instead of the contrib process.
> >
> > Gyula
> >
> > On Mon, 30 May 2022 at 19:19, Jing Ge  wrote:
> >
> > > Good idea, thanks! Is the channel #contribution-helps a good fit for
> it?
> > Or
> > > should we just rename it to #development? For me, development is a
> subset
> > > of contribution.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Mon, May 30, 2022 at 4:28 PM Robert Metzger 
> > > wrote:
> > >
> > > > Thanks a lot for kicking this off.
> > > >
> > > > Is there a reason why we haven't set up a #development channel yet?
> > > > I have a short question that is more suitable for that channel,
> > compared
> > > to
> > > > the dev@ list ;)
> > > >
> > > >
> > > > On Mon, May 23, 2022 at 7:52 AM Xintong Song 
> > > > wrote:
> > > >
> > > > > Hi Kyle,
> > > > >
> > > > > I've sent an invitation to your gmail.
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 23, 2022 at 1:39 PM Kyle Bendickson 
> > > wrote:
> > > > >
> > > > > > Hi Xintong!
> > > > > >
> > > > > > I’d love to take an early look if possible. I’m not a committer
> > here,
> > > > > but I
> > > > > > have a role in the Apache Iceberg slack and am looking forward to
> > the
> > > > > Flink
> > > > > > slack, particularly given the strong integration between Iceberg
> > and
> > > > > Flink.
> > > > > >
> > > > > > Thanks,
> > > > > > Kyle!
> > > > > > kjbendickson[at]gmail.com (preferred invite email)
> > > > > > kyle[at]tabular.io (work email)
> > > > > >
> > > > > > On Sun, May 22, 2022 at 10:14 PM Xintong Song <
> > tonysong...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > As we have approved on creating an Apache Flink Slack [1], I'm
> > > > starting
> > > > > > > this new thread to coordinate and give updates on the issues
> > > related
> > > > to
> > > > > > > initializing the slack workspace.
> > > > > > >
> > > > > > > ## Progress
> > > > > > > 1. Jark and I have worked on a draft of Slack Management
> > > Regulations
> > > > > [2],
> > > > > > > including Code of Conduct, Roles and Permissions, etc. Looking
> > > > forward
> > > > > to
> > > > > > > feedback.
> > > > > > > 2. We have created a slack workspace for initial setups. Anyone
> > who
> > > > > wants
> > > > > > > to help with the initialization or just to take an early look,
> > > please
> > > > > > reach
> > > > > > > out for an invitation.
> > > > > > > 3. The URLs "apache-flink.slack.com" and "
> apacheflink.slack.com"
> > > > have
> > > > > > > already been taken. I've already sent a help request to the
> Slack
> > > > team
> > > > > > and
> > > > > > > am waiting for their response. Before this gets resolved, we
> are
> > > > using
> > > > > "
> > > > > > > asf-flink.slack.com" for the moment.
> > > > > > > 4. I've created FLINK-27719 [3] for tracking the remaining
> tasks,
> > > > > > including
> > > > > > > setting up the auto-updated invitation link and the archive.
> > > > > > >
> > > > > > > ## How can you help
> > > > > > > 1. Take a look at the Slack Management Regulations [2] and
> > provide
> > > > your
> > > > > > > feedback
> > > > > > > 2. Check and pick-up tasks from FLINK-27719 [3]
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xintong
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > https://lists.apache.org/thread/d556ywochkpqxbo1yh7ojm751whtojxp
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/WIP%3A+Slack+Management
> > > > > > >
> > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-27719
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


[DISCUSS] Bi-Weekly Flink Community Sync Meeting

2022-05-31 Thread Robert Metzger
Hi everyone,

We currently have a bi-weekly release sync meeting on Google Meet every
Tuesday at 9am CEST / 3pm China Standard Time / 7am UTC.
I would like to propose extending the purpose of the meeting to a general
"Flink Community Sync" meeting, to discuss current topics in the Flink
community.

I propose that we just collect agenda items on the mailing list in the two
weeks prior to the meeting.
I'm happy to take care of prioritizing agenda items and taking notes in the
wiki.
I've created already a page for the next sync:
https://cwiki.apache.org/confluence/display/FLINK/2022-06-14+Community+Sync

Let me know what you think!

Best,
Robert


Re: [DISCUSS] Planning Flink 1.16 with the community

2022-05-31 Thread Robert Metzger
>
> Is the feature freeze deadline on July 25 fixed or will it be adjusted
> accordingly?


Ideally we try not to push the deadline too much into the future. So I
prefer to consider it fixed.

On Mon, May 30, 2022 at 7:08 PM Jing Ge  wrote:

> Thanks Robert for the reminder. Thanks Martijn for sharing the link.
>
> Is the feature freeze deadline on July 25 fixed or will it be adjusted
> accordingly?
>
> Best regards,
> Jing
>
> On Mon, May 30, 2022 at 4:56 PM Martijn Visser 
> wrote:
>
>> Yes, we will have the release meeting tomorrow. Looking forward to
>> everyone
>> who wants to participate. For those that are looking for the invite link,
>> see https://cwiki.apache.org/confluence/display/FLINK/1.16+Release
>>
>> Best regards,
>>
>> Martijn
>>
>> Op ma 30 mei 2022 om 16:39 schreef Robert Metzger :
>>
>> > I assume we'll have the next release planning meeting tomorrow?
>> >
>> > I'm also bringing this up also as a reminder for other folks who might
>> be
>> > interested in joining.
>> >
>> > On Wed, May 11, 2022 at 5:55 AM Xintong Song 
>> > wrote:
>> >
>> >> I'd like to kindly remind that, if someone adds / modifies release
>> notes
>> >> of
>> >> a JIRA ticket close to or after the release finalization, it would be
>> >> important to sync with the release managers and make sure that change
>> >> appears in the final release notes.
>> >>
>> >> We have recently discovered a breaking change in 1.14 that is
>> mentioned in
>> >> JIRA but not in the final release notes, because the JIRA change
>> happened
>> >> after the release managers assembled the final release notes from JIRA
>> >> tickets.
>> >>
>> >> Thank you~
>> >>
>> >> Xintong Song
>> >>
>> >>
>> >> [1] https://issues.apache.org/jira/browse/FLINK-23652
>> >>
>> >> On Mon, May 9, 2022 at 7:34 PM Johannes Moser 
>> wrote:
>> >>
>> >> > Thanks Chesnay, Godfrey, Xingbo and Martijn for volunteering as
>> release
>> >> > managers.
>> >> >
>> >> > I would also recommend to have an eye on the "Release Notes:" fields
>> in
>> >> > JIRA
>> >> > for related issues early, as this is used to assemble the final
>> release
>> >> > notes.
>> >> >
>> >> > I'd also like to take the chance to ask contributors to fill the
>> >> "Release
>> >> > Notes"
>> >> > field such that users know what to do when they want to upgrade from
>> a
>> >> > previous version. [1] for reference.
>> >> > Please keep in mind it is about enabling users. If no significant
>> >> changes
>> >> > facing the user have been done it should be left empty.
>> >> >
>> >> >
>> >> > [1]
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-ReleaseNoteRedImportant
>> >> >
>> >> >
>> >> > > On 09.05.2022, at 12:27, Martijn Visser 
>> >> > wrote:
>> >> > >
>> >> > > Hi everyone,
>> >> > >
>> >> > > With Flink 1.15 released last week, we're starting with the next
>> >> release
>> >> > > cycle for what will become Flink 1.16. As previously discussed [1]
>> >> > Chesnay,
>> >> > > Godfrey, Xingbo and myself have volunteered as release managers.
>> We're
>> >> > > aiming to cut the release branch on the 25th of July. If needed, we
>> >> can
>> >> > > delay this with a maximum of 2 weeks. We're aiming for a release of
>> >> Flink
>> >> > > 1.16 mid September 2022.
>> >> > >
>> >> > > As we've done before, we would like to have a rough overview of new
>> >> > > features or major improvements that are planned and will likely be
>> >> > included
>> >> > > in this release. Please provide your FLIP, or (umbrella) Jira
>> ticket
>> >> in
>> >> > the
>> >> > > wiki page which you can find at
>> >> > > https://cwiki.apache.org/confluence/display/FLINK/1.16+Release.
>> >> > >
>> >> > > Starting Tuesday the 17t

Re: [DISCUSS] Planning Flink 1.16 with the community

2022-05-30 Thread Robert Metzger
I assume we'll have the next release planning meeting tomorrow?

I'm also bringing this up also as a reminder for other folks who might be
interested in joining.

On Wed, May 11, 2022 at 5:55 AM Xintong Song  wrote:

> I'd like to kindly remind that, if someone adds / modifies release notes of
> a JIRA ticket close to or after the release finalization, it would be
> important to sync with the release managers and make sure that change
> appears in the final release notes.
>
> We have recently discovered a breaking change in 1.14 that is mentioned in
> JIRA but not in the final release notes, because the JIRA change happened
> after the release managers assembled the final release notes from JIRA
> tickets.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-23652
>
> On Mon, May 9, 2022 at 7:34 PM Johannes Moser  wrote:
>
> > Thanks Chesnay, Godfrey, Xingbo and Martijn for volunteering as release
> > managers.
> >
> > I would also recommend to have an eye on the "Release Notes:" fields in
> > JIRA
> > for related issues early, as this is used to assemble the final release
> > notes.
> >
> > I'd also like to take the chance to ask contributors to fill the "Release
> > Notes"
> > field such that users know what to do when they want to upgrade from a
> > previous version. [1] for reference.
> > Please keep in mind it is about enabling users. If no significant changes
> > facing the user have been done it should be left empty.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-ReleaseNoteRedImportant
> >
> >
> > > On 09.05.2022, at 12:27, Martijn Visser 
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > With Flink 1.15 released last week, we're starting with the next
> release
> > > cycle for what will become Flink 1.16. As previously discussed [1]
> > Chesnay,
> > > Godfrey, Xingbo and myself have volunteered as release managers. We're
> > > aiming to cut the release branch on the 25th of July. If needed, we can
> > > delay this with a maximum of 2 weeks. We're aiming for a release of
> Flink
> > > 1.16 mid September 2022.
> > >
> > > As we've done before, we would like to have a rough overview of new
> > > features or major improvements that are planned and will likely be
> > included
> > > in this release. Please provide your FLIP, or (umbrella) Jira ticket in
> > the
> > > wiki page which you can find at
> > > https://cwiki.apache.org/confluence/display/FLINK/1.16+Release.
> > >
> > > Starting Tuesday the 17th of May we will start again with bi-weekly
> > release
> > > sync. This is scheduled for 9am CEST / 3pm China Standard Time / 7am
> UTC.
> > > This meeting is to perform the following:
> > >
> > > * Have a look at the Flink 1.16 progress from the wiki page (please
> > update
> > > this before the meeting).
> > > * Discuss any blocker ticket that might occur who can help resolve
> this.
> > > * Discuss build instability tickets that need to be followed-up.
> > > * If there's a (new) contributor to get a PR reviewed or committed, we
> > can
> > > see who can help out unblocking the contributor.
> > >
> > > One of the lessons learned during the Flink 1.15 release cycle was that
> > > cross-team testing was done fairly late in the process, which delayed
> the
> > > release. It would be good to make sure that these efforts are done as
> > > quickly as possible.
> > > This also applies to documentation: please make sure that your feature
> is
> > > documented (either in regular documentation, the JavaDocs, inside the
> > repo
> > > in markdown or multiple). Don't start working on your next contribution
> > if
> > > the documentation hasn't been made available yet.
> > >
> > > Please let us know what you think.
> > >
> > > Best regards,
> > >
> > > Chesnay, Godfrey, Xingbo and Martijn
> > >
> > > [1] https://lists.apache.org/thread/ghfb5xdjy7tv0zqlrxvh3hcsc740w4ml
> >
> >
>


Re: [DISCUSS] Initializing Apache Flink Slack

2022-05-30 Thread Robert Metzger
Thanks a lot for kicking this off.

Is there a reason why we haven't set up a #development channel yet?
I have a short question that is more suitable for that channel, compared to
the dev@ list ;)


On Mon, May 23, 2022 at 7:52 AM Xintong Song  wrote:

> Hi Kyle,
>
> I've sent an invitation to your gmail.
>
> Best,
>
> Xintong
>
>
>
> On Mon, May 23, 2022 at 1:39 PM Kyle Bendickson  wrote:
>
> > Hi Xintong!
> >
> > I’d love to take an early look if possible. I’m not a committer here,
> but I
> > have a role in the Apache Iceberg slack and am looking forward to the
> Flink
> > slack, particularly given the strong integration between Iceberg and
> Flink.
> >
> > Thanks,
> > Kyle!
> > kjbendickson[at]gmail.com (preferred invite email)
> > kyle[at]tabular.io (work email)
> >
> > On Sun, May 22, 2022 at 10:14 PM Xintong Song 
> > wrote:
> >
> > > Hi devs,
> > >
> > > As we have approved on creating an Apache Flink Slack [1], I'm starting
> > > this new thread to coordinate and give updates on the issues related to
> > > initializing the slack workspace.
> > >
> > > ## Progress
> > > 1. Jark and I have worked on a draft of Slack Management Regulations
> [2],
> > > including Code of Conduct, Roles and Permissions, etc. Looking forward
> to
> > > feedback.
> > > 2. We have created a slack workspace for initial setups. Anyone who
> wants
> > > to help with the initialization or just to take an early look, please
> > reach
> > > out for an invitation.
> > > 3. The URLs "apache-flink.slack.com" and "apacheflink.slack.com" have
> > > already been taken. I've already sent a help request to the Slack team
> > and
> > > am waiting for their response. Before this gets resolved, we are using
> "
> > > asf-flink.slack.com" for the moment.
> > > 4. I've created FLINK-27719 [3] for tracking the remaining tasks,
> > including
> > > setting up the auto-updated invitation link and the archive.
> > >
> > > ## How can you help
> > > 1. Take a look at the Slack Management Regulations [2] and provide your
> > > feedback
> > > 2. Check and pick-up tasks from FLINK-27719 [3]
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > > [1] https://lists.apache.org/thread/d556ywochkpqxbo1yh7ojm751whtojxp
> > >
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/WIP%3A+Slack+Management
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-27719
> > >
> >
>


Re: [VOTE] Creating an Apache Flink slack workspace

2022-05-17 Thread Robert Metzger
Thanks for starting the VOTE!

+1 (binding)



On Tue, May 17, 2022 at 10:29 AM Jark Wu  wrote:

> Thank Xintong for driving this work.
>
> +1 from my side (binding)
>
> Best,
> Jark
>
> On Tue, 17 May 2022 at 16:24, Xintong Song  wrote:
>
> > Hi everyone,
> >
> > As previously discussed in [1], I would like to open a vote on creating
> an
> > Apache Flink slack workspace channel.
> >
> > The proposed actions include:
> > - Creating a dedicated slack workspace with the name Apache Flink that is
> > controlled and maintained by the Apache Flink PMC
> > - Updating the Flink website about rules for using various communication
> > channels
> > - Setting up an Archive for the Apache Flink slack
> > - Revisiting this initiative by the end of 2022
> >
> > The vote will last for at least 72 hours, and will be accepted by a
> > consensus of active PMC members.
> >
> > Best,
> >
> > Xintong
> >
>


Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-17 Thread Robert Metzger
Thanks a lot Kyle!

What do you think of concluding this discussion and starting a VOTE about:
1. Setting up a PMC controlled Slack instance for the Flink community
2. Updating the Flink website about the various communication channels
3. Setting up an Archive for our Slack instance
4. Revisiting this initiative by the end of 2022.

Xintong, do you want to start the VOTE on dev@?

On Fri, May 13, 2022 at 9:41 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Nice, cool to hear Kyle! How do you all approach moderation? Is there
> anything specific you feel like you've "gotten right"/ other tips?
>
> (as a side note, I also love slack).
>
> Austin
>
>
> On Fri, May 13, 2022 at 2:27 PM Kyle Bendickson  wrote:
>
> > Hi all,
> >
> > Chiming in as I work in the Iceberg space and we have our own slack as
> > well, that I am admittedly proud of.
> >
> > We don’t necessarily encounter issues with vendors, though of course we
> do
> > get some noise now and again.
> >
> > Overall, our slack workspace has been cited in multiple blogs and things
> as
> > one of the bigger benefits of using Iceberg.
> >
> > So I personally can’t recommend a slack workspace enough.
> >
> > Our slack workspace is also one major thing I feel boosts our ability to
> > attract new contributors and even bug reports we’d otherwise not receive
> as
> > quickly.
> >
> > A lot of amazing devs / folks out there who maybe don’t see themselves as
> > “prominent” enough but will speak up on slack.
> >
> > So +1 from your friends in Iceberg (at least me).
> >
> > Feel free to reach out if you have any questions!
> >
> > - Kyle
> >
> > On Fri, May 13, 2022 at 10:17 AM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Would just like to share an interesting article from the dbt
> > community[1],
> > > which in part describes some of their challenges in managing Slack in a
> > > large community. The biggest point it seems to make is that their Slack
> > has
> > > become a marketing tool for dbt/data vendors instead of a community
> > space —
> > > given the diversity of vendors in the Flink space, we may face similar
> > > challenges. Perhaps their experience can help us with the initial
> > > setup/guidelines.
> > >
> > > Cheers,
> > > Austin
> > >
> > > [1]: https://pedram.substack.com/p/we-need-to-talk-about-dbt?s=r
> > >
> > > On Thu, May 12, 2022 at 6:04 AM Robert Metzger 
> > > wrote:
> > >
> > > > +1 on setting up our own Slack instance (PMC owned)
> > > > +1 for having a separate discussion about setting up a discussion
> forum
> > > (I
> > > > like the idea of using GH discussions)
> > > >
> > > > Besides, we still need to investigate how
> > > >> http://apache-airflow.slack-archives.org works, I think
> > > >> a slack of our own can be easier to set up the archive.
> > > >
> > > >
> > > > This is the code used by airflow:
> https://github.com/ashb/slackarchive
> > .
> > > > I'm happy to look into setting up the archive for the community.
> > > >
> > > >
> > > > On Thu, May 12, 2022 at 11:00 AM Jark Wu  wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I would +1 to create Apache Flink Slack for the lower barriers to
> > entry
> > > >> as Jingsong mentioned.
> > > >> Besides, we still need to investigate how
> > > >> http://apache-airflow.slack-archives.org works, I think
> > > >> a slack of our own can be easier to set up the archive.
> > > >>
> > > >> Regarding Discourse vs Slack, I think they are not exclusive, but
> > > >> complementary.
> > > >> Someday in the future, we might be able to provide them both. But
> what
> > > we
> > > >> are seeking today
> > > >> is a tool that can provide real-time communication and ad-hoc
> > questions
> > > >> and interactions.
> > > >> A forum is more similar to a mailing list. Forum is modern mailing
> > list
> > > >> but can't solve the problems
> > > >> mentioned above. With slack-archives, the information and thoughtful
> > > >> discussion in Slack can also be searchable.
> > > >>
> > > >> I t

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-12 Thread Robert Metzger
e Discourse forum is much more inviting and vibrant
>> > than a
>> > >>>> mailing list. Just from a tool perspective, discourse would have
>> the
>> > >>>> advantage of being Open Source and so we could probably self-host
>> it
>> > on an
>> > >>>> ASF machine. [1]
>> > >>>>
>> > >>>> When it comes to Slack, I definitely see the benefit of a dedicated
>> > >>>> Apache Flink Slack compared to ASF Slack. For example, we could
>> have
>> > more
>> > >>>> channels (e.g. look how many channels Airflow is using
>> > >>>> http://apache-airflow.slack-archives.org) and we could generally
>> > >>>> customize the experience more towards Apache Flink.  If we go for
>> > Slack,
>> > >>>> let's definitely try to archive it like Airflow did. If we do
>> this, we
>> > >>>> don't necessarily need infinite message retention in Slack itself.
>> > >>>>
>> > >>>> Cheers,
>> > >>>>
>> > >>>> Konstantin
>> > >>>>
>> > >>>> [1] https://github.com/discourse/discourse
>> > >>>>
>> > >>>>
>> > >>>> Am Di., 10. Mai 2022 um 10:20 Uhr schrieb Timo Walther <
>> > >>>> twal...@apache.org>:
>> > >>>>
>> > >>>>> I also think that a real-time channel is long overdue. The Flink
>> > >>>>> community in China has shown that such a platform can be useful
>> for
>> > >>>>> improving the collaboration within the community. The DingTalk
>> > channel of
>> > >>>>> 10k+ users collectively helping each other is great to see. It
>> could
>> > also
>> > >>>>> reduce the burden from committers for answering frequently asked
>> > questions.
>> > >>>>>
>> > >>>>> Personally, I'm a mailing list fan esp. when it comes to design
>> > >>>>> discussions. In my opinion, the dev@ mailing list should
>> definitely
>> > >>>>> stay where and how it is. However, I understand that users might
>> not
>> > want
>> > >>>>> to subscribe to a mailing list for a single question and get their
>> > mailbox
>> > >>>>> filled with unrelated discussions afterwards. Esp. in a company
>> > setting it
>> > >>>>> might not be easy to setup a dedicated email address for mailing
>> > lists and
>> > >>>>> setting up rules is also not convenient.
>> > >>>>>
>> > >>>>> It would be great if we could use the ASF Slack. We should find an
>> > >>>>> official, accessible channel. I would be open for the right tool.
>> It
>> > might
>> > >>>>> make sense to also look into Discourse or even Reddit? The latter
>> > would
>> > >>>>> definitely be easier to index by a search engine. Discourse is
>> > actually
>> > >>>>> made for modern real-time forums.
>> > >>>>>
>> > >>>>> Regards,
>> > >>>>> Timo
>> > >>>>>
>> > >>>>>
>> > >>>>> Am 10.05.22 um 09:59 schrieb David Anderson:
>> > >>>>>
>> > >>>>> Thank you @Xintong Song  for sharing the
>> > >>>>> experience of the Flink China community.
>> > >>>>>
>> > >>>>> I'm become convinced we should give Slack a try, both for
>> discussions
>> > >>>>> among the core developers, and as a place where the community can
>> > reach out
>> > >>>>> for help. I am in favor of using the ASF slack, as we will need a
>> > paid
>> > >>>>> instance for this to go well, and joining it is easy enough (took
>> me
>> > about
>> > >>>>> 2 minutes). Thanks, Robert, for suggesting we go down this route.
>> > >>>>>
>> > >>>>> David
>> > >>>>>
>> > >>>>> On Tue, May 10, 2022 at 8:21 AM Robert Metzger <
>> rmetz...@apache.org>
>> > >>>>> wrote:
>> > >>>>>
&g

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-10 Thread Robert Metzger
It seems that we'd have to use invite links on the Flink website for people
to join our Slack (1)
These links can be configured to have no time-expiration, but they will
expire after 100 guests have joined.
I guess we'd have to use a URL shortener (https://s.apache.org) that we
update once the invite link expires. It's not a nice solution, but it'll
work.


(1) https://the-asf.slack.com/archives/CBX4TSBQ8/p1652125017094159


On Mon, May 9, 2022 at 3:59 PM Robert Metzger  wrote:

> Thanks a lot for your answer. The onboarding experience to the ASF Slack
> is indeed not ideal:
> https://apisix.apache.org/docs/general/join#join-the-slack-channel
> I'll see if we can improve it
>
> On Mon, May 9, 2022 at 3:38 PM Martijn Visser 
> wrote:
>
>> As far as I recall you can't sign up for the ASF instance of Slack, you
>> can
>> only get there if you're a committer or if you're invited by a committer.
>>
>> On Mon, 9 May 2022 at 15:15, Robert Metzger  wrote:
>>
>> > Sorry for joining this discussion late, and thanks for the summary
>> Xintong!
>> >
>> > Why are we considering a separate slack instance instead of using the
>> ASF
>> > Slack instance?
>> > The ASF instance is paid, so all messages are retained forever, and
>> quite
>> > a few people are already on that Slack instance.
>> > There is already a #flink channel on that Slack instance, that we could
>> > leave as passive as it is right now, or put some more effort into it,
>> on a
>> > voluntary basis.
>> > We could add another #flink-dev channel to that Slack for developer
>> > discussions, and a private flink-committer and flink-pmc chat.
>> >
>> > If we are going that path, we should rework the "Community" and "Getting
>> > Help" pages and explain that the mailing lists are the "ground truth
>> tools"
>> > in Flink, and Slack is only there to facilitate faster communication,
>> but
>> > it is optional / voluntary (e.g. a committers won't respond to DMs)
>> >
>> > All public #flink-* channels should be archived and google-indexable.
>> > I've asked Jarek from Airflow who's maintaining
>> > http://apache-airflow.slack-archives.org.
>> > If we can't use slack-archives.org, it would be nice to find some
>> > volunteers in the Flink community to hack a simple indexing tool.
>> > The indexing part is very important for me, because of some bad
>> > experiences with the Kubernetes experience, where most of the advanced
>> > stuff is hidden in their Slack, and it took me a few weeks to find that
>> > goldmine of information.
>> >
>> > Overall, I see this as an experiment worth doing, but I would suggest
>> > revisiting it in 6 to 12 months: We should check if really all important
>> > decisions are mirrored to the right mailing lists, and that we get the
>> > benefits we hoped for (more adoption, better experience for users and
>> > developers), and that we can handle the concerns (DMs to developers,
>> > indexing).
>> >
>> >
>> >
>> >
>> >
>> > On Sat, May 7, 2022 at 12:22 PM Xintong Song 
>> > wrote:
>> >
>> >> Thanks all for the valuable feedback.
>> >>
>> >> It seems most people are overall positive about using Slack for dev
>> >> discussions, as long as they are properly reflected back to the MLs.
>> >> - We definitely need a code of conduct that clearly specifies what
>> people
>> >> should / should not do.
>> >> - Contributors pinging well-known reviewers /committers, I think that
>> also
>> >> happens now on JIRA / Github. Personally, I'd understand a no-reply as
>> a
>> >> "soft no". We may consider to also put that in the cod of conduct.
>> >>
>> >> Concerning using Slack for user QAs, it seem the major concern is
>> that, we
>> >> may end up repeatedly answering the same questions from different
>> users,
>> >> due to lack of capacity for archiving and searching historical
>> >> conversations. TBH, I don't have a good solution for the archivability
>> and
>> >> searchability. I investigated some tools like Zapier [1], but none of
>> them
>> >> seems suitable for us. However, I'd like to share 2 arguments.
>> >> - The purpose of Slack is to make the communication more efficient? By
>> >> *efficient*, I mean saving time for both question askers and helpers
>> with
>> >> instance messa

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
Thanks a lot for your answer. The onboarding experience to the ASF Slack is
indeed not ideal:
https://apisix.apache.org/docs/general/join#join-the-slack-channel
I'll see if we can improve it

On Mon, May 9, 2022 at 3:38 PM Martijn Visser 
wrote:

> As far as I recall you can't sign up for the ASF instance of Slack, you can
> only get there if you're a committer or if you're invited by a committer.
>
> On Mon, 9 May 2022 at 15:15, Robert Metzger  wrote:
>
> > Sorry for joining this discussion late, and thanks for the summary
> Xintong!
> >
> > Why are we considering a separate slack instance instead of using the ASF
> > Slack instance?
> > The ASF instance is paid, so all messages are retained forever, and quite
> > a few people are already on that Slack instance.
> > There is already a #flink channel on that Slack instance, that we could
> > leave as passive as it is right now, or put some more effort into it, on
> a
> > voluntary basis.
> > We could add another #flink-dev channel to that Slack for developer
> > discussions, and a private flink-committer and flink-pmc chat.
> >
> > If we are going that path, we should rework the "Community" and "Getting
> > Help" pages and explain that the mailing lists are the "ground truth
> tools"
> > in Flink, and Slack is only there to facilitate faster communication, but
> > it is optional / voluntary (e.g. a committers won't respond to DMs)
> >
> > All public #flink-* channels should be archived and google-indexable.
> > I've asked Jarek from Airflow who's maintaining
> > http://apache-airflow.slack-archives.org.
> > If we can't use slack-archives.org, it would be nice to find some
> > volunteers in the Flink community to hack a simple indexing tool.
> > The indexing part is very important for me, because of some bad
> > experiences with the Kubernetes experience, where most of the advanced
> > stuff is hidden in their Slack, and it took me a few weeks to find that
> > goldmine of information.
> >
> > Overall, I see this as an experiment worth doing, but I would suggest
> > revisiting it in 6 to 12 months: We should check if really all important
> > decisions are mirrored to the right mailing lists, and that we get the
> > benefits we hoped for (more adoption, better experience for users and
> > developers), and that we can handle the concerns (DMs to developers,
> > indexing).
> >
> >
> >
> >
> >
> > On Sat, May 7, 2022 at 12:22 PM Xintong Song 
> > wrote:
> >
> >> Thanks all for the valuable feedback.
> >>
> >> It seems most people are overall positive about using Slack for dev
> >> discussions, as long as they are properly reflected back to the MLs.
> >> - We definitely need a code of conduct that clearly specifies what
> people
> >> should / should not do.
> >> - Contributors pinging well-known reviewers /committers, I think that
> also
> >> happens now on JIRA / Github. Personally, I'd understand a no-reply as a
> >> "soft no". We may consider to also put that in the cod of conduct.
> >>
> >> Concerning using Slack for user QAs, it seem the major concern is that,
> we
> >> may end up repeatedly answering the same questions from different users,
> >> due to lack of capacity for archiving and searching historical
> >> conversations. TBH, I don't have a good solution for the archivability
> and
> >> searchability. I investigated some tools like Zapier [1], but none of
> them
> >> seems suitable for us. However, I'd like to share 2 arguments.
> >> - The purpose of Slack is to make the communication more efficient? By
> >> *efficient*, I mean saving time for both question askers and helpers
> with
> >> instance messages, file transmissions, even voice / video calls, etc.
> >> (Especially for cases where back and forth is needed, as David
> mentioned.)
> >> It does not mean questions that do not get enough attentions on MLs are
> >> now
> >> guaranteed to be answered immediately. We can probably put that into the
> >> code of conduct, and kindly guide users to first search and initiate
> >> questions on MLs.
> >> - I'd also like to share some experience from the Flink China community.
> >> We
> >> have 3 DingTalk groups with totally 25k members (might be less, I didn't
> >> do
> >> deduplication), posting hundreds of messages daily. What I'm really
> >> excited
> >> about is that, there are way more interactions between users & users
> than
> &

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
Sorry for joining this discussion late, and thanks for the summary Xintong!

Why are we considering a separate slack instance instead of using the ASF
Slack instance?
The ASF instance is paid, so all messages are retained forever, and quite a
few people are already on that Slack instance.
There is already a #flink channel on that Slack instance, that we could
leave as passive as it is right now, or put some more effort into it, on a
voluntary basis.
We could add another #flink-dev channel to that Slack for developer
discussions, and a private flink-committer and flink-pmc chat.

If we are going that path, we should rework the "Community" and "Getting
Help" pages and explain that the mailing lists are the "ground truth tools"
in Flink, and Slack is only there to facilitate faster communication, but
it is optional / voluntary (e.g. a committers won't respond to DMs)

All public #flink-* channels should be archived and google-indexable.
I've asked Jarek from Airflow who's maintaining
http://apache-airflow.slack-archives.org.
If we can't use slack-archives.org, it would be nice to find some
volunteers in the Flink community to hack a simple indexing tool.
The indexing part is very important for me, because of some bad experiences
with the Kubernetes experience, where most of the advanced stuff is hidden
in their Slack, and it took me a few weeks to find that goldmine of
information.

Overall, I see this as an experiment worth doing, but I would suggest
revisiting it in 6 to 12 months: We should check if really all important
decisions are mirrored to the right mailing lists, and that we get the
benefits we hoped for (more adoption, better experience for users and
developers), and that we can handle the concerns (DMs to developers,
indexing).





On Sat, May 7, 2022 at 12:22 PM Xintong Song  wrote:

> Thanks all for the valuable feedback.
>
> It seems most people are overall positive about using Slack for dev
> discussions, as long as they are properly reflected back to the MLs.
> - We definitely need a code of conduct that clearly specifies what people
> should / should not do.
> - Contributors pinging well-known reviewers /committers, I think that also
> happens now on JIRA / Github. Personally, I'd understand a no-reply as a
> "soft no". We may consider to also put that in the cod of conduct.
>
> Concerning using Slack for user QAs, it seem the major concern is that, we
> may end up repeatedly answering the same questions from different users,
> due to lack of capacity for archiving and searching historical
> conversations. TBH, I don't have a good solution for the archivability and
> searchability. I investigated some tools like Zapier [1], but none of them
> seems suitable for us. However, I'd like to share 2 arguments.
> - The purpose of Slack is to make the communication more efficient? By
> *efficient*, I mean saving time for both question askers and helpers with
> instance messages, file transmissions, even voice / video calls, etc.
> (Especially for cases where back and forth is needed, as David mentioned.)
> It does not mean questions that do not get enough attentions on MLs are now
> guaranteed to be answered immediately. We can probably put that into the
> code of conduct, and kindly guide users to first search and initiate
> questions on MLs.
> - I'd also like to share some experience from the Flink China community. We
> have 3 DingTalk groups with totally 25k members (might be less, I didn't do
> deduplication), posting hundreds of messages daily. What I'm really excited
> about is that, there are way more interactions between users & users than
> between users & developers. Users are helping each other, sharing
> experiences, sending screenshots / log files / documentations and solving
> problems together. We the developers seldom get pinged, if not proactively
> joined the conversations. The DingTalk groups are way more active compared
> to the user-zh@ ML, which I'd attribute to the improvement of interaction
> experiences. Admittedly, there are questions being repeatedly asked &
> answered, but TBH I don't think that compares to the benefit of a
> self-driven user community. I'd really love to see if we can bring such
> success to the global English-speaking community.
>
> Concerning StackOverFlow, it definitely worth more attention from the
> community. Thanks for the suggestion / reminder, Piotr & David. I think
> Slack and StackOverFlow are probably not mutual exclusive.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://zapier.com/
>
>
>
> On Sat, May 7, 2022 at 9:50 AM Jingsong Li  wrote:
>
> > Most of the open source communities I know have set up their slack
> > channels, such as Apache Iceberg [1], Apache Druid [2], etc.
> > So I think slack can be worth trying.
> >
> > David is right, there are some cases that need to communicate back and
> > forth, slack communication will be more effective.
> >
> > But back to the question, ultimately it's about whether there are
> > enough core 

Re: Flink 1.15 Stabilisation Sync

2022-04-04 Thread Robert Metzger
Thanks a lot for the update.

>From the burndown board [1] it seems that there's only one blocker left,
which has already a fix in review [2].
Are we also planning to address the open Critical tickets before opening
the first voting release candidate?
When do you expect 1.15 to be out?

[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=505=detail=FLINK-26985
[2] https://issues.apache.org/jira/browse/FLINK-26985


On Mon, Mar 28, 2022 at 11:18 PM Johannes Moser  wrote:

> Dear Community,
>
> We are fairly on track with stabilising the 1.15 release, that’s why Yun
> Gao and me think we don’t need the sync anymore.
>
> So we skip it for now, starting from tomorrow. If the situation might
> change. We will let you know.
>
> Best,
> Joe


[ANNOUNCE] New Apache Flink Committer - David Morávek

2022-03-04 Thread Robert Metzger
Hi everyone,

On behalf of the PMC, I'm very happy to announce David Morávek as a new
Flink committer.

His first contributions to Flink date back to 2019. He has been
increasingly active with reviews and driving major initiatives in the
community. David brings valuable experience from being a committer in the
Apache Beam project to Flink.


Please join me in congratulating David for becoming a Flink committer!

Cheers,
Robert


[ANNOUNCE] New Apache Flink Committer - Martijn Visser

2022-03-03 Thread Robert Metzger
Hi everyone,

On behalf of the PMC, I'm very happy to announce Martijn Visser as a new
Flink committer.

Martijn is a very active Flink community member, driving a lot of efforts
on the dev@flink mailing list. He also pushes projects such as replacing
Google Analytics with Matomo, so that we can generate our web analytics
within the Apache Software Foundation.

Please join me in congratulating Martijn for becoming a Flink committer!

Cheers,
Robert


Re: [DISCUSS] Disable "Automated Checks / Review Progress" GitHub integration

2022-02-25 Thread Robert Metzger
Thank you all for your feedback. I've disabled the bot.

On Fri, Feb 18, 2022 at 5:41 AM Jingsong Li  wrote:

> +1 to remove it.
>
> Thanks for driving.
>
> Best,
> Jingsong
>
> On Thu, Feb 17, 2022 at 8:44 PM Till Rohrmann 
> wrote:
> >
> > +1 to remove it.
> >
> > Cheers,
> > Till
> >
> > On Thu, Feb 17, 2022 at 1:42 PM Martijn Visser 
> > wrote:
> >
> > > +1 to remove it
> > >
> > > On Thu, 17 Feb 2022 at 13:34, Chesnay Schepler 
> wrote:
> > >
> > > > +1 to remove it.
> > > >
> > > > On 17/02/2022 13:31, Konstantin Knauf wrote:
> > > > > +1
> > > > >
> > > > > On Thu, Feb 17, 2022 at 1:11 PM Robert Metzger <
> rmetz...@apache.org>
> > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> Some time ago, we added this "Automated Checks / Review Progress"
> [1]
> > > > bot
> > > > >> to the Flink PRs. I'm not aware of anybody using it, and I'm also
> not
> > > > sure
> > > > >> if it still works properly.
> > > > >>
> > > > >> Therefore, I propose to disable this bot. Please let me know if
> you
> > > > >> disagree, otherwise, I'll soon disable it.
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Robert
> > > > >>
> > > > >>
> > > > >> [1]
> https://github.com/apache/flink/pull/18818#issuecomment-1042865516
> > > > >>
> > > > >
> > > >
> > > >
> > >
>


[DISCUSS] Disable "Automated Checks / Review Progress" GitHub integration

2022-02-17 Thread Robert Metzger
Hi all,

Some time ago, we added this "Automated Checks / Review Progress" [1] bot
to the Flink PRs. I'm not aware of anybody using it, and I'm also not sure
if it still works properly.

Therefore, I propose to disable this bot. Please let me know if you
disagree, otherwise, I'll soon disable it.


Best,
Robert


[1]https://github.com/apache/flink/pull/18818#issuecomment-1042865516


[ANNOUNCE] New Apache Flink Committers: Feng Wang, Zhipeng Zhang

2022-02-16 Thread Robert Metzger
Hi everyone,

On behalf of the PMC, I'm very happy to announce two new Flink
committers: Feng Wang and Zhipeng Zhang!

Feng is one of the most active Flink evangelists in China, with plenty of
public talks, blog posts and other evangelization activities. The PMC wants
to recognize and value these efforts by making Feng a committer!

Zhipeng Zhang has made significant contributions to flink-ml, like most of
the FLIPs for our ML efforts.

Please join me in welcoming them as committers!


Best,
Robert


[ANNOUNCE] New Flink PMC members: Igal Shilman, Konstantin Knauf and Yun Gao

2022-02-16 Thread Robert Metzger
Hi all,

I would like to formally announce a few new Flink PMC members on the dev@
list. The PMC has not done a good job of always announcing new PMC members
(and committers) recently. I'll try to keep an eye on this in the future to
improve the situation.

Nevertheless, I'm very happy to announce some very active community members
as new PMC members:

- Igal Shilman, added to the PMC in October 2021
- Konstantin Knauf, added to the PMC in January 2022
- Yun Gao, added to the PMC in February 2022

Please join me in welcoming them to the Flink PMC!

Best,
Robert


Re: Azure Pipelines are dealing with an incident, causing pipeline runs to fail

2022-02-09 Thread Robert Metzger
I filed a support request with Microsoft:
https://developercommunity.visualstudio.com/t/Number-of-Microsoft-hosted-agents-droppe/1658827?from=email=21=newest

On Wed, Feb 9, 2022 at 1:04 PM Martijn Visser  wrote:

> Unfortunately it looks like there are still failures. Will keep you posted
>
> On Wed, 9 Feb 2022 at 11:51, Martijn Visser  wrote:
>
> > Hi everyone,
> >
> > The issue should now be resolved.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, 9 Feb 2022 at 10:55, Martijn Visser 
> wrote:
> >
> >> Hi everyone,
> >>
> >> Please keep in mind that Azure Pipelines currently is dealing with an
> >> incident [1] which causes all CI pipeline runs on Azure to fail. When
> the
> >> incident has been resolved, it will be required to retrigger your
> pipeline
> >> to see if the pipeline then passes.
> >>
> >> Best regards,
> >>
> >> Martijn Visser
> >> https://twitter.com/MartijnVisser82
> >>
> >> [1] https://status.dev.azure.com/_event/287959626
> >>
> >
>


[jira] [Created] (FLINK-25998) Flink akka runs into NoClassDefFoundError on shutdown

2022-02-07 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25998:
--

 Summary: Flink akka runs into NoClassDefFoundError on shutdown
 Key: FLINK-25998
 URL: https://issues.apache.org/jira/browse/FLINK-25998
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Robert Metzger


When trying to start a standalone jobmanager on an unavailable port, I see the 
following unexpected exception:

{code}
2022-02-08 08:07:18,299 INFO  akka.remote.Remoting  
   [] - Starting remoting
2022-02-08 08:07:18,357 ERROR akka.remote.transport.netty.NettyTransport
   [] - failed to bind to /0.0.0.0:6123, shutting down Netty transport
2022-02-08 08:07:18,373 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Shutting 
StandaloneApplicationClusterEntryPoint down with application status FAILED. 
Diagnostics java.net.BindException: Could not start actor system on any port in 
port range 6123
at 
org.apache.flink.runtime.rpc.akka.AkkaBootstrapTools.startRemoteActorSystem(AkkaBootstrapTools.java:133)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:358)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:327)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:247)
at 
org.apache.flink.runtime.rpc.RpcUtils.createRemoteRpcService(RpcUtils.java:191)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:334)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:253)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:203)
at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:200)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:684)
at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:82)
.
2022-02-08 08:07:18,377 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator[] - Shutting down 
remote daemon.
2022-02-08 08:07:18,377 ERROR org.apache.flink.util.FatalExitExceptionHandler   
   [] - FATAL: Thread 'flink-akka.remote.default-remote-dispatcher-6' 
produced an uncaught exception. Stopping the process...
java.lang.NoClassDefFoundError: 
akka/actor/dungeon/FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
at 
akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException(FaultHandling.scala:334)
 ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at 
akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException$(FaultHandling.scala:334)
 ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at 
akka.actor.ActorCell.handleNonFatalOrInterruptedException(ActorCell.scala:411) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.actor.ActorCell.invoke(ActorCell.scala:551) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.dispatch.Mailbox.run(Mailbox.scala:231) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.dispatch.Mailbox.exec(Mailbox.scala:243) 
[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
[?:1.8.0_312]
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
[?:1.8.0_312]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
[?:1.8.0_312]
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) 
[?:1.8.0_312]
Caused by: java.lang.ClassNotFoundException: 
akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:387) 
~[?:1.8.0_312]
at java.lang.ClassLoader.loadClass(ClassLoader.java:419) ~[?:1.8.0_312]
at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClassFromComponentOnly(ComponentClassLoader.java:149)
 ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClass

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Robert Metzger
Hey Gabor,

let me know your cwiki username, and I can give you write permissions.


On Fri, Jan 28, 2022 at 4:05 PM Gabor Somogyi 
wrote:

> Thanks for making the design better! No further thing to discuss from my
> side.
>
> Started to reflect the agreement in the FLIP doc.
> Since I don't have access to the wiki I need to ask Marci to do that which
> may take some time.
>
> G
>
>
> On Fri, Jan 28, 2022 at 3:52 PM David Morávek  wrote:
>
> > Hi,
> >
> > AFAIU an under registration TM is not added to the registered TMs map
> until
> > > RegistrationResponse ..
> > >
> >
> > I think you're right, with a careful design around threading (delegating
> > update broadcasts to the main thread) + synchronous initial update (that
> > would be nice to avoid) this should be doable.
> >
> > Not sure what you mean "we can't register the TM without providing it
> with
> > > token" but in unsecure configuration registration must happen w/o
> tokens.
> > >
> >
> > Exactly as you describe it, this was meant only for the "kerberized /
> > secured" cluster case, in other cases we wouldn't enforce a non-null
> token
> > in the response
> >
> > I think this is a good idea in general.
> > >
> >
> > +1
> >
> > If you don't have any more thoughts on the RPC / lifecycle part, can you
> > please reflect it into the FLIP?
> >
> > D.
> >
> > On Fri, Jan 28, 2022 at 3:16 PM Gabor Somogyi  >
> > wrote:
> >
> > > > - Make sure DTs issued by single DTMs are monotonically increasing
> (can
> > > be
> > > sorted on TM side)
> > >
> > > AFAIU an under registration TM is not added to the registered TMs map
> > until
> > > RegistrationResponse
> > > is processed which would contain the initial tokens. If that's true
> then
> > > how is it possible to have race with
> > > DTM update which is working on the registered TMs list?
> > > To be more specific "taskExecutors" is the registered map of TMs to
> which
> > > DTM can send updated tokens
> > > but this doesn't contain the under registration TM while
> > > RegistrationResponse is not processed, right?
> > >
> > > Of course if DTM can update while RegistrationResponse is processed
> then
> > > somehow sorting would be
> > > required and that case I would agree.
> > >
> > > - Scope DT updates by the RM ID and ensure that TM only accepts update
> > from
> > > the current leader
> > >
> > > I've planned this initially the mentioned way so agreed.
> > >
> > > - Return initial token with the RegistrationResponse, which should make
> > the
> > > RPC contract bit clearer (ensure that we can't register the TM without
> > > providing it with token)
> > >
> > > I think this is a good idea in general. Not sure what you mean "we
> can't
> > > register the TM without
> > > providing it with token" but in unsecure configuration registration
> must
> > > happen w/o tokens.
> > > All in all the newly added tokens field must be somehow optional.
> > >
> > > G
> > >
> > >
> > > On Fri, Jan 28, 2022 at 2:22 PM David Morávek  wrote:
> > >
> > > > We had a long discussion with Chesnay about the possible edge cases
> and
> > > it
> > > > basically boils down to the following two scenarios:
> > > >
> > > > 1) There is a possible race condition between TM registration (the
> > first
> > > DT
> > > > update) and token refresh if they happen simultaneously. Than the
> > > > registration might beat the refreshed token. This could be easily
> > > addressed
> > > > if DTs could be sorted (eg. by the expiration time) on the TM side.
> In
> > > > other words, if there are multiple updates at the same time we need
> to
> > > make
> > > > sure that we have a deterministic way of choosing the latest one.
> > > >
> > > > One idea by Chesnay that popped up during this discussion was whether
> > we
> > > > could simply return the initial token with the RegistrationResponse
> to
> > > > avoid making an extra call during the TM registration.
> > > >
> > > > 2) When the RM leadership changes (eg. because zookeeper session
> times
> > > out)
> > > > there might be a race condition where the old RM is shutting down and
> > > > updates the tokens, that it might again beat the registration token
> of
> > > the
> > > > new RM. This could be avoided if we scope the token by
> > > _ResourceManagerId_
> > > > and only accept updates for the current leader (basically we'd have
> an
> > > > extra parameter to the _updateDelegationToken_ method).
> > > >
> > > > -
> > > >
> > > > DTM is way simpler then for example slot management, which could
> > receive
> > > > updates from the JobMaster that RM might not know about.
> > > >
> > > > So if you want to go in the path you're describing it should be
> doable
> > > and
> > > > we'd propose following to cover all cases:
> > > >
> > > > - Make sure DTs issued by single DTMs are monotonically increasing
> (can
> > > be
> > > > sorted on TM side)
> > > > - Scope DT updates by the RM ID and ensure that TM only accepts
> update
> > > from
> > > > the current leader
> > > > - Return initial token 

Re: Flink native k8s integration vs. operator

2022-01-20 Thread Robert Metzger
Hi Alexis,

The usage of Custom Resource Definitions (CRDs). The main reason given to
> me was that such resources are global (for a given cluster) and that is not
> desired. I know that ultimately a CR based on a CRD can be scoped to a
> specific namespace, but customer is king…


I don't think this restriction applies to many organizations. K8s operators
are the de facto standard for deploying all kinds of software. There are
quite many projects that used to just have a Helm chart, that are now
switching over to provide operators, because they provide a much better
experience.
If you have more specifics on this concern that is relevant for the Flink
community, I'd like to hear that.


Kubernetes Service Accounts (SAs) with roles to create deployments/pods.
> This one is more understandable, particularly after the whole log4j
> debacle. Roles that manage solely deployment.scale subresources would be
> acceptable though.


This requirement is not strictly needed to deploy Flink on K8s. Only with
the native K8s integration of Flink, you need to give the Flink JVM a role
that allows creating other pods.


Best,
Robert

On Tue, Jan 18, 2022 at 5:18 PM Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> wrote:

> Hi everyone,
>
>
>
> Since I see this is getting some traction, I’d like to add a couple
> things. I had been developing a Kubernetes controller for Flink as a Proof
> of Concept at my company; I called it Flork because it was to be a Flink
> Orchestrator for Kubernetes. In the end, we will most likely not use this
> controller due to security concerns that were communicated to me. These
> concerns stem from the fact that our product would be used by customers in
> their own Kubernetes clusters, and many customers don’t want:
>
>
>
> - The usage of Custom Resource Definitions (CRDs). The main reason given
> to me was that such resources are global (for a given cluster) and that is
> not desired. I know that ultimately a CR based on a CRD can be scoped to a
> specific namespace, but customer is king…
>
>
>
> - Kubernetes Service Accounts (SAs) with roles to create deployments/pods.
> This one is more understandable, particularly after the whole log4j
> debacle. Roles that manage solely deployment.scale subresources would be
> acceptable though.
>
>
>
> I mention these in case they prove to be relevant for others in the
> current context. For us, it means we have to stick with something like
> standalone Kubernetes + reactive/adaptive.
>
>
>
> Nevertheless, the PoC I had was already functional and, while I would have
> to request permission to contribute the code to the community, it might be
> useful for these efforts. However, I’d first ask if there is actually
> interest in this code, considering these are some of the “features” it
> currently has:
>
>
>
> * The CRD relies on the Pod Template support included in Flink itself. As
> such, some of the fields in the CRD are “vanilla” pod specs, and the schema
> reflects that because it embeds a flattened version of the schema from [1].
> I’d also have a basic Helm chart ready.
>
>
>
> * The code is written in a mixture of Java and Kotlin, and is built with
> Gradle. I made heavy use of Kotlin Coroutines to implement some of the core
> logic in a non-blocking way.
>
>
>
> * The code already supports High Availability by leveraging Kubernetes
> leases and the corresponding helpers in Fabric8’s client.
>
>
>
> * The main deployment logic is delegated to Flink’s own flink-kubernetes
> module [2]. Nevertheless, my build shadows all the fabric8 classes and
> service definitions embedded in said module, so that the rest of the code
> can use other kubernetes-client versions independently.
>
>
>
> * The controller handles savepoint creation for redeployments after CR
> changes, e.g. upgrades. This would also work after controller fail-over
> with/without HA.
>
>
>
> * The code supports some extension for custom container images: classes
> defined in META-INF/services/ files are called as decorators for Flink’s
> conf file and/or the pod specs defined in the CR, and they could be copied
> to the image on top of an official base version.
>
>
>
> * A deployment mode without CRD could be supported --- I have some code
> that can run on top of the core controller and allows “embedding” a CR in a
> Config Map key. The translation between the CM and the core controller code
> is then done transparently.
>
>
>
> * I have a module that integrates the code with Inversion of Control
> containers such as Spring. I only used javax annotations (soon to be
> jakarta), so it’s not tied to Spring.
>
>
>
> Something I haven’t considered at all in my code is ingress for Flink’s UI.
>
>
>
> Let me know what you think.
>
>
>
> [1]
> https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/swagger.json
>
> [2] https://github.com/apache/flink/tree/master/flink-kubernetes
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Gyula Fóra 
> *Sent:* Montag, 17. Januar 2022 

[jira] [Created] (FLINK-25679) Build arm64 Linux images for Apache Flink

2022-01-17 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25679:
--

 Summary: Build arm64 Linux images for Apache Flink
 Key: FLINK-25679
 URL: https://issues.apache.org/jira/browse/FLINK-25679
 Project: Flink
  Issue Type: Improvement
  Components: flink-docker
Affects Versions: 1.15.0
Reporter: Robert Metzger


Building Flink images for arm64 Linux should be trivial to support, since 
upstream docker images support arm64, as well as frocksdb.

Building the images locally is also easily possible using Docker's buildx 
features, and the build system of the official docker images most likely 
supports ARM arch.

This improvement would allow us supporting development / testing on Apple 
M1-based systems, as well as ARM architecture at various cloud providers (AWS 
Graviton)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25505) Fix NetworkBufferPoolTest, SystemResourcesCounterTest on Apple M1

2022-01-03 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25505:
--

 Summary: Fix NetworkBufferPoolTest, SystemResourcesCounterTest on 
Apple M1 
 Key: FLINK-25505
 URL: https://issues.apache.org/jira/browse/FLINK-25505
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Runtime / Network
Affects Versions: 1.15.0
Reporter: Robert Metzger


As discussed in https://issues.apache.org/jira/browse/FLINK-23230, some tests 
in flink-runtime are not passing on M1 / Apple Silicon Macs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25327) ApplicationMode "DELETE /cluster" REST call leads to exit code 2, instead of 0

2021-12-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25327:
--

 Summary: ApplicationMode "DELETE /cluster" REST call leads to exit 
code 2, instead of 0
 Key: FLINK-25327
 URL: https://issues.apache.org/jira/browse/FLINK-25327
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: Robert Metzger


FLINK-24113 introduced a mode to keep the Application Mode JobManager running 
after the Job has been cancelled. Cluster shutdown needs to be initiated for 
example using the DELETE /cluster REST endpoint.

The problem is that there can be a fatal error during the shutdown, making the 
JobManager exit with return code != 0 (making resource managers believe there 
was an error with the Flink application)

Error 
{code}
2021-12-15 08:09:55,708 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Fatal error 
occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: Application failed unexpectedly.
at 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$finishBootstrapTasks$1(ApplicationDispatcherBootstrap.java:177)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2278) 
~[?:1.8.0_312]
at 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.stop(ApplicationDispatcherBootstrap.java:125)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$onStop$0(Dispatcher.java:284)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
org.apache.flink.util.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:696)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 ~[?:1.8.0_312]
at 
org.apache.flink.util.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
 ~[?:1.8.0_312]
at 
java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
 ~[?:1.8.0_312]
at 
org.apache.flink.util.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:693)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
org.apache.flink.util.concurrent.FutureUtils.runAfterwards(FutureUtils.java:660)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
org.apache.flink.runtime.dispatcher.Dispatcher.onStop(Dispatcher.java:281) 
~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
 ~[flink-dist-1.15-master-robert.jar:1.15-master-robert]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.lambda$terminate$0(AkkaRpcActor.java:580)
 ~[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc47.jar:1.15-master-robert]
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
 ~[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc47.jar:1.15-master-robert]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:579)
 ~[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc47.jar:1.15-master-robert]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:191)
 ~[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc47.jar:1.15-master-robert]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) 
[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc47.jar:1.15-master-robert]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) 
[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc47.jar:1.15-master-robert]
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) 
[flink-rpc-akka_44e0316d-9cf7-4fc8-9b48-4f6084b0cc4

[jira] [Created] (FLINK-25316) BlobServer can get stuck during shutdown

2021-12-14 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25316:
--

 Summary: BlobServer can get stuck during shutdown
 Key: FLINK-25316
 URL: https://issues.apache.org/jira/browse/FLINK-25316
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Robert Metzger
 Fix For: 1.15.0


The cluster shutdown can get stuck
{code}
"AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
[0x00402a9b5000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xd6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
at java.lang.Thread.join(Thread.java:1252)
- locked <0xd6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
- locked <0xd5d27350> (a java.lang.Object)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
{code}

because the BlobServer.run() method ignores interrupts:
{code}
"BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
tid=0x00401c929800 nid=0x2b4 runnable [0x0040263f9000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at java.net.ServerSocket.accept(ServerSocket.java:528)
at 
org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
{code}

This issue was introduced in FLINK-24156.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-30 Thread Robert Metzger
@Matthias Pohl : I've also been annoyed by this 30
days limit, but I'm not aware of a way to globally change the default. I
would ask in #asfinfra in the asf slack.

On Thu, Sep 30, 2021 at 12:19 PM Till Rohrmann  wrote:

> Thanks for the hint with the managed search engines Matthias. I think this
> is quite helpful.
>
> Cheers,
> Till
>
> On Wed, Sep 15, 2021 at 4:27 PM Matthias Pohl 
> wrote:
>
> > Thanks Leonard for the announcement. I guess that is helpful.
> >
> > @Robert is there any way we can change the default setting to something
> > else (e.g. greater than 0 days)? Only having the last month available as
> a
> > default is kind of annoying considering that the time setting is quite
> > hidden.
> >
> > Matthias
> >
> > PS: As a workaround, one could use the gte=0d parameter which is encoded
> in
> > the URL (e.g. if you use managed search engines in Chrome or Firefox's
> > bookmark keywords:
> > https://lists.apache.org/x/list.html?u...@flink.apache.org:gte=0d:%s).
> > That
> > will make all posts available right-away.
> >
> > On Mon, Sep 6, 2021 at 3:16 PM JING ZHANG  wrote:
> >
> > > Thanks Leonard for driving this.
> > > The information is helpful.
> > >
> > > Best,
> > > JING ZHANG
> > >
> > > Jark Wu  于2021年9月6日周一 下午4:59写道:
> > >
> > >> Thanks Leonard,
> > >>
> > >> I have seen many users complaining that the Flink mailing list doesn't
> > >> work (they were using Nabble).
> > >> I think this information would be very helpful.
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Mon, 6 Sept 2021 at 16:39, Leonard Xu  wrote:
> > >>
> > >>> Hi, all
> > >>>
> > >>> The mailing list archive service Nabble Archive was broken at the end
> > of
> > >>> June, the Flink community has migrated the mailing lists archives[1]
> to
> > >>> Apache Archive service by commit[2], you can refer [3] to know more
> > mailing
> > >>> lists archives of Flink.
> > >>>
> > >>> Apache Archive service is maintained by ASF thus the stability is
> > >>> guaranteed, it’s a web-based mail archive service which allows you to
> > >>> browse, search, interact, subscribe, unsubscribe, etc. with mailing
> > lists.
> > >>>
> > >>> Apache Archive service shows mails of the last month by default, you
> > can
> > >>> specify the date range to browse, search the history mails.
> > >>>
> > >>>
> > >>> Hope it would be helpful.
> > >>>
> > >>> Best,
> > >>> Leonard
> > >>>
> > >>> [1] The Flink mailing lists in Apache archive service
> > >>> dev mailing list archives:
> > >>> https://lists.apache.org/list.html?dev@flink.apache.org
> > >>> user mailing list archives :
> > >>> https://lists.apache.org/list.html?u...@flink.apache.org
> > >>> user-zh mailing list archives :
> > >>> https://lists.apache.org/list.html?user...@flink.apache.org
> > >>> [2]
> > >>>
> >
> https://github.com/apache/flink-web/commit/9194dda862da00d93f627fd315056471657655d1
> > >>> [3] https://flink.apache.org/community.html#mailing-lists
> > >>
> > >>
> >
>


[jira] [Created] (FLINK-24395) Checkpoint trigger time difference between log statement and web frontend

2021-09-28 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24395:
--

 Summary: Checkpoint trigger time difference between log statement 
and web frontend
 Key: FLINK-24395
 URL: https://issues.apache.org/jira/browse/FLINK-24395
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.14.0
Reporter: Robert Metzger
 Attachments: image-2021-09-28-12-20-34-332.png

Consider this checkpoint (68)

{code}
2021-09-28 10:14:43,644 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 68 (type=CHECKPOINT) @ 1632823660151 for job 
.
2021-09-28 10:16:41,428 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Completed 
checkpoint 68 for job  (128940015376 bytes, 
checkpointDuration=540908 ms, finalizationTime=369 ms).
{code}

And what is shown in the UI about it:

 !image-2021-09-28-12-20-34-332.png! 

The trigger time is off by ~7 minutes. It seems that the trigger message is 
logged too late.
(note that this has happened in a system where savepoint disposal is very slow)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24392) Upgrade presto s3 fs implementation to Trinio >= 348

2021-09-28 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24392:
--

 Summary: Upgrade presto s3 fs implementation to Trinio >= 348
 Key: FLINK-24392
 URL: https://issues.apache.org/jira/browse/FLINK-24392
 Project: Flink
  Issue Type: Improvement
  Components: FileSystems
Affects Versions: 1.14.0
Reporter: Robert Metzger
 Fix For: 1.15.0


The Presto s3 filesystem implementation currently shipped with Flink doesn't 
support streaming uploads. All data needs to be materialized to a single file 
on disk, before it can be uploaded.
This can lead to situations where TaskManagers are running out of disk when 
creating a savepoint.

The Hadoop filesystem implementation supports streaming uploads (by using 
multipart uploads of smaller (say 100mb) files locally), but it does more API 
calls, leading to other issues.

Trinion 348 supports streaming uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Drop Scala Shell

2021-09-21 Thread Robert Metzger
+1

On Mon, Sep 20, 2021 at 4:39 PM Seth Wiesman  wrote:

> +1
>
> On Mon, Sep 20, 2021 at 6:04 AM Chesnay Schepler 
> wrote:
>
> > +1
> >
> > On 20/09/2021 09:38, Martijn Visser wrote:
> > > Hi all,
> > >
> > > I would like to start a vote on dropping the Scala Shell. This was
> > > previously discussed in the mailing list [1]
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > not enough votes.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > [1]
> > >
> >
> https://lists.apache.org/thread.html/rf7a7f935c43d3e98f94193be81b69f1c0d6e60b6fa09570531c3fa67%40%3Cdev.flink.apache.org%3E
> > >
> >
> >
>


[jira] [Created] (FLINK-24320) Show in the Job / Checkpoints / Configuration if checkpoints are incremental

2021-09-17 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24320:
--

 Summary: Show in the Job / Checkpoints / Configuration if 
checkpoints are incremental
 Key: FLINK-24320
 URL: https://issues.apache.org/jira/browse/FLINK-24320
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing, Runtime / Web Frontend
Affects Versions: 1.13.2
Reporter: Robert Metzger
 Attachments: image-2021-09-17-13-31-02-148.png, 
image-2021-09-17-13-31-32-311.png

It would be nice if the overview would also show if incremental checkpoints are 
enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24208) Allow idempotent savepoint triggering

2021-09-08 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24208:
--

 Summary: Allow idempotent savepoint triggering
 Key: FLINK-24208
 URL: https://issues.apache.org/jira/browse/FLINK-24208
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Reporter: Robert Metzger


As a user of Flink, I want to be able to trigger a savepoint from an external 
system in a way that I can detect if I have requested this savepoint already.

By passing a custom ID to the savepoint request, I can check (in case of an 
error of the original request, or the external system) if the request has been 
made already.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24114) Make CompletedOperationCache.COMPLETED_OPERATION_RESULT_CACHE_DURATION_SECONDS configurable (at least for savepoint trigger operations)

2021-09-01 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24114:
--

 Summary: Make 
CompletedOperationCache.COMPLETED_OPERATION_RESULT_CACHE_DURATION_SECONDS 
configurable (at least for savepoint trigger operations)
 Key: FLINK-24114
 URL: https://issues.apache.org/jira/browse/FLINK-24114
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Robert Metzger


Currently, it can happen that external services triggering savepoints can not 
persist the savepoint location from the savepoint handler, because the 
operation cache has a hardcoded value of 5 minutes, until entries (which have 
been accessed at least once) are evicted.
To avoid scenarios where the savepoint location has been accessed, but the 
external system failed to persist the location, I propose to make this eviction 
timeout configurable (so that I as a user can configure a value of 24 hours for 
the cache eviction).

(This is related to FLINK-24113)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24113) Introduce option in Application Mode to request cluster shutdown

2021-09-01 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24113:
--

 Summary: Introduce option in Application Mode to request cluster 
shutdown
 Key: FLINK-24113
 URL: https://issues.apache.org/jira/browse/FLINK-24113
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Robert Metzger


Currently a Flink JobManager started in Application Mode will shut down once 
the job has completed.

When doing a "stop with savepoint" operation, we want to keep the JobManager 
alive after the job has stopped to retrieve and persist the final savepoint 
location.
Currently, Flink waits up to 5 minutes and then shuts down.

I'm proposing to introduce a new configuration flag "application mode shutdown 
behavior": "keepalive" (naming things is hard ;) ) which will keep the 
JobManager in ApplicationMode running until a REST call confirms that it can 
shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24037) Allow wildcards in ENABLE_BUILT_IN_PLUGINS

2021-08-28 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-24037:
--

 Summary: Allow wildcards in ENABLE_BUILT_IN_PLUGINS
 Key: FLINK-24037
 URL: https://issues.apache.org/jira/browse/FLINK-24037
 Project: Flink
  Issue Type: Improvement
  Components: flink-docker
Reporter: Robert Metzger


As a user of Flink, I would like to be able to specify a certain default 
plugin, (such as the S3 presto FS) without having to specific the Flink version 
again.
The Flink version is already specified by the Docker container I'm using.

If one is using generic deployment scripts, I don't want to put the Flink 
version in two locations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23925) HistoryServer: Archiving job with more than one attempt fails

2021-08-23 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-23925:
--

 Summary: HistoryServer: Archiving job with more than one attempt 
fails
 Key: FLINK-23925
 URL: https://issues.apache.org/jira/browse/FLINK-23925
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.2
Reporter: Robert Metzger


Error:
{code}
2021-08-23 16:26:01,953 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Disconnect job manager 
0...@akka.tcp://flink@localhost:6123/user/rpc/jobmanager_2
 for job ca9f6a073d311d60f457a1c4243e7dc3 from the resource manager.
2021-08-23 16:26:02,137 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Could not 
archive completed job 
CarTopSpeedWindowingExample(ca9f6a073d311d60f457a1c4243e7dc3) to the history 
server.
java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: 
attempt does not exist
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
 ~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
 [?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
 [?:1.8.0_252]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_252]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.lang.IllegalArgumentException: attempt does not exist
at 
org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:109)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:31)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
org.apache.flink.runtime.rest.handler.job.SubtaskExecutionAttemptDetailsHandler.archiveJsonWithPath(SubtaskExecutionAttemptDetailsHandler.java:140)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
org.apache.flink.runtime.webmonitor.history.OnlyExecutionGraphJsonArchivist.archiveJsonWithPath(OnlyExecutionGraphJsonArchivist.java:51)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
org.apache.flink.runtime.webmonitor.WebMonitorEndpoint.archiveJsonWithPath(WebMonitorEndpoint.java:1031)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
org.apache.flink.runtime.dispatcher.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:61)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49)
 ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
 ~[?:1.8.0_252]
... 3 more
{code}

Steps to reproduce:
- start a Flink reactive mode job manager:
mkdir usrlib
cp ./examples/streaming/TopSpeedWindowing.jar usrlib/
# Submit Job in Reactive Mode
./bin/standalone-job.sh start -Dscheduler-mode=reactive 
-Dexecution.checkpointing.interval="10s" -j 
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing
# Start first TaskManager
./bin/taskmanager.sh start

- Add another taskmanager to trigger a restart
- Cancel the job

See the failure in the jobmanager logs.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23913) UnalignedCheckpointITCase fails with exit code 137 (kernel oom) on Azure VMs

2021-08-23 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-23913:
--

 Summary: UnalignedCheckpointITCase fails with exit code 137 
(kernel oom) on Azure VMs
 Key: FLINK-23913
 URL: https://issues.apache.org/jira/browse/FLINK-23913
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.14.0
 Environment: UnalignedCheckpointITCase
Reporter: Robert Metzger
 Fix For: 1.14.0


Cases reported in FLINK-23525:
- 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=22618=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798=10338
- 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=22618=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=4743
- 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=22605=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=4743
- ... there are a lot more cases.

The problem seems to have started occurring around August 20.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23589) Support Avro Microsecond precision

2021-08-02 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-23589:
--

 Summary: Support Avro Microsecond precision
 Key: FLINK-23589
 URL: https://issues.apache.org/jira/browse/FLINK-23589
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Robert Metzger
 Fix For: 1.14.0


This was raised by a user: 
https://lists.apache.org/thread.html/r463f748358202d207e4bf9c7fdcb77e609f35bbd670dbc5278dd7615%40%3Cuser.flink.apache.org%3E

Here's the Avro spec: 
https://avro.apache.org/docs/1.8.0/spec.html#Timestamp+%28microsecond+precision%29



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23562) Update CI docker image to latest java version (1.8.0_292)

2021-07-30 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-23562:
--

 Summary: Update CI docker image to latest java version (1.8.0_292)
 Key: FLINK-23562
 URL: https://issues.apache.org/jira/browse/FLINK-23562
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / Azure Pipelines
Reporter: Robert Metzger
 Fix For: 1.14.0


The java version we are using on our CI is outdated (1.8.0_282 vs 1.8.0_292). 
The latest java version has TLSv1 disabled, which makes the 
KubernetesClusterDescriptorTest fail.

This will be fixed by FLINK-22802.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release 1.12.5, release candidate #3

2021-07-30 Thread Robert Metzger
Thanks a lot for providing the new staging repository. I dropped the 1440
and 1441 staging repositories, to avoid that other RC reviewers
accidentally look into it, or that we accidentally release it.

+1 (binding)

Checks:
- I didn't find any additional issues in the release announcement
- the pgp signatures on the source archive seem fine
- source archive compilation starts successfully (rat check passes etc.)
- standalone mode, job submission and cli cancellation works. logs look fine
- maven staging repository looks fine

On Fri, Jul 30, 2021 at 7:30 AM Jingsong Li  wrote:

> Hi everyone,
>
> Thanks Robert, I created a new one.
>
> all artifacts to be deployed to the Maven Central Repository [4],
>
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1444/
>
> Best,
> Jingsong
>
> On Thu, Jul 29, 2021 at 9:50 PM Robert Metzger 
> wrote:
>
> > The difference is that the 1440 staging repository contains the Scala
> _2.11
> > files, the 1441 repo contains scala_2.12. I'm not sure if this works,
> > because things like "flink-core:1.11.5" will be released twice?
> > I would prefer to have a single staging repository containing all
> binaries
> > we intend to release to maven central, to avoid complications in the
> > release process.
> >
> > Since only the convenience binaries are affected by this, we don't need
> to
> > cancel the release. We just need to create a new staging repository.
> >
> >
> > On Thu, Jul 29, 2021 at 3:36 PM Robert Metzger 
> > wrote:
> >
> > > Thanks a lot for creating a release candidate!
> > >
> > > What is the difference between the two maven staging repos?
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1440/
> > >  and
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1441/
> > ?
> > >
> > > On Thu, Jul 29, 2021 at 1:52 PM Xingbo Huang 
> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> - Verified checksums and signatures
> > >> - Built from sources
> > >> - Verified Python wheel package contents
> > >> - Pip install Python wheel package in Mac
> > >> - Run Python UDF job in Python REPL
> > >>
> > >> Best,
> > >> Xingbo
> > >>
> > >> Zakelly Lan  于2021年7月29日周四 下午5:57写道:
> > >>
> > >> > +1 (non-binding)
> > >> >
> > >> > * Built from source.
> > >> > * Run wordcount datastream job on yarn
> > >> > * Web UI and checkpoint seem good.
> > >> > * Kill a container to make job failover, everything is good.
> > >> > * Try run job from checkpoint, everything is good.
> > >> >
> > >> > On Thu, Jul 29, 2021 at 2:34 PM Yun Tang  wrote:
> > >> >
> > >> > > +1 (non-binding)
> > >> > >
> > >> > > Checked the signature.
> > >> > >
> > >> > > Reviewed the PR of flink-web.
> > >> > >
> > >> > > Download the pre-built tar package and launched an application
> mode
> > >> > > standalone job successfully.
> > >> > >
> > >> > > Best
> > >> > > Yun Tang
> > >> > >
> > >> > >
> > >> > > 
> > >> > > From: Jingsong Li 
> > >> > > Sent: Tuesday, July 27, 2021 11:54
> > >> > > To: dev 
> > >> > > Subject: [VOTE] Release 1.12.5, release candidate #3
> > >> > >
> > >> > > Hi everyone,
> > >> > >
> > >> > > Please review and vote on the release candidate #3 for the version
> > >> > 1.12.5,
> > >> > > as follows:
> > >> > > [ ] +1, Approve the release
> > >> > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > >> > >
> > >> > > The complete staging area is available for your review, which
> > >> includes:
> > >> > > * JIRA release notes [1],
> > >> > > * the official Apache source release and binary convenience
> releases
> > >> to
> > >> > be
> > >> > > deployed to dist.apache.org [2], which are signed with the key
> with
> > >> > > fingerprint FBB83C0A4FFB9CA8 [3],
> > >> > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > >> > > * source code tag "release-1.12.5-rc3" [5],
> > >> > > * website pull request listing the new release and adding
> > announcement
> > >> > blog
> > >> > > post [6].
> > >> > >
> > >> > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > >> > > approval, with at least 3 PMC affirmative votes.
> > >> > >
> > >> > > Best,
> > >> > > Jingsong Lee
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350166
> > >> > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.12.5-rc3/
> > >> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >> > > [4]
> > >> > >
> > >>
> https://repository.apache.org/content/repositories/orgapacheflink-1440/
> > >> > >
> > >>
> https://repository.apache.org/content/repositories/orgapacheflink-1441/
> > >> > > [5]
> https://github.com/apache/flink/releases/tag/release-1.12.5-rc3
> > >> > > [6] https://github.com/apache/flink-web/pull/455
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [VOTE] Release 1.13.2, release candidate #3

2021-07-29 Thread Robert Metzger
Thanks a lot for creating this release candidate

+1 (binding)

- staging repository looks fine
- Diff to 1.13.1 looks fine wrt to dependency changes:
https://github.com/apache/flink/compare/release-1.13.1...release-1.13.2-rc3
- standalone mode works locally
   - I found this issue, which is not specific to 1.13.2:
https://issues.apache.org/jira/browse/FLINK-23546
- src archive signature is matched; sha512 is correct

On Thu, Jul 29, 2021 at 9:10 AM Zakelly Lan  wrote:

> +1 (non-binding)
>
> * Built from source.
> * Run wordcount datastream job on yarn
> * Web UI and checkpoint seem good.
> * Kill a container to make job failover, everything is good.
> * Try run job from checkpoint, everything is good.
>
> On Fri, Jul 23, 2021 at 10:04 PM Yun Tang  wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #3 for the version
> 1.13.2,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 78A306590F1081CC6794DC7F62DAD618E07CF996 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.13.2-rc3" [5],
> > * website pull request listing the new release and adding announcement
> > blog post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Best,
> > Yun Tang
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218==12315522
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.2-rc3/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1439/
> > [5] https://github.com/apache/flink/releases/tag/release-1.13.2-rc3
> > [6] https://github.com/apache/flink-web/pull/453
> >
> >
>


[jira] [Created] (FLINK-23546) stop-cluster.sh produces warning on macOS 11.4

2021-07-29 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-23546:
--

 Summary: stop-cluster.sh produces warning on macOS 11.4
 Key: FLINK-23546
 URL: https://issues.apache.org/jira/browse/FLINK-23546
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Scripts
Affects Versions: 1.14.0
Reporter: Robert Metzger


Since FLINK-17470, we are stopping daemons with a timeout, to SIGKILL them if 
they are not gracefully stopping.

I noticed that this mechanism causes warnings on macOS:

{code}
❰robert❙/tmp/flink-1.14-SNAPSHOT❱✔≻ ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host MacBook-Pro-2.localdomain.
Starting taskexecutor daemon on host MacBook-Pro-2.localdomain.
❰robert❙/tmp/flink-1.14-SNAPSHOT❱✔≻ ./bin/stop-cluster.sh
Stopping taskexecutor daemon (pid: 50044) on host MacBook-Pro-2.localdomain.
tail: illegal option -- -
usage: tail [-F | -f | -r] [-q] [-b # | -c # | -n #] [file ...]
Stopping standalonesession daemon (pid: 49812) on host 
MacBook-Pro-2.localdomain.
tail: illegal option -- -
usage: tail [-F | -f | -r] [-q] [-b # | -c # | -n #] [file ...]
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release 1.12.5, release candidate #3

2021-07-29 Thread Robert Metzger
The difference is that the 1440 staging repository contains the Scala _2.11
files, the 1441 repo contains scala_2.12. I'm not sure if this works,
because things like "flink-core:1.11.5" will be released twice?
I would prefer to have a single staging repository containing all binaries
we intend to release to maven central, to avoid complications in the
release process.

Since only the convenience binaries are affected by this, we don't need to
cancel the release. We just need to create a new staging repository.


On Thu, Jul 29, 2021 at 3:36 PM Robert Metzger  wrote:

> Thanks a lot for creating a release candidate!
>
> What is the difference between the two maven staging repos?
> https://repository.apache.org/content/repositories/orgapacheflink-1440/
>  and
> https://repository.apache.org/content/repositories/orgapacheflink-1441/ ?
>
> On Thu, Jul 29, 2021 at 1:52 PM Xingbo Huang  wrote:
>
>> +1 (non-binding)
>>
>> - Verified checksums and signatures
>> - Built from sources
>> - Verified Python wheel package contents
>> - Pip install Python wheel package in Mac
>> - Run Python UDF job in Python REPL
>>
>> Best,
>> Xingbo
>>
>> Zakelly Lan  于2021年7月29日周四 下午5:57写道:
>>
>> > +1 (non-binding)
>> >
>> > * Built from source.
>> > * Run wordcount datastream job on yarn
>> > * Web UI and checkpoint seem good.
>> > * Kill a container to make job failover, everything is good.
>> > * Try run job from checkpoint, everything is good.
>> >
>> > On Thu, Jul 29, 2021 at 2:34 PM Yun Tang  wrote:
>> >
>> > > +1 (non-binding)
>> > >
>> > > Checked the signature.
>> > >
>> > > Reviewed the PR of flink-web.
>> > >
>> > > Download the pre-built tar package and launched an application mode
>> > > standalone job successfully.
>> > >
>> > > Best
>> > > Yun Tang
>> > >
>> > >
>> > > 
>> > > From: Jingsong Li 
>> > > Sent: Tuesday, July 27, 2021 11:54
>> > > To: dev 
>> > > Subject: [VOTE] Release 1.12.5, release candidate #3
>> > >
>> > > Hi everyone,
>> > >
>> > > Please review and vote on the release candidate #3 for the version
>> > 1.12.5,
>> > > as follows:
>> > > [ ] +1, Approve the release
>> > > [ ] -1, Do not approve the release (please provide specific comments)
>> > >
>> > > The complete staging area is available for your review, which
>> includes:
>> > > * JIRA release notes [1],
>> > > * the official Apache source release and binary convenience releases
>> to
>> > be
>> > > deployed to dist.apache.org [2], which are signed with the key with
>> > > fingerprint FBB83C0A4FFB9CA8 [3],
>> > > * all artifacts to be deployed to the Maven Central Repository [4],
>> > > * source code tag "release-1.12.5-rc3" [5],
>> > > * website pull request listing the new release and adding announcement
>> > blog
>> > > post [6].
>> > >
>> > > The vote will be open for at least 72 hours. It is adopted by majority
>> > > approval, with at least 3 PMC affirmative votes.
>> > >
>> > > Best,
>> > > Jingsong Lee
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350166
>> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.5-rc3/
>> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> > > [4]
>> > >
>> https://repository.apache.org/content/repositories/orgapacheflink-1440/
>> > >
>> https://repository.apache.org/content/repositories/orgapacheflink-1441/
>> > > [5] https://github.com/apache/flink/releases/tag/release-1.12.5-rc3
>> > > [6] https://github.com/apache/flink-web/pull/455
>> > >
>> >
>>
>


Re: [VOTE] Release 1.12.5, release candidate #3

2021-07-29 Thread Robert Metzger
Thanks a lot for creating a release candidate!

What is the difference between the two maven staging repos?
https://repository.apache.org/content/repositories/orgapacheflink-1440/ and
https://repository.apache.org/content/repositories/orgapacheflink-1441/ ?

On Thu, Jul 29, 2021 at 1:52 PM Xingbo Huang  wrote:

> +1 (non-binding)
>
> - Verified checksums and signatures
> - Built from sources
> - Verified Python wheel package contents
> - Pip install Python wheel package in Mac
> - Run Python UDF job in Python REPL
>
> Best,
> Xingbo
>
> Zakelly Lan  于2021年7月29日周四 下午5:57写道:
>
> > +1 (non-binding)
> >
> > * Built from source.
> > * Run wordcount datastream job on yarn
> > * Web UI and checkpoint seem good.
> > * Kill a container to make job failover, everything is good.
> > * Try run job from checkpoint, everything is good.
> >
> > On Thu, Jul 29, 2021 at 2:34 PM Yun Tang  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Checked the signature.
> > >
> > > Reviewed the PR of flink-web.
> > >
> > > Download the pre-built tar package and launched an application mode
> > > standalone job successfully.
> > >
> > > Best
> > > Yun Tang
> > >
> > >
> > > 
> > > From: Jingsong Li 
> > > Sent: Tuesday, July 27, 2021 11:54
> > > To: dev 
> > > Subject: [VOTE] Release 1.12.5, release candidate #3
> > >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #3 for the version
> > 1.12.5,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint FBB83C0A4FFB9CA8 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.12.5-rc3" [5],
> > > * website pull request listing the new release and adding announcement
> > blog
> > > post [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350166
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.5-rc3/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1440/
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1441/
> > > [5] https://github.com/apache/flink/releases/tag/release-1.12.5-rc3
> > > [6] https://github.com/apache/flink-web/pull/455
> > >
> >
>


Re: [VOTE] Release 1.11.4, release candidate #1

2021-07-29 Thread Robert Metzger
Thanks a lot for creating the release candidate!

+1 (binding)

Checks:
- manually checked the diff [1]. License documentation seems to be properly
maintained in all changes (mostly a jackson version dump, and some ES +
Kinesis bumps)
- checked standalone mode, job submission, logs locally.
- checked the flink-web PR
- checked the maven staging repo


[1]
https://github.com/apache/flink/compare/release-1.11.3...release-1.11.4-rc1
and
https://github.com/apache/flink/compare/ca2ac0108bf4050ba7efc4fa729e5f7fdf3da459...release-1.11.4-rc1
(which excludes the code reformatting)

On Mon, Jul 26, 2021 at 5:26 PM godfrey he  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 1.11.4,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 4A978875E56AA2100EB0CF12A244D52CF0A40279 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.11.4-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Best,
> Godfrey
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349404
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.4-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1438
> [5] https://github.com/apache/flink/releases/tag/release-1.11.4-rc1
> [6] https://github.com/apache/flink-web/pull/459
>


Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-20 Thread Robert Metzger
+1 to this change!

When I was working on the reactive mode blog post [1] I also ran into this
issue, leading to a poor "out of the box" experience when scaling down.
For my experiments, I've chosen a timeout of 8 seconds, and the cluster has
been running for 76 days (so far) on Kubernetes.
I also consider this change somewhat low-risk, because we can provide a
quick fix for people running into problems.

[1]https://flink.apache.org/2021/05/06/reactive-mode.html


On Fri, Jul 16, 2021 at 7:05 PM Till Rohrmann  wrote:

> Hi everyone,
>
> Since Flink 1.5 we have the same heartbeat timeout and interval default
> values that are defined as heartbeat.timeout: 50s and heartbeat.interval:
> 10s. These values were mainly chosen to compensate for lengthy GC pauses
> and blocking operations that were executed in the main threads of Flink's
> components. Since then, there were quite some advancements wrt the JVM's
> GCs and we also got rid of a lot of blocking calls that were executed in
> the main thread. Moreover, a long heartbeat.timeout causes long recovery
> times in case of a TaskManager loss because the system can only properly
> recover after the dead TaskManager has been removed from the scheduler.
> Hence, I wanted to propose to change the timeout and interval to:
>
> heartbeat.timeout: 15s
> heartbeat.interval: 3s
>
> Since there is no perfect solution that fits all use cases, I would really
> like to hear from you what you think about it and how you configure these
> heartbeat options. Based on your experience we might actually come up with
> better default values that allow us to be resilient but also to detect
> failed components fast. FLIP-185 can be found here [1].
>
> [1] https://cwiki.apache.org/confluence/x/GAoBCw
>
> Cheers,
> Till
>


Re: [DISCUSS] Address deprecation warnings when upgrading dependencies

2021-07-15 Thread Robert Metzger
>
> Maybe we could leverage sonar cloud infrastructure for this. They already
> have built in rules for deprecation warnings [1]. Also they have a free
> offering for public open-source repositories [2].
>

Cool, I didn't know this. There are already quite a lot of Apache Projects
there: https://sonarcloud.io/organizations/apache/projects, and INFRA seems
to be very open to adding more projects:
https://issues.apache.org/jira/browse/INFRA-19555

Do you know if GH PR integration is also available? (so that it shows a
warning in the PR if inspections on the change trigger)


Re: [DISCUSS] Address deprecation warnings when upgrading dependencies

2021-07-14 Thread Robert Metzger
For implementing this in practice, we could also extend our CI pipeline a
bit, and count the number of deprecation warnings while compiling Flink.
We would hard-code the current number of deprecations and fail the build if
that number increases.

We could actually extend this and run a curated list of IntelliJ
inspections during the build (IIRC this was discussed in the past):
https://www.jetbrains.com/help/idea/command-line-code-inspector.html

On Wed, Jul 14, 2021 at 11:14 AM Chesnay Schepler 
wrote:

> It may be better to not do that to ease the migration to junit5, where
> we have to address exactly these usages.
>
> On 14/07/2021 09:57, Till Rohrmann wrote:
> > I actually found
> > myself recently, whenever touching a test class, replacing Junit's
> > assertThat with Hamcrest's version which felt quite tedious.
>
>
>


  1   2   3   4   5   6   7   8   9   10   >