Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Jark Wu
+1 (binding)

- checked/verified signatures and hashes
- started cluster and run some e2e sql queries using SQL Client, results
are as expect:
 * read from kafka source, window aggregate, lookup mysql database, write
into elasticsearch
 * window aggregate using legacy window syntax and new window TVF
 * verified web ui and log output
- reviewed the release PR

I found the log contains some verbose information when using window
aggregate,
but I think this doesn't block the release, I created FLINK-22522 to fix
it.

Best,
Jark


On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz 
wrote:

> Hey Matthias,
>
> I'd like to double confirm what Guowei said. The dependency is Apache 2
> licensed and we do not bundle it in our jar (as it is in the runtime
> scope) thus we do not need to mention it in the NOTICE file (btw, the
> best way to check what is bundled is to check the output of maven shade
> plugin). Thanks for checking it!
>
> Best,
>
> Dawid
>
> On 29/04/2021 05:25, Guowei Ma wrote:
> > Hi, Matthias
> >
> > Thank you very much for your careful inspection.
> > I check the flink-python_2.11-1.13.0.jar and we do not bundle
> > org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> > So I think we may not need to add this to the NOTICE file. (BTW The jar's
> > scope is runtime)
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
> > wrote:
> >
> >> Thanks Dawid and Guowei for managing this release.
> >>
> >> - downloaded the sources and binaries and checked the checksums
> >> - built Flink from the downloaded sources
> >> - executed example jobs with standalone deployments - I didn't find
> >> anything suspicious in the logs
> >> - reviewed release announcement pull request
> >>
> >> - I did a pass over dependency updates: git diff release-1.12.2
> >> release-1.13.0-rc2 */*.xml
> >> There's one thing someone should double-check whether that's suppose to
> be
> >> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
> >> dependency but I don't see it being reflected in the NOTICE file of the
> >> flink-python module. Or is this automatically added later on?
> >>
> >> +1 (non-binding; please see remark on dependency above)
> >>
> >> Matthias
> >>
> >> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen  wrote:
> >>
> >>> Glad to hear that outcome. And no worries about the false alarm.
> >>> Thank you for doing thorough testing, this is very helpful!
> >>>
> >>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
> >> wrote:
>  After the investigation we found that this issue is caused by the
>  implementation of connector, not by the Flink framework.
> 
>  Sorry for the false alarm.
> 
>  Stephan Ewen  于2021年4月28日周三 下午3:23写道:
> 
> > @Caizhi and @Becket - let me reach out to you to jointly debug this
>  issue.
> > I am wondering if there is some incorrect reporting of failed events?
> >
> > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
>  wrote:
> >> -1
> >>
> >> We're testing this version on batch jobs with large (600~1000)
> > parallelisms
> >> and the following exception messages appear with high frequency:
> >>
> >> 2021-04-27 21:27:26
> >> org.apache.flink.util.FlinkException: An OperatorEvent from an
> >> OperatorCoordinator to a task was lost. Triggering task failover to
> > ensure
> >> consistency. Event: '[NoMoreSplitEvent]', targetTask:  -
> >> execution #0
> >> at
> >>
> >>
> >>
> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
> >> at
> >>
> >>
> >>
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
> >> at
> >>
> >>
> >>
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
> >> at
> >>
> >>
> >>
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> >> at
> >>
> >>
> >>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
> >> at
> >>
> >>
> >>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
> >> at
> >>
> >>
> >>
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> >> at
> >>
> >>
> >>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> >> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> >> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> >> at
> >> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> >> at akka.japi.pf
> >>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> >> at
> >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> >> at
> >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> >> at
> >>> scala.PartialFunction$OrElse.applyO

Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Dian Fu
+1 (binding)

- Verified the signature and checksum
- Installed PyFlink successfully using the source package
- Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream API 
with state access, Python DataStream API with batch execution mode
- Reviewed the website PR

Regards,
Dian

> 2021年4月29日 下午3:11,Jark Wu  写道:
> 
> +1 (binding)
> 
> - checked/verified signatures and hashes
> - started cluster and run some e2e sql queries using SQL Client, results
> are as expect:
> * read from kafka source, window aggregate, lookup mysql database, write
> into elasticsearch
> * window aggregate using legacy window syntax and new window TVF
> * verified web ui and log output
> - reviewed the release PR
> 
> I found the log contains some verbose information when using window
> aggregate,
> but I think this doesn't block the release, I created FLINK-22522 to fix
> it.
> 
> Best,
> Jark
> 
> 
> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz 
> wrote:
> 
>> Hey Matthias,
>> 
>> I'd like to double confirm what Guowei said. The dependency is Apache 2
>> licensed and we do not bundle it in our jar (as it is in the runtime
>> scope) thus we do not need to mention it in the NOTICE file (btw, the
>> best way to check what is bundled is to check the output of maven shade
>> plugin). Thanks for checking it!
>> 
>> Best,
>> 
>> Dawid
>> 
>> On 29/04/2021 05:25, Guowei Ma wrote:
>>> Hi, Matthias
>>> 
>>> Thank you very much for your careful inspection.
>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle
>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
>>> So I think we may not need to add this to the NOTICE file. (BTW The jar's
>>> scope is runtime)
>>> 
>>> Best,
>>> Guowei
>>> 
>>> 
>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
>>> wrote:
>>> 
 Thanks Dawid and Guowei for managing this release.
 
 - downloaded the sources and binaries and checked the checksums
 - built Flink from the downloaded sources
 - executed example jobs with standalone deployments - I didn't find
 anything suspicious in the logs
 - reviewed release announcement pull request
 
 - I did a pass over dependency updates: git diff release-1.12.2
 release-1.13.0-rc2 */*.xml
 There's one thing someone should double-check whether that's suppose to
>> be
 like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
 dependency but I don't see it being reflected in the NOTICE file of the
 flink-python module. Or is this automatically added later on?
 
 +1 (non-binding; please see remark on dependency above)
 
 Matthias
 
 On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen  wrote:
 
> Glad to hear that outcome. And no worries about the false alarm.
> Thank you for doing thorough testing, this is very helpful!
> 
> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
 wrote:
>> After the investigation we found that this issue is caused by the
>> implementation of connector, not by the Flink framework.
>> 
>> Sorry for the false alarm.
>> 
>> Stephan Ewen  于2021年4月28日周三 下午3:23写道:
>> 
>>> @Caizhi and @Becket - let me reach out to you to jointly debug this
>> issue.
>>> I am wondering if there is some incorrect reporting of failed events?
>>> 
>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
>> wrote:
 -1
 
 We're testing this version on batch jobs with large (600~1000)
>>> parallelisms
 and the following exception messages appear with high frequency:
 
 2021-04-27 21:27:26
 org.apache.flink.util.FlinkException: An OperatorEvent from an
 OperatorCoordinator to a task was lost. Triggering task failover to
>>> ensure
 consistency. Event: '[NoMoreSplitEvent]', targetTask:  -
 execution #0
 at
 
 
 
>> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
 at
 
 
 
>> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
 at
 
 
 
>> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
 at
 
 
 
>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at
 
 
 
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
 at
 
 
 
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
 at
 
 
 
>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
 at
 
 
 
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
 at akka.japi.pf.Un

Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Xingbo Huang
+1 (non-binding)

- verified checksum and signature
- test upload `apache-flink` and `apache-flink-libraries` to test.pypi
- pip install `apache-flink-libraries` and `apache-flink` in mac os
- started cluster and run row-based operation test
- started cluster and test python general group window agg

Best,
Xingbo

Dian Fu  于2021年4月29日周四 下午4:05写道:

> +1 (binding)
>
> - Verified the signature and checksum
> - Installed PyFlink successfully using the source package
> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
> API with state access, Python DataStream API with batch execution mode
> - Reviewed the website PR
>
> Regards,
> Dian
>
> > 2021年4月29日 下午3:11,Jark Wu  写道:
> >
> > +1 (binding)
> >
> > - checked/verified signatures and hashes
> > - started cluster and run some e2e sql queries using SQL Client, results
> > are as expect:
> > * read from kafka source, window aggregate, lookup mysql database, write
> > into elasticsearch
> > * window aggregate using legacy window syntax and new window TVF
> > * verified web ui and log output
> > - reviewed the release PR
> >
> > I found the log contains some verbose information when using window
> > aggregate,
> > but I think this doesn't block the release, I created FLINK-22522 to fix
> > it.
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz 
> > wrote:
> >
> >> Hey Matthias,
> >>
> >> I'd like to double confirm what Guowei said. The dependency is Apache 2
> >> licensed and we do not bundle it in our jar (as it is in the runtime
> >> scope) thus we do not need to mention it in the NOTICE file (btw, the
> >> best way to check what is bundled is to check the output of maven shade
> >> plugin). Thanks for checking it!
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 29/04/2021 05:25, Guowei Ma wrote:
> >>> Hi, Matthias
> >>>
> >>> Thank you very much for your careful inspection.
> >>> I check the flink-python_2.11-1.13.0.jar and we do not bundle
> >>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> >>> So I think we may not need to add this to the NOTICE file. (BTW The
> jar's
> >>> scope is runtime)
> >>>
> >>> Best,
> >>> Guowei
> >>>
> >>>
> >>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
> >>> wrote:
> >>>
>  Thanks Dawid and Guowei for managing this release.
> 
>  - downloaded the sources and binaries and checked the checksums
>  - built Flink from the downloaded sources
>  - executed example jobs with standalone deployments - I didn't find
>  anything suspicious in the logs
>  - reviewed release announcement pull request
> 
>  - I did a pass over dependency updates: git diff release-1.12.2
>  release-1.13.0-rc2 */*.xml
>  There's one thing someone should double-check whether that's suppose
> to
> >> be
>  like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
>  dependency but I don't see it being reflected in the NOTICE file of
> the
>  flink-python module. Or is this automatically added later on?
> 
>  +1 (non-binding; please see remark on dependency above)
> 
>  Matthias
> 
>  On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen 
> wrote:
> 
> > Glad to hear that outcome. And no worries about the false alarm.
> > Thank you for doing thorough testing, this is very helpful!
> >
> > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
>  wrote:
> >> After the investigation we found that this issue is caused by the
> >> implementation of connector, not by the Flink framework.
> >>
> >> Sorry for the false alarm.
> >>
> >> Stephan Ewen  于2021年4月28日周三 下午3:23写道:
> >>
> >>> @Caizhi and @Becket - let me reach out to you to jointly debug this
> >> issue.
> >>> I am wondering if there is some incorrect reporting of failed
> events?
> >>>
> >>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
> >> wrote:
>  -1
> 
>  We're testing this version on batch jobs with large (600~1000)
> >>> parallelisms
>  and the following exception messages appear with high frequency:
> 
>  2021-04-27 21:27:26
>  org.apache.flink.util.FlinkException: An OperatorEvent from an
>  OperatorCoordinator to a task was lost. Triggering task failover
> to
> >>> ensure
>  consistency. Event: '[NoMoreSplitEvent]', targetTask: 
> -
>  execution #0
>  at
> 
> 
> 
> >>
> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
>  at
> 
> 
> 
> >>
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at
> 
> 
> 
> >>
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at
> 
> 
> 
> >>
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at
> 

Re: [DISCUSS] Using timeouts in JUnit tests

2021-04-29 Thread Till Rohrmann
I think for the maven tests we use this log4j.properties file [1].

[1] https://github.com/apache/flink/blob/master/tools/ci/log4j.properties

Cheers,
Till

On Wed, Apr 28, 2021 at 4:47 AM Dong Lin  wrote:

> Thanks for the detailed explanations! Regarding the usage of timeout, now I
> agree that it is better to remove per-test timeouts because it helps
> make our testing results more reliable and consistent.
>
> My previous concern is that it might not be a good idea to intentionally
> let the test hang in AZP in order to get the thread dump. Now I get that
> there are a few practical concerns around the usage of timeout which makes
> testing results unreliable (e.g. flakiness in the presence of VM
> migration).
>
> Regarding the level logging on AZP, it appears that we actually set
> "rootLogger.level = OFF" in most log4j2-test.properties, which means that
> no INFO log would be printed on AZP. For example, I tried to increase the
> log level in this  PR and was
> suggested in this
> <
> https://issues.apache.org/jira/browse/FLINK-22085?focusedCommentId=17321055&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17321055
> >
> comment to avoid increasing the log level. Did I miss something here?
>
>
> On Wed, Apr 28, 2021 at 2:22 AM Arvid Heise  wrote:
>
> > Just to add to Dong Lin's list of cons of allowing timeout:
> > - Any timeout value that you manually set is arbitrary. If it's set too
> > low, you get test instabilities. What too low means depends on numerous
> > factors, such as hardware and current utilization (especially I/O). If
> you
> > run in VMs and the VM is migrated while running a test, any reasonable
> > timeout will probably fail. While you could make a similar case for the
> > overall timeout of tests, any smaller hiccup in the range of minutes will
> > not impact the overall runtime much. The probability of having a VM
> > constantly migrating during the same stage is abysmally low.
> > - A timeout is more maintenance-intensive. It's one more knob where you
> can
> > tweak a build or not. If you change the test a bit, you also need to
> > double-check the timeout. Hence, there have been quite a few commits that
> > just increase timeouts.
> > - Whether a test uses a timeout or not is arbitrary: Why do some ITs
> have a
> > timeout and others don't? All IT tests are prone to timeout if there are
> > issues with resource allocation. Similarly, there are quite a few unit
> > tests with timeouts while others don't have them with no obvious pattern.
> > - An ill-set timeout reduces build reproducibility. Imagine having a
> > release with such a timeout and the users cannot build Flink reliably.
> >
> > I'd like to also point out that we should not cater around unstable tests
> > if our overall goal is to have as many green builds as possible. If we
> > assume that our builds fail more often than not, we should also look into
> > the other direction and continue the builds on error. I'm not a big fan
> of
> > that.
> >
> > One argument that I also heard is that it eases local debugging in case
> of
> > refactorings as you can see multiple failures at the same time. But no
> one
> > is keeping you from temporarily adding a timeout on your branch. Then, we
> > can be sure that the timeout is plausible for your hardware and avoid all
> > above mentioned drawbacks.
> >
> > @Robert Metzger 
> >
> > > If we had a global limit of 1 minute per test, we would have caught
> this
> > > case (and we would encourage people to be careful with CI time).
> > >
> > There are quite a few tests that run longer, especially on a well
> utilized
> > build machine. A global limit is even worse than individual limits as
> there
> > is no value that fits it all. If you screwed up and 200 tests hang, you'd
> > also run into the global timeout anyway. I'm also not sure what these
> > additional hangs bring you except a huge log.
> >
> > I'm also not sure if it's really better in terms of CI time. For example,
> > for UnalignedCheckpointRescaleITCase, we test all known partitioners in
> one
> > pipeline for correctness. For higher parallelism, that means the test
> runs
> > over 1 minute regularly. If there is a global limit, I'd need to split
> the
> > test into smaller chunks, where I'm positive that the sum of the chunks
> > will be larger than before.
> >
> > PS: all tests on AZP will print INFO in the artifacts. There you can also
> > retrieve the stacktraces.
> > PPS: I also said that we should revalidate the current timeout on AZP. So
> > the argument that we have >2h of precious CI time wasted is kind of
> > constructed and is just due to some random defaults.
> >
> > On Tue, Apr 27, 2021 at 6:42 PM Till Rohrmann 
> > wrote:
> >
> > > I think we do capture the INFO logs of the test runs on AZP.
> > >
> > > I am also not sure whether we really caught slow tests with Junit's
> > timeout
> > > rule before. I think the default is usually to

Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Leonard Xu
+1 (non-binding)

- verified signatures and hashes
- built from source code with scala 2.11 succeeded
- started a cluster, WebUI was accessible, ran some simple SQL jobs, no 
suspicious log output
- tested time functions and time zone usage in SQL Client, the query result is 
as expected
- the web PR looks good
- found one minor exception message typo, will improve it later

Best,
Leonard Xu

> 在 2021年4月29日,16:11,Xingbo Huang  写道:
> 
> +1 (non-binding)
> 
> - verified checksum and signature
> - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> - pip install `apache-flink-libraries` and `apache-flink` in mac os
> - started cluster and run row-based operation test
> - started cluster and test python general group window agg
> 
> Best,
> Xingbo
> 
> Dian Fu  于2021年4月29日周四 下午4:05写道:
> 
>> +1 (binding)
>> 
>> - Verified the signature and checksum
>> - Installed PyFlink successfully using the source package
>> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
>> API with state access, Python DataStream API with batch execution mode
>> - Reviewed the website PR
>> 
>> Regards,
>> Dian
>> 
>>> 2021年4月29日 下午3:11,Jark Wu  写道:
>>> 
>>> +1 (binding)
>>> 
>>> - checked/verified signatures and hashes
>>> - started cluster and run some e2e sql queries using SQL Client, results
>>> are as expect:
>>> * read from kafka source, window aggregate, lookup mysql database, write
>>> into elasticsearch
>>> * window aggregate using legacy window syntax and new window TVF
>>> * verified web ui and log output
>>> - reviewed the release PR
>>> 
>>> I found the log contains some verbose information when using window
>>> aggregate,
>>> but I think this doesn't block the release, I created FLINK-22522 to fix
>>> it.
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz 
>>> wrote:
>>> 
 Hey Matthias,
 
 I'd like to double confirm what Guowei said. The dependency is Apache 2
 licensed and we do not bundle it in our jar (as it is in the runtime
 scope) thus we do not need to mention it in the NOTICE file (btw, the
 best way to check what is bundled is to check the output of maven shade
 plugin). Thanks for checking it!
 
 Best,
 
 Dawid
 
 On 29/04/2021 05:25, Guowei Ma wrote:
> Hi, Matthias
> 
> Thank you very much for your careful inspection.
> I check the flink-python_2.11-1.13.0.jar and we do not bundle
> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> So I think we may not need to add this to the NOTICE file. (BTW The
>> jar's
> scope is runtime)
> 
> Best,
> Guowei
> 
> 
> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
> wrote:
> 
>> Thanks Dawid and Guowei for managing this release.
>> 
>> - downloaded the sources and binaries and checked the checksums
>> - built Flink from the downloaded sources
>> - executed example jobs with standalone deployments - I didn't find
>> anything suspicious in the logs
>> - reviewed release announcement pull request
>> 
>> - I did a pass over dependency updates: git diff release-1.12.2
>> release-1.13.0-rc2 */*.xml
>> There's one thing someone should double-check whether that's suppose
>> to
 be
>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
>> dependency but I don't see it being reflected in the NOTICE file of
>> the
>> flink-python module. Or is this automatically added later on?
>> 
>> +1 (non-binding; please see remark on dependency above)
>> 
>> Matthias
>> 
>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen 
>> wrote:
>> 
>>> Glad to hear that outcome. And no worries about the false alarm.
>>> Thank you for doing thorough testing, this is very helpful!
>>> 
>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
>> wrote:
 After the investigation we found that this issue is caused by the
 implementation of connector, not by the Flink framework.
 
 Sorry for the false alarm.
 
 Stephan Ewen  于2021年4月28日周三 下午3:23写道:
 
> @Caizhi and @Becket - let me reach out to you to jointly debug this
 issue.
> I am wondering if there is some incorrect reporting of failed
>> events?
> 
> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
 wrote:
>> -1
>> 
>> We're testing this version on batch jobs with large (600~1000)
> parallelisms
>> and the following exception messages appear with high frequency:
>> 
>> 2021-04-27 21:27:26
>> org.apache.flink.util.FlinkException: An OperatorEvent from an
>> OperatorCoordinator to a task was lost. Triggering task failover
>> to
> ensure
>> consistency. Event: '[NoMoreSplitEvent]', targetTask: 
>> -
>> execution #0
>> at
>> 
>> 
>> 
 
>> org.apach

Re: Does flink support multi cluster deployment with HA on multi k8s clusters

2021-04-29 Thread Till Rohrmann
Hi Bhagi,

out of the box, Flink does not support this functionality. What you could
try to do is to configure a stretch K8s cluster running on different DCs
and then having an HA service (ZK or K8s) and blob storage being able to
survive a DC outage. That way Flink should also be able to survive a DC
outage.

Cheers,
Till

On Wed, Apr 28, 2021 at 3:26 PM bhagi@R  wrote:

> Hi Team,
>
> production requirement is to deploy flink in multi cluster mode,
> i.e deploying flink cluster1 with HA on kubernetes cluster1 in data center1
> & another flink cluster2  with HA  on kubernetes cluster2 in data center2
> ..
> if Flink cluster1 goes down on k8s cluster1 on DC1 ,it has to fail over to
> Flink cluster2 on k8s cluster2 on DC2.
>
> It has to failover with automatic HA mechanism.
> Please let us know, whether is this possible ??
> or any solution is provided to have flink in multi cluster mode with HA ..
>
>
> Any solution please share the information
>
> Note: Currently deployed Flink session cluster on standalone kubernetes
> with
> HA(Kubernetes HA)
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>


[jira] [Created] (FLINK-22525) The zone id in exception message should be GMT+08:00 instead of GMT+8:00

2021-04-29 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-22525:
--

 Summary: The zone id in exception message should be GMT+08:00 
instead of GMT+8:00
 Key: FLINK-22525
 URL: https://issues.apache.org/jira/browse/FLINK-22525
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Affects Versions: 1.13.0
Reporter: Leonard Xu
 Fix For: 1.14.0, 1.13.1


{code:java}
Flink SQL> SET table.local-time-zone=UTC+3;
Flink SQL> select current_row_timestamp();
[ERROR] Could not execute SQL statement. Reason:
java.lang.IllegalArgumentException: The supported Zone ID is either a full name 
such as 'America/Los_Angeles', or a custom timezone id such as 'GMT-8:00', but 
configured Zone ID is 'UTC+3'.
{code}
The valid zoned should  be 'GMT-08:00'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Zhou, Brian
+1 (non-binding)

- Build the Pravega Flink connector with RC artifacts and all tests pass
- Start a cluster, Run Pravega reader and writer application on it successfully 

Thanks,
Brian

-Original Message-
From: Leonard Xu  
Sent: Thursday, April 29, 2021 16:53
To: dev
Subject: Re: [VOTE] Release 1.13.0, release candidate #2


[EXTERNAL EMAIL] 

+1 (non-binding)

- verified signatures and hashes
- built from source code with scala 2.11 succeeded
- started a cluster, WebUI was accessible, ran some simple SQL jobs, no 
suspicious log output
- tested time functions and time zone usage in SQL Client, the query result is 
as expected
- the web PR looks good
- found one minor exception message typo, will improve it later

Best,
Leonard Xu

> 在 2021年4月29日,16:11,Xingbo Huang  写道:
> 
> +1 (non-binding)
> 
> - verified checksum and signature
> - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> - pip install `apache-flink-libraries` and `apache-flink` in mac os
> - started cluster and run row-based operation test
> - started cluster and test python general group window agg
> 
> Best,
> Xingbo
> 
> Dian Fu  于2021年4月29日周四 下午4:05写道:
> 
>> +1 (binding)
>> 
>> - Verified the signature and checksum
>> - Installed PyFlink successfully using the source package
>> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python 
>> DataStream API with state access, Python DataStream API with batch 
>> execution mode
>> - Reviewed the website PR
>> 
>> Regards,
>> Dian
>> 
>>> 2021年4月29日 下午3:11,Jark Wu  写道:
>>> 
>>> +1 (binding)
>>> 
>>> - checked/verified signatures and hashes
>>> - started cluster and run some e2e sql queries using SQL Client, 
>>> results are as expect:
>>> * read from kafka source, window aggregate, lookup mysql database, 
>>> write into elasticsearch
>>> * window aggregate using legacy window syntax and new window TVF
>>> * verified web ui and log output
>>> - reviewed the release PR
>>> 
>>> I found the log contains some verbose information when using window 
>>> aggregate, but I think this doesn't block the release, I created 
>>> FLINK-22522 to fix it.
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz 
>>> 
>>> wrote:
>>> 
 Hey Matthias,
 
 I'd like to double confirm what Guowei said. The dependency is 
 Apache 2 licensed and we do not bundle it in our jar (as it is in 
 the runtime
 scope) thus we do not need to mention it in the NOTICE file (btw, 
 the best way to check what is bundled is to check the output of 
 maven shade plugin). Thanks for checking it!
 
 Best,
 
 Dawid
 
 On 29/04/2021 05:25, Guowei Ma wrote:
> Hi, Matthias
> 
> Thank you very much for your careful inspection.
> I check the flink-python_2.11-1.13.0.jar and we do not bundle
> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> So I think we may not need to add this to the NOTICE file. (BTW 
> The
>> jar's
> scope is runtime)
> 
> Best,
> Guowei
> 
> 
> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
> 
> wrote:
> 
>> Thanks Dawid and Guowei for managing this release.
>> 
>> - downloaded the sources and binaries and checked the checksums
>> - built Flink from the downloaded sources
>> - executed example jobs with standalone deployments - I didn't 
>> find anything suspicious in the logs
>> - reviewed release announcement pull request
>> 
>> - I did a pass over dependency updates: git diff release-1.12.2
>> release-1.13.0-rc2 */*.xml
>> There's one thing someone should double-check whether that's 
>> suppose
>> to
 be
>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as 
>> a dependency but I don't see it being reflected in the NOTICE 
>> file of
>> the
>> flink-python module. Or is this automatically added later on?
>> 
>> +1 (non-binding; please see remark on dependency above)
>> 
>> Matthias
>> 
>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen 
>> wrote:
>> 
>>> Glad to hear that outcome. And no worries about the false alarm.
>>> Thank you for doing thorough testing, this is very helpful!
>>> 
>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
>>> 
>> wrote:
 After the investigation we found that this issue is caused by 
 the implementation of connector, not by the Flink framework.
 
 Sorry for the false alarm.
 
 Stephan Ewen  于2021年4月28日周三 下午3:23写道:
 
> @Caizhi and @Becket - let me reach out to you to jointly debug 
> this
 issue.
> I am wondering if there is some incorrect reporting of failed
>> events?
> 
> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
> 
 wrote:
>> -1
>> 
>> We're testing this version on batch jobs with large 
>> (600~1000)
> parallelisms

[jira] [Created] (FLINK-22526) Rename TopicId to topic in Kafka related code

2021-04-29 Thread dengziming (Jira)
dengziming created FLINK-22526:
--

 Summary: Rename TopicId to topic in Kafka related code
 Key: FLINK-22526
 URL: https://issues.apache.org/jira/browse/FLINK-22526
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: dengziming


TopicId is a new concept introduced in Kafka 2.8.0, we can use it to 
produce/consume in future Kafka version, so we should avoid use topicId when 
referring to a topic.

for more details: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516:+Topic+Identifiers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Does flink support multi cluster deployment with HA on multi k8s clusters

2021-04-29 Thread bhagi@R
Can you explain clearly...

How to configure HA service with k8s for different DC's for Flink cluster ..



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Yun Tang
+1 (non-binding)

- built from source code with scala 2.11 succeeded
- submit state machine example and it runed well with expected commit id shown 
in UI.
- enable state latency tracking with slf4j metrics reporter and all behaves as 
expected.
- Click 'FlameGraph' but found the we UI did not give friendly hint to tell me 
enable it via setting rest.flamegraph.enabled: true, will create issue later.

Best
Yun Tang

From: Leonard Xu 
Sent: Thursday, April 29, 2021 16:52
To: dev 
Subject: Re: [VOTE] Release 1.13.0, release candidate #2

+1 (non-binding)

- verified signatures and hashes
- built from source code with scala 2.11 succeeded
- started a cluster, WebUI was accessible, ran some simple SQL jobs, no 
suspicious log output
- tested time functions and time zone usage in SQL Client, the query result is 
as expected
- the web PR looks good
- found one minor exception message typo, will improve it later

Best,
Leonard Xu

> 在 2021年4月29日,16:11,Xingbo Huang  写道:
>
> +1 (non-binding)
>
> - verified checksum and signature
> - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> - pip install `apache-flink-libraries` and `apache-flink` in mac os
> - started cluster and run row-based operation test
> - started cluster and test python general group window agg
>
> Best,
> Xingbo
>
> Dian Fu  于2021年4月29日周四 下午4:05写道:
>
>> +1 (binding)
>>
>> - Verified the signature and checksum
>> - Installed PyFlink successfully using the source package
>> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
>> API with state access, Python DataStream API with batch execution mode
>> - Reviewed the website PR
>>
>> Regards,
>> Dian
>>
>>> 2021年4月29日 下午3:11,Jark Wu  写道:
>>>
>>> +1 (binding)
>>>
>>> - checked/verified signatures and hashes
>>> - started cluster and run some e2e sql queries using SQL Client, results
>>> are as expect:
>>> * read from kafka source, window aggregate, lookup mysql database, write
>>> into elasticsearch
>>> * window aggregate using legacy window syntax and new window TVF
>>> * verified web ui and log output
>>> - reviewed the release PR
>>>
>>> I found the log contains some verbose information when using window
>>> aggregate,
>>> but I think this doesn't block the release, I created FLINK-22522 to fix
>>> it.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz 
>>> wrote:
>>>
 Hey Matthias,

 I'd like to double confirm what Guowei said. The dependency is Apache 2
 licensed and we do not bundle it in our jar (as it is in the runtime
 scope) thus we do not need to mention it in the NOTICE file (btw, the
 best way to check what is bundled is to check the output of maven shade
 plugin). Thanks for checking it!

 Best,

 Dawid

 On 29/04/2021 05:25, Guowei Ma wrote:
> Hi, Matthias
>
> Thank you very much for your careful inspection.
> I check the flink-python_2.11-1.13.0.jar and we do not bundle
> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> So I think we may not need to add this to the NOTICE file. (BTW The
>> jar's
> scope is runtime)
>
> Best,
> Guowei
>
>
> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
> wrote:
>
>> Thanks Dawid and Guowei for managing this release.
>>
>> - downloaded the sources and binaries and checked the checksums
>> - built Flink from the downloaded sources
>> - executed example jobs with standalone deployments - I didn't find
>> anything suspicious in the logs
>> - reviewed release announcement pull request
>>
>> - I did a pass over dependency updates: git diff release-1.12.2
>> release-1.13.0-rc2 */*.xml
>> There's one thing someone should double-check whether that's suppose
>> to
 be
>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
>> dependency but I don't see it being reflected in the NOTICE file of
>> the
>> flink-python module. Or is this automatically added later on?
>>
>> +1 (non-binding; please see remark on dependency above)
>>
>> Matthias
>>
>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen 
>> wrote:
>>
>>> Glad to hear that outcome. And no worries about the false alarm.
>>> Thank you for doing thorough testing, this is very helpful!
>>>
>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
>> wrote:
 After the investigation we found that this issue is caused by the
 implementation of connector, not by the Flink framework.

 Sorry for the false alarm.

 Stephan Ewen  于2021年4月28日周三 下午3:23写道:

> @Caizhi and @Becket - let me reach out to you to jointly debug this
 issue.
> I am wondering if there is some incorrect reporting of failed
>> events?
>
> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
 wrote:
>> -1
>>
>>

[jira] [Created] (FLINK-22527) Give friendly hint when click FlameGraph without rest.flamegraph.enabled

2021-04-29 Thread Yun Tang (Jira)
Yun Tang created FLINK-22527:


 Summary: Give friendly hint when click FlameGraph without 
rest.flamegraph.enabled
 Key: FLINK-22527
 URL: https://issues.apache.org/jira/browse/FLINK-22527
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.13.0
Reporter: Yun Tang
 Fix For: 1.13.1


Currently, if {{rest.flamegraph.enabled}} is not enabled, the webUI would just 
tell user that "Unable to load requested file ", however, it should give 
more friendly hint that to enable {{rest.flamegraph.enabled}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22528) Document latency tracking metrics for state accesses

2021-04-29 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-22528:
-

 Summary: Document latency tracking metrics for state accesses
 Key: FLINK-22528
 URL: https://issues.apache.org/jira/browse/FLINK-22528
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Runtime / State Backends
Affects Versions: 1.13.0
Reporter: Till Rohrmann
 Fix For: 1.13.0


With FLINK-21736 we introduced latency tracking metrics for state accesses. We 
should also document how the user can use them and what they say and what their 
limitations are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Does flink support multi cluster deployment with HA on multi k8s clusters

2021-04-29 Thread Enrique
Hi Till and Bhagi,

As part of our product, we also have a production requirement to deploy
Flink in multi zones and currently have it configured with Kubernetes HA.

Till, is the reason this is not supported out of the box also due to the
fact Kubernetes HA relies on RWX storage and this is considered an
anti-pattern in the cloud native space? I've seen in Cloud Native
deployments that the recommended way is to replicate the state through
software instead of replicating the state through a shared
filesystem/storage which is a common pattern in data-centers. The cloud
native approach results in each instance of Flink across nodes owning their
own RWO filesystem and communicating through the network to other instance
across nodes to replicate state. I was just wondering if this had been
considered in the move to a more cloud-native Flink deployment?

Cheers,

Enrique





--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


[jira] [Created] (FLINK-22529) StateFun Kinesis ingresses should support configs that are available via FlinkKinesisConsumer's ConsumerConfigConstants

2021-04-29 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-22529:
---

 Summary: StateFun Kinesis ingresses should support configs that 
are available via FlinkKinesisConsumer's ConsumerConfigConstants
 Key: FLINK-22529
 URL: https://issues.apache.org/jira/browse/FLINK-22529
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai


The Kinesis ingress should support the configs that are available in 
{{FlinkKinesisConsumer}}'s {{ConsumerConfigConstants}}. Instead, currently, all 
property keys provided to the Kinesis ingress are assumed to be AWS-client 
related keys, and therefore have all been appended with the `aws.clientconfigs` 
string.

I'd suggest to avoid mixing the {{ConsumerConfigConstants}} configs within the 
properties as well. Having named methods on the {{KinesisIngressBuilder}} for 
those configuration would provide a cleaner solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22530) RuntimeException after subsequent windowed grouping in TableAPI

2021-04-29 Thread Christopher Rost (Jira)
Christopher Rost created FLINK-22530:


 Summary: RuntimeException after subsequent windowed grouping in 
TableAPI
 Key: FLINK-22530
 URL: https://issues.apache.org/jira/browse/FLINK-22530
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Ecosystem, Table SQL / Runtime
Affects Versions: 1.12.0
Reporter: Christopher Rost


After applying the following using the TableAPI v 1.12.0, an error is thrown:

 
{code:java}
java.lang.RuntimeException: Error while applying rule 
StreamExecGroupWindowAggregateRule(in:LOGICAL,out:STREAM_PHYSICAL), args 
[rel#505:FlinkLogicalWindowAggregate.LOGICAL.any.None: 
0.[NONE].[NONE](input=RelSubset#504,group={1},window=TumblingGroupWindow('w2, 
w1_rowtime, 1),properties=EXPR$1)]{code}
The code snippet to reproduce:

 
{code:java}
Table table2 = table1
  .window(Tumble.over(lit(10).seconds()).on($(EVENT_TIME)).as("w1"))
  .groupBy($(ID), $(LABEL), $("w1"))
  .select($(ID), $(LABEL), $("w1").rowtime().as("w1_rowtime"));

// table2.execute().print(); --> work well

Table table3 = table2
  .window(Tumble.over(lit(10).seconds()).on($("w1_rowtime")).as("w2"))
  .groupBy($(LABEL), $("w2"))
  .select(
$(LABEL).as("super_label"),
lit(1).count().as("super_count"),
$("w2").rowtime().as("w2_rowtime")
  );

// table3.execute().print(); //--> work well

   table3.select($("super_label"), $("w2_rowtime"))
  .execute().print(); // --> throws exception

{code}
{{It seems that the alias }}{{"w1_rowtime"}}{{ is no longer available for 
further usages of table3, since the cause of the exception is: }}{{}}
{noformat}
Caused by: java.lang.IllegalArgumentException: field [w1_rowtime] not found; 
input fields are: [vertex_id, vertex_label, EXPR$0
{noformat}
{{}}

{{}}{{The full exception:}}
{code:java}
java.lang.RuntimeException: Error while applying rule 
StreamExecGroupWindowAggregateRule(in:LOGICAL,out:STREAM_PHYSICAL), args 
[rel#197:FlinkLogicalWindowAggregate.LOGICAL.any.None: 
0.[NONE].[NONE](input=RelSubset#196,group={1},window=TumblingGroupWindow('w2, 
w1_rowtime, 1),properties=EXPR$1)]  at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:256)
at 
org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:312)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:64)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:62)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:58)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.optimize(FlinkChainedProgram.scala:57)
at 
org.apache.flink.table.planner.plan.optimize.StreamCommonSubGraphBasedOptimizer.optimizeTree(StreamCommonSubGraphBasedOptimizer.scala:163)
at 
org.apache.flink.table.planner.plan.optimize.StreamCommonSubGraphBasedOptimizer.doOptimize(StreamCommonSubGraphBasedOptimizer.scala:79)
at 
org.apache.flink.table.planner.plan.optimize.CommonSubGraphBasedOptimizer.optimize(CommonSubGraphBasedOptimizer.scala:77)
at 
org.apache.flink.table.planner.delegation.PlannerBase.optimize(PlannerBase.scala:286)
at 
org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:165)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1267)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:703)
at 
org.apache.flink.table.api.internal.TableImpl.execute(TableImpl.java:570)
at 
edu.leipzig.impl.algorithm.GraphStreamGroupingTest.testDoubleGrouping(GraphStreamGroupingTest.java:224)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.Nat

[jira] [Created] (FLINK-22531) Improve the support for finite streaming jobs with async operations.

2021-04-29 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-22531:


 Summary: Improve the support for finite streaming jobs with async 
operations.
 Key: FLINK-22531
 URL: https://issues.apache.org/jira/browse/FLINK-22531
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Affects Versions: statefun-3.0.0
Reporter: Igal Shilman


Finite streaming jobs will terminate even in the presence of asynchronous in 
flight operations.

Looking at the AsyncWait operator, it seems that it can be mitigated by using 
the following interface:

[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/BoundedOneInput.java#L27]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Robert Metzger
Thanks for creating the RC and managing the release process so far Guowei
and Dawid!

+1 (binding)

Checks:
- I deployed the RC on AWS EMR (session cluster and per-job cluster). I
confirmed a minor issue Arvid Heise told me offline about:
https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
this issue.
- I tested reactive mode extensively on Kubernetes, letting it scale up and
down for a very long time (multiple weeks)
- Checked the changes to the pom files: all dependency changes seem to be
reflected properly in the NOTICE files
  - netty bump to 4.1.46
  - Elasticsearch to 1.15.1
  - hbase dependency changes
  - the new aws glue schema doesn't deploy anything foreign to maven
  -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
fine
- the license checker has not been changed in this release cycle (there are
two exclusion lists in there)





On Thu, Apr 29, 2021 at 12:05 PM Yun Tang  wrote:

> +1 (non-binding)
>
> - built from source code with scala 2.11 succeeded
> - submit state machine example and it runed well with expected commit id
> shown in UI.
> - enable state latency tracking with slf4j metrics reporter and all
> behaves as expected.
> - Click 'FlameGraph' but found the we UI did not give friendly hint to
> tell me enable it via setting rest.flamegraph.enabled: true, will create
> issue later.
>
> Best
> Yun Tang
> 
> From: Leonard Xu 
> Sent: Thursday, April 29, 2021 16:52
> To: dev 
> Subject: Re: [VOTE] Release 1.13.0, release candidate #2
>
> +1 (non-binding)
>
> - verified signatures and hashes
> - built from source code with scala 2.11 succeeded
> - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
> suspicious log output
> - tested time functions and time zone usage in SQL Client, the query
> result is as expected
> - the web PR looks good
> - found one minor exception message typo, will improve it later
>
> Best,
> Leonard Xu
>
> > 在 2021年4月29日,16:11,Xingbo Huang  写道:
> >
> > +1 (non-binding)
> >
> > - verified checksum and signature
> > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> > - pip install `apache-flink-libraries` and `apache-flink` in mac os
> > - started cluster and run row-based operation test
> > - started cluster and test python general group window agg
> >
> > Best,
> > Xingbo
> >
> > Dian Fu  于2021年4月29日周四 下午4:05写道:
> >
> >> +1 (binding)
> >>
> >> - Verified the signature and checksum
> >> - Installed PyFlink successfully using the source package
> >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
> >> API with state access, Python DataStream API with batch execution mode
> >> - Reviewed the website PR
> >>
> >> Regards,
> >> Dian
> >>
> >>> 2021年4月29日 下午3:11,Jark Wu  写道:
> >>>
> >>> +1 (binding)
> >>>
> >>> - checked/verified signatures and hashes
> >>> - started cluster and run some e2e sql queries using SQL Client,
> results
> >>> are as expect:
> >>> * read from kafka source, window aggregate, lookup mysql database,
> write
> >>> into elasticsearch
> >>> * window aggregate using legacy window syntax and new window TVF
> >>> * verified web ui and log output
> >>> - reviewed the release PR
> >>>
> >>> I found the log contains some verbose information when using window
> >>> aggregate,
> >>> but I think this doesn't block the release, I created FLINK-22522 to
> fix
> >>> it.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz  >
> >>> wrote:
> >>>
>  Hey Matthias,
> 
>  I'd like to double confirm what Guowei said. The dependency is Apache
> 2
>  licensed and we do not bundle it in our jar (as it is in the runtime
>  scope) thus we do not need to mention it in the NOTICE file (btw, the
>  best way to check what is bundled is to check the output of maven
> shade
>  plugin). Thanks for checking it!
> 
>  Best,
> 
>  Dawid
> 
>  On 29/04/2021 05:25, Guowei Ma wrote:
> > Hi, Matthias
> >
> > Thank you very much for your careful inspection.
> > I check the flink-python_2.11-1.13.0.jar and we do not bundle
> > org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> > So I think we may not need to add this to the NOTICE file. (BTW The
> >> jar's
> > scope is runtime)
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <
> matth...@ververica.com>
> > wrote:
> >
> >> Thanks Dawid and Guowei for managing this release.
> >>
> >> - downloaded the sources and binaries and checked the checksums
> >> - built Flink from the downloaded sources
> >> - executed example jobs with standalone deployments - I didn't find
> >> anything suspicious in the logs
> >> - reviewed release announcement pull request
> >>
> >> - I did a pass over dependency updates: git diff release-1.12.2
> >> release-1.13.0-rc2 */*.xml
> >> There's one th

[jira] [Created] (FLINK-22532) Improve the support for remote functions in the DataStream integration

2021-04-29 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-22532:


 Summary: Improve the support for remote functions in the 
DataStream integration 
 Key: FLINK-22532
 URL: https://issues.apache.org/jira/browse/FLINK-22532
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Affects Versions: statefun-3.0.0
Reporter: Igal Shilman


While looking at 
[RoutableMessabeBuilder.java|https://github.com/apache/flink-statefun/blob/master/statefun-flink/statefun-flink-core/src/main/java/org/apache/flink/statefun/flink/core/message/RoutableMessageBuilder.java#L57]
 it is not that clear that the argument for a remote function needs to be of 
type TypedValue. 

We need to think of how to improve the end experience for this use case. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Dawid Wysakowicz
Thank you all for helping to verify the release. Really appreciated! I
will conclude the vote in a separate thread.

Best,

Dawid

On 29/04/2021 14:16, Robert Metzger wrote:
> Thanks for creating the RC and managing the release process so far Guowei
> and Dawid!
>
> +1 (binding)
>
> Checks:
> - I deployed the RC on AWS EMR (session cluster and per-job cluster). I
> confirmed a minor issue Arvid Heise told me offline about:
> https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
> this issue.
> - I tested reactive mode extensively on Kubernetes, letting it scale up and
> down for a very long time (multiple weeks)
> - Checked the changes to the pom files: all dependency changes seem to be
> reflected properly in the NOTICE files
>   - netty bump to 4.1.46
>   - Elasticsearch to 1.15.1
>   - hbase dependency changes
>   - the new aws glue schema doesn't deploy anything foreign to maven
>   -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
> fine
> - the license checker has not been changed in this release cycle (there are
> two exclusion lists in there)
>
>
>
>
>
> On Thu, Apr 29, 2021 at 12:05 PM Yun Tang  wrote:
>
>> +1 (non-binding)
>>
>> - built from source code with scala 2.11 succeeded
>> - submit state machine example and it runed well with expected commit id
>> shown in UI.
>> - enable state latency tracking with slf4j metrics reporter and all
>> behaves as expected.
>> - Click 'FlameGraph' but found the we UI did not give friendly hint to
>> tell me enable it via setting rest.flamegraph.enabled: true, will create
>> issue later.
>>
>> Best
>> Yun Tang
>> 
>> From: Leonard Xu 
>> Sent: Thursday, April 29, 2021 16:52
>> To: dev 
>> Subject: Re: [VOTE] Release 1.13.0, release candidate #2
>>
>> +1 (non-binding)
>>
>> - verified signatures and hashes
>> - built from source code with scala 2.11 succeeded
>> - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
>> suspicious log output
>> - tested time functions and time zone usage in SQL Client, the query
>> result is as expected
>> - the web PR looks good
>> - found one minor exception message typo, will improve it later
>>
>> Best,
>> Leonard Xu
>>
>>> 在 2021年4月29日,16:11,Xingbo Huang  写道:
>>>
>>> +1 (non-binding)
>>>
>>> - verified checksum and signature
>>> - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
>>> - pip install `apache-flink-libraries` and `apache-flink` in mac os
>>> - started cluster and run row-based operation test
>>> - started cluster and test python general group window agg
>>>
>>> Best,
>>> Xingbo
>>>
>>> Dian Fu  于2021年4月29日周四 下午4:05写道:
>>>
 +1 (binding)

 - Verified the signature and checksum
 - Installed PyFlink successfully using the source package
 - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
 API with state access, Python DataStream API with batch execution mode
 - Reviewed the website PR

 Regards,
 Dian

> 2021年4月29日 下午3:11,Jark Wu  写道:
>
> +1 (binding)
>
> - checked/verified signatures and hashes
> - started cluster and run some e2e sql queries using SQL Client,
>> results
> are as expect:
> * read from kafka source, window aggregate, lookup mysql database,
>> write
> into elasticsearch
> * window aggregate using legacy window syntax and new window TVF
> * verified web ui and log output
> - reviewed the release PR
>
> I found the log contains some verbose information when using window
> aggregate,
> but I think this doesn't block the release, I created FLINK-22522 to
>> fix
> it.
>
> Best,
> Jark
>
>
> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz  wrote:
>
>> Hey Matthias,
>>
>> I'd like to double confirm what Guowei said. The dependency is Apache
>> 2
>> licensed and we do not bundle it in our jar (as it is in the runtime
>> scope) thus we do not need to mention it in the NOTICE file (btw, the
>> best way to check what is bundled is to check the output of maven
>> shade
>> plugin). Thanks for checking it!
>>
>> Best,
>>
>> Dawid
>>
>> On 29/04/2021 05:25, Guowei Ma wrote:
>>> Hi, Matthias
>>>
>>> Thank you very much for your careful inspection.
>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle
>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
>>> So I think we may not need to add this to the NOTICE file. (BTW The
 jar's
>>> scope is runtime)
>>>
>>> Best,
>>> Guowei
>>>
>>>
>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <
>> matth...@ververica.com>
>>> wrote:
>>>
 Thanks Dawid and Guowei for managing this release.

 - downloaded the sources and binaries and checked the checksums
 - built Flink from the downloaded sources
 - executed example jobs with standalone deplo

Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread David Anderson
+1 (non-binding)

Checks:
- I built from source, successfully.
- I tested the new backpressure metrics and UI. I found one non-critical
bug that's been around for years, and for which a fix has already been
merged for 1.13.1 (https://issues.apache.org/jira/browse/FLINK-22489

).
- I tested flame graphs.

On Thu, Apr 29, 2021 at 2:17 PM Robert Metzger  wrote:

> Thanks for creating the RC and managing the release process so far Guowei
> and Dawid!
>
> +1 (binding)
>
> Checks:
> - I deployed the RC on AWS EMR (session cluster and per-job cluster). I
> confirmed a minor issue Arvid Heise told me offline about:
> https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
> this issue.
> - I tested reactive mode extensively on Kubernetes, letting it scale up and
> down for a very long time (multiple weeks)
> - Checked the changes to the pom files: all dependency changes seem to be
> reflected properly in the NOTICE files
>   - netty bump to 4.1.46
>   - Elasticsearch to 1.15.1
>   - hbase dependency changes
>   - the new aws glue schema doesn't deploy anything foreign to maven
>   -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
> fine
> - the license checker has not been changed in this release cycle (there are
> two exclusion lists in there)
>
>
>
>
>
> On Thu, Apr 29, 2021 at 12:05 PM Yun Tang  wrote:
>
> > +1 (non-binding)
> >
> > - built from source code with scala 2.11 succeeded
> > - submit state machine example and it runed well with expected commit id
> > shown in UI.
> > - enable state latency tracking with slf4j metrics reporter and all
> > behaves as expected.
> > - Click 'FlameGraph' but found the we UI did not give friendly hint to
> > tell me enable it via setting rest.flamegraph.enabled: true, will create
> > issue later.
> >
> > Best
> > Yun Tang
> > 
> > From: Leonard Xu 
> > Sent: Thursday, April 29, 2021 16:52
> > To: dev 
> > Subject: Re: [VOTE] Release 1.13.0, release candidate #2
> >
> > +1 (non-binding)
> >
> > - verified signatures and hashes
> > - built from source code with scala 2.11 succeeded
> > - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
> > suspicious log output
> > - tested time functions and time zone usage in SQL Client, the query
> > result is as expected
> > - the web PR looks good
> > - found one minor exception message typo, will improve it later
> >
> > Best,
> > Leonard Xu
> >
> > > 在 2021年4月29日,16:11,Xingbo Huang  写道:
> > >
> > > +1 (non-binding)
> > >
> > > - verified checksum and signature
> > > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> > > - pip install `apache-flink-libraries` and `apache-flink` in mac os
> > > - started cluster and run row-based operation test
> > > - started cluster and test python general group window agg
> > >
> > > Best,
> > > Xingbo
> > >
> > > Dian Fu  于2021年4月29日周四 下午4:05写道:
> > >
> > >> +1 (binding)
> > >>
> > >> - Verified the signature and checksum
> > >> - Installed PyFlink successfully using the source package
> > >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python
> DataStream
> > >> API with state access, Python DataStream API with batch execution mode
> > >> - Reviewed the website PR
> > >>
> > >> Regards,
> > >> Dian
> > >>
> > >>> 2021年4月29日 下午3:11,Jark Wu  写道:
> > >>>
> > >>> +1 (binding)
> > >>>
> > >>> - checked/verified signatures and hashes
> > >>> - started cluster and run some e2e sql queries using SQL Client,
> > results
> > >>> are as expect:
> > >>> * read from kafka source, window aggregate, lookup mysql database,
> > write
> > >>> into elasticsearch
> > >>> * window aggregate using legacy window syntax and new window TVF
> > >>> * verified web ui and log output
> > >>> - reviewed the release PR
> > >>>
> > >>> I found the log contains some verbose information when using window
> > >>> aggregate,
> > >>> but I think this doesn't block the release, I created FLINK-22522 to
> > fix
> > >>> it.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>>
> > >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > >>> wrote:
> > >>>
> >  Hey Matthias,
> > 
> >  I'd like to double confirm what Guowei said. The dependency is
> Apache
> > 2
> >  licensed and we do not bundle it in our jar (as it is in the runtime
> >  scope) thus we do not need to mention it in the NOTICE file (btw,
> the
> >  best way to check what is bundled is to check the output of maven
> > shade
> >  plugin). Thanks for checking it!
> > 
> >  Best,
> > 
> >  Dawid
> > 
> >  On 29/04/2021 05:25, Guowei Ma wrote:
> > > Hi, Matthias
> > >
> > > Thank you very much for your careful inspection.
> > > I check the flink-python_2.11-1.13.0.jar and we do not bundle
> > > org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it

[jira] [Created] (FLINK-22533) Allow creating custom metrics

2021-04-29 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-22533:


 Summary: Allow creating custom metrics
 Key: FLINK-22533
 URL: https://issues.apache.org/jira/browse/FLINK-22533
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman


Currently it is not possible to create custom metrics in StateFun.

Let us consider supporting these. 

 

Mailing list thread: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT][VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread Dawid Wysakowicz
I am happy to announce that we have approved the 1.13.0 release. Thank
you all for making it happen!

There were 12 approving votes, 5 of which were binding

  * Piotr Nowojski (binding)
  * Zhu Zhu (binding)
  * Jark Wu (binding)
  * Dian Fu (binding)
  * Robert Metzger (binding)
  * Xintong Song
  * Matthias Pohl
  * Xingbo Huang
  * Leonard Xu
  * Brian Zhou
  * Yun Tang
  * David Anderson

There are no disapproving votes.

Thanks again!



OpenPGP_signature
Description: OpenPGP digital signature


[jira] [Created] (FLINK-22534) Set delegation token's service name as credential alias

2021-04-29 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-22534:


 Summary: Set delegation token's service name as credential alias
 Key: FLINK-22534
 URL: https://issues.apache.org/jira/browse/FLINK-22534
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hadoop Compatibility
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] Apache Flink 1.12.3 released

2021-04-29 Thread Arvid Heise
Dear all,

The Apache Flink community is very happy to announce the release of Apache
Flink 1.12.3, which is the third bugfix release for the Apache Flink 1.12
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:
https://flink.apache.org/news/2021/04/29/release-1.12.3.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349691

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,

Your friendly release manager Arvid


[jira] [Created] (FLINK-22535) Resource leak would happen if exception thrown during AbstractInvokable#restore of task life

2021-04-29 Thread Yun Tang (Jira)
Yun Tang created FLINK-22535:


 Summary: Resource leak would happen if exception thrown during 
AbstractInvokable#restore of task life
 Key: FLINK-22535
 URL: https://issues.apache.org/jira/browse/FLINK-22535
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.13.0
Reporter: Yun Tang
 Fix For: 1.13.1


FLINK-17012 introduced new initialization phase such as 
{{AbstractInvokable.restore}}, however, if 
[invokable.restore()|https://github.com/apache/flink/blob/79a521e08df550d96f97bb6915191d8496bb29ea/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L754-L759]
 throws exception out, no more {{StreamTask#cleanUpInvoke}} would be called, 
leading to resource leak.

We internally leveraged another way to use managed memory by registering 
specific operator identifier in memory manager, forgetting to call the stream 
task cleanup would let stream operator not be disposed and we have to face 
critical resource leak.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Does flink support multi cluster deployment with HA on multi k8s clusters

2021-04-29 Thread Till Rohrmann
The reason why this feature is not part of Flink is that it is not trivial
to solve. There are definitely different solutions to the problem. I think
that a solution will probably be built around Flink and not integrated into
it because it goes a bit beyond what Flink's current focus is. I can tell
you that companies are looking at this problem and want to solve it within
their commercial offerings.

Cheers,
Till

On Thu, Apr 29, 2021 at 12:40 PM Enrique  wrote:

> Hi Till and Bhagi,
>
> As part of our product, we also have a production requirement to deploy
> Flink in multi zones and currently have it configured with Kubernetes HA.
>
> Till, is the reason this is not supported out of the box also due to the
> fact Kubernetes HA relies on RWX storage and this is considered an
> anti-pattern in the cloud native space? I've seen in Cloud Native
> deployments that the recommended way is to replicate the state through
> software instead of replicating the state through a shared
> filesystem/storage which is a common pattern in data-centers. The cloud
> native approach results in each instance of Flink across nodes owning their
> own RWO filesystem and communicating through the network to other instance
> across nodes to replicate state. I was just wondering if this had been
> considered in the move to a more cloud-native Flink deployment?
>
> Cheers,
>
> Enrique
>
>
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>


Re: [ANNOUNCE] Apache Flink 1.12.3 released

2021-04-29 Thread Till Rohrmann
Great to hear. Thanks a lot for being our release manager Arvid and to
everyone who has contributed to this release!

Cheers,
Till

On Thu, Apr 29, 2021 at 4:11 PM Arvid Heise  wrote:

> Dear all,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.12.3, which is the third bugfix release for the Apache Flink 1.12
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2021/04/29/release-1.12.3.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349691
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
>
> Your friendly release manager Arvid
>


[jira] [Created] (FLINK-22536) Promote the critical log in FineGrainedSlotManager to INFO level

2021-04-29 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-22536:
--

 Summary: Promote the critical log in FineGrainedSlotManager to 
INFO level
 Key: FLINK-22536
 URL: https://issues.apache.org/jira/browse/FLINK-22536
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Yangze Guo


Some critical logs, e.g. register/unregister task managers and allocate/free 
slots, are set to the DEBUG level in FineGrainedSlotManager. This adds to the 
difficulty of debugging. We proposed to promote those logs to INFO level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)