Re: [VOTE] Release 1.9.2, release candidate #1

2020-01-30 Thread Kostas Kloudas
Hi all,

+1 (binding)

- Compiled locally
- Built simple jobs using quickstart
- Submitted on Yarn both per-job and session cluster

Cheers,
Kostas

On Thu, Jan 30, 2020 at 6:26 AM Hequn Cheng  wrote:
>
> Hi everyone,
>
> Thanks a lot for your checking and voting for the release!
> I’ll summarize the result in another email.
>
> Thanks,
> Hequn
>
> On Thu, Jan 30, 2020 at 1:16 PM Hequn Cheng  wrote:
>
> > +1 (non-binding)
> >
> > - check release note
> > - check signatures and hashes
> > - run e2e test on Travis
> > - built from source, without Hadoop and using Scala 2.11,2.12
> > - checked that all POM files point to the same version
> > - ran examples on a standalone cluster.
> >
> > Best,
> > Hequn
> >
> > On Thu, Jan 30, 2020 at 12:58 PM jincheng sun 
> > wrote:
> >
> >> +1 (binding)
> >>
> >> - Release notes looks good.
> >> - Check end-to-end tests passed.
> >> - Signatures and hash are correct.
> >> - All artifacts have been deployed to the maven central repository.
> >> - Started a local Flink cluster and ran the streaming WordCount example.
> >> - Verified web UI and log output, nothing unexpected.
> >>
> >> Best,
> >> Jincheng
> >>
> >>
> >> Tzu-Li (Gordon) Tai  于2020年1月30日周四 上午12:30写道:
> >>
> >> > +1 (binding)
> >> >
> >> > - checked signatures and hashes
> >> > - built from source (mvn clean package, Scala 2.12)
> >> > - ran e2e tests locally
> >> > - checked new dependencies for licenses, no new ones were included since
> >> > 1.9.1
> >> > - ran Kafka example with local cluster
> >> > - quickstart worked without errors, logs looked good
> >> > - Website announcement PR looks good to merge (minus release date to be
> >> > updated once 1.9.2 is out)
> >> > - No missing artifacts in staging area
> >> >
> >> > Thanks for managing the release Hequn!
> >> >
> >> > Cheers,
> >> > Gordon
> >> >
> >> > On Wed, Jan 29, 2020 at 11:05 PM Hequn Cheng 
> >> wrote:
> >> >
> >> > > @Chesnay Schepler  Thanks a lot! The release
> >> notes
> >> > > look
> >> > > much better now.
> >> > > I have updated the release notes in the website PR accordingly.
> >> > >
> >> > > Thanks,
> >> > > Hequn
> >> > >
> >> > > On Wed, Jan 29, 2020 at 5:34 PM Chesnay Schepler 
> >> > > wrote:
> >> > >
> >> > > > +1 (binding)
> >> > > >
> >> > > > - checked signatures
> >> > > > - built from source
> >> > > > - started a cluster and submitted examples via WebUI
> >> > > > - log files look good
> >> > > >
> >> > > > I also went over the release notes and updated a few things.
> >> > > >
> >> > > > On 24/01/2020 11:03, Hequn Cheng wrote:
> >> > > > > Hi everyone,
> >> > > > >
> >> > > > > Please review and vote on the release candidate #1 for the version
> >> > > 1.9.2,
> >> > > > > as follows:
> >> > > > > [ ] +1, Approve the release
> >> > > > > [ ] -1, Do not approve the release (please provide specific
> >> comments)
> >> > > > >
> >> > > > >
> >> > > > > The complete staging area is available for your review, which
> >> > includes:
> >> > > > > * JIRA release notes [1],
> >> > > > > * the official Apache source release and binary convenience
> >> releases
> >> > to
> >> > > > be
> >> > > > > deployed to dist.apache.org [2], which are signed with the key
> >> with
> >> > > > > fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
> >> > > > > * all artifacts to be deployed to the Maven Central Repository
> >> [4],
> >> > > > > * source code tag "release-1.9.2-rc1" [5],
> >> > > > > * website pull request listing the new release and adding
> >> > announcement
> >> > > > blog
> >> > > > > post [6].
> >> > > > >
> >> > > > > The vote will be open for at least 72 hours.
> >> > > > > Please cast your votes before *Jan. 29th 2020, 12:00 UTC*.
> >> > > > >
> >> > > > > It is adopted by majority approval, with at least 3 PMC
> >> affirmative
> >> > > > votes.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Hequn
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346278
> >> > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
> >> > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> > > > > [4]
> >> > > >
> >> > https://repository.apache.org/content/repositories/orgapacheflink-1323/
> >> > > > > [5]
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> https://github.com/apache/flink/commit/c9d2c9098d725a2d39e860bde414ecb0c5d6a233
> >> > > > > [6] https://github.com/apache/flink-web/pull/295
> >> > > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >


Re: [VOTE] Release 1.10.0, release candidate #1

2020-01-30 Thread Piotr Nowojski
Hi,

Thanks for creating this RC Gary & Yu.

+1 (non-binding) from my side

Because of instabilities during the testing period, I’ve manually tested some 
jobs (and streaming examples) on an EMR cluster, writing to S3 using newly 
unshaded/not relocated S3 plugin. Everything seems to works fine. Also I’m not 
aware of any blocking issues for this RC. 

Piotrek

> On 27 Jan 2020, at 22:06, Gary Yao  wrote:
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 1.10.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.10.0-rc1" [5],
> 
> The announcement blog post is in the works. I will update this voting
> thread with a link to the pull request soon.
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Yu & Gary
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1325
> [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1



[ANNOUNCE] Community Discounts for Flink Forward SF 2020 Registrations

2020-01-30 Thread Fabian Hueske
Hi everyone,

The registration for Flink Forward SF 2020 is open now!

Flink Forward San Francisco 2020 will take place from March 23rd to 25th.
The conference will start with one day of training and continue with two
days of keynotes and talks.
We would like to invite you to join the Apache Flink community to connect
with other Flink enthusiasts and learn about Flink use cases, operational
best practices, and technical deep dives.

Please register at
--> https://events.evolutionaryevents.com/flink-forward-sf-2020
if you'd like to attend Flink Forward SF.

IMPORTANT:
* As a member of the Flink community we offer you a 50% discount on your
conference pass if you register with the code: FFSF20-MailingList
* If you are an Apache committer (for any project), we offer a *free*
conference pass if you register with your Apache email address and the
discount code: FFSF20-ApacheCommitter

Best,
Fabian


Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread Fabian Hueske
Hi Ismael,

> Just one question, we will be able to still be featured as an official
docker image in this case?

Yes, that's the goal. We still want to publish official DockerHub images
for every Flink release.
Since we're mainly migrating the docker-flink/docker-flink repo to
apache/flink-docker, this should just work as before.

Less important images (playgrounds, demos) would be published via ASF Infra
under the Apache DockerHub user [1].

Best,
Fabian

[1] https://hub.docker.com/u/apache

Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng :

> +1
>
> Even though I prefer to contribute the Dockerfiles into the Flink main
> repo,
> but I think a dedicate repo is also a good idea.
>
> Thanks a lot for driving this! @Ufuk Celebi
>
> On Thu, Jan 30, 2020 at 12:02 PM Peter Huang 
> wrote:
>
> > +1 (non-binding)
> >
> >
> >
> > On Wed, Jan 29, 2020 at 5:54 PM Yang Wang  wrote:
> >
> > > +1 (non-binding)
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Rong Rong  于2020年1月30日周四 上午12:53写道:
> > >
> > > > +1
> > > >
> > > > --
> > > > Rong
> > > >
> > > > On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía 
> > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > No more maintenance work for us Patrick! Just kidding :), it was
> > mostly
> > > > > done by Patrick, all kudos to him.
> > > > > Just one question, we will be able to still be featured as an
> > official
> > > > > docker image in this case?
> > > > >
> > > > > Best,
> > > > > Ismaël
> > > > >
> > > > > ps. Hope having an official Helm chart becomes also a future
> target.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske 
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
> > > myas...@live.com
> > > > >:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > > 
> > > > > > > From: Stephan Ewen 
> > > > > > > Sent: Tuesday, January 28, 2020 21:36
> > > > > > > To: dev ; patr...@ververica.com <
> > > > > > > patr...@ververica.com>
> > > > > > > Subject: Re: [VOTE] Integrate Flink Docker image publication
> into
> > > > Flink
> > > > > > > release process
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
> > > patr...@ververica.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for kicking this off, Ufuk.
> > > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > --
> > > > > > > > Patrick
> > > > > > > >
> > > > > > > > On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi 
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hey all,
> > > > > > > > >
> > > > > > > > > there is a proposal to contribute the Dockerfiles and
> scripts
> > > of
> > > > > > > > > https://github.com/docker-flink/docker-flink to the Flink
> > > > project.
> > > > > > The
> > > > > > > > > discussion corresponding to this vote outlines the
> reasoning
> > > for
> > > > > the
> > > > > > > > > proposal and can be found here: [1].
> > > > > > > > >
> > > > > > > > > The proposal is as follows:
> > > > > > > > > * Request a new repository apache/flink-docker
> > > > > > > > > * Migrate all files from docker-flink/docker-flink to
> > > > > > > apache/flink-docker
> > > > > > > > > * Update the release documentation to describe how to
> update
> > > > > > > > > apache/flink-docker for new releases
> > > > > > > > >
> > > > > > > > > Please review and vote on this proposal as follows:
> > > > > > > > > [ ] +1, Approve the proposal
> > > > > > > > > [ ] -1, Do not approve the proposal (please provide
> specific
> > > > > > comments)
> > > > > > > > >
> > > > > > > > > The vote will be open for at least 3 days, ending the
> > earliest
> > > > on:
> > > > > > > > January
> > > > > > > > > 30th 2020, 17:00 UTC.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > > > Ufuk
> > > > > > > > >
> > > > > > > > > PS: I'm treating this proposal similar to a "Release Plan"
> as
> > > > > > mentioned
> > > > > > > > in
> > > > > > > > > the project bylaws [2]. Please let me know if you consider
> > > this a
> > > > > > > > different
> > > > > > > > > category.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36139.html
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread Arvid Heise
+1 (non-binding)

On Thu, Jan 30, 2020 at 11:10 AM Fabian Hueske  wrote:

> Hi Ismael,
>
> > Just one question, we will be able to still be featured as an official
> docker image in this case?
>
> Yes, that's the goal. We still want to publish official DockerHub images
> for every Flink release.
> Since we're mainly migrating the docker-flink/docker-flink repo to
> apache/flink-docker, this should just work as before.
>
> Less important images (playgrounds, demos) would be published via ASF Infra
> under the Apache DockerHub user [1].
>
> Best,
> Fabian
>
> [1] https://hub.docker.com/u/apache
>
> Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng <
> chenghe...@gmail.com
> >:
>
> > +1
> >
> > Even though I prefer to contribute the Dockerfiles into the Flink main
> > repo,
> > but I think a dedicate repo is also a good idea.
> >
> > Thanks a lot for driving this! @Ufuk Celebi
> >
> > On Thu, Jan 30, 2020 at 12:02 PM Peter Huang  >
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > >
> > >
> > > On Wed, Jan 29, 2020 at 5:54 PM Yang Wang 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Rong Rong  于2020年1月30日周四 上午12:53写道:
> > > >
> > > > > +1
> > > > >
> > > > > --
> > > > > Rong
> > > > >
> > > > > On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía 
> > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > No more maintenance work for us Patrick! Just kidding :), it was
> > > mostly
> > > > > > done by Patrick, all kudos to him.
> > > > > > Just one question, we will be able to still be featured as an
> > > official
> > > > > > docker image in this case?
> > > > > >
> > > > > > Best,
> > > > > > Ismaël
> > > > > >
> > > > > > ps. Hope having an official Helm chart becomes also a future
> > target.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske <
> fhue...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
> > > > myas...@live.com
> > > > > >:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > > 
> > > > > > > > From: Stephan Ewen 
> > > > > > > > Sent: Tuesday, January 28, 2020 21:36
> > > > > > > > To: dev ; patr...@ververica.com <
> > > > > > > > patr...@ververica.com>
> > > > > > > > Subject: Re: [VOTE] Integrate Flink Docker image publication
> > into
> > > > > Flink
> > > > > > > > release process
> > > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
> > > > patr...@ververica.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for kicking this off, Ufuk.
> > > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Patrick
> > > > > > > > >
> > > > > > > > > On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi <
> u...@apache.org>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey all,
> > > > > > > > > >
> > > > > > > > > > there is a proposal to contribute the Dockerfiles and
> > scripts
> > > > of
> > > > > > > > > > https://github.com/docker-flink/docker-flink to the
> Flink
> > > > > project.
> > > > > > > The
> > > > > > > > > > discussion corresponding to this vote outlines the
> > reasoning
> > > > for
> > > > > > the
> > > > > > > > > > proposal and can be found here: [1].
> > > > > > > > > >
> > > > > > > > > > The proposal is as follows:
> > > > > > > > > > * Request a new repository apache/flink-docker
> > > > > > > > > > * Migrate all files from docker-flink/docker-flink to
> > > > > > > > apache/flink-docker
> > > > > > > > > > * Update the release documentation to describe how to
> > update
> > > > > > > > > > apache/flink-docker for new releases
> > > > > > > > > >
> > > > > > > > > > Please review and vote on this proposal as follows:
> > > > > > > > > > [ ] +1, Approve the proposal
> > > > > > > > > > [ ] -1, Do not approve the proposal (please provide
> > specific
> > > > > > > comments)
> > > > > > > > > >
> > > > > > > > > > The vote will be open for at least 3 days, ending the
> > > earliest
> > > > > on:
> > > > > > > > > January
> > > > > > > > > > 30th 2020, 17:00 UTC.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >
> > > > > > > > > > Ufuk
> > > > > > > > > >
> > > > > > > > > > PS: I'm treating this proposal similar to a "Release
> Plan"
> > as
> > > > > > > mentioned
> > > > > > > > > in
> > > > > > > > > > the project bylaws [2]. Please let me know if you
> consider
> > > > this a
> > > > > > > > > different
> > > > > > > > > > category.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36139.html
> > > > > > > > > > [2]
> > > > > > > > > >
> > 

[jira] [Created] (FLINK-15811) StreamSourceOperatorWatermarksTest#testNoMaxWatermarkOnAsyncCancel fails on Travis

2020-01-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-15811:


 Summary: 
StreamSourceOperatorWatermarksTest#testNoMaxWatermarkOnAsyncCancel fails on 
Travis
 Key: FLINK-15811
 URL: https://issues.apache.org/jira/browse/FLINK-15811
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream, Tests
Affects Versions: 1.11.0
Reporter: Chesnay Schepler


https://api.travis-ci.org/v3/job/643480766/log.txt

{code}
08:06:17.382 [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 1.812 s <<< FAILURE! - in 
org.apache.flink.streaming.runtime.operators.StreamSourceOperatorWatermarksTest
08:06:17.382 [ERROR] 
testNoMaxWatermarkOnAsyncCancel(org.apache.flink.streaming.runtime.operators.StreamSourceOperatorWatermarksTest)
  Time elapsed: 0.235 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.flink.streaming.runtime.operators.StreamSourceOperatorWatermarksTest.testNoMaxWatermarkOnAsyncCancel(StreamSourceOperatorWatermarksTest.java:127)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15813) Set default value of jobmanager.execution.failover-strategy to region

2020-01-30 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-15813:
-

 Summary: Set default value of 
jobmanager.execution.failover-strategy to region
 Key: FLINK-15813
 URL: https://issues.apache.org/jira/browse/FLINK-15813
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Till Rohrmann
 Fix For: 1.11.0


We should set the default value of {{jobmanager.execution.failover-strategy}} 
to {{region}}. This might require to adapt existing tests to make them pass.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15812) HistoryServer archiving is done in Dispatcher main thread

2020-01-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-15812:


 Summary: HistoryServer archiving is done in Dispatcher main thread
 Key: FLINK-15812
 URL: https://issues.apache.org/jira/browse/FLINK-15812
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Chesnay Schepler
 Fix For: 1.11.0, 1.9.3, 1.10.1


{{Dispatcher#archiveExecutionGraph}} should call 
{{HistoryServerArchivist#archiveExecutionGraph}} asynchronously since the 
archiving may involve IO operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: connection timeout during shuffle initialization

2020-01-30 Thread Piotr Nowojski
Hi,

>> I think it's perfectly ok to perform IO ops in netty threads,
(…)
>> Removing synchronization *did solve* the problem for me, because it
>> allows flink to leverage the whole netty event loop pool and it's ok to
>> have a single thread blocked for a little while (we still can accept
>> connections with other threads).


It’s discouraged pattern, as Netty have a thread pool for processing multiple 
channels, but a single channel is always handled by the same pre-defined thread 
(to the best of my knowledge). In Flink we are lucky that Netty threads are not 
doing anything critical besides registering partitions (heartbeats are handled 
independently) that could fail the job if blocked. And I guess you are right, 
if some threads are blocked on the IO, new (sub)partition registration should 
be handled by the non blocked threads, if not for the global lock. 

It sounds very hacky though. Also that's ignoring performance implication - one 
thread blocked on the disks IO, wastes CPU/network potential of ~1/16 channels 
(due to this fix pre determined assignment between channels <-> threads). In 
some scenarios that might be acceptable, with uniform tasks without data skew. 
But if there are simultaneously running multiple tasks with different work load 
patterns and/or a data skew, this can cause visible performance issues.

Having said that, any rework to fully address this issue and make the IO non 
blocking, could be very substantial, so I would be more than happy to just kick 
the can down the road for now ;) 

>> removal of buffer prefetch in BoundedBlockingSubpartitionReader did not
>> help, I've already tried that (there are still other problematic code
>> paths, eg. releasePartition). 


Are there other problematic parts besides releasePartition that you have 
already analysed? Maybe it would be better to just try moving out those calls 
out of the `ResultPartitionManager` somehow call stack?

>> Let me think about how to get a relevant cpu graph from the TM, it's kind
>> of hard to target a "defective node". 

Thanks, I know it’s non trivial, but I would guess you do not have to target a 
“defective node”. If defective node is blocking for ~2 minutes during the 
failure, I’m pretty sure other nodes are being blocked constantly for seconds 
at a time, and profiler results from such nodes would allow us to confirm the 
issue and better understand what’s exactly happening.

>> Anyway attached are some graphs from such a busy node in time of failure.

I didn’t get/see any graphs?

>> Is there any special reason for the synchronization I don't see? I have a
>> feeling it's only for sychronizing `registredPartitions` map access and
>> that it's perfectly ok not to synchronize `createSubpartitionView` and
>> `releasePartition` calls.

I’m not sure. Definitely removing this lock increases concurrency and so the 
potential for race conditions, especially on the releasing resources paths. 
After briefly looking at the code, I didn’t find any obvious issue, but there 
are some callback/notifications happening, and generally speaking resource 
releasing paths are pretty hard to reason about.

Zhijiang might spot something, as he had a good eye for catching such problems 
in the past.

Besides that, you could just run couple (10? 20?) travis runs. All of the 
ITCases from various modules (connectors, flink-tests, …) are pretty good in 
catching race conditions in the network stack.

Piotrek

> On 29 Jan 2020, at 17:05, David Morávek  wrote:
> 
> Just to clarify, these are bare metal nodes (128G ram, 16 cpus +
> hyperthreading, 4xHDDS, 10g network), which run yarn, hdfs and hbase.
> 
> D.
> 
> On Wed, Jan 29, 2020 at 5:03 PM David Morávek 
> wrote:
> 
>> Hi Piotr,
>> 
>> removal of buffer prefetch in BoundedBlockingSubpartitionReader did not
>> help, I've already tried that (there are still other problematic code
>> paths, eg. releasePartition). I think it's perfectly ok to perform IO ops
>> in netty threads, we just have to make sure, we can leverage multiple
>> threads at once. Synchronization in ResultPartitionManager effectively
>> decreases parallelism to one, and "netty tasks / unprocessed messages" keep
>> piling up.
>> 
>> Removing synchronization *did solve* the problem for me, because it
>> allows flink to leverage the whole netty event loop pool and it's ok to
>> have a single thread blocked for a little while (we still can accept
>> connections with other threads).
>> 
>> Let me think about how to get a relevant cpu graph from the TM, it's kind
>> of hard to target a "defective node". Anyway attached are some graphs from
>> such a busy node in time of failure.
>> 
>> Is there any special reason for the synchronization I don't see? I have a
>> feeling it's only for sychronizing `registredPartitions` map access and
>> that it's perfectly ok not to synchronize `createSubpartitionView` and
>> `releasePartition` calls.
>> 
>> Thanks,
>> D.
>> 
>> On Wed, Jan 29, 2020 at 4:45 PM Piotr Nowojski 
>> wro

Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread Yu Li
+1 (non-binding)

Best Regards,
Yu


On Thu, 30 Jan 2020 at 18:35, Arvid Heise  wrote:

> +1 (non-binding)
>
> On Thu, Jan 30, 2020 at 11:10 AM Fabian Hueske  wrote:
>
> > Hi Ismael,
> >
> > > Just one question, we will be able to still be featured as an official
> > docker image in this case?
> >
> > Yes, that's the goal. We still want to publish official DockerHub images
> > for every Flink release.
> > Since we're mainly migrating the docker-flink/docker-flink repo to
> > apache/flink-docker, this should just work as before.
> >
> > Less important images (playgrounds, demos) would be published via ASF
> Infra
> > under the Apache DockerHub user [1].
> >
> > Best,
> > Fabian
> >
> > [1] https://hub.docker.com/u/apache
> >
> > Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng <
> > chenghe...@gmail.com
> > >:
> >
> > > +1
> > >
> > > Even though I prefer to contribute the Dockerfiles into the Flink main
> > > repo,
> > > but I think a dedicate repo is also a good idea.
> > >
> > > Thanks a lot for driving this! @Ufuk Celebi
> > >
> > > On Thu, Jan 30, 2020 at 12:02 PM Peter Huang <
> huangzhenqiu0...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > >
> > > >
> > > > On Wed, Jan 29, 2020 at 5:54 PM Yang Wang 
> > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Rong Rong  于2020年1月30日周四 上午12:53写道:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > --
> > > > > > Rong
> > > > > >
> > > > > > On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía 
> > > > wrote:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > No more maintenance work for us Patrick! Just kidding :), it
> was
> > > > mostly
> > > > > > > done by Patrick, all kudos to him.
> > > > > > > Just one question, we will be able to still be featured as an
> > > > official
> > > > > > > docker image in this case?
> > > > > > >
> > > > > > > Best,
> > > > > > > Ismaël
> > > > > > >
> > > > > > > ps. Hope having an official Helm chart becomes also a future
> > > target.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske <
> > fhue...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
> > > > > myas...@live.com
> > > > > > >:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > > 
> > > > > > > > > From: Stephan Ewen 
> > > > > > > > > Sent: Tuesday, January 28, 2020 21:36
> > > > > > > > > To: dev ; patr...@ververica.com <
> > > > > > > > > patr...@ververica.com>
> > > > > > > > > Subject: Re: [VOTE] Integrate Flink Docker image
> publication
> > > into
> > > > > > Flink
> > > > > > > > > release process
> > > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
> > > > > patr...@ververica.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for kicking this off, Ufuk.
> > > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Patrick
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi <
> > u...@apache.org>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey all,
> > > > > > > > > > >
> > > > > > > > > > > there is a proposal to contribute the Dockerfiles and
> > > scripts
> > > > > of
> > > > > > > > > > > https://github.com/docker-flink/docker-flink to the
> > Flink
> > > > > > project.
> > > > > > > > The
> > > > > > > > > > > discussion corresponding to this vote outlines the
> > > reasoning
> > > > > for
> > > > > > > the
> > > > > > > > > > > proposal and can be found here: [1].
> > > > > > > > > > >
> > > > > > > > > > > The proposal is as follows:
> > > > > > > > > > > * Request a new repository apache/flink-docker
> > > > > > > > > > > * Migrate all files from docker-flink/docker-flink to
> > > > > > > > > apache/flink-docker
> > > > > > > > > > > * Update the release documentation to describe how to
> > > update
> > > > > > > > > > > apache/flink-docker for new releases
> > > > > > > > > > >
> > > > > > > > > > > Please review and vote on this proposal as follows:
> > > > > > > > > > > [ ] +1, Approve the proposal
> > > > > > > > > > > [ ] -1, Do not approve the proposal (please provide
> > > specific
> > > > > > > > comments)
> > > > > > > > > > >
> > > > > > > > > > > The vote will be open for at least 3 days, ending the
> > > > earliest
> > > > > > on:
> > > > > > > > > > January
> > > > > > > > > > > 30th 2020, 17:00 UTC.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > >
> > > > > > > > > > > Ufuk
> > > > > > > > > > >
> > > > > > > > > > > PS: I'm treating this proposal similar to a "Release
> > Plan"
> > > as
> > > > > > > > mentioned
> > > > > > > > > > in
> > > > > > > > > > > the project bylaws [2]. Please let me kn

Re: connection timeout during shuffle initialization

2020-01-30 Thread Piotr Nowojski
One more thing. Could you create a JIRA ticket for this issue? We could also 
move the discussion there.

Piotrek

> On 30 Jan 2020, at 12:14, Piotr Nowojski  wrote:
> 
> Hi,
> 
>>> I think it's perfectly ok to perform IO ops in netty threads,
> (…)
>>> Removing synchronization *did solve* the problem for me, because it
>>> allows flink to leverage the whole netty event loop pool and it's ok to
>>> have a single thread blocked for a little while (we still can accept
>>> connections with other threads).
> 
> 
> It’s discouraged pattern, as Netty have a thread pool for processing multiple 
> channels, but a single channel is always handled by the same pre-defined 
> thread (to the best of my knowledge). In Flink we are lucky that Netty 
> threads are not doing anything critical besides registering partitions 
> (heartbeats are handled independently) that could fail the job if blocked. 
> And I guess you are right, if some threads are blocked on the IO, new 
> (sub)partition registration should be handled by the non blocked threads, if 
> not for the global lock. 
> 
> It sounds very hacky though. Also that's ignoring performance implication - 
> one thread blocked on the disks IO, wastes CPU/network potential of ~1/16 
> channels (due to this fix pre determined assignment between channels <-> 
> threads). In some scenarios that might be acceptable, with uniform tasks 
> without data skew. But if there are simultaneously running multiple tasks 
> with different work load patterns and/or a data skew, this can cause visible 
> performance issues.
> 
> Having said that, any rework to fully address this issue and make the IO non 
> blocking, could be very substantial, so I would be more than happy to just 
> kick the can down the road for now ;) 
> 
>>> removal of buffer prefetch in BoundedBlockingSubpartitionReader did not
>>> help, I've already tried that (there are still other problematic code
>>> paths, eg. releasePartition). 
> 
> 
> Are there other problematic parts besides releasePartition that you have 
> already analysed? Maybe it would be better to just try moving out those calls 
> out of the `ResultPartitionManager` somehow call stack?
> 
>>> Let me think about how to get a relevant cpu graph from the TM, it's kind
>>> of hard to target a "defective node". 
> 
> Thanks, I know it’s non trivial, but I would guess you do not have to target 
> a “defective node”. If defective node is blocking for ~2 minutes during the 
> failure, I’m pretty sure other nodes are being blocked constantly for seconds 
> at a time, and profiler results from such nodes would allow us to confirm the 
> issue and better understand what’s exactly happening.
> 
>>> Anyway attached are some graphs from such a busy node in time of failure.
> 
> I didn’t get/see any graphs?
> 
>>> Is there any special reason for the synchronization I don't see? I have a
>>> feeling it's only for sychronizing `registredPartitions` map access and
>>> that it's perfectly ok not to synchronize `createSubpartitionView` and
>>> `releasePartition` calls.
> 
> I’m not sure. Definitely removing this lock increases concurrency and so the 
> potential for race conditions, especially on the releasing resources paths. 
> After briefly looking at the code, I didn’t find any obvious issue, but there 
> are some callback/notifications happening, and generally speaking resource 
> releasing paths are pretty hard to reason about.
> 
> Zhijiang might spot something, as he had a good eye for catching such 
> problems in the past.
> 
> Besides that, you could just run couple (10? 20?) travis runs. All of the 
> ITCases from various modules (connectors, flink-tests, …) are pretty good in 
> catching race conditions in the network stack.
> 
> Piotrek
> 
>> On 29 Jan 2020, at 17:05, David Morávek  wrote:
>> 
>> Just to clarify, these are bare metal nodes (128G ram, 16 cpus +
>> hyperthreading, 4xHDDS, 10g network), which run yarn, hdfs and hbase.
>> 
>> D.
>> 
>> On Wed, Jan 29, 2020 at 5:03 PM David Morávek 
>> wrote:
>> 
>>> Hi Piotr,
>>> 
>>> removal of buffer prefetch in BoundedBlockingSubpartitionReader did not
>>> help, I've already tried that (there are still other problematic code
>>> paths, eg. releasePartition). I think it's perfectly ok to perform IO ops
>>> in netty threads, we just have to make sure, we can leverage multiple
>>> threads at once. Synchronization in ResultPartitionManager effectively
>>> decreases parallelism to one, and "netty tasks / unprocessed messages" keep
>>> piling up.
>>> 
>>> Removing synchronization *did solve* the problem for me, because it
>>> allows flink to leverage the whole netty event loop pool and it's ok to
>>> have a single thread blocked for a little while (we still can accept
>>> connections with other threads).
>>> 
>>> Let me think about how to get a relevant cpu graph from the TM, it's kind
>>> of hard to target a "defective node". Anyway attached are some graphs from
>>> such a busy node in time of failure.
>>> 
>>> Is

Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread Igal Shilman
+1 (non-binding)

On Thu, Jan 30, 2020 at 12:18 PM Yu Li  wrote:

> +1 (non-binding)
>
> Best Regards,
> Yu
>
>
> On Thu, 30 Jan 2020 at 18:35, Arvid Heise  wrote:
>
> > +1 (non-binding)
> >
> > On Thu, Jan 30, 2020 at 11:10 AM Fabian Hueske 
> wrote:
> >
> > > Hi Ismael,
> > >
> > > > Just one question, we will be able to still be featured as an
> official
> > > docker image in this case?
> > >
> > > Yes, that's the goal. We still want to publish official DockerHub
> images
> > > for every Flink release.
> > > Since we're mainly migrating the docker-flink/docker-flink repo to
> > > apache/flink-docker, this should just work as before.
> > >
> > > Less important images (playgrounds, demos) would be published via ASF
> > Infra
> > > under the Apache DockerHub user [1].
> > >
> > > Best,
> > > Fabian
> > >
> > > [1] https://hub.docker.com/u/apache
> > >
> > > Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng <
> > > chenghe...@gmail.com
> > > >:
> > >
> > > > +1
> > > >
> > > > Even though I prefer to contribute the Dockerfiles into the Flink
> main
> > > > repo,
> > > > but I think a dedicate repo is also a good idea.
> > > >
> > > > Thanks a lot for driving this! @Ufuk Celebi
> > > >
> > > > On Thu, Jan 30, 2020 at 12:02 PM Peter Huang <
> > huangzhenqiu0...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 29, 2020 at 5:54 PM Yang Wang 
> > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Rong Rong  于2020年1月30日周四 上午12:53写道:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > --
> > > > > > > Rong
> > > > > > >
> > > > > > > On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía <
> ieme...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > No more maintenance work for us Patrick! Just kidding :), it
> > was
> > > > > mostly
> > > > > > > > done by Patrick, all kudos to him.
> > > > > > > > Just one question, we will be able to still be featured as an
> > > > > official
> > > > > > > > docker image in this case?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Ismaël
> > > > > > > >
> > > > > > > > ps. Hope having an official Helm chart becomes also a future
> > > > target.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske <
> > > fhue...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
> > > > > > myas...@live.com
> > > > > > > >:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > > 
> > > > > > > > > > From: Stephan Ewen 
> > > > > > > > > > Sent: Tuesday, January 28, 2020 21:36
> > > > > > > > > > To: dev ; patr...@ververica.com <
> > > > > > > > > > patr...@ververica.com>
> > > > > > > > > > Subject: Re: [VOTE] Integrate Flink Docker image
> > publication
> > > > into
> > > > > > > Flink
> > > > > > > > > > release process
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
> > > > > > patr...@ververica.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for kicking this off, Ufuk.
> > > > > > > > > > >
> > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Patrick
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi <
> > > u...@apache.org>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey all,
> > > > > > > > > > > >
> > > > > > > > > > > > there is a proposal to contribute the Dockerfiles and
> > > > scripts
> > > > > > of
> > > > > > > > > > > > https://github.com/docker-flink/docker-flink to the
> > > Flink
> > > > > > > project.
> > > > > > > > > The
> > > > > > > > > > > > discussion corresponding to this vote outlines the
> > > > reasoning
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > > proposal and can be found here: [1].
> > > > > > > > > > > >
> > > > > > > > > > > > The proposal is as follows:
> > > > > > > > > > > > * Request a new repository apache/flink-docker
> > > > > > > > > > > > * Migrate all files from docker-flink/docker-flink to
> > > > > > > > > > apache/flink-docker
> > > > > > > > > > > > * Update the release documentation to describe how to
> > > > update
> > > > > > > > > > > > apache/flink-docker for new releases
> > > > > > > > > > > >
> > > > > > > > > > > > Please review and vote on this proposal as follows:
> > > > > > > > > > > > [ ] +1, Approve the proposal
> > > > > > > > > > > > [ ] -1, Do not approve the proposal (please provide
> > > > specific
> > > > > > > > > comments)
> > > > > > > > > > > >
> > > > > > > > > > > > The vote will be open for at least 3 days, ending the
> > > > > earliest
> > 

REST Monitoring Savepoint failed

2020-01-30 Thread Ramya Ramamurthy
Hi,

I am trying to dynamically increase the parallelism of the job. In the
process of it, while I am trying to trigger the savepoint, i get
the following error. Any help would be appreciated.

The URL triggered is :
http://stgflink.ssdev.in:8081/jobs/2865c1f40340bcb19a88a01d6ef8ff4f/savepoints/
{
"target-directory" :
"gs://-bucket/flink/flink-gcs/flink-savepoints",
"cancel-job" : "false"
}

Error as below:

{"status":{"id":"COMPLETED"},"operation":{"failure-cause":{"class":"java.util.concurrent.CompletionException","stack-trace":"java.util.concurrent.CompletionException:
java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to
trigger savepoint. Decline reason: An Exception occurred while triggering
the checkpoint.\n\tat
org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:972)\n\tat
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)\n\tat
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)\n\tat
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)\n\tat
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)\n\tat
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)\n\tat
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)\n\tat
akka.actor.Actor$class.aroundReceive(Actor.scala:502)\n\tat
akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)\n\tat
akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)\n\tat
akka.actor.ActorCell.invoke(ActorCell.scala:495)\n\tat
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)\n\tat akka.dis


Re: [RESULT] [VOTE] Release 1.9.2, release candidate #1

2020-01-30 Thread Patrick Lucas
A pull request to update the Flink Docker images to 1.9.2 is open:
https://github.com/docker-flink/docker-flink/pull/95

(The "pull" build failed because it rejects changes to the generated
Dockerfiles. I would like to improve this after this repo has been
contributed to Apache.)

On Thu, Jan 30, 2020 at 6:35 AM Hequn Cheng  wrote:

> Hi everyone,
>
> I'm happy to announce that we have unanimously approved this release.
>
> There are 4 approving votes, 3 of which are binding:
> * Chesnay (binding)
> * Gordon (binding)
> * Jincheng (binding)
> * Hequn
>
> There are no disapproving votes.
>
> Thanks, everyone!
>


Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2020-01-30 Thread Robert Metzger
Thanks a lot for this work! I believe the web UI is very important, in
particular to new users. I'm very happy to see that you are putting effort
into improving the visibility into Flink through the proposed changes.

I can not judge if all the changes make total sense, but the discussion has
been open since September, and a good number of people have commented in
the document.
I wonder if we can move this FLIP to the VOTing stage?

On Wed, Jan 22, 2020 at 6:27 PM Till Rohrmann  wrote:

> Thanks for the update Yadong. Big +1 for the proposed improvements for
> Flink's web UI. I think they will be super helpful for our users.
>
> Cheers,
> Till
>
> On Tue, Jan 7, 2020 at 10:00 AM Yadong Xie  wrote:
>
> > Hi everyone
> >
> > We have spent some time updating the documentation since the last
> > discussion.
> >
> > In short, the latest FLIP-75 contains the following proposal(including
> both
> > frontend and RestAPI)
> >
> >1. Job Level
> >   - better job backpressure detection
> >   - load more feature in job exception
> >   - show attempt history in the subtask
> >   - show attempt timeline
> >   - add pending slots
> >2. Task Manager Level
> >   - add more metrics
> >   - better log display
> >3. Job Manager Level
> >   - add metrics tab
> >   - better log display
> >
> > To help everyone better understand the proposal, we spent efforts on
> making
> > an online POC .
> >
> > Now you can compare the difference between the new and old Web/RestAPI
> (the
> > link is inside the doc)!
> >
> > Here is the latest FLIP-75 doc:
> >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> >
> > Looking forward to your feedback
> >
> >
> > Best,
> > Yadong
> >
> > lining jing  于2019年10月24日周四 下午2:11写道:
> >
> > > Hi all, I have updated the backend design in FLIP-75
> > > <
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > >
> > > .
> > >
> > > Here are some brief introductions:
> > >
> > >- Add metric for manage memory FLINK-14406
> > >.
> > >- Expose TaskExecutor resource configurations to REST API
> FLINK-14422
> > >.
> > >- Add TaskManagerResourceInfo in TaskManagerDetailsInfo to show
> > >TaskManager Resource FLINK-14435
> > >.
> > >
> > > I will continue to update the rest part of the backend design in the
> doc,
> > > let's keep discuss here, any feedback is appreciated.
> > >
> > > Yadong Xie  于2019年9月27日周五 上午10:13写道:
> > >
> > > > Hi all
> > > >
> > > > Flink Web UI is the main platform for most users to monitor their
> jobs
> > > and
> > > > clusters. We have reconstructed Flink web in 1.9.0 version, but there
> > are
> > > > still some shortcomings.
> > > >
> > > > This discussion thread aims to provide a better experience for Flink
> UI
> > > > users.
> > > >
> > > > Here is the design doc I drafted:
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > >
> > > >
> > > > The FLIP can be found at [2].
> > > >
> > > > Please keep the discussion here, in the mailing list.
> > > >
> > > > Looking forward to your opinions, any feedbacks are welcome.
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > > >
> > > > [2]:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2020-01-30 Thread Till Rohrmann
Would it be easier if FLIP-75 would be the umbrella FLIP and we would vote
on the individual improvements as sub FLIPs? Decreasing the scope should
make things easier.

Cheers,
Till

On Thu, Jan 30, 2020 at 2:35 PM Robert Metzger  wrote:

> Thanks a lot for this work! I believe the web UI is very important, in
> particular to new users. I'm very happy to see that you are putting effort
> into improving the visibility into Flink through the proposed changes.
>
> I can not judge if all the changes make total sense, but the discussion has
> been open since September, and a good number of people have commented in
> the document.
> I wonder if we can move this FLIP to the VOTing stage?
>
> On Wed, Jan 22, 2020 at 6:27 PM Till Rohrmann 
> wrote:
>
> > Thanks for the update Yadong. Big +1 for the proposed improvements for
> > Flink's web UI. I think they will be super helpful for our users.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 7, 2020 at 10:00 AM Yadong Xie  wrote:
> >
> > > Hi everyone
> > >
> > > We have spent some time updating the documentation since the last
> > > discussion.
> > >
> > > In short, the latest FLIP-75 contains the following proposal(including
> > both
> > > frontend and RestAPI)
> > >
> > >1. Job Level
> > >   - better job backpressure detection
> > >   - load more feature in job exception
> > >   - show attempt history in the subtask
> > >   - show attempt timeline
> > >   - add pending slots
> > >2. Task Manager Level
> > >   - add more metrics
> > >   - better log display
> > >3. Job Manager Level
> > >   - add metrics tab
> > >   - better log display
> > >
> > > To help everyone better understand the proposal, we spent efforts on
> > making
> > > an online POC .
> > >
> > > Now you can compare the difference between the new and old Web/RestAPI
> > (the
> > > link is inside the doc)!
> > >
> > > Here is the latest FLIP-75 doc:
> > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > >
> > > Looking forward to your feedback
> > >
> > >
> > > Best,
> > > Yadong
> > >
> > > lining jing  于2019年10月24日周四 下午2:11写道:
> > >
> > > > Hi all, I have updated the backend design in FLIP-75
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > >
> > > > .
> > > >
> > > > Here are some brief introductions:
> > > >
> > > >- Add metric for manage memory FLINK-14406
> > > >.
> > > >- Expose TaskExecutor resource configurations to REST API
> > FLINK-14422
> > > >.
> > > >- Add TaskManagerResourceInfo in TaskManagerDetailsInfo to show
> > > >TaskManager Resource FLINK-14435
> > > >.
> > > >
> > > > I will continue to update the rest part of the backend design in the
> > doc,
> > > > let's keep discuss here, any feedback is appreciated.
> > > >
> > > > Yadong Xie  于2019年9月27日周五 上午10:13写道:
> > > >
> > > > > Hi all
> > > > >
> > > > > Flink Web UI is the main platform for most users to monitor their
> > jobs
> > > > and
> > > > > clusters. We have reconstructed Flink web in 1.9.0 version, but
> there
> > > are
> > > > > still some shortcomings.
> > > > >
> > > > > This discussion thread aims to provide a better experience for
> Flink
> > UI
> > > > > users.
> > > > >
> > > > > Here is the design doc I drafted:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > >
> > > > >
> > > > > The FLIP can be found at [2].
> > > > >
> > > > > Please keep the discussion here, in the mailing list.
> > > > >
> > > > > Looking forward to your opinions, any feedbacks are welcome.
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > <
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > > > >
> > > > > [2]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal
> > > > >
> > > >
> > >
> >
>


Re: REST Monitoring Savepoint failed

2020-01-30 Thread Till Rohrmann
Hi Ramya,

I think this message is better suited for the user ML list. Which version
of Flink are you using? Have you checked the Flink logs to see whether they
contain anything suspicious?

Cheers,
Till

On Thu, Jan 30, 2020 at 1:09 PM Ramya Ramamurthy  wrote:

> Hi,
>
> I am trying to dynamically increase the parallelism of the job. In the
> process of it, while I am trying to trigger the savepoint, i get
> the following error. Any help would be appreciated.
>
> The URL triggered is :
>
> http://stgflink.ssdev.in:8081/jobs/2865c1f40340bcb19a88a01d6ef8ff4f/savepoints/
> {
> "target-directory" :
> "gs://-bucket/flink/flink-gcs/flink-savepoints",
> "cancel-job" : "false"
> }
>
> Error as below:
>
>
> {"status":{"id":"COMPLETED"},"operation":{"failure-cause":{"class":"java.util.concurrent.CompletionException","stack-trace":"java.util.concurrent.CompletionException:
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to
> trigger savepoint. Decline reason: An Exception occurred while triggering
> the checkpoint.\n\tat
>
> org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:972)\n\tat
>
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
>
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
>
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)\n\tat
>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)\n\tat
>
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)\n\tat
>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)\n\tat
>
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)\n\tat
>
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)\n\tat
> akka.actor.Actor$class.aroundReceive(Actor.scala:502)\n\tat
> akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)\n\tat
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)\n\tat
> akka.actor.ActorCell.invoke(ActorCell.scala:495)\n\tat
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)\n\tat akka.dis
>


Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread aihua li
+1 (non-binding)

> 2020年1月30日 下午7:36,Igal Shilman  写道:
> 
> +1 (non-binding)
> 
> On Thu, Jan 30, 2020 at 12:18 PM Yu Li  wrote:
> 
>> +1 (non-binding)
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Thu, 30 Jan 2020 at 18:35, Arvid Heise  wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> On Thu, Jan 30, 2020 at 11:10 AM Fabian Hueske 
>> wrote:
>>> 
 Hi Ismael,
 
> Just one question, we will be able to still be featured as an
>> official
 docker image in this case?
 
 Yes, that's the goal. We still want to publish official DockerHub
>> images
 for every Flink release.
 Since we're mainly migrating the docker-flink/docker-flink repo to
 apache/flink-docker, this should just work as before.
 
 Less important images (playgrounds, demos) would be published via ASF
>>> Infra
 under the Apache DockerHub user [1].
 
 Best,
 Fabian
 
 [1] https://hub.docker.com/u/apache
 
 Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng <
 chenghe...@gmail.com
> :
 
> +1
> 
> Even though I prefer to contribute the Dockerfiles into the Flink
>> main
> repo,
> but I think a dedicate repo is also a good idea.
> 
> Thanks a lot for driving this! @Ufuk Celebi
> 
> On Thu, Jan 30, 2020 at 12:02 PM Peter Huang <
>>> huangzhenqiu0...@gmail.com
> 
> wrote:
> 
>> +1 (non-binding)
>> 
>> 
>> 
>> On Wed, Jan 29, 2020 at 5:54 PM Yang Wang 
 wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> 
>>> Best,
>>> Yang
>>> 
>>> Rong Rong  于2020年1月30日周四 上午12:53写道:
>>> 
 +1
 
 --
 Rong
 
 On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía <
>> ieme...@gmail.com>
>> wrote:
 
> +1 (non-binding)
> 
> No more maintenance work for us Patrick! Just kidding :), it
>>> was
>> mostly
> done by Patrick, all kudos to him.
> Just one question, we will be able to still be featured as an
>> official
> docker image in this case?
> 
> Best,
> Ismaël
> 
> ps. Hope having an official Helm chart becomes also a future
> target.
> 
> 
> 
> On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske <
 fhue...@apache.org>
 wrote:
> 
>> +1
>> 
>> Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
>>> myas...@live.com
> :
>> 
>>> +1 (non-binding)
>>> 
>>> From: Stephan Ewen 
>>> Sent: Tuesday, January 28, 2020 21:36
>>> To: dev ; patr...@ververica.com <
>>> patr...@ververica.com>
>>> Subject: Re: [VOTE] Integrate Flink Docker image
>>> publication
> into
 Flink
>>> release process
>>> 
>>> +1
>>> 
>>> On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
>>> patr...@ververica.com
> 
>>> wrote:
>>> 
 Thanks for kicking this off, Ufuk.
 
 +1 (non-binding)
 
 --
 Patrick
 
 On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi <
 u...@apache.org>
 wrote:
 
> Hey all,
> 
> there is a proposal to contribute the Dockerfiles and
> scripts
>>> of
> https://github.com/docker-flink/docker-flink to the
 Flink
 project.
>> The
> discussion corresponding to this vote outlines the
> reasoning
>>> for
> the
> proposal and can be found here: [1].
> 
> The proposal is as follows:
> * Request a new repository apache/flink-docker
> * Migrate all files from docker-flink/docker-flink to
>>> apache/flink-docker
> * Update the release documentation to describe how to
> update
> apache/flink-docker for new releases
> 
> Please review and vote on this proposal as follows:
> [ ] +1, Approve the proposal
> [ ] -1, Do not approve the proposal (please provide
> specific
>> comments)
> 
> The vote will be open for at least 3 days, ending the
>> earliest
 on:
 January
> 30th 2020, 17:00 UTC.
> 
> Cheers,
> 
> Ufuk
> 
> PS: I'm treating this proposal similar to a "Release
 Plan"
> as
>> mentioned
 in
> the project bylaws [2]. Please let me know if you
 consider
>>> this a
 different
> category.
> 
> [1]
> 
 
>>> 
>> 
> 
 
>>> 
>> 
> 
 
>>> 
>> http://apache-flink-mailing-list-archive

[jira] [Created] (FLINK-15814) Log warning when StreamingFileSink is used with bulk output formats and an ambiguous S3 scheme

2020-01-30 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-15814:


 Summary: Log warning when StreamingFileSink is used with bulk 
output formats and an ambiguous S3 scheme
 Key: FLINK-15814
 URL: https://issues.apache.org/jira/browse/FLINK-15814
 Project: Flink
  Issue Type: Improvement
Reporter: Seth Wiesman
 Fix For: 1.11.0


StreamingFileSink does not properly work with bulk output formats and 
s3-presto. We should log a warning when the provided scheme is 's3p' or 's3' 
(which is ambiguous) explaining the potential issues and encourage users to 
only use 's3a' which always delegates to hadoop. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15815) Disable inclusion of original pom under META-INF/maven

2020-01-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-15815:


 Summary: Disable inclusion of original pom under META-INF/maven
 Key: FLINK-15815
 URL: https://issues.apache.org/jira/browse/FLINK-15815
 Project: Flink
  Issue Type: Improvement
  Components: BuildSystem / Shaded
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: shaded-10.0


The META-INF/maven directory contains the original poms. These are pretty much 
just noise and are safe to exclude, so let's do just that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15816) Limit the maximum length of the value of kubernetes.cluster-id configuration option

2020-01-30 Thread Canbin Zheng (Jira)
Canbin Zheng created FLINK-15816:


 Summary: Limit the maximum length of the value of 
kubernetes.cluster-id configuration option
 Key: FLINK-15816
 URL: https://issues.apache.org/jira/browse/FLINK-15816
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Canbin Zheng
 Fix For: 1.11.0, 1.10.1


Two Kubernetes Service will be created when a session cluster is deployed, one 
is the internal Service and the other is the rest Service, we set the internal 
Service name to the value of the _kubernetes.cluster-id_ configuration option 
and then set the rest Service name to  _${kubernetes.cluster-id}_ with a suffix 
*-rest* appended, said if we set the _kubernetes.cluster-id_ to *session-test*, 
then the internal Service name will be *session-test* and the rest Service name 
be *session-test-rest;* there is a constraint in Kubernetes that the Service 
name must be no more than 63 characters, for the current naming convention it 
leads to that the value of _kubernetes.cluster-id_ should not be more than 58 
characters, otherwise there are scenarios that the internal Service is created 
successfully then comes up with a ClusterDeploymentException when trying to 
create the rest Service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Yu Li became a Flink committer

2020-01-30 Thread Benchao Li
Congratulations!!

Biao Liu  于2020年1月29日周三 下午9:25写道:

> Congrats!
>
> On Wed, Jan 29, 2020 at 10:37 aihua li  wrote:
>
> > Congratulations Yu LI, well deserved.
> >
> > > 2020年1月23日 下午4:59,Stephan Ewen  写道:
> > >
> > > Hi all!
> > >
> > > We are announcing that Yu Li has joined the rank of Flink committers.
> > >
> > > Yu joined already in late December, but the announcement got lost
> because
> > > of the Christmas and New Years season, so here is a belated proper
> > > announcement.
> > >
> > > Yu is one of the main contributors to the state backend components in
> the
> > > recent year, working on various improvements, for example the RocksDB
> > > memory management for 1.10.
> > > He has also been one of the release managers for the big 1.10 release.
> > >
> > > Congrats for joining us, Yu!
> > >
> > > Best,
> > > Stephan
> >
> > --
>
> Thanks,
> Biao /'bɪ.aʊ/
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: [ANNOUNCE] Yu Li became a Flink committer

2020-01-30 Thread felixzheng zheng
Congrats!

Benchao Li  于2020年1月31日周五 上午10:02写道:

> Congratulations!!
>
> Biao Liu  于2020年1月29日周三 下午9:25写道:
>
> > Congrats!
> >
> > On Wed, Jan 29, 2020 at 10:37 aihua li  wrote:
> >
> > > Congratulations Yu LI, well deserved.
> > >
> > > > 2020年1月23日 下午4:59,Stephan Ewen  写道:
> > > >
> > > > Hi all!
> > > >
> > > > We are announcing that Yu Li has joined the rank of Flink committers.
> > > >
> > > > Yu joined already in late December, but the announcement got lost
> > because
> > > > of the Christmas and New Years season, so here is a belated proper
> > > > announcement.
> > > >
> > > > Yu is one of the main contributors to the state backend components in
> > the
> > > > recent year, working on various improvements, for example the RocksDB
> > > > memory management for 1.10.
> > > > He has also been one of the release managers for the big 1.10
> release.
> > > >
> > > > Congrats for joining us, Yu!
> > > >
> > > > Best,
> > > > Stephan
> > >
> > > --
> >
> > Thanks,
> > Biao /'bɪ.aʊ/
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>


[jira] [Created] (FLINK-15817) Kubernetes Resource leak while deployment exception happens

2020-01-30 Thread Canbin Zheng (Jira)
Canbin Zheng created FLINK-15817:


 Summary: Kubernetes Resource leak while deployment exception 
happens
 Key: FLINK-15817
 URL: https://issues.apache.org/jira/browse/FLINK-15817
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Canbin Zheng
 Fix For: 1.11.0, 1.10.1


When we deploy a new session cluster on Kubernetes cluster, usually there are 
four steps to create the Kubernetes components, and the creation order is as 
below: internal Service -> rest Service -> ConfigMap -> JobManager Deployment.

After the internal Service is created, any Exceptions that fail the cluster 
deployment progress would cause Kubernetes Resource leak, for example:
 #  If failed to create rest Service due to service name 
constraint([FLINK-15816|https://issues.apache.org/jira/browse/FLINK-15816]), 
the internal Service would not be cleaned up when the deploy progress 
terminates.
 # If failed to create JobManager Deployment(a case is that 
_jobmanager.heap.size_ is too small such as 512M, which is less than the 
default configuration value of 'containerized.heap-cutoff-min'), the internal 
Service, the rest Service, and the ConfigMap all leaks.

This ticket proposes to do some clean-ups(cleans the residual Services and 
ConfigMap) if the cluster deployment progress terminates accidentally on the 
client-side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15818) Add Kafka ingress startup position config to YAML support

2020-01-30 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-15818:
---

 Summary: Add Kafka ingress startup position config to YAML support
 Key: FLINK-15818
 URL: https://issues.apache.org/jira/browse/FLINK-15818
 Project: Flink
  Issue Type: New Feature
  Components: Stateful Functions
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


With FLINK-15769, we've added startup position configuration to the Java Kafka 
Ingress.
The same functionality should also be added to the YAML support.
Part of this work should also cover unifying source creation logic between the 
`ProtobufKafkaSourceProvider` and `KafkaSourceProvider` so that configuration 
validation is shared.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15819) Add more StateFun Kafka ingress features that are already available in FlinkKafkaConsumer

2020-01-30 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-15819:
---

 Summary: Add more StateFun Kafka ingress features that are already 
available in FlinkKafkaConsumer
 Key: FLINK-15819
 URL: https://issues.apache.org/jira/browse/FLINK-15819
 Project: Flink
  Issue Type: New Feature
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai


This is an umbrella JIRA to list all pending {{FlinkKafkaConsumer}} features 
that are reasonable to expose through Stateful Function's Kafka ingress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15820) Allow disabling / enabling auto offset committing to Kafka in StateFun Kafka ingress

2020-01-30 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-15820:
---

 Summary: Allow disabling / enabling auto offset committing to 
Kafka in StateFun Kafka ingress
 Key: FLINK-15820
 URL: https://issues.apache.org/jira/browse/FLINK-15820
 Project: Flink
  Issue Type: Sub-task
  Components: Stateful Functions
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai


This is already supported in {{FlinkKafkaConsumer}}, so it would only be a 
matter of exposing it through the Stateful Functions Kafka ingress.

It would be reasonable to support this with it being disabled by default, so 
that users do not always need to set the consumer group id and have a minimal 
setup to get started with using the ingress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15821) Allow configuring Kafka partition / topic discovery in StateFun Kafka ingress

2020-01-30 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-15821:
---

 Summary: Allow configuring Kafka partition / topic discovery in 
StateFun Kafka ingress
 Key: FLINK-15821
 URL: https://issues.apache.org/jira/browse/FLINK-15821
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai


This is already implemented in the {{FlinkKafkaConsumer}}, so it would only be 
a matter of exposing it through Stateful Function's Kafka ingress.

Proposed API:
{code}
KafkaIngressBuilder#withTopics(java.util.regex.Pattern regexPattern)
KafkaIngressBuilder#enableDiscovery(java.time.Duration discoveryInterval)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread Jingsong Li
+1 (non-binding)

Official docker support really looks good to me.

Best,
Jingsong Lee

On Fri, Jan 31, 2020 at 12:55 AM aihua li  wrote:

> +1 (non-binding)
>
> > 2020年1月30日 下午7:36,Igal Shilman  写道:
> >
> > +1 (non-binding)
> >
> > On Thu, Jan 30, 2020 at 12:18 PM Yu Li  wrote:
> >
> >> +1 (non-binding)
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Thu, 30 Jan 2020 at 18:35, Arvid Heise  wrote:
> >>
> >>> +1 (non-binding)
> >>>
> >>> On Thu, Jan 30, 2020 at 11:10 AM Fabian Hueske 
> >> wrote:
> >>>
>  Hi Ismael,
> 
> > Just one question, we will be able to still be featured as an
> >> official
>  docker image in this case?
> 
>  Yes, that's the goal. We still want to publish official DockerHub
> >> images
>  for every Flink release.
>  Since we're mainly migrating the docker-flink/docker-flink repo to
>  apache/flink-docker, this should just work as before.
> 
>  Less important images (playgrounds, demos) would be published via ASF
> >>> Infra
>  under the Apache DockerHub user [1].
> 
>  Best,
>  Fabian
> 
>  [1] https://hub.docker.com/u/apache
> 
>  Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng <
>  chenghe...@gmail.com
> > :
> 
> > +1
> >
> > Even though I prefer to contribute the Dockerfiles into the Flink
> >> main
> > repo,
> > but I think a dedicate repo is also a good idea.
> >
> > Thanks a lot for driving this! @Ufuk Celebi
> >
> > On Thu, Jan 30, 2020 at 12:02 PM Peter Huang <
> >>> huangzhenqiu0...@gmail.com
> >
> > wrote:
> >
> >> +1 (non-binding)
> >>
> >>
> >>
> >> On Wed, Jan 29, 2020 at 5:54 PM Yang Wang 
>  wrote:
> >>
> >>> +1 (non-binding)
> >>>
> >>>
> >>> Best,
> >>> Yang
> >>>
> >>> Rong Rong  于2020年1月30日周四 上午12:53写道:
> >>>
>  +1
> 
>  --
>  Rong
> 
>  On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía <
> >> ieme...@gmail.com>
> >> wrote:
> 
> > +1 (non-binding)
> >
> > No more maintenance work for us Patrick! Just kidding :), it
> >>> was
> >> mostly
> > done by Patrick, all kudos to him.
> > Just one question, we will be able to still be featured as an
> >> official
> > docker image in this case?
> >
> > Best,
> > Ismaël
> >
> > ps. Hope having an official Helm chart becomes also a future
> > target.
> >
> >
> >
> > On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske <
>  fhue...@apache.org>
>  wrote:
> >
> >> +1
> >>
> >> Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
> >>> myas...@live.com
> > :
> >>
> >>> +1 (non-binding)
> >>> 
> >>> From: Stephan Ewen 
> >>> Sent: Tuesday, January 28, 2020 21:36
> >>> To: dev ; patr...@ververica.com <
> >>> patr...@ververica.com>
> >>> Subject: Re: [VOTE] Integrate Flink Docker image
> >>> publication
> > into
>  Flink
> >>> release process
> >>>
> >>> +1
> >>>
> >>> On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
> >>> patr...@ververica.com
> >
> >>> wrote:
> >>>
>  Thanks for kicking this off, Ufuk.
> 
>  +1 (non-binding)
> 
>  --
>  Patrick
> 
>  On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi <
>  u...@apache.org>
>  wrote:
> 
> > Hey all,
> >
> > there is a proposal to contribute the Dockerfiles and
> > scripts
> >>> of
> > https://github.com/docker-flink/docker-flink to the
>  Flink
>  project.
> >> The
> > discussion corresponding to this vote outlines the
> > reasoning
> >>> for
> > the
> > proposal and can be found here: [1].
> >
> > The proposal is as follows:
> > * Request a new repository apache/flink-docker
> > * Migrate all files from docker-flink/docker-flink to
> >>> apache/flink-docker
> > * Update the release documentation to describe how to
> > update
> > apache/flink-docker for new releases
> >
> > Please review and vote on this proposal as follows:
> > [ ] +1, Approve the proposal
> > [ ] -1, Do not approve the proposal (please provide
> > specific
> >> comments)
> >
> > The vote will be open for at least 3 days, ending the
> >> earliest
>  on:
>  January
> > 30th 2020, 17:00 UTC.
> >
> > Cheers,
> >
> > Ufuk
> >
> 

Re: [VOTE] Integrate Flink Docker image publication into Flink release process

2020-01-30 Thread Tzu-Li (Gordon) Tai
+1 (binding)

Cheers,
Gordon

On Fri, Jan 31, 2020 at 2:16 PM Jingsong Li  wrote:

> +1 (non-binding)
>
> Official docker support really looks good to me.
>
> Best,
> Jingsong Lee
>
> On Fri, Jan 31, 2020 at 12:55 AM aihua li  wrote:
>
> > +1 (non-binding)
> >
> > > 2020年1月30日 下午7:36,Igal Shilman  写道:
> > >
> > > +1 (non-binding)
> > >
> > > On Thu, Jan 30, 2020 at 12:18 PM Yu Li  wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Best Regards,
> > >> Yu
> > >>
> > >>
> > >> On Thu, 30 Jan 2020 at 18:35, Arvid Heise 
> wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> On Thu, Jan 30, 2020 at 11:10 AM Fabian Hueske 
> > >> wrote:
> > >>>
> >  Hi Ismael,
> > 
> > > Just one question, we will be able to still be featured as an
> > >> official
> >  docker image in this case?
> > 
> >  Yes, that's the goal. We still want to publish official DockerHub
> > >> images
> >  for every Flink release.
> >  Since we're mainly migrating the docker-flink/docker-flink repo to
> >  apache/flink-docker, this should just work as before.
> > 
> >  Less important images (playgrounds, demos) would be published via
> ASF
> > >>> Infra
> >  under the Apache DockerHub user [1].
> > 
> >  Best,
> >  Fabian
> > 
> >  [1] https://hub.docker.com/u/apache
> > 
> >  Am Do., 30. Jan. 2020 um 06:12 Uhr schrieb Hequn Cheng <
> >  chenghe...@gmail.com
> > > :
> > 
> > > +1
> > >
> > > Even though I prefer to contribute the Dockerfiles into the Flink
> > >> main
> > > repo,
> > > but I think a dedicate repo is also a good idea.
> > >
> > > Thanks a lot for driving this! @Ufuk Celebi
> > >
> > > On Thu, Jan 30, 2020 at 12:02 PM Peter Huang <
> > >>> huangzhenqiu0...@gmail.com
> > >
> > > wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >>
> > >>
> > >> On Wed, Jan 29, 2020 at 5:54 PM Yang Wang 
> >  wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>>
> > >>> Best,
> > >>> Yang
> > >>>
> > >>> Rong Rong  于2020年1月30日周四 上午12:53写道:
> > >>>
> >  +1
> > 
> >  --
> >  Rong
> > 
> >  On Wed, Jan 29, 2020 at 8:51 AM Ismaël Mejía <
> > >> ieme...@gmail.com>
> > >> wrote:
> > 
> > > +1 (non-binding)
> > >
> > > No more maintenance work for us Patrick! Just kidding :), it
> > >>> was
> > >> mostly
> > > done by Patrick, all kudos to him.
> > > Just one question, we will be able to still be featured as an
> > >> official
> > > docker image in this case?
> > >
> > > Best,
> > > Ismaël
> > >
> > > ps. Hope having an official Helm chart becomes also a future
> > > target.
> > >
> > >
> > >
> > > On Tue, Jan 28, 2020 at 3:26 PM Fabian Hueske <
> >  fhue...@apache.org>
> >  wrote:
> > >
> > >> +1
> > >>
> > >> Am Di., 28. Jan. 2020 um 15:23 Uhr schrieb Yun Tang <
> > >>> myas...@live.com
> > > :
> > >>
> > >>> +1 (non-binding)
> > >>> 
> > >>> From: Stephan Ewen 
> > >>> Sent: Tuesday, January 28, 2020 21:36
> > >>> To: dev ; patr...@ververica.com <
> > >>> patr...@ververica.com>
> > >>> Subject: Re: [VOTE] Integrate Flink Docker image
> > >>> publication
> > > into
> >  Flink
> > >>> release process
> > >>>
> > >>> +1
> > >>>
> > >>> On Tue, Jan 28, 2020 at 2:20 PM Patrick Lucas <
> > >>> patr...@ververica.com
> > >
> > >>> wrote:
> > >>>
> >  Thanks for kicking this off, Ufuk.
> > 
> >  +1 (non-binding)
> > 
> >  --
> >  Patrick
> > 
> >  On Mon, Jan 27, 2020 at 5:50 PM Ufuk Celebi <
> >  u...@apache.org>
> >  wrote:
> > 
> > > Hey all,
> > >
> > > there is a proposal to contribute the Dockerfiles and
> > > scripts
> > >>> of
> > > https://github.com/docker-flink/docker-flink to the
> >  Flink
> >  project.
> > >> The
> > > discussion corresponding to this vote outlines the
> > > reasoning
> > >>> for
> > > the
> > > proposal and can be found here: [1].
> > >
> > > The proposal is as follows:
> > > * Request a new repository apache/flink-docker
> > > * Migrate all files from docker-flink/docker-flink to
> > >>> apache/flink-docker
> > > * Update the release documentation to describe how to
> > > update
> > > apache/flink-docker for new releases
> > >
> > > Please review and vote on this proposal as follows:

Re: REST Monitoring Savepoint failed

2020-01-30 Thread Ramya Ramamurthy
Hi Till,

I am using flink 1.7.
This is my observation.

a) I first trigger a savepoint. this is stored on my Google cloud storage.
b) When i invoke the rescale HTTP API, i get the error telling savepoints
dir is not configured. But post triggering a), i could verify the savepoint
directory present in GCS in the mentioned path.

Below is the snapshot of my deployment file.

Environment:
  JOB_MANAGER_RPC_ADDRESS:svc-flink-jobmanager-gcs
  HIGH_AVAILABILITY:  zookeeper
  HIGH_AVAILABILITY_ZOOKEEPER:zookeeper-dev-1:2181
  HIGH_AVAILABILITY_ZOOKEEPER_PATH_ROOT:  /flink-gcs-chk
  HIGH_AVAILABILITY_CLUSTER_ID:   fs.default_ns
  HIGH_AVAILABILITY_STORAGEDIR:
gs://x/flink/flink-gcs/checkpoints
  HIGH_AVAILABILITY_JOBMANAGER_PORT:  6123
  STATE_CHECKPOINTS_DIR:
 gs://x/flink/flink-gcs/flink-checkpoints
  STATE_SAVEPOINTS_DIR:
gs://x/flink/flink-gcs/flink-savepoints

Response to my Savepoints REST API is as below:
{
"status": {
"id": "COMPLETED"
},
"operation": {
"location":
"gs://x/flink/flink-gcs/flink-savepoints/savepoint-d02217-ec0aced1fe9c"
}
}

So why does the job doesnt recognize this savepoint directory ? Also,
during this operation, i could see the Checkpoints directory for this job
gets deleted. Post which, no checkpoints are happening. any thoughts here
would really help us in progressing.

Thanks,

On Thu, Jan 30, 2020 at 8:45 PM Till Rohrmann  wrote:

> Hi Ramya,
>
> I think this message is better suited for the user ML list. Which version
> of Flink are you using? Have you checked the Flink logs to see whether they
> contain anything suspicious?
>
> Cheers,
> Till
>
> On Thu, Jan 30, 2020 at 1:09 PM Ramya Ramamurthy 
> wrote:
>
> > Hi,
> >
> > I am trying to dynamically increase the parallelism of the job. In the
> > process of it, while I am trying to trigger the savepoint, i get
> > the following error. Any help would be appreciated.
> >
> > The URL triggered is :
> >
> >
> http://stgflink.ssdev.in:8081/jobs/2865c1f40340bcb19a88a01d6ef8ff4f/savepoints/
> > {
> > "target-directory" :
> > "gs://-bucket/flink/flink-gcs/flink-savepoints",
> > "cancel-job" : "false"
> > }
> >
> > Error as below:
> >
> >
> >
> {"status":{"id":"COMPLETED"},"operation":{"failure-cause":{"class":"java.util.concurrent.CompletionException","stack-trace":"java.util.concurrent.CompletionException:
> > java.util.concurrent.CompletionException:
> > org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to
> > trigger savepoint. Decline reason: An Exception occurred while triggering
> > the checkpoint.\n\tat
> >
> >
> org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:972)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)\n\tat
> >
> >
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)\n\tat
> > akka.actor.Actor$class.aroundReceive(Actor.scala:502)\n\tat
> > akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)\n\tat
> > akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)\n\tat
> > akka.actor.ActorCell.invoke(ActorCell.scala:495)\n\tat
> > akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)\n\tat akka.dis
> >
>


Re: REST Monitoring Savepoint failed

2020-01-30 Thread Ramya Ramamurthy
Hi Till,

I am using flink 1.7.
This is my observation.

a) I first trigger a savepoint. this is stored on my Google cloud storage.
b) When i invoke the rescale HTTP API, i get the error telling savepoints
dir is not configured. But post triggering a), i could verify the savepoint
directory present in GCS in the mentioned path.

Below is the snapshot of my deployment file.

Environment:
  JOB_MANAGER_RPC_ADDRESS:svc-flink-jobmanager-gcs
  HIGH_AVAILABILITY:  zookeeper
  HIGH_AVAILABILITY_ZOOKEEPER:zookeeper-dev-1:2181
  HIGH_AVAILABILITY_ZOOKEEPER_PATH_ROOT:  /flink-gcs-chk
  HIGH_AVAILABILITY_CLUSTER_ID:   fs.default_ns
  HIGH_AVAILABILITY_STORAGEDIR:
gs://x/flink/flink-gcs/checkpoints
  HIGH_AVAILABILITY_JOBMANAGER_PORT:  6123
  STATE_CHECKPOINTS_DIR:
 gs://x/flink/flink-gcs/flink-checkpoints
  STATE_SAVEPOINTS_DIR:
gs://x/flink/flink-gcs/flink-savepoints

Response to my Savepoints REST API is as below:
{
"status": {
"id": "COMPLETED"
},
"operation": {
"location":
"gs://x/flink/flink-gcs/flink-savepoints/savepoint-d02217-ec0aced1fe9c"
}
}

So why does the job doesnt recognize this savepoint directory ? Also,
during this operation, i could see the Checkpoints directory for this job
gets deleted. Post which, no checkpoints are happening. any thoughts here
would really help us in progressing.

Thanks,

On Thu, Jan 30, 2020 at 8:45 PM Till Rohrmann  wrote:

> Hi Ramya,
>
> I think this message is better suited for the user ML list. Which version
> of Flink are you using? Have you checked the Flink logs to see whether they
> contain anything suspicious?
>
> Cheers,
> Till
>
> On Thu, Jan 30, 2020 at 1:09 PM Ramya Ramamurthy 
> wrote:
>
> > Hi,
> >
> > I am trying to dynamically increase the parallelism of the job. In the
> > process of it, while I am trying to trigger the savepoint, i get
> > the following error. Any help would be appreciated.
> >
> > The URL triggered is :
> >
> >
> http://stgflink.ssdev.in:8081/jobs/2865c1f40340bcb19a88a01d6ef8ff4f/savepoints/
> > {
> > "target-directory" :
> > "gs://-bucket/flink/flink-gcs/flink-savepoints",
> > "cancel-job" : "false"
> > }
> >
> > Error as below:
> >
> >
> >
> {"status":{"id":"COMPLETED"},"operation":{"failure-cause":{"class":"java.util.concurrent.CompletionException","stack-trace":"java.util.concurrent.CompletionException:
> > java.util.concurrent.CompletionException:
> > org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to
> > trigger savepoint. Decline reason: An Exception occurred while triggering
> > the checkpoint.\n\tat
> >
> >
> org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:972)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)\n\tat
> >
> >
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)\n\tat
> > akka.actor.Actor$class.aroundReceive(Actor.scala:502)\n\tat
> > akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)\n\tat
> > akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)\n\tat
> > akka.actor.ActorCell.invoke(ActorCell.scala:495)\n\tat
> > akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)\n\tat akka.dis
> >
>


[jira] [Created] (FLINK-15822) Rethink the necessity of the internal Service

2020-01-30 Thread Canbin Zheng (Jira)
Canbin Zheng created FLINK-15822:


 Summary: Rethink the necessity of the internal Service
 Key: FLINK-15822
 URL: https://issues.apache.org/jira/browse/FLINK-15822
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Canbin Zheng
 Fix For: 1.11.0, 1.10.1


The current design introduces two Kubernetes Services when deploying a new 
session cluster.  The rest Service serves external communication while the 
internal Service mainly serves two purposes:
 # A discovery service directs the communication from TaskManagers to the 
JobManager Pod that has labels containing the internal Service’s selector in 
the non-HA mode, so that the TM Pods could re-register to the new JM Pod once a 
JM Pod failover occurs, while in the HA mode, there could be one active and 
multiple standby JM Pods, so we use the Pod IP of the active one for internal 
communication instead of using the internal Service .
 # The OwnerReference of all other Kubernetes Resources, including the rest 
Service,  the ConfigMap and the JobManager Deployment.

Is it possible that we just create one single Service instead of two? I think 
things could work quite well with only the rest Service, meanwhile the design 
and code could be more succinct.

This ticket proposes to remove the internal Service, the core changes including
 # In the non-HA mode, we use the rest Service as the JobManager Pod discovery 
service.
 # Set the JobManager Deployment as the OwnerReference of all the other 
Kubernetes Resources, including the rest Service and the ConfigMap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15823) Sync download page translation

2020-01-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-15823:


 Summary: Sync download page translation
 Key: FLINK-15823
 URL: https://issues.apache.org/jira/browse/FLINK-15823
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Project Website
Reporter: Chesnay Schepler


Adjust the chinese version of `downloads.md` for the changes made in 
https://issues.apache.org/jira/browse/FLINK-15800



--
This message was sent by Atlassian Jira
(v8.3.4#803005)