Re: [RESULT] [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-22 Thread Tzu-Li (Gordon) Tai
@Chesnay

No. Users will have to manually build and install PyFlink themselves in
1.9.0:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/flinkDev/building.html#build-pyflink

This is also mentioned in the announcement blog post (to-be-merged):
https://github.com/apache/flink-web/pull/244/files#diff-0cc840a590f5cab2485934278134c9baR291

On Thu, Aug 22, 2019 at 10:03 AM Chesnay Schepler 
wrote:

> Are we also releasing python artifacts for 1.9?
>
> On 21/08/2019 19:23, Tzu-Li (Gordon) Tai wrote:
> > I'm happy to announce that we have unanimously approved this candidate as
> > the 1.9.0 release.
> >
> > There are 12 approving votes, 5 of which are binding:
> > - Yu Li
> > - Zili Chen
> > - Gordon Tai
> > - Stephan Ewen
> > - Jark Wu
> > - Vino Yang
> > - Gary Yao
> > - Bowen Li
> > - Chesnay Schepler
> > - Till Rohrmann
> > - Aljoscha Krettek
> > - David Anderson
> >
> > There are no disapproving votes.
> >
> > Thanks everyone who has contributed to this release!
> >
> > I will wait until tomorrow morning for the artifacts to be available in
> > Maven central before announcing the release in a separate thread.
> >
> > The release blog post will also be merged tomorrow along with the
> official
> > announcement.
> >
> > Cheers,
> > Gordon
> >
> > On Wed, Aug 21, 2019, 5:37 PM David Anderson 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> I upgraded the flink-training-exercises project.
> >>
> >> I encountered a few rough edges, including problems in the docs, but
> >> nothing serious.
> >>
> >> I had to make some modifications to deal with changes in the Table API:
> >>
> >> ExternalCatalogTable.builder became new ExternalCatalogTableBuilder
> >> TableEnvironment.getTableEnvironment became
> StreamTableEnvironment.create
> >> StreamTableDescriptorValidator.UPDATE_MODE() became
> >> StreamTableDescriptorValidator.UPDATE_MODE
> >> org.apache.flink.table.api.java.Slide moved to
> >> org.apache.flink.table.api.Slide
> >>
> >> I also found myself forced to change a CoProcessFunction to a
> >> KeyedCoProcessFunction (which it should have been).
> >>
> >> I also tried a few complex queries in the SQL console, and wrote a
> >> simple job using the State Processor API. Everything worked.
> >>
> >> David
> >>
> >>
> >> David Anderson | Training Coordinator
> >>
> >> Follow us @VervericaData
> >>
> >> --
> >> Join Flink Forward - The Apache Flink Conference
> >> Stream Processing | Event Driven | Real Time
> >>
> >>
> >> On Wed, Aug 21, 2019 at 1:45 PM Aljoscha Krettek 
> >> wrote:
> >>> +1
> >>>
> >>> I checked the last RC on a GCE cluster and was satisfied with the
> >> testing. The cherry-picked commits didn’t change anything related, so
> I’m
> >> forwarding my vote from there.
> >>> Aljoscha
> >>>
>  On 21. Aug 2019, at 13:34, Chesnay Schepler 
> >> wrote:
>  +1 (binding)
> 
>  On 21/08/2019 08:09, Bowen Li wrote:
> > +1 non-binding
> >
> > - built from source with default profile
> > - manually ran SQL and Table API tests for Flink's metadata
> >> integration
> > with Hive Metastore in local cluster
> > - manually ran SQL tests for batch capability with Blink planner and
> >> Hive
> > integration (source/sink/udf) in local cluster
> >  - file formats include: csv, orc, parquet
> >
> >
> > On Tue, Aug 20, 2019 at 10:23 PM Gary Yao 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> Reran Jepsen tests 10 times.
> >>
> >> On Wed, Aug 21, 2019 at 5:35 AM vino yang 
> >> wrote:
> >>> +1 (non-binding)
> >>>
> >>> - checkout source code and build successfully
> >>> - started a local cluster and ran some example jobs successfully
> >>> - verified signatures and hashes
> >>> - checked release notes and post
> >>>
> >>> Best,
> >>> Vino
> >>>
> >>> Stephan Ewen  于2019年8月21日周三 上午4:20写道:
> >>>
>  +1 (binding)
> 
>    - Downloaded the binary release tarball
>    - started a standalone cluster with four nodes
>    - ran some examples through the Web UI
>    - checked the logs
>    - created a project from the Java quickstarts maven archetype
>    - ran a multi-stage DataSet job in batch mode
>    - killed as TaskManager and verified correct restart behavior,
> >> including
>  failover region backtracking
> 
> 
>  I found a few issues, and a common theme here is confusing error
> >>> reporting
>  and logging.
> 
>  (1) When testing batch failover and killing a TaskManager, the job
> >>> reports
>  as the failure cause "org.apache.flink.util.FlinkException: The
> >> assigned
>  slot 6d0e469d55a2630871f43ad0f89c786c_0 was removed."
>   I think that is a pretty bad error message, as a user I don't
> >> know
> >>> what
>  that means. Some internal book keeping thing?
>   You need to know a lot about 

Re: [RESULT] [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-22 Thread Chesnay Schepler

Are we also releasing python artifacts for 1.9?

On 21/08/2019 19:23, Tzu-Li (Gordon) Tai wrote:

I'm happy to announce that we have unanimously approved this candidate as
the 1.9.0 release.

There are 12 approving votes, 5 of which are binding:
- Yu Li
- Zili Chen
- Gordon Tai
- Stephan Ewen
- Jark Wu
- Vino Yang
- Gary Yao
- Bowen Li
- Chesnay Schepler
- Till Rohrmann
- Aljoscha Krettek
- David Anderson

There are no disapproving votes.

Thanks everyone who has contributed to this release!

I will wait until tomorrow morning for the artifacts to be available in
Maven central before announcing the release in a separate thread.

The release blog post will also be merged tomorrow along with the official
announcement.

Cheers,
Gordon

On Wed, Aug 21, 2019, 5:37 PM David Anderson  wrote:


+1 (non-binding)

I upgraded the flink-training-exercises project.

I encountered a few rough edges, including problems in the docs, but
nothing serious.

I had to make some modifications to deal with changes in the Table API:

ExternalCatalogTable.builder became new ExternalCatalogTableBuilder
TableEnvironment.getTableEnvironment became StreamTableEnvironment.create
StreamTableDescriptorValidator.UPDATE_MODE() became
StreamTableDescriptorValidator.UPDATE_MODE
org.apache.flink.table.api.java.Slide moved to
org.apache.flink.table.api.Slide

I also found myself forced to change a CoProcessFunction to a
KeyedCoProcessFunction (which it should have been).

I also tried a few complex queries in the SQL console, and wrote a
simple job using the State Processor API. Everything worked.

David


David Anderson | Training Coordinator

Follow us @VervericaData

--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time


On Wed, Aug 21, 2019 at 1:45 PM Aljoscha Krettek 
wrote:

+1

I checked the last RC on a GCE cluster and was satisfied with the

testing. The cherry-picked commits didn’t change anything related, so I’m
forwarding my vote from there.

Aljoscha


On 21. Aug 2019, at 13:34, Chesnay Schepler 

wrote:

+1 (binding)

On 21/08/2019 08:09, Bowen Li wrote:

+1 non-binding

- built from source with default profile
- manually ran SQL and Table API tests for Flink's metadata

integration

with Hive Metastore in local cluster
- manually ran SQL tests for batch capability with Blink planner and

Hive

integration (source/sink/udf) in local cluster
 - file formats include: csv, orc, parquet


On Tue, Aug 20, 2019 at 10:23 PM Gary Yao  wrote:


+1 (non-binding)

Reran Jepsen tests 10 times.

On Wed, Aug 21, 2019 at 5:35 AM vino yang 

wrote:

+1 (non-binding)

- checkout source code and build successfully
- started a local cluster and ran some example jobs successfully
- verified signatures and hashes
- checked release notes and post

Best,
Vino

Stephan Ewen  于2019年8月21日周三 上午4:20写道:


+1 (binding)

  - Downloaded the binary release tarball
  - started a standalone cluster with four nodes
  - ran some examples through the Web UI
  - checked the logs
  - created a project from the Java quickstarts maven archetype
  - ran a multi-stage DataSet job in batch mode
  - killed as TaskManager and verified correct restart behavior,

including

failover region backtracking


I found a few issues, and a common theme here is confusing error

reporting

and logging.

(1) When testing batch failover and killing a TaskManager, the job

reports

as the failure cause "org.apache.flink.util.FlinkException: The

assigned

slot 6d0e469d55a2630871f43ad0f89c786c_0 was removed."
 I think that is a pretty bad error message, as a user I don't

know

what

that means. Some internal book keeping thing?
 You need to know a lot about Flink to understand that this

means

"TaskManager failure".
 https://issues.apache.org/jira/browse/FLINK-13805
 I would not block the release on this, but think this should

get

pretty

urgent attention.

(2) The Metric Fetcher floods the log with error messages when a
TaskManager is lost.
  There are many exceptions being logged by the Metrics Fetcher

due

to

not reaching the TM any more.
  This pollutes the log and drowns out the original exception

and

the

meaningful logs from the scheduler/execution graph.
  https://issues.apache.org/jira/browse/FLINK-13806
  Again, I would not block the release on this, but think this

should

get pretty urgent attention.

(3) If you put "web.submit.enable: false" into the configuration,

the

web

UI will still display the "SubmitJob" page, but errors will
 continuously pop up, stating "Unable to load requested file

/jars."

 https://issues.apache.org/jira/browse/FLINK-13799

(4) REST endpoint logs ERROR level messages when selecting the
"Checkpoints" tab for batch jobs. That does not seem correct.
  https://issues.apache.org/jira/browse/FLINK-13795

Best,
Stephan




On Tue, Aug 20, 2019 at 11:32 AM Tzu-Li (Gordon) Tai <

tzuli...@apache.org>

wrote:


+1

Legal checks:
- verified signatures and 

Re: [RESULT] [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-21 Thread Shaoxuan Wang
Congratulations and thanks all for the great efforts on release1.9.

I have verified the RC#3 with the following items:

- Verified signatures and hashes. (OK)
- Built from source archive. (OK)
- Repository contains all artifacts. (OK)
- Test WordCount on local cluster. (OK)
a. Both streaming and batch
b. Web ui works fine
- Test WordCount on yarn cluster. (OK)
a. Both streaming and batch
b. Web ui works fine
c. Test session mode and non-session mode

So +1 (binding) from my side.

Regards,
Shaoxuan


On Thu, Aug 22, 2019 at 1:23 AM Tzu-Li (Gordon) Tai 
wrote:

> I'm happy to announce that we have unanimously approved this candidate as
> the 1.9.0 release.
>
> There are 12 approving votes, 5 of which are binding:
> - Yu Li
> - Zili Chen
> - Gordon Tai
> - Stephan Ewen
> - Jark Wu
> - Vino Yang
> - Gary Yao
> - Bowen Li
> - Chesnay Schepler
> - Till Rohrmann
> - Aljoscha Krettek
> - David Anderson
>
> There are no disapproving votes.
>
> Thanks everyone who has contributed to this release!
>
> I will wait until tomorrow morning for the artifacts to be available in
> Maven central before announcing the release in a separate thread.
>
> The release blog post will also be merged tomorrow along with the official
> announcement.
>
> Cheers,
> Gordon
>
> On Wed, Aug 21, 2019, 5:37 PM David Anderson  wrote:
>
> > +1 (non-binding)
> >
> > I upgraded the flink-training-exercises project.
> >
> > I encountered a few rough edges, including problems in the docs, but
> > nothing serious.
> >
> > I had to make some modifications to deal with changes in the Table API:
> >
> > ExternalCatalogTable.builder became new ExternalCatalogTableBuilder
> > TableEnvironment.getTableEnvironment became StreamTableEnvironment.create
> > StreamTableDescriptorValidator.UPDATE_MODE() became
> > StreamTableDescriptorValidator.UPDATE_MODE
> > org.apache.flink.table.api.java.Slide moved to
> > org.apache.flink.table.api.Slide
> >
> > I also found myself forced to change a CoProcessFunction to a
> > KeyedCoProcessFunction (which it should have been).
> >
> > I also tried a few complex queries in the SQL console, and wrote a
> > simple job using the State Processor API. Everything worked.
> >
> > David
> >
> >
> > David Anderson | Training Coordinator
> >
> > Follow us @VervericaData
> >
> > --
> > Join Flink Forward - The Apache Flink Conference
> > Stream Processing | Event Driven | Real Time
> >
> >
> > On Wed, Aug 21, 2019 at 1:45 PM Aljoscha Krettek 
> > wrote:
> > >
> > > +1
> > >
> > > I checked the last RC on a GCE cluster and was satisfied with the
> > testing. The cherry-picked commits didn’t change anything related, so I’m
> > forwarding my vote from there.
> > >
> > > Aljoscha
> > >
> > > > On 21. Aug 2019, at 13:34, Chesnay Schepler 
> > wrote:
> > > >
> > > > +1 (binding)
> > > >
> > > > On 21/08/2019 08:09, Bowen Li wrote:
> > > >> +1 non-binding
> > > >>
> > > >> - built from source with default profile
> > > >> - manually ran SQL and Table API tests for Flink's metadata
> > integration
> > > >> with Hive Metastore in local cluster
> > > >> - manually ran SQL tests for batch capability with Blink planner and
> > Hive
> > > >> integration (source/sink/udf) in local cluster
> > > >> - file formats include: csv, orc, parquet
> > > >>
> > > >>
> > > >> On Tue, Aug 20, 2019 at 10:23 PM Gary Yao 
> wrote:
> > > >>
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> Reran Jepsen tests 10 times.
> > > >>>
> > > >>> On Wed, Aug 21, 2019 at 5:35 AM vino yang 
> > wrote:
> > > >>>
> > >  +1 (non-binding)
> > > 
> > >  - checkout source code and build successfully
> > >  - started a local cluster and ran some example jobs successfully
> > >  - verified signatures and hashes
> > >  - checked release notes and post
> > > 
> > >  Best,
> > >  Vino
> > > 
> > >  Stephan Ewen  于2019年8月21日周三 上午4:20写道:
> > > 
> > > > +1 (binding)
> > > >
> > > >  - Downloaded the binary release tarball
> > > >  - started a standalone cluster with four nodes
> > > >  - ran some examples through the Web UI
> > > >  - checked the logs
> > > >  - created a project from the Java quickstarts maven archetype
> > > >  - ran a multi-stage DataSet job in batch mode
> > > >  - killed as TaskManager and verified correct restart behavior,
> > > >>> including
> > > > failover region backtracking
> > > >
> > > >
> > > > I found a few issues, and a common theme here is confusing error
> > >  reporting
> > > > and logging.
> > > >
> > > > (1) When testing batch failover and killing a TaskManager, the
> job
> > >  reports
> > > > as the failure cause "org.apache.flink.util.FlinkException: The
> > > >>> assigned
> > > > slot 6d0e469d55a2630871f43ad0f89c786c_0 was removed."
> > > > I think that is a pretty bad error message, as a user I don't
> > know
> > >  what
> > > > that means. Some internal book keeping thing?
> > > 

[RESULT] [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-21 Thread Tzu-Li (Gordon) Tai
I'm happy to announce that we have unanimously approved this candidate as
the 1.9.0 release.

There are 12 approving votes, 5 of which are binding:
- Yu Li
- Zili Chen
- Gordon Tai
- Stephan Ewen
- Jark Wu
- Vino Yang
- Gary Yao
- Bowen Li
- Chesnay Schepler
- Till Rohrmann
- Aljoscha Krettek
- David Anderson

There are no disapproving votes.

Thanks everyone who has contributed to this release!

I will wait until tomorrow morning for the artifacts to be available in
Maven central before announcing the release in a separate thread.

The release blog post will also be merged tomorrow along with the official
announcement.

Cheers,
Gordon

On Wed, Aug 21, 2019, 5:37 PM David Anderson  wrote:

> +1 (non-binding)
>
> I upgraded the flink-training-exercises project.
>
> I encountered a few rough edges, including problems in the docs, but
> nothing serious.
>
> I had to make some modifications to deal with changes in the Table API:
>
> ExternalCatalogTable.builder became new ExternalCatalogTableBuilder
> TableEnvironment.getTableEnvironment became StreamTableEnvironment.create
> StreamTableDescriptorValidator.UPDATE_MODE() became
> StreamTableDescriptorValidator.UPDATE_MODE
> org.apache.flink.table.api.java.Slide moved to
> org.apache.flink.table.api.Slide
>
> I also found myself forced to change a CoProcessFunction to a
> KeyedCoProcessFunction (which it should have been).
>
> I also tried a few complex queries in the SQL console, and wrote a
> simple job using the State Processor API. Everything worked.
>
> David
>
>
> David Anderson | Training Coordinator
>
> Follow us @VervericaData
>
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
>
>
> On Wed, Aug 21, 2019 at 1:45 PM Aljoscha Krettek 
> wrote:
> >
> > +1
> >
> > I checked the last RC on a GCE cluster and was satisfied with the
> testing. The cherry-picked commits didn’t change anything related, so I’m
> forwarding my vote from there.
> >
> > Aljoscha
> >
> > > On 21. Aug 2019, at 13:34, Chesnay Schepler 
> wrote:
> > >
> > > +1 (binding)
> > >
> > > On 21/08/2019 08:09, Bowen Li wrote:
> > >> +1 non-binding
> > >>
> > >> - built from source with default profile
> > >> - manually ran SQL and Table API tests for Flink's metadata
> integration
> > >> with Hive Metastore in local cluster
> > >> - manually ran SQL tests for batch capability with Blink planner and
> Hive
> > >> integration (source/sink/udf) in local cluster
> > >> - file formats include: csv, orc, parquet
> > >>
> > >>
> > >> On Tue, Aug 20, 2019 at 10:23 PM Gary Yao  wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> Reran Jepsen tests 10 times.
> > >>>
> > >>> On Wed, Aug 21, 2019 at 5:35 AM vino yang 
> wrote:
> > >>>
> >  +1 (non-binding)
> > 
> >  - checkout source code and build successfully
> >  - started a local cluster and ran some example jobs successfully
> >  - verified signatures and hashes
> >  - checked release notes and post
> > 
> >  Best,
> >  Vino
> > 
> >  Stephan Ewen  于2019年8月21日周三 上午4:20写道:
> > 
> > > +1 (binding)
> > >
> > >  - Downloaded the binary release tarball
> > >  - started a standalone cluster with four nodes
> > >  - ran some examples through the Web UI
> > >  - checked the logs
> > >  - created a project from the Java quickstarts maven archetype
> > >  - ran a multi-stage DataSet job in batch mode
> > >  - killed as TaskManager and verified correct restart behavior,
> > >>> including
> > > failover region backtracking
> > >
> > >
> > > I found a few issues, and a common theme here is confusing error
> >  reporting
> > > and logging.
> > >
> > > (1) When testing batch failover and killing a TaskManager, the job
> >  reports
> > > as the failure cause "org.apache.flink.util.FlinkException: The
> > >>> assigned
> > > slot 6d0e469d55a2630871f43ad0f89c786c_0 was removed."
> > > I think that is a pretty bad error message, as a user I don't
> know
> >  what
> > > that means. Some internal book keeping thing?
> > > You need to know a lot about Flink to understand that this
> means
> > > "TaskManager failure".
> > > https://issues.apache.org/jira/browse/FLINK-13805
> > > I would not block the release on this, but think this should
> get
> >  pretty
> > > urgent attention.
> > >
> > > (2) The Metric Fetcher floods the log with error messages when a
> > > TaskManager is lost.
> > >  There are many exceptions being logged by the Metrics Fetcher
> due
> > >>> to
> > > not reaching the TM any more.
> > >  This pollutes the log and drowns out the original exception
> and
> > >>> the
> > > meaningful logs from the scheduler/execution graph.
> > >  https://issues.apache.org/jira/browse/FLINK-13806
> > >  Again, I would not block the release on this, but think this
> > >>> should
> > >