Hi,

Aside from the issues mentioned above I have some good news as well.

I have finished porting and started testing one of our major production
jobs (RBea) on 1.2 and everything seems to run well so far, with
savepoints, rescaling, externalized checkpoints, metrics etc. on YARN.

In this job I use, windowing, RocksDB state, iterations, timers, broadcast
states, repartitionable operator states etc. and everything seems to be
working extremely well under normal circumstances.

So far I mostly ran sunny day tests but I will continue testing with larger
load and some failure scenarios. I will keep you posted.

Great job!
Gyula



Robert Metzger <rmetz...@apache.org> ezt írta (időpont: 2017. jan. 26., Cs,
21:28):

Damn. I really hoped that this RC goes through.

I propose to keep the RC2 open until we've fixed all issues mentioned here
and to get some more testing feedback.



On Thu, Jan 26, 2017 at 8:06 PM, Stephan Ewen <se...@apache.org> wrote:

> @Till - I think that FLINK-5667 is a blocker
>
> Good catch finding it!
>
> On Thu, Jan 26, 2017 at 7:51 PM, Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > I have found another problem: Under certain circumstances Flink can lose
> > state data by completing an invalid checkpoint.
> > https://issues.apache.org/jira/browse/FLINK-5667.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 26, 2017 at 6:27 PM, Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > Robert also found an issue that pending checkpoint files are not
> properly
> > > cleaned up: https://issues.apache.org/jira/browse/FLINK-5660. To my
> > > surprise, the issue was already fixed in 1.1.4 so I guess I've
> forgotten
> > to
> > > forward port the fix. There is a pending PR to fix it. The fix could
> also
> > > be part of a 1.2.1 release.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Jan 26, 2017 at 6:04 PM, Ufuk Celebi <u...@apache.org> wrote:
> > >
> > >> I ran some tests and found the following issues:
> > >>
> > >> https://issues.apache.org/jira/browse/FLINK-5663: Checkpoint fails
> > >> because of closed registry
> > >> => This happened a couple of times for the first checkpoints after
> > >> submitting a job. If it happened on every submission I would
> > >> definitely make this a blocker, but I happen to run into it in like 3
> > >> out of 10 job submission. What do we make of this?
> > >>
> > >> https://issues.apache.org/jira/browse/FLINK-5665: When the failures
> > >> happened, I also had some lingering 0-byte files.
> > >>
> > >> https://issues.apache.org/jira/browse/FLINK-5664: I also found the
> > >> logging of the RocksDB backend a little noisy (for my local setup at
> > >> least with many tasks per TM and low checkpointing interval.)
> > >>
> > >> All in all, I'm not sure if we want to make these a blocker or not.
> > >> I'm fine both ways with a follow up 1.2.1 release.
> > >>
> > >> ===
> > >>
> > >> - Verified signatures and checksums
> > >> - Checked out the Java quickstarts and ran the jobs
> > >> - All poms point to 1.2.0
> > >> - Migrated multiple jobs via savepoint from 1.1.4 to 1.2.0 with Kryo
> > >> types, session windows (w/o lateness), operator and keyed state for
> > >> all three backends
> > >> - Rescaled the same jobs from 1.2.0 savepoints with all three
backends
> > >> - Verified the "migration namespace serializer" fix
> > >> - Ran streaming state machine with Kafka source, RocksDB backend and
> > >> master and worker failures (standalone cluster)
> > >>
> > >> On Wed, Jan 25, 2017 at 9:14 PM, Robert Metzger <rmetz...@apache.org>
> > >> wrote:
> > >> > Dear Flink community,
> > >> >
> > >> > Please vote on releasing the following candidate as Apache Flink
> > version
> > >> > 1.2.0.
> > >> >
> > >> > The commit to be voted on:
> > >> > 8b5b6a8b (http://git-wip-us.apache.org/repos/asf/flink/commit/
> > 8b5b6a8b)
> > >> >
> > >> > Branch:
> > >> > release-1.2.0-rc2
> > >> > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin
> > >> > k.git;a=shortlog;h=refs/heads/release-1.2.0-rc2)
> > >> >
> > >> > The release artifacts to be voted on can be found at:
> > >> > *http://people.apache.org/~rmetzger/flink-1.2.0-rc2/
> > >> > <http://people.apache.org/~rmetzger/flink-1.2.0-rc2/>*
> > >> >
> > >> > The release artifacts are signed with the key with fingerprint
> > D9839159:
> > >> > http://www.apache.org/dist/flink/KEYS
> > >> >
> > >> > The staging repository for this release can be found at:
> > >> > *https://repository.apache.org/content/repositories/
> > orgapacheflink-1113
> > >> > <https://repository.apache.org/content/repositories/
> > orgapacheflink-1113
> > >> >*
> > >> >
> > >> > -------------------------------------------------------------
> > >> >
> > >> > I would like to keep Friday as the target release time. Please let
> me
> > >> know
> > >> > if you want me to move the deadline to Monday if you need more time
> of
> > >> the
> > >> > testing.
> > >> >
> > >> > The vote ends on Friday, January 27, 2017, 6pm CET.
> > >> >
> > >> > Please test the release rather now than on Friday morning, to be
> able
> > to
> > >> > cancel it as early as possible.
> > >> > For making the testing easier, I've created this document to track
> > what
> > >> has
> > >> > already been tested and what needs to be tested:
> > https://docs.google.co
> > >> > m/document/d/1MX-8l9RrLly3UmZMODHBnuZUrK_n-DGIBLjFKyCrTAs/
> > >> edit?usp=sharing
> > >> > Feel free to add more tests or change existing ones.
> > >> >
> > >> > [ ] +1 Release this package as Apache Flink 1.2.0
> > >> > [ ] -1 Do not release this package, because ...
> > >>
> > >
> > >
> >
>

Reply via email to