I think this issue that Ufuk opened is also a blocker:
https://issues.apache.org/jira/browse/FLINK-5670

As I comment in the Issue, at least one bigger user of Flink has run into
this problem on their cluster.

On Fri, 27 Jan 2017 at 10:50 Ufuk Celebi <u...@apache.org> wrote:

> Thanks Gyula!
>
> The current state of things is:
> - Stefan is working on a fix for
> https://issues.apache.org/jira/browse/FLINK-5663.
> - Till is working on https://issues.apache.org/jira/browse/FLINK-5667.
>
> As far as I can tell, these will be fixed today and we are ready to go for
> RC3.
>
> I resolved the other issues I created.
>
> – Ufuk
>
> On 26 January 2017 at 22:16:26, Gyula Fóra (gyf...@apache.org) wrote:
> > Hi,
> >
> > Aside from the issues mentioned above I have some good news as well.
> >
> > I have finished porting and started testing one of our major production
> > jobs (RBea) on 1.2 and everything seems to run well so far, with
> > savepoints, rescaling, externalized checkpoints, metrics etc. on YARN.
> >
> > In this job I use, windowing, RocksDB state, iterations, timers,
> broadcast
> > states, repartitionable operator states etc. and everything seems to be
> > working extremely well under normal circumstances.
> >
> > So far I mostly ran sunny day tests but I will continue testing with
> larger
> > load and some failure scenarios. I will keep you posted.
> >
> > Great job!
> > Gyula
> >
> >
> >
> > Robert Metzger ezt írta (időpont: 2017. jan. 26., Cs,
> > 21:28):
> >
> > Damn. I really hoped that this RC goes through.
> >
> > I propose to keep the RC2 open until we've fixed all issues mentioned
> here
> > and to get some more testing feedback.
> >
> >
> >
> > On Thu, Jan 26, 2017 at 8:06 PM, Stephan Ewen wrote:
> >
> > > @Till - I think that FLINK-5667 is a blocker
> > >
> > > Good catch finding it!
> > >
> > > On Thu, Jan 26, 2017 at 7:51 PM, Till Rohrmann
> > > wrote:
> > >
> > > > I have found another problem: Under certain circumstances Flink can
> lose
> > > > state data by completing an invalid checkpoint.
> > > > https://issues.apache.org/jira/browse/FLINK-5667.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Jan 26, 2017 at 6:27 PM, Till Rohrmann
> > > > wrote:
> > > >
> > > > > Robert also found an issue that pending checkpoint files are not
> > > properly
> > > > > cleaned up: https://issues.apache.org/jira/browse/FLINK-5660. To
> my
> > > > > surprise, the issue was already fixed in 1.1.4 so I guess I've
> > > forgotten
> > > > to
> > > > > forward port the fix. There is a pending PR to fix it. The fix
> could
> > > also
> > > > > be part of a 1.2.1 release.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Jan 26, 2017 at 6:04 PM, Ufuk Celebi wrote:
> > > > >
> > > > >> I ran some tests and found the following issues:
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/FLINK-5663: Checkpoint
> fails
> > > > >> because of closed registry
> > > > >> => This happened a couple of times for the first checkpoints after
> > > > >> submitting a job. If it happened on every submission I would
> > > > >> definitely make this a blocker, but I happen to run into it in
> like 3
> > > > >> out of 10 job submission. What do we make of this?
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/FLINK-5665: When the
> failures
> > > > >> happened, I also had some lingering 0-byte files.
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/FLINK-5664: I also found
> the
> > > > >> logging of the RocksDB backend a little noisy (for my local setup
> at
> > > > >> least with many tasks per TM and low checkpointing interval.)
> > > > >>
> > > > >> All in all, I'm not sure if we want to make these a blocker or
> not.
> > > > >> I'm fine both ways with a follow up 1.2.1 release.
> > > > >>
> > > > >> ===
> > > > >>
> > > > >> - Verified signatures and checksums
> > > > >> - Checked out the Java quickstarts and ran the jobs
> > > > >> - All poms point to 1.2.0
> > > > >> - Migrated multiple jobs via savepoint from 1.1.4 to 1.2.0 with
> Kryo
> > > > >> types, session windows (w/o lateness), operator and keyed state
> for
> > > > >> all three backends
> > > > >> - Rescaled the same jobs from 1.2.0 savepoints with all three
> > backends
> > > > >> - Verified the "migration namespace serializer" fix
> > > > >> - Ran streaming state machine with Kafka source, RocksDB backend
> and
> > > > >> master and worker failures (standalone cluster)
> > > > >>
> > > > >> On Wed, Jan 25, 2017 at 9:14 PM, Robert Metzger
> > > > >> wrote:
> > > > >> > Dear Flink community,
> > > > >> >
> > > > >> > Please vote on releasing the following candidate as Apache Flink
> > > > version
> > > > >> > 1.2.0.
> > > > >> >
> > > > >> > The commit to be voted on:
> > > > >> > 8b5b6a8b (http://git-wip-us.apache.org/repos/asf/flink/commit/
> > > > 8b5b6a8b)
> > > > >> >
> > > > >> > Branch:
> > > > >> > release-1.2.0-rc2
> > > > >> > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin
> > > > >> > k.git;a=shortlog;h=refs/heads/release-1.2.0-rc2)
> > > > >> >
> > > > >> > The release artifacts to be voted on can be found at:
> > > > >> > *http://people.apache.org/~rmetzger/flink-1.2.0-rc2/
> > > > >> > *
> > > > >> >
> > > > >> > The release artifacts are signed with the key with fingerprint
> > > > D9839159:
> > > > >> > http://www.apache.org/dist/flink/KEYS
> > > > >> >
> > > > >> > The staging repository for this release can be found at:
> > > > >> > *https://repository.apache.org/content/repositories/
> > > > orgapacheflink-1113
> > > > >> > > > > orgapacheflink-1113
> > > > >> >*
> > > > >> >
> > > > >> > -------------------------------------------------------------
> > > > >> >
> > > > >> > I would like to keep Friday as the target release time. Please
> let
> > > me
> > > > >> know
> > > > >> > if you want me to move the deadline to Monday if you need more
> time
> > > of
> > > > >> the
> > > > >> > testing.
> > > > >> >
> > > > >> > The vote ends on Friday, January 27, 2017, 6pm CET.
> > > > >> >
> > > > >> > Please test the release rather now than on Friday morning, to be
> > > able
> > > > to
> > > > >> > cancel it as early as possible.
> > > > >> > For making the testing easier, I've created this document to
> track
> > > > what
> > > > >> has
> > > > >> > already been tested and what needs to be tested:
> > > > https://docs.google.co
> > > > >> > m/document/d/1MX-8l9RrLly3UmZMODHBnuZUrK_n-DGIBLjFKyCrTAs/
> > > > >> edit?usp=sharing
> > > > >> > Feel free to add more tests or change existing ones.
> > > > >> >
> > > > >> > [ ] +1 Release this package as Apache Flink 1.2.0
> > > > >> > [ ] -1 Do not release this package, because ...
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>
>

Reply via email to