Thanks Gyula! The current state of things is: - Stefan is working on a fix for https://issues.apache.org/jira/browse/FLINK-5663. - Till is working on https://issues.apache.org/jira/browse/FLINK-5667.
As far as I can tell, these will be fixed today and we are ready to go for RC3. I resolved the other issues I created. – Ufuk On 26 January 2017 at 22:16:26, Gyula Fóra (gyf...@apache.org) wrote: > Hi, > > Aside from the issues mentioned above I have some good news as well. > > I have finished porting and started testing one of our major production > jobs (RBea) on 1.2 and everything seems to run well so far, with > savepoints, rescaling, externalized checkpoints, metrics etc. on YARN. > > In this job I use, windowing, RocksDB state, iterations, timers, broadcast > states, repartitionable operator states etc. and everything seems to be > working extremely well under normal circumstances. > > So far I mostly ran sunny day tests but I will continue testing with larger > load and some failure scenarios. I will keep you posted. > > Great job! > Gyula > > > > Robert Metzger ezt írta (időpont: 2017. jan. 26., Cs, > 21:28): > > Damn. I really hoped that this RC goes through. > > I propose to keep the RC2 open until we've fixed all issues mentioned here > and to get some more testing feedback. > > > > On Thu, Jan 26, 2017 at 8:06 PM, Stephan Ewen wrote: > > > @Till - I think that FLINK-5667 is a blocker > > > > Good catch finding it! > > > > On Thu, Jan 26, 2017 at 7:51 PM, Till Rohrmann > > wrote: > > > > > I have found another problem: Under certain circumstances Flink can lose > > > state data by completing an invalid checkpoint. > > > https://issues.apache.org/jira/browse/FLINK-5667. > > > > > > Cheers, > > > Till > > > > > > On Thu, Jan 26, 2017 at 6:27 PM, Till Rohrmann > > > wrote: > > > > > > > Robert also found an issue that pending checkpoint files are not > > properly > > > > cleaned up: https://issues.apache.org/jira/browse/FLINK-5660. To my > > > > surprise, the issue was already fixed in 1.1.4 so I guess I've > > forgotten > > > to > > > > forward port the fix. There is a pending PR to fix it. The fix could > > also > > > > be part of a 1.2.1 release. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Thu, Jan 26, 2017 at 6:04 PM, Ufuk Celebi wrote: > > > > > > > >> I ran some tests and found the following issues: > > > >> > > > >> https://issues.apache.org/jira/browse/FLINK-5663: Checkpoint fails > > > >> because of closed registry > > > >> => This happened a couple of times for the first checkpoints after > > > >> submitting a job. If it happened on every submission I would > > > >> definitely make this a blocker, but I happen to run into it in like 3 > > > >> out of 10 job submission. What do we make of this? > > > >> > > > >> https://issues.apache.org/jira/browse/FLINK-5665: When the failures > > > >> happened, I also had some lingering 0-byte files. > > > >> > > > >> https://issues.apache.org/jira/browse/FLINK-5664: I also found the > > > >> logging of the RocksDB backend a little noisy (for my local setup at > > > >> least with many tasks per TM and low checkpointing interval.) > > > >> > > > >> All in all, I'm not sure if we want to make these a blocker or not. > > > >> I'm fine both ways with a follow up 1.2.1 release. > > > >> > > > >> === > > > >> > > > >> - Verified signatures and checksums > > > >> - Checked out the Java quickstarts and ran the jobs > > > >> - All poms point to 1.2.0 > > > >> - Migrated multiple jobs via savepoint from 1.1.4 to 1.2.0 with Kryo > > > >> types, session windows (w/o lateness), operator and keyed state for > > > >> all three backends > > > >> - Rescaled the same jobs from 1.2.0 savepoints with all three > backends > > > >> - Verified the "migration namespace serializer" fix > > > >> - Ran streaming state machine with Kafka source, RocksDB backend and > > > >> master and worker failures (standalone cluster) > > > >> > > > >> On Wed, Jan 25, 2017 at 9:14 PM, Robert Metzger > > > >> wrote: > > > >> > Dear Flink community, > > > >> > > > > >> > Please vote on releasing the following candidate as Apache Flink > > > version > > > >> > 1.2.0. > > > >> > > > > >> > The commit to be voted on: > > > >> > 8b5b6a8b (http://git-wip-us.apache.org/repos/asf/flink/commit/ > > > 8b5b6a8b) > > > >> > > > > >> > Branch: > > > >> > release-1.2.0-rc2 > > > >> > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin > > > >> > k.git;a=shortlog;h=refs/heads/release-1.2.0-rc2) > > > >> > > > > >> > The release artifacts to be voted on can be found at: > > > >> > *http://people.apache.org/~rmetzger/flink-1.2.0-rc2/ > > > >> > * > > > >> > > > > >> > The release artifacts are signed with the key with fingerprint > > > D9839159: > > > >> > http://www.apache.org/dist/flink/KEYS > > > >> > > > > >> > The staging repository for this release can be found at: > > > >> > *https://repository.apache.org/content/repositories/ > > > orgapacheflink-1113 > > > >> > > > > orgapacheflink-1113 > > > >> >* > > > >> > > > > >> > ------------------------------------------------------------- > > > >> > > > > >> > I would like to keep Friday as the target release time. Please let > > me > > > >> know > > > >> > if you want me to move the deadline to Monday if you need more time > > of > > > >> the > > > >> > testing. > > > >> > > > > >> > The vote ends on Friday, January 27, 2017, 6pm CET. > > > >> > > > > >> > Please test the release rather now than on Friday morning, to be > > able > > > to > > > >> > cancel it as early as possible. > > > >> > For making the testing easier, I've created this document to track > > > what > > > >> has > > > >> > already been tested and what needs to be tested: > > > https://docs.google.co > > > >> > m/document/d/1MX-8l9RrLly3UmZMODHBnuZUrK_n-DGIBLjFKyCrTAs/ > > > >> edit?usp=sharing > > > >> > Feel free to add more tests or change existing ones. > > > >> > > > > >> > [ ] +1 Release this package as Apache Flink 1.2.0 > > > >> > [ ] -1 Do not release this package, because ... > > > >> > > > > > > > > > > > > > >