Thanks for kicking off the discussion! I have an open PR for - https://issues.apache.org/jira/browse/FLINK-5808: Missing verification for setParallelism and setMaxParallelism - https://issues.apache.org/jira/browse/FLINK-5713: Protect against NPE in WindowOperator window cleanup
On Tue, Mar 14, 2017, at 15:15, Till Rohrmann wrote: > Thanks for kicking off the discussion Tzu-Li. I'd like to add the > following > issues which have already been merged into the 1.2-release and > 1.1-release > branch: > > 1.2.1: > > (7) https://issues.apache.org/jira/browse/FLINK-5942 > Hardens the checkpoint recovery in case of corrupted ZooKeeper data. > Corrupted checkpoints will now be skipped. > Status: Merged > > (8) https://issues.apache.org/jira/browse/FLINK-5940 > Hardens the checkpoint recovery in case that we cannot retrieve the > completed checkpoint from the meta data state handle retrieved from > ZooKeeper. This can, for example, happen if the meta data is deleted. > Checkpoints with unretrievable state handles are skipped. > Status: Merged > > 1.1.5: > > > (7) https://issues.apache.org/jira/browse/FLINK-5942 > Hardens the checkpoint recovery in case of corrupted ZooKeeper data. > Corrupted checkpoints will now be skipped. > Status: Merged > > (8) https://issues.apache.org/jira/browse/FLINK-5940 > Hardens the checkpoint recovery in case that we cannot retrieve the > completed checkpoint from the meta data state handle retrieved from > ZooKeeper. This can, for example, happen if the meta data is deleted. > Checkpoints with unretrievable state handles are skipped. > Status: Merged > > Cheers, > Till > > On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai > <tzuli...@apache.org> > wrote: > > > Hi all! > > > > I would like to start a discussion for the next bugfix release for 1.1.x > > and 1.2.x. > > There’s been quite a few critical fixes for bugs in both the releases > > recently, and I think they deserve a bugfix release soon. > > Most of the bugs were reported by users. > > > > I’m starting the discussion for both bugfix releases because most fixes > > span both releases (almost identical). > > Of course, the actual RC votes and RC creation process doesn’t have to be > > started together. > > > > Here’s an overview of what’s been collected so far, for both bugfix > > releases - > > (it’s a list of what I’m aware of so far, and may be missing stuff; please > > append and bring to attention as necessary :-) ) > > > > > > For Flink 1.2.1: > > > > (1) https://issues.apache.org/jira/browse/FLINK-5701: > > Async exceptions in the FlinkKafkaProducer are not checked on checkpoints. > > This compromises the producer’s at-least-once guarantee. > > Status: merged > > > > (2) https://issues.apache.org/jira/browse/FLINK-5949: > > Do not check Kerberos credentials for non-Kerberos authentications. MapR > > users are affected by this, and cannot submit Flink on YARN jobs on a > > secured MapR cluster. > > Status: PR - https://github.com/apache/flink/pull/3528, one +1 already > > > > (3) https://issues.apache.org/jira/browse/FLINK-6006: > > Kafka Consumer can lose state if queried partition list is incomplete on > > restore. > > Status: PR - https://github.com/apache/flink/pull/3505, one +1 already > > > > (4) https://issues.apache.org/jira/browse/FLINK-6025: > > KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is > > used. > > Status: merged > > > > (5) https://issues.apache.org/jira/browse/FLINK-5771: > > Fix multi-char delimiters in Batch InputFormats. > > Status: merged > > > > (6) https://issues.apache.org/jira/browse/FLINK-5934: > > Set the Scheduler in the ExecutionGraph via its constructor. This fixes a > > bug that causes HA recovery to fail. > > Status: merged > > > > > > > > For Flink 1.1.5: > > > > (1) https://issues.apache.org/jira/browse/FLINK-5701: > > Async exceptions in the FlinkKafkaProducer are not checked on checkpoints. > > This compromises the producer’s at-least-once guarantee. > > Status: This is already merged for 1.2.1. I would personally like to > > backport the fix for this to 1.1.5 also. > > > > (2) https://issues.apache.org/jira/browse/FLINK-6006: > > Kafka Consumer can lose state if queried partition list is incomplete on > > restore. > > Status: PR - https://github.com/apache/flink/pull/3507, one +1 already > > > > (3) https://issues.apache.org/jira/browse/FLINK-6025: > > KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is > > used. > > Status: merged > > > > (4) https://issues.apache.org/jira/browse/FLINK-5771: > > Fix multi-char delimiters in Batch InputFormats. > > Status: merged > > > > (5) https://issues.apache.org/jira/browse/FLINK-5934: > > Set the Scheduler in the ExecutionGraph via its constructor. This fixes a > > bug that causes HA recovery to fail. > > Status: merged > > > > (6) https://issues.apache.org/jira/browse/FLINK-5048: > > Kafka Consumer (0.9/0.10) threading model leads problematic cancellation > > behavior. > > Status: This fix was already released in 1.2.0, but never made it into the > > 1.1.x bugfixes. Do we want to backport this also for 1.1.5? > > > > > > What do you think? From the list so far, we pretty much already have > > everything in, so I think it would be nice to aim for RCs by the end of > > this week. > > Since both bugfix releases cover almost the same list of issues, I think > > it shouldn’t be too hard for us to kick off both bugfix releases around the > > same time. > > > > Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5” > > as the Fix Versions, and are still open. > > We should probably want to check if there’s anything on there that we > > should block on for the releases: > > > > For 1.2.1: > > https://issues.apache.org/jira/browse/FLINK-5711?jql= > > project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% > > 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1 > > > > For 1.1.5: > > https://issues.apache.org/jira/browse/FLINK-6006?jql= > > project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% > > 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5