Hi all! I would like to start a discussion for the next bugfix release for 1.1.x and 1.2.x. There’s been quite a few critical fixes for bugs in both the releases recently, and I think they deserve a bugfix release soon. Most of the bugs were reported by users.
I’m starting the discussion for both bugfix releases because most fixes span both releases (almost identical). Of course, the actual RC votes and RC creation process doesn’t have to be started together. Here’s an overview of what’s been collected so far, for both bugfix releases - (it’s a list of what I’m aware of so far, and may be missing stuff; please append and bring to attention as necessary :-) ) For Flink 1.2.1: (1) https://issues.apache.org/jira/browse/FLINK-5701: Async exceptions in the FlinkKafkaProducer are not checked on checkpoints. This compromises the producer’s at-least-once guarantee. Status: merged (2) https://issues.apache.org/jira/browse/FLINK-5949: Do not check Kerberos credentials for non-Kerberos authentications. MapR users are affected by this, and cannot submit Flink on YARN jobs on a secured MapR cluster. Status: PR - https://github.com/apache/flink/pull/3528, one +1 already (3) https://issues.apache.org/jira/browse/FLINK-6006: Kafka Consumer can lose state if queried partition list is incomplete on restore. Status: PR - https://github.com/apache/flink/pull/3505, one +1 already (4) https://issues.apache.org/jira/browse/FLINK-6025: KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is used. Status: merged (5) https://issues.apache.org/jira/browse/FLINK-5771: Fix multi-char delimiters in Batch InputFormats. Status: merged (6) https://issues.apache.org/jira/browse/FLINK-5934: Set the Scheduler in the ExecutionGraph via its constructor. This fixes a bug that causes HA recovery to fail. Status: merged For Flink 1.1.5: (1) https://issues.apache.org/jira/browse/FLINK-5701: Async exceptions in the FlinkKafkaProducer are not checked on checkpoints. This compromises the producer’s at-least-once guarantee. Status: This is already merged for 1.2.1. I would personally like to backport the fix for this to 1.1.5 also. (2) https://issues.apache.org/jira/browse/FLINK-6006: Kafka Consumer can lose state if queried partition list is incomplete on restore. Status: PR - https://github.com/apache/flink/pull/3507, one +1 already (3) https://issues.apache.org/jira/browse/FLINK-6025: KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is used. Status: merged (4) https://issues.apache.org/jira/browse/FLINK-5771: Fix multi-char delimiters in Batch InputFormats. Status: merged (5) https://issues.apache.org/jira/browse/FLINK-5934: Set the Scheduler in the ExecutionGraph via its constructor. This fixes a bug that causes HA recovery to fail. Status: merged (6) https://issues.apache.org/jira/browse/FLINK-5048: Kafka Consumer (0.9/0.10) threading model leads problematic cancellation behavior. Status: This fix was already released in 1.2.0, but never made it into the 1.1.x bugfixes. Do we want to backport this also for 1.1.5? What do you think? From the list so far, we pretty much already have everything in, so I think it would be nice to aim for RCs by the end of this week. Since both bugfix releases cover almost the same list of issues, I think it shouldn’t be too hard for us to kick off both bugfix releases around the same time. Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5” as the Fix Versions, and are still open. We should probably want to check if there’s anything on there that we should block on for the releases: For 1.2.1: https://issues.apache.org/jira/browse/FLINK-5711?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1 For 1.1.5: https://issues.apache.org/jira/browse/FLINK-6006?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5