Thanks for kicking off the discussion Tzu-Li. I'd like to add the following
issues which have already been merged into the 1.2-release and 1.1-release
branch:

1.2.1:

(7) https://issues.apache.org/jira/browse/FLINK-5942
Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
Corrupted checkpoints will now be skipped.
Status: Merged

(8) https://issues.apache.org/jira/browse/FLINK-5940
Hardens the checkpoint recovery in case that we cannot retrieve the
completed checkpoint from the meta data state handle retrieved from
ZooKeeper. This can, for example, happen if the meta data is deleted.
Checkpoints with unretrievable state handles are skipped.
Status: Merged

1.1.5:


(7) https://issues.apache.org/jira/browse/FLINK-5942
Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
Corrupted checkpoints will now be skipped.
Status: Merged

(8) https://issues.apache.org/jira/browse/FLINK-5940
Hardens the checkpoint recovery in case that we cannot retrieve the
completed checkpoint from the meta data state handle retrieved from
ZooKeeper. This can, for example, happen if the meta data is deleted.
Checkpoints with unretrievable state handles are skipped.
Status: Merged

Cheers,
Till

On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Hi all!
>
> I would like to start a discussion for the next bugfix release for 1.1.x
> and 1.2.x.
> There’s been quite a few critical fixes for bugs in both the releases
> recently, and I think they deserve a bugfix release soon.
> Most of the bugs were reported by users.
>
> I’m starting the discussion for both bugfix releases because most fixes
> span both releases (almost identical).
> Of course, the actual RC votes and RC creation process doesn’t have to be
> started together.
>
> Here’s an overview of what’s been collected so far, for both bugfix
> releases -
> (it’s a list of what I’m aware of so far, and may be missing stuff; please
> append and bring to attention as necessary :-) )
>
>
> For Flink 1.2.1:
>
> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
> This compromises the producer’s at-least-once guarantee.
> Status: merged
>
> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> Do not check Kerberos credentials for non-Kerberos authentications. MapR
> users are affected by this, and cannot submit Flink on YARN jobs on a
> secured MapR cluster.
> Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
>
> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> Kafka Consumer can lose state if queried partition list is incomplete on
> restore.
> Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
>
> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
> used.
> Status: merged
>
> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> Fix multi-char delimiters in Batch InputFormats.
> Status: merged
>
> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
> bug that causes HA recovery to fail.
> Status: merged
>
>
>
> For Flink 1.1.5:
>
> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
> This compromises the producer’s at-least-once guarantee.
> Status: This is already merged for 1.2.1. I would personally like to
> backport the fix for this to 1.1.5 also.
>
> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> Kafka Consumer can lose state if queried partition list is incomplete on
> restore.
> Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
>
> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
> used.
> Status: merged
>
> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> Fix multi-char delimiters in Batch InputFormats.
> Status: merged
>
> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
> bug that causes HA recovery to fail.
> Status: merged
>
> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> Kafka Consumer (0.9/0.10) threading model leads problematic cancellation
> behavior.
> Status: This fix was already released in 1.2.0, but never made it into the
> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>
>
> What do you think? From the list so far, we pretty much already have
> everything in, so I think it would be nice to aim for RCs by the end of
> this week.
> Since both bugfix releases cover almost the same list of issues, I think
> it shouldn’t be too hard for us to kick off both bugfix releases around the
> same time.
>
> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5”
> as the Fix Versions, and are still open.
> We should probably want to check if there’s anything on there that we
> should block on for the releases:
>
> For 1.2.1:
> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
>
> For 1.1.5:
> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5

Reply via email to