Thanks for the update! Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled cancel-task from timer queue to prevent memory leaks
The remaining issue list looks good, but I would say that (5) is optional. It is not a critical production bug. On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Thanks a lot for the updates so far everyone! > > From the discussion so far, the below is the still unfixed pending issues > for 1.1.5 / 1.2.1 release. > > Since there’s only one backport for 1.1.5 left, I think having an RC for > 1.1.5 near the end of this week / early next week is very promising, as > basically everything is already in. > I’d be happy to volunteer to help manage the release for 1.1.5, and > prepare the RC when it’s ready :) > > For 1.2.1, we can leave the pending list here for tracking, and come back > to update it in the near future. > > If there’s anything I missed, please let me know! > > > =========== Still pending for Flink 1.1.5 =========== > > (1) https://issues.apache.org/jira/browse/FLINK-5701 > Broken at-least-once Kafka producer. > Status: backport PR pending - https://github.com/apache/flink/pull/3549. > Since it is a relatively self-contained change, I expect this to be a fast > fix. > > > > =========== Still pending for Flink 1.2.1 =========== > > (1) https://issues.apache.org/jira/browse/FLINK-5808 > Fix Missing verification for setParallelism and setMaxParallelism > Status: PR - https://github.com/apache/flink/pull/3509, review in progress > > (2) https://issues.apache.org/jira/browse/FLINK-5713 > Protect against NPE in WindowOperator window cleanup > Status: PR - https://github.com/apache/flink/pull/3535, review pending > > (3) https://issues.apache.org/jira/browse/FLINK-6044 > TypeSerializerSerializationProxy.read() doesn't verify the read buffer > length > Status: Fixed for master, 1.2 backport pending > > (4) https://issues.apache.org/jira/browse/FLINK-5985 > Flink treats every task as stateful (making topology changes impossible) > Status: PR - https://github.com/apache/flink/pull/3543, review in progress > > (5) https://issues.apache.org/jira/browse/FLINK-5650 > Flink-python tests taking up too much time > Status: I think Chesnay currently has some progress with this one, we can > see if we want to make this a blocker > > > Cheers, > Gordon > > On March 15, 2017 at 7:16:53 PM, Jinkui Shi (shijinkui...@163.com) wrote: > > Can we fix this issue in the 1.2.1: > > Flink-python tests cost too long time > https://issues.apache.org/jira/browse/FLINK-5650 < > https://issues.apache.org/jira/browse/FLINK-5650> > > > 在 2017年3月15日,下午6:29,Vladislav Pernin <vladislav.per...@gmail.com> 写道: > > > > I just tested in in my reproducer. It works. > > > > 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: > > > >> I did in fact just open a PR for > >>> https://issues.apache.org/jira/browse/FLINK-6001 > >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and > >>> allowedLateness > >> > >> > >> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote: > >>> Hi, > >>> > >>> I would also include the following (not yet resolved) issue in the > 1.2.1 > >>> scope : > >>> > >>> https://issues.apache.org/jira/browse/FLINK-6001 > >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and > >>> allowedLateness > >>> > >>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <u...@apache.org>: > >>> > >>>> Big +1 Gordon! > >>>> > >>>> I think (10) is very critical to have in 1.2.1. > >>>> > >>>> – Ufuk > >>>> > >>>> > >>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter > >>>> <s.rich...@data-artisans.com> wrote: > >>>>> Hi, > >>>>> > >>>>> I would suggest to also include in 1.2.1: > >>>>> > >>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 < > >>>> https://issues.apache.org/jira/browse/FLINK-6044> > >>>>> Replaces unintentional calls to InputStream#read(…) with the intended > >>>>> and correct InputStream#readFully(…) > >>>>> Status: PR > >>>>> > >>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 < > >>>> https://issues.apache.org/jira/browse/FLINK-5985> > >>>>> Flink 1.2 was creating state handles for stateless tasks which caused > >>>> trouble > >>>>> at restore time for users that wanted to do some changes that only > >>>> include > >>>>> stateless operators to their topology. > >>>>> Status: PR > >>>>> > >>>>> > >>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <trohrm...@apache.org > >>> : > >>>>>> > >>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the > >>>> following > >>>>>> issues which have already been merged into the 1.2-release and > >>>> 1.1-release > >>>>>> branch: > >>>>>> > >>>>>> 1.2.1: > >>>>>> > >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 > >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data. > >>>>>> Corrupted checkpoints will now be skipped. > >>>>>> Status: Merged > >>>>>> > >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 > >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the > >>>>>> completed checkpoint from the meta data state handle retrieved from > >>>>>> ZooKeeper. This can, for example, happen if the meta data is > >> deleted. > >>>>>> Checkpoints with unretrievable state handles are skipped. > >>>>>> Status: Merged > >>>>>> > >>>>>> 1.1.5: > >>>>>> > >>>>>> > >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 > >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data. > >>>>>> Corrupted checkpoints will now be skipped. > >>>>>> Status: Merged > >>>>>> > >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 > >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the > >>>>>> completed checkpoint from the meta data state handle retrieved from > >>>>>> ZooKeeper. This can, for example, happen if the meta data is > >> deleted. > >>>>>> Checkpoints with unretrievable state handles are skipped. > >>>>>> Status: Merged > >>>>>> > >>>>>> Cheers, > >>>>>> Till > >>>>>> > >>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai < > >>>> tzuli...@apache.org> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi all! > >>>>>>> > >>>>>>> I would like to start a discussion for the next bugfix release for > >>>> 1.1.x > >>>>>>> and 1.2.x. > >>>>>>> There’s been quite a few critical fixes for bugs in both the > >> releases > >>>>>>> recently, and I think they deserve a bugfix release soon. > >>>>>>> Most of the bugs were reported by users. > >>>>>>> > >>>>>>> I’m starting the discussion for both bugfix releases because most > >> fixes > >>>>>>> span both releases (almost identical). > >>>>>>> Of course, the actual RC votes and RC creation process doesn’t > >> have to > >>>> be > >>>>>>> started together. > >>>>>>> > >>>>>>> Here’s an overview of what’s been collected so far, for both bugfix > >>>>>>> releases - > >>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff; > >>>> please > >>>>>>> append and bring to attention as necessary :-) ) > >>>>>>> > >>>>>>> > >>>>>>> For Flink 1.2.1: > >>>>>>> > >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: > >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on > >>>> checkpoints. > >>>>>>> This compromises the producer’s at-least-once guarantee. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: > >>>>>>> Do not check Kerberos credentials for non-Kerberos authentications. > >>>> MapR > >>>>>>> users are affected by this, and cannot submit Flink on YARN jobs > >> on a > >>>>>>> secured MapR cluster. > >>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 > >> already > >>>>>>> > >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: > >>>>>>> Kafka Consumer can lose state if queried partition list is > >> incomplete > >>>> on > >>>>>>> restore. > >>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 > >> already > >>>>>>> > >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: > >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s > >>>> JavaSerializer is > >>>>>>> used. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: > >>>>>>> Fix multi-char delimiters in Batch InputFormats. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: > >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This > >>>> fixes a > >>>>>>> bug that causes HA recovery to fail. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> For Flink 1.1.5: > >>>>>>> > >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: > >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on > >>>> checkpoints. > >>>>>>> This compromises the producer’s at-least-once guarantee. > >>>>>>> Status: This is already merged for 1.2.1. I would personally like > >> to > >>>>>>> backport the fix for this to 1.1.5 also. > >>>>>>> > >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: > >>>>>>> Kafka Consumer can lose state if queried partition list is > >> incomplete > >>>> on > >>>>>>> restore. > >>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 > >> already > >>>>>>> > >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: > >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s > >>>> JavaSerializer is > >>>>>>> used. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: > >>>>>>> Fix multi-char delimiters in Batch InputFormats. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: > >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This > >>>> fixes a > >>>>>>> bug that causes HA recovery to fail. > >>>>>>> Status: merged > >>>>>>> > >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: > >>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic > >>>> cancellation > >>>>>>> behavior. > >>>>>>> Status: This fix was already released in 1.2.0, but never made it > >> into > >>>> the > >>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5? > >>>>>>> > >>>>>>> > >>>>>>> What do you think? From the list so far, we pretty much already > >> have > >>>>>>> everything in, so I think it would be nice to aim for RCs by the > >> end of > >>>>>>> this week. > >>>>>>> Since both bugfix releases cover almost the same list of issues, I > >>>> think > >>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases > >>>> around the > >>>>>>> same time. > >>>>>>> > >>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / > >>>> “1.1.5” > >>>>>>> as the Fix Versions, and are still open. > >>>>>>> We should probably want to check if there’s anything on there that > >> we > >>>>>>> should block on for the releases: > >>>>>>> > >>>>>>> For 1.2.1: > >>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= > >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% > >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1 > >>>>>>> > >>>>>>> For 1.1.5: > >>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= > >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% > >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5 > >>>>> > >>>> > >> > > > >