Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Stephan Ewen Wed, 15 Mar 2017 12:38:04 -0700

Thanks for the update!

Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
cancel-task from timer queue to prevent memory leaks


The remaining issue list looks good, but I would say that (5) is optional.
It is not a critical production bug.



On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Thanks a lot for the updates so far everyone!
>
> From the discussion so far, the below is the still unfixed pending issues
> for 1.1.5 / 1.2.1 release.
>
> Since there’s only one backport for 1.1.5 left, I think having an RC for
> 1.1.5 near the end of this week / early next week is very promising, as
> basically everything is already in.
> I’d be happy to volunteer to help manage the release for 1.1.5, and
> prepare the RC when it’s ready :)
>
> For 1.2.1, we can leave the pending list here for tracking, and come back
> to update it in the near future.
>
> If there’s anything I missed, please let me know!
>
>
> =========== Still pending for Flink 1.1.5 ===========
>
> (1) https://issues.apache.org/jira/browse/FLINK-5701
> Broken at-least-once Kafka producer.
> Status: backport PR pending - https://github.com/apache/flink/pull/3549.
> Since it is a relatively self-contained change, I expect this to be a fast
> fix.
>
>
>
> =========== Still pending for Flink 1.2.1 ===========
>
> (1) https://issues.apache.org/jira/browse/FLINK-5808
> Fix Missing verification for setParallelism and setMaxParallelism
> Status: PR - https://github.com/apache/flink/pull/3509, review in progress
>
> (2) https://issues.apache.org/jira/browse/FLINK-5713
> Protect against NPE in WindowOperator window cleanup
> Status: PR - https://github.com/apache/flink/pull/3535, review pending
>
> (3) https://issues.apache.org/jira/browse/FLINK-6044
> TypeSerializerSerializationProxy.read() doesn't verify the read buffer
> length
> Status: Fixed for master, 1.2 backport pending
>
> (4) https://issues.apache.org/jira/browse/FLINK-5985
> Flink treats every task as stateful (making topology changes impossible)
> Status: PR - https://github.com/apache/flink/pull/3543, review in progress
>
> (5) https://issues.apache.org/jira/browse/FLINK-5650
> Flink-python tests taking up too much time
> Status: I think Chesnay currently has some progress with this one, we can
> see if we want to make this a blocker
>
>
> Cheers,
> Gordon
>
> On March 15, 2017 at 7:16:53 PM, Jinkui Shi (shijinkui...@163.com) wrote:
>
> Can we fix this issue in the 1.2.1:
>
> Flink-python tests cost too long time
> https://issues.apache.org/jira/browse/FLINK-5650 <
> https://issues.apache.org/jira/browse/FLINK-5650>
>
> > 在 2017年3月15日，下午6:29，Vladislav Pernin <vladislav.per...@gmail.com> 写道：
> >
> > I just tested in in my reproducer. It works.
> >
> > 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>:
> >
> >> I did in fact just open a PR for
> >>> https://issues.apache.org/jira/browse/FLINK-6001
> >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> >>> allowedLateness
> >>
> >>
> >> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> >>> Hi,
> >>>
> >>> I would also include the following (not yet resolved) issue in the
> 1.2.1
> >>> scope :
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-6001
> >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> >>> allowedLateness
> >>>
> >>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <u...@apache.org>:
> >>>
> >>>> Big +1 Gordon!
> >>>>
> >>>> I think (10) is very critical to have in 1.2.1.
> >>>>
> >>>> – Ufuk
> >>>>
> >>>>
> >>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> >>>> <s.rich...@data-artisans.com> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I would suggest to also include in 1.2.1:
> >>>>>
> >>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> >>>> https://issues.apache.org/jira/browse/FLINK-6044>
> >>>>> Replaces unintentional calls to InputStream#read(…) with the intended
> >>>>> and correct InputStream#readFully(…)
> >>>>> Status: PR
> >>>>>
> >>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> >>>> https://issues.apache.org/jira/browse/FLINK-5985>
> >>>>> Flink 1.2 was creating state handles for stateless tasks which caused
> >>>> trouble
> >>>>> at restore time for users that wanted to do some changes that only
> >>>> include
> >>>>> stateless operators to their topology.
> >>>>> Status: PR
> >>>>>
> >>>>>
> >>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <trohrm...@apache.org
> >>> :
> >>>>>>
> >>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> >>>> following
> >>>>>> issues which have already been merged into the 1.2-release and
> >>>> 1.1-release
> >>>>>> branch:
> >>>>>>
> >>>>>> 1.2.1:
> >>>>>>
> >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> >>>>>> Corrupted checkpoints will now be skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
> >>>>>> completed checkpoint from the meta data state handle retrieved from
> >>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >> deleted.
> >>>>>> Checkpoints with unretrievable state handles are skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> 1.1.5:
> >>>>>>
> >>>>>>
> >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> >>>>>> Corrupted checkpoints will now be skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
> >>>>>> completed checkpoint from the meta data state handle retrieved from
> >>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >> deleted.
> >>>>>> Checkpoints with unretrievable state handles are skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Till
> >>>>>>
> >>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> >>>> tzuli...@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all!
> >>>>>>>
> >>>>>>> I would like to start a discussion for the next bugfix release for
> >>>> 1.1.x
> >>>>>>> and 1.2.x.
> >>>>>>> There’s been quite a few critical fixes for bugs in both the
> >> releases
> >>>>>>> recently, and I think they deserve a bugfix release soon.
> >>>>>>> Most of the bugs were reported by users.
> >>>>>>>
> >>>>>>> I’m starting the discussion for both bugfix releases because most
> >> fixes
> >>>>>>> span both releases (almost identical).
> >>>>>>> Of course, the actual RC votes and RC creation process doesn’t
> >> have to
> >>>> be
> >>>>>>> started together.
> >>>>>>>
> >>>>>>> Here’s an overview of what’s been collected so far, for both bugfix
> >>>>>>> releases -
> >>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff;
> >>>> please
> >>>>>>> append and bring to attention as necessary :-) )
> >>>>>>>
> >>>>>>>
> >>>>>>> For Flink 1.2.1:
> >>>>>>>
> >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >>>> checkpoints.
> >>>>>>> This compromises the producer’s at-least-once guarantee.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> >>>>>>> Do not check Kerberos credentials for non-Kerberos authentications.
> >>>> MapR
> >>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
> >> on a
> >>>>>>> secured MapR cluster.
> >>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
> >> already
> >>>>>>>
> >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> >>>>>>> Kafka Consumer can lose state if queried partition list is
> >> incomplete
> >>>> on
> >>>>>>> restore.
> >>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
> >> already
> >>>>>>>
> >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >>>> JavaSerializer is
> >>>>>>> used.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> >>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
> >>>> fixes a
> >>>>>>> bug that causes HA recovery to fail.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> For Flink 1.1.5:
> >>>>>>>
> >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >>>> checkpoints.
> >>>>>>> This compromises the producer’s at-least-once guarantee.
> >>>>>>> Status: This is already merged for 1.2.1. I would personally like
> >> to
> >>>>>>> backport the fix for this to 1.1.5 also.
> >>>>>>>
> >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> >>>>>>> Kafka Consumer can lose state if queried partition list is
> >> incomplete
> >>>> on
> >>>>>>> restore.
> >>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
> >> already
> >>>>>>>
> >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >>>> JavaSerializer is
> >>>>>>> used.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> >>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
> >>>> fixes a
> >>>>>>> bug that causes HA recovery to fail.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> >>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> >>>> cancellation
> >>>>>>> behavior.
> >>>>>>> Status: This fix was already released in 1.2.0, but never made it
> >> into
> >>>> the
> >>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> >>>>>>>
> >>>>>>>
> >>>>>>> What do you think? From the list so far, we pretty much already
> >> have
> >>>>>>> everything in, so I think it would be nice to aim for RCs by the
> >> end of
> >>>>>>> this week.
> >>>>>>> Since both bugfix releases cover almost the same list of issues, I
> >>>> think
> >>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
> >>>> around the
> >>>>>>> same time.
> >>>>>>>
> >>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
> >>>> “1.1.5”
> >>>>>>> as the Fix Versions, and are still open.
> >>>>>>> We should probably want to check if there’s anything on there that
> >> we
> >>>>>>> should block on for the releases:
> >>>>>>>
> >>>>>>> For 1.2.1:
> >>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
> >>>>>>>
> >>>>>>> For 1.1.5:
> >>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
> >>>>>
> >>>>
> >>
> >
>
>

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Reply via email to