[jira] [Created] (FLINK-13937) Fix the error of the hive connector dependency version

2019-09-01 Thread Jeff Yang (Jira)
Jeff Yang created FLINK-13937:
-

 Summary: Fix the error  of the hive connector dependency version 
 Key: FLINK-13937
 URL: https://issues.apache.org/jira/browse/FLINK-13937
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Jeff Yang
 Fix For: 1.10.0


There is a wrong maven dependency in the hive connector's 
[documentation|https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/].
 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: [VOTE] FLIP-58: Flink Python User-Defined Function for Table API

2019-09-01 Thread Becket Qin
+1

It is extremely useful for ML users.

On Mon, Sep 2, 2019 at 9:46 AM Shaoxuan Wang  wrote:

> +1 (binding)
>
> This will be a great feature for Flink users, especially for the data
> science and AI engineers.
>
> Regards,
> Shaoxuan
>
>
> On Fri, Aug 30, 2019 at 1:35 PM Jeff Zhang  wrote:
>
> > +1, very looking forward this feature in flink 1.10
> >
> >
> > Yu Li  于2019年8月30日周五 上午11:08写道:
> >
> > > +1 (non-binding)
> > >
> > > Thanks for driving this!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Fri, 30 Aug 2019 at 11:01, Terry Wang  wrote:
> > >
> > > > +1. That would be very helpful.
> > > > Best,
> > > > Terry Wang
> > > >
> > > >
> > > >
> > > > > 在 2019年8月30日,上午10:18,Jark Wu  写道:
> > > > >
> > > > > +1
> > > > >
> > > > > Thanks for the great work!
> > > > >
> > > > > On Fri, 30 Aug 2019 at 10:04, Xingbo Huang 
> > wrote:
> > > > >
> > > > >> Hi Dian,
> > > > >>
> > > > >> +1,
> > > > >> Thanks a lot for driving this.
> > > > >>
> > > > >> Best,
> > > > >> Xingbo
> > > > >>> 在 2019年8月30日,上午9:39,Wei Zhong  写道:
> > > > >>>
> > > > >>> Hi Dian,
> > > > >>>
> > > > >>> +1 non-binding
> > > > >>> Thanks for driving this!
> > > > >>>
> > > > >>> Best, Wei
> > > > >>>
> > > >  在 2019年8月29日,09:25,Hequn Cheng  写道:
> > > > 
> > > >  Hi Dian,
> > > > 
> > > >  +1
> > > >  Thanks a lot for driving this.
> > > > 
> > > >  Best, Hequn
> > > > 
> > > >  On Wed, Aug 28, 2019 at 2:01 PM jincheng sun <
> > > > sunjincheng...@gmail.com>
> > > >  wrote:
> > > > 
> > > > > Hi Dian,
> > > > >
> > > > > +1, Thanks for your great job!
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > > > Dian Fu  于2019年8月28日周三 上午11:04写道:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> I'd like to start a voting thread for FLIP-58 [1] since that
> we
> > > have
> > > > >> reached an agreement on the design in the discussion thread
> [2],
> > > > >>
> > > > >> This vote will be open for at least 72 hours. Unless there is
> an
> > > > >> objection, I will try to close it by Sept 2, 2019 00:00 UTC if
> > we
> > > > have
> > > > >> received sufficient votes.
> > > > >>
> > > > >> PS: This doesn't mean that we cannot further improve the
> design.
> > > We
> > > > >> can
> > > > >> still discuss the implementation details case by case in the
> > JIRA
> > > as
> > > > >> long
> > > > >> as it doesn't affect the overall design.
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API
> > > > >> <
> > > > >>
> > > > >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Function+for+Table+API
> > > > >>>
> > > > >> [2]
> > > > >>
> > > > >
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> > > > >> <
> > > > >>
> > > > >
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> > > > >>>
> > > > >>
> > > > >> Thanks,
> > > > >> Dian
> > > > >
> > > > >>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>


Re: [DISCUSS] Simplify Flink's cluster level RestartStrategy configuration

2019-09-01 Thread zhijiang
+1 for this proposal.

IMO, it not only simplifies the cluster configuration, but also seems more fit 
logic to not rely on some low-level speicific parameters to judge the 
upper-level strategy.
It is also resonable to push forward the restart strategy configuration step by 
step for batch later.

Best,
Zhijiang
--
From:Zhu Zhu 
Send Time:2019年9月2日(星期一) 05:18
To:dev 
Subject:Re: [DISCUSS] Simplify Flink's cluster level RestartStrategy 
configuration

+1 to simplify the RestartStrategy configuration

One thing to confirm is whether the default delay should be "0 s" in the
case of
"If the config option `restart-strategy` is not configured" and "If
checkpointing is enabled".
I see a related discussion([SURVEY] Is the default restart delay of 0s
causing problems) is ongoing and we may need to take the result from that.

Thanks,
Zhu Zhu

Becket Qin  于2019年9月2日周一 上午9:06写道:

> +1. The new behavior makes sense to me.
>
> BTW, we need a FLIP for this :)
>
> On Fri, Aug 30, 2019 at 10:17 PM Till Rohrmann 
> wrote:
>
> > After an offline discussion with Stephan, we concluded that changing the
> > default restart strategy for batch jobs is not that easy because the
> > cluster level restart configuration does not necessarily know about the
> > type of job which is submitted. We concluded that we would like to keep
> the
> > batch behaviour as is (NoRestartStrategy) and revisit this issue at a
> later
> > point in time.
> >
> > On Fri, Aug 30, 2019 at 3:24 PM Till Rohrmann 
> > wrote:
> >
> > > The current default behaviour for batch is `NoRestartStrategy` if
> nothing
> > > is configured. We could say that we set the default value of
> > > `restart-strategy` to `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0
> > s")`
> > > independent of the checkpointing. The only downside I could see is that
> > > some faulty batch jobs might get stuck in a restart loop without
> > reaching a
> > > terminal state.
> > >
> > > @Dawid, I don't intend to touch the ExecutionConfig. This change only
> > > targets the cluster level configuration of the RestartStrategy.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 30, 2019 at 3:14 PM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > > wrote:
> > >
> > >> Also +1 in general.
> > >>
> > >> I have a few questions though:
> > >>
> > >> - does it only apply to the logic in
> > >>
> > >>
> >
> org.apache.flink.runtime.executiongraph.restart.RestartStrategyFactory#createRestartStrategyFactory,
> > >> which is only the cluster side configuration? Or do you want to change
> > >> the logic also on the job side in ExecutionConfig?
> > >>
> > >> - if the latter, does that mean deprecated methods in ExecutionConfig
> > >> like: setNumberOfExecutionRetries, setExecutionRetryDelay will have no
> > >> effect? I think this would be a good idea, but would suggest to remove
> > >> the corresponding fields and methods. This is not that simple though.
> I
> > >> tried to do that for other parameters that have no effect already like
> > >> codeAnalysisMode & failTaskOnCheckpointError. The are two problems:
> > >>
> > >> 1) setNumberOfExecutionRetires are effectively marked with @Public
> > >> annotation (the codeAnalysisMode & failTaskOnCheckpointError don't
> have
> > >> this problem). Therefore this would be a binary incompatible change.
> > >>
> > >> 2) ExecutionConfig is stored in state as part of PojoSerializer in
> > >> pre flink 1.7. It should not be a problem for
> numberOfExecutionRetries &
> > >> executionRetryDelays as they are of primitive types. It is a problem
> for
> > >> codeAnalysisMode (we cannot remove the class, as this breaks
> > >> serialization). I wanted to mention that anyway, just to be aware of
> > that.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 30/08/2019 14:48, Stephan Ewen wrote:
> > >> > +1 in general
> > >> >
> > >> > What is the default in batch, though? No restarts? I always found
> that
> > >> > somewhat uncommon.
> > >> > Should we also change that part, if we are changing the default
> > anyways?
> > >> >
> > >> >
> > >> > On Fri, Aug 30, 2019 at 2:35 PM Till Rohrmann  >
> > >> wrote:
> > >> >
> > >> >> Hi everyone,
> > >> >>
> > >> >> I wanted to discuss how to simplify Flink's cluster level
> > >> RestartStrategy
> > >> >> configuration [1]. Currently, Flink's behaviour with respect to
> > >> configuring
> > >> >> the {{RestartStrategies}} is quite complicated and convoluted. The
> > >> reason
> > >> >> for this is that we evolved the way it has been configured and
> wanted
> > >> to
> > >> >> keep it backwards compatible. Due to this, we have currently the
> > >> following
> > >> >> behaviour:
> > >> >>
> > >> >> * If the config option `restart-strategy` is configured, then Flink
> > >> uses
> > >> >> this `RestartStrategy` (so far so simple)
> > >> >> * If the config option `restart-strategy` is not configured, then
> > >> >> ** If `restart-strategy.fixed-delay.attempts

Re: [DISCUSS] FLIP-53: Fine Grained Resource Management

2019-09-01 Thread Xintong Song
Updated the FLIP wiki page [1], with the following changes.

   - Remove the step of converting pipelined edges between different slot
   sharing groups into blocking edges.
   - Set `allSourcesInSamePipelinedRegion` to true by default.

Thank you~

Xintong Song



On Mon, Sep 2, 2019 at 11:50 AM Xintong Song  wrote:

> Regarding changing edge type, I think actually we don't need to do this
> for batch jobs neither, because we don't have public interfaces for users
> to explicitly set slot sharing groups in DataSet API and SQL/Table API. We
> have such interfaces in DataStream API only.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Aug 27, 2019 at 10:16 PM Xintong Song 
> wrote:
>
>> Thanks for the correction, Till.
>>
>> Regarding your comments:
>> - You are right, we should not change the edge type for streaming jobs.
>> Then I think we can change the option 'allSourcesInSamePipelinedRegion' in
>> step 2 to 'isStreamingJob', and implement the current step 2 before the
>> current step 1 so we can use this option to decide whether should change
>> the edge type. What do you think?
>> - Agree. It should be easier to make the default value of
>> 'allSourcesInSamePipelinedRegion' (or 'isStreamingJob') 'true', and set it
>> to 'false' when using DataSet API or blink planner.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Tue, Aug 27, 2019 at 8:59 PM Till Rohrmann 
>> wrote:
>>
>>> Thanks for creating the implementation plan Xintong. Overall, the
>>> implementation plan looks good. I had a couple of comments:
>>>
>>> - What will happen if a user has defined a streaming job with two slot
>>> sharing groups? Would the code insert a blocking data exchange between
>>> these two groups? If yes, then this breaks existing Flink streaming jobs.
>>> - How do we detect unbounded streaming jobs to set
>>> the allSourcesInSamePipelinedRegion to `true`? Wouldn't it be easier to
>>> set
>>> it false if we are using the DataSet API or the Blink planner with a
>>> bounded job?
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Aug 27, 2019 at 2:16 PM Till Rohrmann 
>>> wrote:
>>>
>>> > I guess there is a typo since the link to the FLIP-53 is
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Tue, Aug 27, 2019 at 1:42 PM Xintong Song 
>>> > wrote:
>>> >
>>> >> Added implementation steps for this FLIP on the wiki page [1].
>>> >>
>>> >>
>>> >> Thank you~
>>> >>
>>> >> Xintong Song
>>> >>
>>> >>
>>> >> [1]
>>> >>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>> >>
>>> >> On Mon, Aug 19, 2019 at 10:29 PM Xintong Song 
>>> >> wrote:
>>> >>
>>> >> > Hi everyone,
>>> >> >
>>> >> > As Till suggested, the original "FLIP-53: Fine Grained Resource
>>> >> > Management" splits into two separate FLIPs,
>>> >> >
>>> >> >- FLIP-53: Fine Grained Operator Resource Management [1]
>>> >> >- FLIP-56: Dynamic Slot Allocation [2]
>>> >> >
>>> >> > We'll continue using this discussion thread for FLIP-53. For
>>> FLIP-56, I
>>> >> > just started a new discussion thread [3].
>>> >> >
>>> >> > Thank you~
>>> >> >
>>> >> > Xintong Song
>>> >> >
>>> >> >
>>> >> > [1]
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>> >> >
>>> >> > [2]
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>>> >> >
>>> >> > [3]
>>> >> >
>>> >>
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-56-Dynamic-Slot-Allocation-td31960.html
>>> >> >
>>> >> > On Mon, Aug 19, 2019 at 2:55 PM Xintong Song >> >
>>> >> > wrote:
>>> >> >
>>> >> >> Thinks for the comments, Yang.
>>> >> >>
>>> >> >> Regarding your questions:
>>> >> >>
>>> >> >>1. How to calculate the resource specification of TaskManagers?
>>> Do
>>> >> they
>>> >> >>>have them same resource spec calculated based on the
>>> >> configuration? I
>>> >> >>> think
>>> >> >>>we still have wasted resources in this situation. Or we could
>>> start
>>> >> >>>TaskManagers with different spec.
>>> >> >>>
>>> >> >> I agree with you that we can further improve the resource utility
>>> by
>>> >> >> customizing task executors with different resource specifications.
>>> >> However,
>>> >> >> I'm in favor of limiting the scope of this FLIP and leave it as a
>>> >> future
>>> >> >> optimization. The plan for that part is to move the logic of
>>> deciding
>>> >> task
>>> >> >> executor specifications into the slot manager and make slot manager
>>> >> >> pluggable, so inside the slot manager plugin we can have different
>>> >> logics
>>> >> >> for deciding the task executor specifications.
>>> >> >>
>>> >> >>
>>> >> >>>2. If a slot is released and returned to SlotPool, does it
>>> could be
>>> >> >>>reused by other SlotRequest that the request resource is
>>> smaller
>>> >> than
>>>

Re: [DISCUSS] FLIP-53: Fine Grained Resource Management

2019-09-01 Thread Xintong Song
Regarding changing edge type, I think actually we don't need to do this for
batch jobs neither, because we don't have public interfaces for users to
explicitly set slot sharing groups in DataSet API and SQL/Table API. We
have such interfaces in DataStream API only.

Thank you~

Xintong Song



On Tue, Aug 27, 2019 at 10:16 PM Xintong Song  wrote:

> Thanks for the correction, Till.
>
> Regarding your comments:
> - You are right, we should not change the edge type for streaming jobs.
> Then I think we can change the option 'allSourcesInSamePipelinedRegion' in
> step 2 to 'isStreamingJob', and implement the current step 2 before the
> current step 1 so we can use this option to decide whether should change
> the edge type. What do you think?
> - Agree. It should be easier to make the default value of
> 'allSourcesInSamePipelinedRegion' (or 'isStreamingJob') 'true', and set it
> to 'false' when using DataSet API or blink planner.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Aug 27, 2019 at 8:59 PM Till Rohrmann 
> wrote:
>
>> Thanks for creating the implementation plan Xintong. Overall, the
>> implementation plan looks good. I had a couple of comments:
>>
>> - What will happen if a user has defined a streaming job with two slot
>> sharing groups? Would the code insert a blocking data exchange between
>> these two groups? If yes, then this breaks existing Flink streaming jobs.
>> - How do we detect unbounded streaming jobs to set
>> the allSourcesInSamePipelinedRegion to `true`? Wouldn't it be easier to
>> set
>> it false if we are using the DataSet API or the Blink planner with a
>> bounded job?
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 27, 2019 at 2:16 PM Till Rohrmann 
>> wrote:
>>
>> > I guess there is a typo since the link to the FLIP-53 is
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>> >
>> > Cheers,
>> > Till
>> >
>> > On Tue, Aug 27, 2019 at 1:42 PM Xintong Song 
>> > wrote:
>> >
>> >> Added implementation steps for this FLIP on the wiki page [1].
>> >>
>> >>
>> >> Thank you~
>> >>
>> >> Xintong Song
>> >>
>> >>
>> >> [1]
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>> >>
>> >> On Mon, Aug 19, 2019 at 10:29 PM Xintong Song 
>> >> wrote:
>> >>
>> >> > Hi everyone,
>> >> >
>> >> > As Till suggested, the original "FLIP-53: Fine Grained Resource
>> >> > Management" splits into two separate FLIPs,
>> >> >
>> >> >- FLIP-53: Fine Grained Operator Resource Management [1]
>> >> >- FLIP-56: Dynamic Slot Allocation [2]
>> >> >
>> >> > We'll continue using this discussion thread for FLIP-53. For
>> FLIP-56, I
>> >> > just started a new discussion thread [3].
>> >> >
>> >> > Thank you~
>> >> >
>> >> > Xintong Song
>> >> >
>> >> >
>> >> > [1]
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>> >> >
>> >> > [2]
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> >> >
>> >> > [3]
>> >> >
>> >>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-56-Dynamic-Slot-Allocation-td31960.html
>> >> >
>> >> > On Mon, Aug 19, 2019 at 2:55 PM Xintong Song 
>> >> > wrote:
>> >> >
>> >> >> Thinks for the comments, Yang.
>> >> >>
>> >> >> Regarding your questions:
>> >> >>
>> >> >>1. How to calculate the resource specification of TaskManagers?
>> Do
>> >> they
>> >> >>>have them same resource spec calculated based on the
>> >> configuration? I
>> >> >>> think
>> >> >>>we still have wasted resources in this situation. Or we could
>> start
>> >> >>>TaskManagers with different spec.
>> >> >>>
>> >> >> I agree with you that we can further improve the resource utility by
>> >> >> customizing task executors with different resource specifications.
>> >> However,
>> >> >> I'm in favor of limiting the scope of this FLIP and leave it as a
>> >> future
>> >> >> optimization. The plan for that part is to move the logic of
>> deciding
>> >> task
>> >> >> executor specifications into the slot manager and make slot manager
>> >> >> pluggable, so inside the slot manager plugin we can have different
>> >> logics
>> >> >> for deciding the task executor specifications.
>> >> >>
>> >> >>
>> >> >>>2. If a slot is released and returned to SlotPool, does it
>> could be
>> >> >>>reused by other SlotRequest that the request resource is smaller
>> >> than
>> >> >>> it?
>> >> >>>
>> >> >> No, I think slot pool should always return slots if they do not
>> exactly
>> >> >> match the pending requests, so that resource manager can deal with
>> the
>> >> >> extra resources.
>> >> >>
>> >> >>>   - If it is yes, what happens to the available resource in the
>> >> >>
>> >> >>   TaskManager.
>> >> >>>   - What is the SlotStatus of the cached slot in SlotPool? The
>> >> >>>   AllocationId is null?
>> >> >>>
>> >> >> The allocation

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

2019-09-01 Thread Xintong Song
I just updated the FLIP wiki page [1], with the following changes:

   - Network memory uses JVM direct memory, and is accounted when setting
   JVM max direct memory size parameter.
   - Use dynamic configurations (`-Dkey=value`) to pass calculated memory
   configs into TaskExecutors, instead of ENV variables.
   - Remove 'supporting memory reservation' from the scope of this FLIP.

@till @stephan, please take another look see if there are any other
concerns.

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors

On Mon, Sep 2, 2019 at 11:13 AM Xintong Song  wrote:

> Sorry for the late response.
>
> - Regarding the `TaskExecutorSpecifics` naming, let's discuss the detail
> in PR.
> - Regarding passing parameters into the `TaskExecutor`, +1 for using
> dynamic configuration at the moment, given that there are more questions to
> be discussed to have a general framework for overwriting configurations
> with ENV variables.
> - Regarding memory reservation, I double checked with Yu and he will take
> care of it.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Aug 29, 2019 at 7:35 PM Till Rohrmann 
> wrote:
>
>> What I forgot to add is that we could tackle specifying the configuration
>> fully in an incremental way and that the full specification should be the
>> desired end state.
>>
>> On Thu, Aug 29, 2019 at 1:33 PM Till Rohrmann 
>> wrote:
>>
>> > I think our goal should be that the configuration is fully specified
>> when
>> > the process is started. By considering the internal calculation step to
>> be
>> > rather validate existing values and calculate missing ones, these two
>> > proposal shouldn't even conflict (given determinism).
>> >
>> > Since we don't want to change an existing flink-conf.yaml, specifying
>> the
>> > full configuration would require to pass in the options differently.
>> >
>> > One way could be the ENV variables approach. The reason why I'm trying
>> to
>> > exclude this feature from the FLIP is that I believe it needs a bit more
>> > discussion. Just some questions which come to my mind: What would be the
>> > exact format (FLINK_KEY_NAME)? Would we support a dot separator which is
>> > supported by some systems (FLINK.KEY.NAME)? If we accept the dot
>> > separator what would be the order of precedence if there are two ENV
>> > variables defined (FLINK_KEY_NAME and FLINK.KEY.NAME)? What is the
>> > precedence of env variable vs. dynamic configuration value specified
>> via -D?
>> >
>> > Another approach could be to pass in the dynamic configuration values
>> via
>> > `-Dkey=value` to the Flink process. For that we don't have to change
>> > anything because the functionality already exists.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Aug 29, 2019 at 12:50 PM Stephan Ewen  wrote:
>> >
>> >> I see. Under the assumption of strict determinism that should work.
>> >>
>> >> The original proposal had this point "don't compute inside the TM,
>> compute
>> >> outside and supply a full config", because that sounded more intuitive.
>> >>
>> >> On Thu, Aug 29, 2019 at 12:15 PM Till Rohrmann 
>> >> wrote:
>> >>
>> >> > My understanding was that before starting the Flink process we call a
>> >> > utility which calculates these values. I assume that this utility
>> will
>> >> do
>> >> > the calculation based on a set of configured values (process memory,
>> >> flink
>> >> > memory, network memory etc.). Assuming that these values don't differ
>> >> from
>> >> > the values with which the JVM is started, it should be possible to
>> >> > recompute them in the Flink process in order to set the values.
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Aug 29, 2019 at 11:29 AM Stephan Ewen 
>> wrote:
>> >> >
>> >> > > When computing the values in the JVM process after it started, how
>> >> would
>> >> > > you deal with values like Max Direct Memory, Metaspace size. native
>> >> > memory
>> >> > > reservation (reduce heap size), etc? All the values that are
>> >> parameters
>> >> > to
>> >> > > the JVM process and that need to be supplied at process startup?
>> >> > >
>> >> > > On Wed, Aug 28, 2019 at 4:46 PM Till Rohrmann <
>> trohrm...@apache.org>
>> >> > > wrote:
>> >> > >
>> >> > > > Thanks for the clarification. I have some more comments:
>> >> > > >
>> >> > > > - I would actually split the logic to compute the process memory
>> >> > > > requirements and storing the values into two things. E.g. one
>> could
>> >> > name
>> >> > > > the former TaskExecutorProcessUtility and  the latter
>> >> > > > TaskExecutorProcessMemory. But we can discuss this on the PR
>> since
>> >> it's
>> >> > > > just a naming detail.
>> >> > > >
>> >> > > > - Generally, I'm not opposed to making configuration values
>> >> overridable
>> >> > > by
>> >> > > > ENV variables. I think this is a very good idea and makes the
>> >> > > > configurability of Flink processes easier. However, I think that
>> >> adding
>> >> > > > this functionality shoul

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

2019-09-01 Thread Yu Li
Yes I'll address the memory reservation functionality in a separate FLIP to
cooperate with FLIP-49 (sorry for being late for the discussion).

Best Regards,
Yu


On Mon, 2 Sep 2019 at 11:14, Xintong Song  wrote:

> Sorry for the late response.
>
> - Regarding the `TaskExecutorSpecifics` naming, let's discuss the detail in
> PR.
> - Regarding passing parameters into the `TaskExecutor`, +1 for using
> dynamic configuration at the moment, given that there are more questions to
> be discussed to have a general framework for overwriting configurations
> with ENV variables.
> - Regarding memory reservation, I double checked with Yu and he will take
> care of it.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Aug 29, 2019 at 7:35 PM Till Rohrmann 
> wrote:
>
> > What I forgot to add is that we could tackle specifying the configuration
> > fully in an incremental way and that the full specification should be the
> > desired end state.
> >
> > On Thu, Aug 29, 2019 at 1:33 PM Till Rohrmann 
> > wrote:
> >
> > > I think our goal should be that the configuration is fully specified
> when
> > > the process is started. By considering the internal calculation step to
> > be
> > > rather validate existing values and calculate missing ones, these two
> > > proposal shouldn't even conflict (given determinism).
> > >
> > > Since we don't want to change an existing flink-conf.yaml, specifying
> the
> > > full configuration would require to pass in the options differently.
> > >
> > > One way could be the ENV variables approach. The reason why I'm trying
> to
> > > exclude this feature from the FLIP is that I believe it needs a bit
> more
> > > discussion. Just some questions which come to my mind: What would be
> the
> > > exact format (FLINK_KEY_NAME)? Would we support a dot separator which
> is
> > > supported by some systems (FLINK.KEY.NAME)? If we accept the dot
> > > separator what would be the order of precedence if there are two ENV
> > > variables defined (FLINK_KEY_NAME and FLINK.KEY.NAME)? What is the
> > > precedence of env variable vs. dynamic configuration value specified
> via
> > -D?
> > >
> > > Another approach could be to pass in the dynamic configuration values
> via
> > > `-Dkey=value` to the Flink process. For that we don't have to change
> > > anything because the functionality already exists.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Aug 29, 2019 at 12:50 PM Stephan Ewen 
> wrote:
> > >
> > >> I see. Under the assumption of strict determinism that should work.
> > >>
> > >> The original proposal had this point "don't compute inside the TM,
> > compute
> > >> outside and supply a full config", because that sounded more
> intuitive.
> > >>
> > >> On Thu, Aug 29, 2019 at 12:15 PM Till Rohrmann 
> > >> wrote:
> > >>
> > >> > My understanding was that before starting the Flink process we call
> a
> > >> > utility which calculates these values. I assume that this utility
> will
> > >> do
> > >> > the calculation based on a set of configured values (process memory,
> > >> flink
> > >> > memory, network memory etc.). Assuming that these values don't
> differ
> > >> from
> > >> > the values with which the JVM is started, it should be possible to
> > >> > recompute them in the Flink process in order to set the values.
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Aug 29, 2019 at 11:29 AM Stephan Ewen 
> > wrote:
> > >> >
> > >> > > When computing the values in the JVM process after it started, how
> > >> would
> > >> > > you deal with values like Max Direct Memory, Metaspace size.
> native
> > >> > memory
> > >> > > reservation (reduce heap size), etc? All the values that are
> > >> parameters
> > >> > to
> > >> > > the JVM process and that need to be supplied at process startup?
> > >> > >
> > >> > > On Wed, Aug 28, 2019 at 4:46 PM Till Rohrmann <
> trohrm...@apache.org
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks for the clarification. I have some more comments:
> > >> > > >
> > >> > > > - I would actually split the logic to compute the process memory
> > >> > > > requirements and storing the values into two things. E.g. one
> > could
> > >> > name
> > >> > > > the former TaskExecutorProcessUtility and  the latter
> > >> > > > TaskExecutorProcessMemory. But we can discuss this on the PR
> since
> > >> it's
> > >> > > > just a naming detail.
> > >> > > >
> > >> > > > - Generally, I'm not opposed to making configuration values
> > >> overridable
> > >> > > by
> > >> > > > ENV variables. I think this is a very good idea and makes the
> > >> > > > configurability of Flink processes easier. However, I think that
> > >> adding
> > >> > > > this functionality should not be part of this FLIP because it
> > would
> > >> > > simply
> > >> > > > widen the scope unnecessarily.
> > >> > > >
> > >> > > > The reasons why I believe it is unnecessary are the following:
> For
> > >> Yarn
> > >> > > we
> > >> > > > already create write a flink-conf.yaml which could be populated
> > with
> > >> > the
> > >> > > > 

Re: [DISCUSS] Simplify Flink's cluster level RestartStrategy configuration

2019-09-01 Thread Zhu Zhu
+1 to simplify the RestartStrategy configuration

One thing to confirm is whether the default delay should be "0 s" in the
case of
"If the config option `restart-strategy` is not configured" and "If
checkpointing is enabled".
I see a related discussion([SURVEY] Is the default restart delay of 0s
causing problems) is ongoing and we may need to take the result from that.

Thanks,
Zhu Zhu

Becket Qin  于2019年9月2日周一 上午9:06写道:

> +1. The new behavior makes sense to me.
>
> BTW, we need a FLIP for this :)
>
> On Fri, Aug 30, 2019 at 10:17 PM Till Rohrmann 
> wrote:
>
> > After an offline discussion with Stephan, we concluded that changing the
> > default restart strategy for batch jobs is not that easy because the
> > cluster level restart configuration does not necessarily know about the
> > type of job which is submitted. We concluded that we would like to keep
> the
> > batch behaviour as is (NoRestartStrategy) and revisit this issue at a
> later
> > point in time.
> >
> > On Fri, Aug 30, 2019 at 3:24 PM Till Rohrmann 
> > wrote:
> >
> > > The current default behaviour for batch is `NoRestartStrategy` if
> nothing
> > > is configured. We could say that we set the default value of
> > > `restart-strategy` to `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0
> > s")`
> > > independent of the checkpointing. The only downside I could see is that
> > > some faulty batch jobs might get stuck in a restart loop without
> > reaching a
> > > terminal state.
> > >
> > > @Dawid, I don't intend to touch the ExecutionConfig. This change only
> > > targets the cluster level configuration of the RestartStrategy.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 30, 2019 at 3:14 PM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > > wrote:
> > >
> > >> Also +1 in general.
> > >>
> > >> I have a few questions though:
> > >>
> > >> - does it only apply to the logic in
> > >>
> > >>
> >
> org.apache.flink.runtime.executiongraph.restart.RestartStrategyFactory#createRestartStrategyFactory,
> > >> which is only the cluster side configuration? Or do you want to change
> > >> the logic also on the job side in ExecutionConfig?
> > >>
> > >> - if the latter, does that mean deprecated methods in ExecutionConfig
> > >> like: setNumberOfExecutionRetries, setExecutionRetryDelay will have no
> > >> effect? I think this would be a good idea, but would suggest to remove
> > >> the corresponding fields and methods. This is not that simple though.
> I
> > >> tried to do that for other parameters that have no effect already like
> > >> codeAnalysisMode & failTaskOnCheckpointError. The are two problems:
> > >>
> > >> 1) setNumberOfExecutionRetires are effectively marked with @Public
> > >> annotation (the codeAnalysisMode & failTaskOnCheckpointError don't
> have
> > >> this problem). Therefore this would be a binary incompatible change.
> > >>
> > >> 2) ExecutionConfig is stored in state as part of PojoSerializer in
> > >> pre flink 1.7. It should not be a problem for
> numberOfExecutionRetries &
> > >> executionRetryDelays as they are of primitive types. It is a problem
> for
> > >> codeAnalysisMode (we cannot remove the class, as this breaks
> > >> serialization). I wanted to mention that anyway, just to be aware of
> > that.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 30/08/2019 14:48, Stephan Ewen wrote:
> > >> > +1 in general
> > >> >
> > >> > What is the default in batch, though? No restarts? I always found
> that
> > >> > somewhat uncommon.
> > >> > Should we also change that part, if we are changing the default
> > anyways?
> > >> >
> > >> >
> > >> > On Fri, Aug 30, 2019 at 2:35 PM Till Rohrmann  >
> > >> wrote:
> > >> >
> > >> >> Hi everyone,
> > >> >>
> > >> >> I wanted to discuss how to simplify Flink's cluster level
> > >> RestartStrategy
> > >> >> configuration [1]. Currently, Flink's behaviour with respect to
> > >> configuring
> > >> >> the {{RestartStrategies}} is quite complicated and convoluted. The
> > >> reason
> > >> >> for this is that we evolved the way it has been configured and
> wanted
> > >> to
> > >> >> keep it backwards compatible. Due to this, we have currently the
> > >> following
> > >> >> behaviour:
> > >> >>
> > >> >> * If the config option `restart-strategy` is configured, then Flink
> > >> uses
> > >> >> this `RestartStrategy` (so far so simple)
> > >> >> * If the config option `restart-strategy` is not configured, then
> > >> >> ** If `restart-strategy.fixed-delay.attempts` or
> > >> >> `restart-strategy.fixed-delay.delay` are defined, then instantiate
> > >> >> `FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts,
> > >> >> restart-strategy.fixed-delay.delay)`
> > >> >> ** If `restart-strategy.fixed-delay.attempts` and
> > >> >> `restart-strategy.fixed-delay.delay` are not defined, then
> > >> >> *** If checkpointing is disabled, then choose `NoRestartStrategy`
> > >> >> *** If checkpointing is enabled, then choose
> > >> >> `FixedDelayRestartStrategy(Integer.MAX_VALUE,

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

2019-09-01 Thread Xintong Song
Sorry for the late response.

- Regarding the `TaskExecutorSpecifics` naming, let's discuss the detail in
PR.
- Regarding passing parameters into the `TaskExecutor`, +1 for using
dynamic configuration at the moment, given that there are more questions to
be discussed to have a general framework for overwriting configurations
with ENV variables.
- Regarding memory reservation, I double checked with Yu and he will take
care of it.

Thank you~

Xintong Song



On Thu, Aug 29, 2019 at 7:35 PM Till Rohrmann  wrote:

> What I forgot to add is that we could tackle specifying the configuration
> fully in an incremental way and that the full specification should be the
> desired end state.
>
> On Thu, Aug 29, 2019 at 1:33 PM Till Rohrmann 
> wrote:
>
> > I think our goal should be that the configuration is fully specified when
> > the process is started. By considering the internal calculation step to
> be
> > rather validate existing values and calculate missing ones, these two
> > proposal shouldn't even conflict (given determinism).
> >
> > Since we don't want to change an existing flink-conf.yaml, specifying the
> > full configuration would require to pass in the options differently.
> >
> > One way could be the ENV variables approach. The reason why I'm trying to
> > exclude this feature from the FLIP is that I believe it needs a bit more
> > discussion. Just some questions which come to my mind: What would be the
> > exact format (FLINK_KEY_NAME)? Would we support a dot separator which is
> > supported by some systems (FLINK.KEY.NAME)? If we accept the dot
> > separator what would be the order of precedence if there are two ENV
> > variables defined (FLINK_KEY_NAME and FLINK.KEY.NAME)? What is the
> > precedence of env variable vs. dynamic configuration value specified via
> -D?
> >
> > Another approach could be to pass in the dynamic configuration values via
> > `-Dkey=value` to the Flink process. For that we don't have to change
> > anything because the functionality already exists.
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 29, 2019 at 12:50 PM Stephan Ewen  wrote:
> >
> >> I see. Under the assumption of strict determinism that should work.
> >>
> >> The original proposal had this point "don't compute inside the TM,
> compute
> >> outside and supply a full config", because that sounded more intuitive.
> >>
> >> On Thu, Aug 29, 2019 at 12:15 PM Till Rohrmann 
> >> wrote:
> >>
> >> > My understanding was that before starting the Flink process we call a
> >> > utility which calculates these values. I assume that this utility will
> >> do
> >> > the calculation based on a set of configured values (process memory,
> >> flink
> >> > memory, network memory etc.). Assuming that these values don't differ
> >> from
> >> > the values with which the JVM is started, it should be possible to
> >> > recompute them in the Flink process in order to set the values.
> >> >
> >> >
> >> >
> >> > On Thu, Aug 29, 2019 at 11:29 AM Stephan Ewen 
> wrote:
> >> >
> >> > > When computing the values in the JVM process after it started, how
> >> would
> >> > > you deal with values like Max Direct Memory, Metaspace size. native
> >> > memory
> >> > > reservation (reduce heap size), etc? All the values that are
> >> parameters
> >> > to
> >> > > the JVM process and that need to be supplied at process startup?
> >> > >
> >> > > On Wed, Aug 28, 2019 at 4:46 PM Till Rohrmann  >
> >> > > wrote:
> >> > >
> >> > > > Thanks for the clarification. I have some more comments:
> >> > > >
> >> > > > - I would actually split the logic to compute the process memory
> >> > > > requirements and storing the values into two things. E.g. one
> could
> >> > name
> >> > > > the former TaskExecutorProcessUtility and  the latter
> >> > > > TaskExecutorProcessMemory. But we can discuss this on the PR since
> >> it's
> >> > > > just a naming detail.
> >> > > >
> >> > > > - Generally, I'm not opposed to making configuration values
> >> overridable
> >> > > by
> >> > > > ENV variables. I think this is a very good idea and makes the
> >> > > > configurability of Flink processes easier. However, I think that
> >> adding
> >> > > > this functionality should not be part of this FLIP because it
> would
> >> > > simply
> >> > > > widen the scope unnecessarily.
> >> > > >
> >> > > > The reasons why I believe it is unnecessary are the following: For
> >> Yarn
> >> > > we
> >> > > > already create write a flink-conf.yaml which could be populated
> with
> >> > the
> >> > > > memory settings. For the other processes it should not make a
> >> > difference
> >> > > > whether the loaded Configuration is populated with the memory
> >> settings
> >> > > from
> >> > > > ENV variables or by using TaskExecutorProcessUtility to compute
> the
> >> > > missing
> >> > > > values from the loaded configuration. If the latter would not be
> >> > possible
> >> > > > (wrong or missing configuration values), then we should not have
> >> been
> >> > > able
> >> > > > to actually start the pr

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-01 Thread Yu Li
+1 for a 1.8.2 release, thanks for bringing this up Jincheng!

Best Regards,
Yu


On Mon, 2 Sep 2019 at 09:19, Thomas Weise  wrote:

> +1 for the 1.8.2 release
>
> I marked https://issues.apache.org/jira/browse/FLINK-13586 for this
> release. It would be good to compensate for the backward incompatible
> change to ClosureCleaner that was introduced in 1.8.1, which affects
> downstream dependencies.
>
> Thanks,
> Thomas
>
>
> On Sun, Sep 1, 2019 at 5:10 PM jincheng sun 
> wrote:
>
> > Hi Jark,
> >
> > Glad to hear that you want to be the Release Manager of flink 1.8.2.
> > I believe that you will be a great RM, and I am very willing to help you
> > with the final release in the final stages. :)
> >
> > The release of Apache Flink involves a number of tasks. For details, you
> > can consult the documentation [1]. If you have any questions, please let
> me
> > know and let us work together.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1
> >
> > Cheers,
> > Jincheng
> >
> > Till Rohrmann  于2019年8月31日周六 上午12:59写道:
> >
> > > +1 for a 1.8.2 bug fix release. Thanks for kicking this discussion off
> > > Jincheng.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 30, 2019 at 6:45 PM Jark Wu  wrote:
> > >
> > > > Thanks Jincheng for bringing this up.
> > > >
> > > > +1 to the 1.8.2 release, because it already contains a couple of
> > > important
> > > > fixes and it has been a long time since 1.8.1 came out.
> > > > I'm willing to help the community as much as possible. I'm wondering
> > if I
> > > > can be the release manager of 1.8.2 or work with you together
> > @Jincheng?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 30 Aug 2019 at 18:58, Hequn Cheng 
> > wrote:
> > > >
> > > > > Hi Jincheng,
> > > > >
> > > > > +1 for a 1.8.2 release.
> > > > > Thanks a lot for raising the discussion. It would be nice to have
> > these
> > > > > critical fixes.
> > > > >
> > > > > Best, Hequn
> > > > >
> > > > >
> > > > > On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels  >
> > > > wrote:
> > > > >
> > > > > > Hi Jincheng,
> > > > > >
> > > > > > +1 I would be for a 1.8.2 release such that we can fix the
> problems
> > > > with
> > > > > > the nested closure cleaner which currently block 1.8.1 users with
> > > Beam:
> > > > > > https://issues.apache.org/jira/browse/FLINK-13367
> > > > > >
> > > > > > Thanks,
> > > > > > Max
> > > > > >
> > > > > > On 30.08.19 11:25, jincheng sun wrote:
> > > > > > > Hi Flink devs,
> > > > > > >
> > > > > > > It has been nearly 2 months since the 1.8.1 released. So, what
> do
> > > you
> > > > > > think
> > > > > > > about releasing Flink 1.8.2 soon?
> > > > > > >
> > > > > > > We already have some blocker and critical fixes in the
> > release-1.8
> > > > > > branch:
> > > > > > >
> > > > > > > [Blocker]
> > > > > > > - FLINK-13159 java.lang.ClassNotFoundException when restore job
> > > > > > > - FLINK-10368 'Kerberized YARN on Docker test' unstable
> > > > > > > - FLINK-12578 Use secure URLs for Maven repositories
> > > > > > >
> > > > > > > [Critical]
> > > > > > > - FLINK-12736 ResourceManager may release TM with allocated
> slots
> > > > > > > - FLINK-12889 Job keeps in FAILING state
> > > > > > > - FLINK-13484 ConnectedComponents end-to-end test instable with
> > > > > > > NoResourceAvailableException
> > > > > > > - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt
> to
> > > > sleep
> > > > > > > with negative time
> > > > > > > - FLINK-13806 Metric Fetcher floods the JM log with errors when
> > TM
> > > is
> > > > > > lost
> > > > > > >
> > > > > > > Furthermore, I think the following one blocker issue should be
> > > merged
> > > > > > > before 1.8.2 release.
> > > > > > >
> > > > > > > - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
> > > > > > >
> > > > > > > It would also be great if we can have the fix of
> Elasticsearch6.x
> > > > > > connector
> > > > > > > threads leaking (FLINK-13689) in 1.8.2 release which is
> > identified
> > > as
> > > > > > major.
> > > > > > >
> > > > > > > Please let me know what you think?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Jincheng
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-01 Thread Jark Wu
Thanks Jincheng, I will look into the release guidelines.

Hi @Thomas Weise  , should we mark FLINK-13586 as a
blocker? And how long do you think this issue will take?

I summarized the current status of issues we need to track:

[Bloker]:
[FLINK-13897] OSS FS NOTICE file is placed in wrong directory (@Chesnay was
working on it, PR was reviewed)
[Major]:
[FLINK-13586] Method ClosureCleaner.clean broke backward compatibility
between 1.8.0 and 1.8.1 (need PR)
[FLINK-13689] Rest High Level Client for Elasticsearch6.x connector leaks
threads if no connection could be established (reviewed by @Gordon, PR need
to be updated)

And we have a new issue FLINK-13925 target to 1.8.2 marked as major, I'm
not sure whether we should wait this for 1.8.2, could @Aljoscha Krettek
 help to check it?
[FLINK-13925] ClassLoader in BlobLibraryCacheManager is not using context
class loader

Thank you all for the fixing and reviewing.

The issues of this release can be tracked here:
https://issues.apache.org/jira/projects/FLINK/versions/12345670

Best,
Jark


On Mon, 2 Sep 2019 at 09:19, Thomas Weise  wrote:

> +1 for the 1.8.2 release
>
> I marked https://issues.apache.org/jira/browse/FLINK-13586 for this
> release. It would be good to compensate for the backward incompatible
> change to ClosureCleaner that was introduced in 1.8.1, which affects
> downstream dependencies.
>
> Thanks,
> Thomas
>
>
> On Sun, Sep 1, 2019 at 5:10 PM jincheng sun 
> wrote:
>
> > Hi Jark,
> >
> > Glad to hear that you want to be the Release Manager of flink 1.8.2.
> > I believe that you will be a great RM, and I am very willing to help you
> > with the final release in the final stages. :)
> >
> > The release of Apache Flink involves a number of tasks. For details, you
> > can consult the documentation [1]. If you have any questions, please let
> me
> > know and let us work together.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1
> >
> > Cheers,
> > Jincheng
> >
> > Till Rohrmann  于2019年8月31日周六 上午12:59写道:
> >
> > > +1 for a 1.8.2 bug fix release. Thanks for kicking this discussion off
> > > Jincheng.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 30, 2019 at 6:45 PM Jark Wu  wrote:
> > >
> > > > Thanks Jincheng for bringing this up.
> > > >
> > > > +1 to the 1.8.2 release, because it already contains a couple of
> > > important
> > > > fixes and it has been a long time since 1.8.1 came out.
> > > > I'm willing to help the community as much as possible. I'm wondering
> > if I
> > > > can be the release manager of 1.8.2 or work with you together
> > @Jincheng?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 30 Aug 2019 at 18:58, Hequn Cheng 
> > wrote:
> > > >
> > > > > Hi Jincheng,
> > > > >
> > > > > +1 for a 1.8.2 release.
> > > > > Thanks a lot for raising the discussion. It would be nice to have
> > these
> > > > > critical fixes.
> > > > >
> > > > > Best, Hequn
> > > > >
> > > > >
> > > > > On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels  >
> > > > wrote:
> > > > >
> > > > > > Hi Jincheng,
> > > > > >
> > > > > > +1 I would be for a 1.8.2 release such that we can fix the
> problems
> > > > with
> > > > > > the nested closure cleaner which currently block 1.8.1 users with
> > > Beam:
> > > > > > https://issues.apache.org/jira/browse/FLINK-13367
> > > > > >
> > > > > > Thanks,
> > > > > > Max
> > > > > >
> > > > > > On 30.08.19 11:25, jincheng sun wrote:
> > > > > > > Hi Flink devs,
> > > > > > >
> > > > > > > It has been nearly 2 months since the 1.8.1 released. So, what
> do
> > > you
> > > > > > think
> > > > > > > about releasing Flink 1.8.2 soon?
> > > > > > >
> > > > > > > We already have some blocker and critical fixes in the
> > release-1.8
> > > > > > branch:
> > > > > > >
> > > > > > > [Blocker]
> > > > > > > - FLINK-13159 java.lang.ClassNotFoundException when restore job
> > > > > > > - FLINK-10368 'Kerberized YARN on Docker test' unstable
> > > > > > > - FLINK-12578 Use secure URLs for Maven repositories
> > > > > > >
> > > > > > > [Critical]
> > > > > > > - FLINK-12736 ResourceManager may release TM with allocated
> slots
> > > > > > > - FLINK-12889 Job keeps in FAILING state
> > > > > > > - FLINK-13484 ConnectedComponents end-to-end test instable with
> > > > > > > NoResourceAvailableException
> > > > > > > - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt
> to
> > > > sleep
> > > > > > > with negative time
> > > > > > > - FLINK-13806 Metric Fetcher floods the JM log with errors when
> > TM
> > > is
> > > > > > lost
> > > > > > >
> > > > > > > Furthermore, I think the following one blocker issue should be
> > > merged
> > > > > > > before 1.8.2 release.
> > > > > > >
> > > > > > > - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
> > > > > > >
> > > > > > > It would also be great if we can have the fix of
> Elasticsearch6.x
> > > > > > connector
> > > > > > > threads leaking (FLINK-13689) in 1

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-09-01 Thread Zhu Zhu
Hi Elkhan,

>>Regarding "One optimization that we take is letting yarn to reuse the
flink-dist jar which was localized when running previous jobs."
>>We are intending to use Flink Real-time pipeline for Replay from
Hive/HDFS (from offline source), to have 1 single pipeline for both batch
and real-time. So for batch Flink job, the ?>>containers will be released
once the job is done.
>>I guess your job is real-time flink, so  you can share the  jars from
already long-running jobs.

This optimization is conducted by making flink dist jar a public
distributed cache of YARN.
In this way, the localized dist jar can be shared by different YARN
applications and it will not be removed when the YARN application which
localized it terminates.
This requires some changes in Flink though.
We will open a ISSUE to contribute this optimization to the community.

Thanks,
Zhu Zhu

SHI Xiaogang  于2019年8月31日周六 下午12:57写道:

> Hi Dadashov,
>
> You may have a look at method YarnResourceManager#onContainersAllocated
> which will launch containers (via NMClient#startContainer) after containers
> are allocated.
> The launching is performed in the main thread of YarnResourceManager and
> the launching is synchronous/blocking. Consequently, the containers will be
> launched one by one.
>
> Regards,
> Xiaogang
>
> Elkhan Dadashov  于2019年8月31日周六 上午2:37写道:
>
>> Thanks  everyone for valuable input and sharing  your experience for
>> tackling the issue.
>>
>> Regarding suggestions :
>> - We provision some common jars in all cluster nodes  *-->*  but this
>> requires dependence on Infra Team schedule for handling common jars/updating
>> - Making Uberjar slimmer *-->* tried even with 200 MB Uberjar (half
>> size),  did not improve much. Only 100 containers could started in time.
>> but then receiving :
>>
>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
>> start container.
>> This token is expired. current time is 1566422713305 found 1566422560552
>> Note: System times on machines may be out of sync. Check system time and 
>> time zones.
>>
>>
>> - It would be nice to see FLINK-13184
>>  , but expected
>> version that will get in is 1.10
>> - Increase replication factor --> It would be nice to have Flink conf for
>> setting replication factor for only Fink job jars, but not the output. It
>> is also challenging to set a replication for yet non-existing directory,
>> the new files will have default replication factor. Will explore HDFS cache
>> option.
>>
>> Maybe another option can be:
>> - Letting yet-to-be-started Task Managers (or NodeManagers) download the
>> jars from already started TaskManagers  in P2P fashion, not to have a
>> blocker on HDFS replication.
>>
>> Spark job without any tuning exact same size jar with 800 executors, can
>> start without any issue at the same cluster in less than a minute.
>>
>> *Further questions:*
>>
>> *@ SHI Xiaogang > :*
>>
>> I see that all 800 requests are sent concurrently :
>>
>> 2019-08-30 00:28:28.516 [flink-akka.actor.default-dispatcher-37] INFO
>>  org.apache.flink.yarn.YarnResourceManager  - Requesting new TaskExecutor
>> container with resources . Number pending requests
>> 793.
>> 2019-08-30 00:28:28.516 [flink-akka.actor.default-dispatcher-37] INFO
>>  org.apache.flink.yarn.YarnResourceManager  - Request slot with profile
>> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0,
>> nativeMemoryInMB=0, networkMemoryInMB=0} for job
>> e908cb4700d5127a0b67be035e4494f7 with allocation id
>> AllocationID{cb016f7ce1eac1342001ccdb1427ba07}.
>>
>> 2019-08-30 00:28:28.516 [flink-akka.actor.default-dispatcher-37] INFO
>>  org.apache.flink.yarn.YarnResourceManager  - Requesting new TaskExecutor
>> container with resources . Number pending requests
>> 794.
>> 2019-08-30 00:28:28.516 [flink-akka.actor.default-dispatcher-37] INFO
>>  org.apache.flink.yarn.YarnResourceManager  - Request slot with profile
>> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0,
>> nativeMemoryInMB=0, networkMemoryInMB=0} for job
>> e908cb4700d5127a0b67be035e4494f7 with allocation id
>> AllocationID{71bbb917374ade66df4c058c41b81f4e}.
>> ...
>>
>> Can you please elaborate the part  "As containers are launched and
>> stopped one after another" ? Any pointer to class/method in Flink?
>>
>> *@ Zhu Zhu > *:
>>
>> Regarding "One optimization that we take is letting yarn to reuse the
>> flink-dist jar which was localized when running previous jobs."
>>
>> We are intending to use Flink Real-time pipeline for Replay from
>> Hive/HDFS (from offline source), to have 1 single pipeline for both batch
>> and real-time. So for batch Flink job, the containers will be released once
>> the job is done.
>> I guess your job is real-time flink, so  you can share the  jars from
>> already long-running jobs.
>>
>> Thanks.
>>
>>
>> On Fri, Aug 30, 2019 at 12:46 AM Jeff Zhang  wrote:
>>
>>> I can think of 2 approaches:
>>>
>>> 1. Allow fli

Re: [PROPOSAL] Force rebase on master before merge

2019-09-01 Thread Zili Chen
Hi all,

Thanks for your replies.

For Till's question, as Chesnay said if we cannot attach travis checks
via CIBot workflow the mechanism provided by GitHub doesn't work at all,
which states "This setting will not take effect unless at least one
status check is enabled".

Technically we can involve this up-to-date checker in CIBot workflow.
However,
Given the status that our project is currently under quite active
development
and it takes too long to run an extra, almost no implicit conflict build
pass,
I agree that it is not our case to enforce such rules.

Best,
tison.


Chesnay Schepler  于2019年8月30日周五 下午4:38写道:

> I think this is a non-issue; every committer I know checks beforehand if
> the build passes.
>
> Piotr has provided good arguments for why this approach isn't practical.
> Additionally, there are simply technical limitations that prevent this
> from working as expected.
>
> a) we cannot attach Travis checks via CiBot due to lack of permissions
> b) It is not possible AFAIK to force a PR to be up-to-date with current
> master when Travis runs. In other words, I can open a PR, travis passes,
> and so long as no new merge conflicts arise I could _still_ merge it 2
> months later.
>
> On 30/08/2019 10:34, Piotr Nowojski wrote:
> > Hi,
> >
> > Thanks for the proposal. I have similar concerns as Kurt.
> >
> > If we enforced such rule I would be afraid that everybody would be
> waiting for tests on his PR to complete, racing others committers to be
> “the first guy that clicks the merge button”, then forcing all of the
> others to rebase manually and race again. For example it wouldn’t be
> possible to push a final version of the PR, wait for the tests to complete
> overnight and merge it next day. Unless we would allow for merging without
> green travis after a final rebase, but that for me would be almost exactly
> what we have now.
> >
> > Is this a big issue in the first place? I don’t feel it that way, but
> maybe I’m working in not very contested parts of the code?
> >
> > If it’s an issue, I would suggest to go for the merging bot, that would
> have a queue of PRs to be:
> > 1. Automatically rebased on the latest master
> > 2. If no conflicts in 1., run the tests
> > 3. If no test failures merge
> >
> > Piotrek
> >
> >> On 30 Aug 2019, at 09:38, Till Rohrmann  wrote:
> >>
> >> Hi Tison,
> >>
> >> thanks for starting this discussion. In general, I'm in favour of
> >> automations which remove human mistakes out of the equation.
> >>
> >> Do you know how these status checks work concretely? Will Github reject
> >> commits for which there is no passed Travis run? How would hotfix
> commits
> >> being distinguished from PR commits for which a Travis run should
> exist? So
> >> I guess my question is how would enabling the status checks change how
> >> committers interact with the Github repository?
> >>
> >> Cheers,
> >> Till
> >>
> >> On Fri, Aug 30, 2019 at 4:46 AM Zili Chen  wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> Thanks for your reply!
> >>>
> >>> I find two concerns about the downside from your email. Correct
> >>> me if I misunderstanding.
> >>>
> >>> 1. Rebase times. Typically commits are independent one another, rebase
> >>> just fast-forward changes so that contributors rarely resolve conflicts
> >>> by himself. Reviews doesn't get blocked by this force rebase if there
> is
> >>> a green travis report ever -- just require contributor rebase and test
> >>> again, which generally doesn't involve changes(unless resolve
> conflicts).
> >>> Contributor rebases his pull request when he has spare time or is
> required
> >>> by reviewer/before getting merged. This should not inflict too much
> works.
> >>>
> >>> 2. Testing time. It is a separated topic that discussed in this
> thread[1].
> >>> I don't think we finally live with a long testing time, so it won't be
> a
> >>> problem then we trigger multiple tests.
> >>>
> >>> Simply sum up, for trivial cases, works are trivial and it
> >>> prevents accidentally
> >>> failures; for complicated cases, it already requires rebase and fully
> >>> tests.
> >>>
> >>> Best,
> >>> tison.
> >>>
> >>> [1]
> >>>
> >>>
> https://lists.apache.org/x/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> >>>
> >>>
> >>> Kurt Young  于2019年8月30日周五 上午9:15写道:
> >>>
>  Hi Zili,
> 
>  Thanks for the proposal, I had similar confusion in the past with your
>  point #2.
>  Force rebase to master before merging can solve some problems, but it
> >>> also
>  introduces new problem. Given the CI testing time is quite long
> (couple
> >>> of
>  hours)
>  now, it's highly possible that before your test which triggered by
> >>> rebasing
>  finishes,
>  the master will get some more new commits. This situation will get
> worse
> >>> if
>  more
>  people are doing this. One possible solution is let the committer
> decide
>  what should
>  do before he/she merges it. If it's a trivial i

Re: [VOTE] FLIP-58: Flink Python User-Defined Function for Table API

2019-09-01 Thread Shaoxuan Wang
+1 (binding)

This will be a great feature for Flink users, especially for the data
science and AI engineers.

Regards,
Shaoxuan


On Fri, Aug 30, 2019 at 1:35 PM Jeff Zhang  wrote:

> +1, very looking forward this feature in flink 1.10
>
>
> Yu Li  于2019年8月30日周五 上午11:08写道:
>
> > +1 (non-binding)
> >
> > Thanks for driving this!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 30 Aug 2019 at 11:01, Terry Wang  wrote:
> >
> > > +1. That would be very helpful.
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > > > 在 2019年8月30日,上午10:18,Jark Wu  写道:
> > > >
> > > > +1
> > > >
> > > > Thanks for the great work!
> > > >
> > > > On Fri, 30 Aug 2019 at 10:04, Xingbo Huang 
> wrote:
> > > >
> > > >> Hi Dian,
> > > >>
> > > >> +1,
> > > >> Thanks a lot for driving this.
> > > >>
> > > >> Best,
> > > >> Xingbo
> > > >>> 在 2019年8月30日,上午9:39,Wei Zhong  写道:
> > > >>>
> > > >>> Hi Dian,
> > > >>>
> > > >>> +1 non-binding
> > > >>> Thanks for driving this!
> > > >>>
> > > >>> Best, Wei
> > > >>>
> > >  在 2019年8月29日,09:25,Hequn Cheng  写道:
> > > 
> > >  Hi Dian,
> > > 
> > >  +1
> > >  Thanks a lot for driving this.
> > > 
> > >  Best, Hequn
> > > 
> > >  On Wed, Aug 28, 2019 at 2:01 PM jincheng sun <
> > > sunjincheng...@gmail.com>
> > >  wrote:
> > > 
> > > > Hi Dian,
> > > >
> > > > +1, Thanks for your great job!
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > > Dian Fu  于2019年8月28日周三 上午11:04写道:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'd like to start a voting thread for FLIP-58 [1] since that we
> > have
> > > >> reached an agreement on the design in the discussion thread [2],
> > > >>
> > > >> This vote will be open for at least 72 hours. Unless there is an
> > > >> objection, I will try to close it by Sept 2, 2019 00:00 UTC if
> we
> > > have
> > > >> received sufficient votes.
> > > >>
> > > >> PS: This doesn't mean that we cannot further improve the design.
> > We
> > > >> can
> > > >> still discuss the implementation details case by case in the
> JIRA
> > as
> > > >> long
> > > >> as it doesn't affect the overall design.
> > > >>
> > > >> [1]
> > > >>
> > > >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API
> > > >> <
> > > >>
> > > >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Function+for+Table+API
> > > >>>
> > > >> [2]
> > > >>
> > > >
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> > > >> <
> > > >>
> > > >
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> > > >>>
> > > >>
> > > >> Thanks,
> > > >> Dian
> > > >
> > > >>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-01 Thread Shaoxuan Wang
Hi Jincheng, Fudian, and Aljoscha,
I am assuming the proposed python UDX can also be applied to Flink SQL.
Is this correct? If yes, I would suggest to title the FLIP as "Flink Python
User-Defined Function" or "Flink Python User-Defined Function for Table".

Regards,
Shaoxuan


On Wed, Aug 28, 2019 at 12:22 PM jincheng sun 
wrote:

> Thanks for the feedback Bowen!
>
> Great thanks for create the FLIP and bring up the VOTE Dian!
>
> Best, Jincheng
>
> Dian Fu  于2019年8月28日周三 上午11:32写道:
>
> > Hi all,
> >
> > I have started a voting thread [1]. Thanks a lot for your help during
> > creating the FLIP @Jincheng.
> >
> >
> > Hi Bowen,
> >
> > Very appreciated for your comments. I have replied you in the design doc.
> > As it seems that the comments doesn't affect the overall design, I'll not
> > cancel the vote for now and we can continue the discussion in the design
> > doc.
> >
> > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
> > <
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
> > >
> >
> > Regards,
> > Dian
> >
> > > 在 2019年8月28日,上午11:05,Bowen Li  写道:
> > >
> > > Hi Jincheng and Dian,
> > >
> > > Sorry for being late to the party. I took a glance at the proposal,
> LGTM
> > in
> > > general, and I left only a couple comments.
> > >
> > > Thanks,
> > > Bowen
> > >
> > >
> > > On Mon, Aug 26, 2019 at 8:05 PM Dian Fu  wrote:
> > >
> > >> Hi Jincheng,
> > >>
> > >> Thanks! It works.
> > >>
> > >> Thanks,
> > >> Dian
> > >>
> > >>> 在 2019年8月27日,上午10:55,jincheng sun  写道:
> > >>>
> > >>> Hi Dian, can you check if you have edit access? :)
> > >>>
> > >>>
> > >>> Dian Fu  于2019年8月26日周一 上午10:52写道:
> > >>>
> >  Hi Jincheng,
> > 
> >  Appreciated for the kind tips and offering of help. Definitely need
> > it!
> >  Could you grant me write permission for confluence? My Id: Dian Fu
> > 
> >  Thanks,
> >  Dian
> > 
> > > 在 2019年8月26日,上午9:53,jincheng sun  写道:
> > >
> > > Thanks for your feedback Hequn & Dian.
> > >
> > > Dian, I am glad to see that you want help to create the FLIP!
> > > Everyone will have first time, and I am very willing to help you
> > >> complete
> > > your first FLIP creation. Here some tips:
> > >
> > > - First I'll give your account write permission for confluence.
> > > - Before create the FLIP, please have look at the FLIP Template
> [1],
> >  (It's
> > > better to know more about FLIP by reading [2])
> > > - Create Flink Python UDFs related JIRAs after completing the VOTE
> of
> > > FLIP.(I think you also can bring up the VOTE thread, if you want! )
> > >
> > > Any problems you encounter during this period,feel free to tell me
> > that
> >  we
> > > can solve them together. :)
> > >
> > > Best,
> > > Jincheng
> > >
> > >
> > >
> > >
> > > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
> > > [2]
> > >
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > >
> > > Hequn Cheng  于2019年8月23日周五 上午11:54写道:
> > >
> > >> +1 for starting the vote.
> > >>
> > >> Thanks Jincheng a lot for the discussion.
> > >>
> > >> Best, Hequn
> > >>
> > >> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu 
> > >> wrote:
> > >>
> > >>> Hi Jincheng,
> > >>>
> > >>> +1 to start the FLIP create and VOTE on this feature. I'm willing
> > to
> >  help
> > >>> on the FLIP create if you don't mind. As I haven't created a FLIP
> >  before,
> > >>> it will be great if you could help on this. :)
> > >>>
> > >>> Regards,
> > >>> Dian
> > >>>
> >  在 2019年8月22日,下午11:41,jincheng sun 
> 写道:
> > 
> >  Hi all,
> > 
> >  Thanks a lot for your feedback. If there are no more suggestions
> > and
> >  comments, I think it's better to  initiate a vote to create a
> FLIP
> > >> for
> >  Apache Flink Python UDFs.
> >  What do you think?
> > 
> >  Best, Jincheng
> > 
> >  jincheng sun  于2019年8月15日周四
> 上午12:54写道:
> > 
> > > Hi Thomas,
> > >
> > > Thanks for your confirmation and the very important reminder
> > about
> > >>> bundle
> > > processing.
> > >
> > > I have had add the description about how to perform bundle
> > >> processing
> > >>> from
> > > the perspective of checkpoint and watermark. Feel free to leave
> > >>> comments if
> > > there are anything not describe clearly.
> > >
> > > Best,
> > > Jincheng
> > >
> > >
> > > Dian Fu  于2019年8月14日周三 上午10:08写道:
> > >
> > >> Hi Thomas,
> > >>
> > >

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-01 Thread Thomas Weise
+1 for the 1.8.2 release

I marked https://issues.apache.org/jira/browse/FLINK-13586 for this
release. It would be good to compensate for the backward incompatible
change to ClosureCleaner that was introduced in 1.8.1, which affects
downstream dependencies.

Thanks,
Thomas


On Sun, Sep 1, 2019 at 5:10 PM jincheng sun 
wrote:

> Hi Jark,
>
> Glad to hear that you want to be the Release Manager of flink 1.8.2.
> I believe that you will be a great RM, and I am very willing to help you
> with the final release in the final stages. :)
>
> The release of Apache Flink involves a number of tasks. For details, you
> can consult the documentation [1]. If you have any questions, please let me
> know and let us work together.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1
>
> Cheers,
> Jincheng
>
> Till Rohrmann  于2019年8月31日周六 上午12:59写道:
>
> > +1 for a 1.8.2 bug fix release. Thanks for kicking this discussion off
> > Jincheng.
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 30, 2019 at 6:45 PM Jark Wu  wrote:
> >
> > > Thanks Jincheng for bringing this up.
> > >
> > > +1 to the 1.8.2 release, because it already contains a couple of
> > important
> > > fixes and it has been a long time since 1.8.1 came out.
> > > I'm willing to help the community as much as possible. I'm wondering
> if I
> > > can be the release manager of 1.8.2 or work with you together
> @Jincheng?
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 30 Aug 2019 at 18:58, Hequn Cheng 
> wrote:
> > >
> > > > Hi Jincheng,
> > > >
> > > > +1 for a 1.8.2 release.
> > > > Thanks a lot for raising the discussion. It would be nice to have
> these
> > > > critical fixes.
> > > >
> > > > Best, Hequn
> > > >
> > > >
> > > > On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels 
> > > wrote:
> > > >
> > > > > Hi Jincheng,
> > > > >
> > > > > +1 I would be for a 1.8.2 release such that we can fix the problems
> > > with
> > > > > the nested closure cleaner which currently block 1.8.1 users with
> > Beam:
> > > > > https://issues.apache.org/jira/browse/FLINK-13367
> > > > >
> > > > > Thanks,
> > > > > Max
> > > > >
> > > > > On 30.08.19 11:25, jincheng sun wrote:
> > > > > > Hi Flink devs,
> > > > > >
> > > > > > It has been nearly 2 months since the 1.8.1 released. So, what do
> > you
> > > > > think
> > > > > > about releasing Flink 1.8.2 soon?
> > > > > >
> > > > > > We already have some blocker and critical fixes in the
> release-1.8
> > > > > branch:
> > > > > >
> > > > > > [Blocker]
> > > > > > - FLINK-13159 java.lang.ClassNotFoundException when restore job
> > > > > > - FLINK-10368 'Kerberized YARN on Docker test' unstable
> > > > > > - FLINK-12578 Use secure URLs for Maven repositories
> > > > > >
> > > > > > [Critical]
> > > > > > - FLINK-12736 ResourceManager may release TM with allocated slots
> > > > > > - FLINK-12889 Job keeps in FAILING state
> > > > > > - FLINK-13484 ConnectedComponents end-to-end test instable with
> > > > > > NoResourceAvailableException
> > > > > > - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt to
> > > sleep
> > > > > > with negative time
> > > > > > - FLINK-13806 Metric Fetcher floods the JM log with errors when
> TM
> > is
> > > > > lost
> > > > > >
> > > > > > Furthermore, I think the following one blocker issue should be
> > merged
> > > > > > before 1.8.2 release.
> > > > > >
> > > > > > - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
> > > > > >
> > > > > > It would also be great if we can have the fix of Elasticsearch6.x
> > > > > connector
> > > > > > threads leaking (FLINK-13689) in 1.8.2 release which is
> identified
> > as
> > > > > major.
> > > > > >
> > > > > > Please let me know what you think?
> > > > > >
> > > > > > Cheers,
> > > > > > Jincheng
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Simplify Flink's cluster level RestartStrategy configuration

2019-09-01 Thread Becket Qin
+1. The new behavior makes sense to me.

BTW, we need a FLIP for this :)

On Fri, Aug 30, 2019 at 10:17 PM Till Rohrmann  wrote:

> After an offline discussion with Stephan, we concluded that changing the
> default restart strategy for batch jobs is not that easy because the
> cluster level restart configuration does not necessarily know about the
> type of job which is submitted. We concluded that we would like to keep the
> batch behaviour as is (NoRestartStrategy) and revisit this issue at a later
> point in time.
>
> On Fri, Aug 30, 2019 at 3:24 PM Till Rohrmann 
> wrote:
>
> > The current default behaviour for batch is `NoRestartStrategy` if nothing
> > is configured. We could say that we set the default value of
> > `restart-strategy` to `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0
> s")`
> > independent of the checkpointing. The only downside I could see is that
> > some faulty batch jobs might get stuck in a restart loop without
> reaching a
> > terminal state.
> >
> > @Dawid, I don't intend to touch the ExecutionConfig. This change only
> > targets the cluster level configuration of the RestartStrategy.
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 30, 2019 at 3:14 PM Dawid Wysakowicz  >
> > wrote:
> >
> >> Also +1 in general.
> >>
> >> I have a few questions though:
> >>
> >> - does it only apply to the logic in
> >>
> >>
> org.apache.flink.runtime.executiongraph.restart.RestartStrategyFactory#createRestartStrategyFactory,
> >> which is only the cluster side configuration? Or do you want to change
> >> the logic also on the job side in ExecutionConfig?
> >>
> >> - if the latter, does that mean deprecated methods in ExecutionConfig
> >> like: setNumberOfExecutionRetries, setExecutionRetryDelay will have no
> >> effect? I think this would be a good idea, but would suggest to remove
> >> the corresponding fields and methods. This is not that simple though. I
> >> tried to do that for other parameters that have no effect already like
> >> codeAnalysisMode & failTaskOnCheckpointError. The are two problems:
> >>
> >> 1) setNumberOfExecutionRetires are effectively marked with @Public
> >> annotation (the codeAnalysisMode & failTaskOnCheckpointError don't have
> >> this problem). Therefore this would be a binary incompatible change.
> >>
> >> 2) ExecutionConfig is stored in state as part of PojoSerializer in
> >> pre flink 1.7. It should not be a problem for numberOfExecutionRetries &
> >> executionRetryDelays as they are of primitive types. It is a problem for
> >> codeAnalysisMode (we cannot remove the class, as this breaks
> >> serialization). I wanted to mention that anyway, just to be aware of
> that.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 30/08/2019 14:48, Stephan Ewen wrote:
> >> > +1 in general
> >> >
> >> > What is the default in batch, though? No restarts? I always found that
> >> > somewhat uncommon.
> >> > Should we also change that part, if we are changing the default
> anyways?
> >> >
> >> >
> >> > On Fri, Aug 30, 2019 at 2:35 PM Till Rohrmann 
> >> wrote:
> >> >
> >> >> Hi everyone,
> >> >>
> >> >> I wanted to discuss how to simplify Flink's cluster level
> >> RestartStrategy
> >> >> configuration [1]. Currently, Flink's behaviour with respect to
> >> configuring
> >> >> the {{RestartStrategies}} is quite complicated and convoluted. The
> >> reason
> >> >> for this is that we evolved the way it has been configured and wanted
> >> to
> >> >> keep it backwards compatible. Due to this, we have currently the
> >> following
> >> >> behaviour:
> >> >>
> >> >> * If the config option `restart-strategy` is configured, then Flink
> >> uses
> >> >> this `RestartStrategy` (so far so simple)
> >> >> * If the config option `restart-strategy` is not configured, then
> >> >> ** If `restart-strategy.fixed-delay.attempts` or
> >> >> `restart-strategy.fixed-delay.delay` are defined, then instantiate
> >> >> `FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts,
> >> >> restart-strategy.fixed-delay.delay)`
> >> >> ** If `restart-strategy.fixed-delay.attempts` and
> >> >> `restart-strategy.fixed-delay.delay` are not defined, then
> >> >> *** If checkpointing is disabled, then choose `NoRestartStrategy`
> >> >> *** If checkpointing is enabled, then choose
> >> >> `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")`
> >> >>
> >> >> I would like to simplify the configuration by removing the "If
> >> >> `restart-strategy.fixed-delay.attempts` or
> >> >> `restart-strategy.fixed-delay.delay`, then" condition. That way, the
> >> logic
> >> >> would be the following:
> >> >>
> >> >> * If the config option `restart-strategy` is configured, then Flink
> >> uses
> >> >> this `RestartStrategy`
> >> >> * If the config option `restart-strategy` is not configured, then
> >> >> ** If checkpointing is disabled, then choose `NoRestartStrategy`
> >> >> ** If checkpointing is enabled, then choose
> >> >> `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")`
> >> >>
> >> >> That way we retain the user 

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-01 Thread jincheng sun
Hi Jark,

Glad to hear that you want to be the Release Manager of flink 1.8.2.
I believe that you will be a great RM, and I am very willing to help you
with the final release in the final stages. :)

The release of Apache Flink involves a number of tasks. For details, you
can consult the documentation [1]. If you have any questions, please let me
know and let us work together.

https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1

Cheers,
Jincheng

Till Rohrmann  于2019年8月31日周六 上午12:59写道:

> +1 for a 1.8.2 bug fix release. Thanks for kicking this discussion off
> Jincheng.
>
> Cheers,
> Till
>
> On Fri, Aug 30, 2019 at 6:45 PM Jark Wu  wrote:
>
> > Thanks Jincheng for bringing this up.
> >
> > +1 to the 1.8.2 release, because it already contains a couple of
> important
> > fixes and it has been a long time since 1.8.1 came out.
> > I'm willing to help the community as much as possible. I'm wondering if I
> > can be the release manager of 1.8.2 or work with you together @Jincheng?
> >
> > Best,
> > Jark
> >
> > On Fri, 30 Aug 2019 at 18:58, Hequn Cheng  wrote:
> >
> > > Hi Jincheng,
> > >
> > > +1 for a 1.8.2 release.
> > > Thanks a lot for raising the discussion. It would be nice to have these
> > > critical fixes.
> > >
> > > Best, Hequn
> > >
> > >
> > > On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels 
> > wrote:
> > >
> > > > Hi Jincheng,
> > > >
> > > > +1 I would be for a 1.8.2 release such that we can fix the problems
> > with
> > > > the nested closure cleaner which currently block 1.8.1 users with
> Beam:
> > > > https://issues.apache.org/jira/browse/FLINK-13367
> > > >
> > > > Thanks,
> > > > Max
> > > >
> > > > On 30.08.19 11:25, jincheng sun wrote:
> > > > > Hi Flink devs,
> > > > >
> > > > > It has been nearly 2 months since the 1.8.1 released. So, what do
> you
> > > > think
> > > > > about releasing Flink 1.8.2 soon?
> > > > >
> > > > > We already have some blocker and critical fixes in the release-1.8
> > > > branch:
> > > > >
> > > > > [Blocker]
> > > > > - FLINK-13159 java.lang.ClassNotFoundException when restore job
> > > > > - FLINK-10368 'Kerberized YARN on Docker test' unstable
> > > > > - FLINK-12578 Use secure URLs for Maven repositories
> > > > >
> > > > > [Critical]
> > > > > - FLINK-12736 ResourceManager may release TM with allocated slots
> > > > > - FLINK-12889 Job keeps in FAILING state
> > > > > - FLINK-13484 ConnectedComponents end-to-end test instable with
> > > > > NoResourceAvailableException
> > > > > - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt to
> > sleep
> > > > > with negative time
> > > > > - FLINK-13806 Metric Fetcher floods the JM log with errors when TM
> is
> > > > lost
> > > > >
> > > > > Furthermore, I think the following one blocker issue should be
> merged
> > > > > before 1.8.2 release.
> > > > >
> > > > > - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
> > > > >
> > > > > It would also be great if we can have the fix of Elasticsearch6.x
> > > > connector
> > > > > threads leaking (FLINK-13689) in 1.8.2 release which is identified
> as
> > > > major.
> > > > >
> > > > > Please let me know what you think?
> > > > >
> > > > > Cheers,
> > > > > Jincheng
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Kinesis connector becomes part of Flink releases

2019-09-01 Thread Yu Li
Great to know, thanks for the efforts Bowen!

And I believe it worth a release note in the original JIRA, wdyt? Thanks.

Best Regards,
Yu


On Sat, 31 Aug 2019 at 11:01, Bowen Li  wrote:

> Hi all,
>
> I'm glad to announce that, as #9494
> was merged today,
> flink-connector-kinesis is officially of Apache 2.0 license now in master
> branch and its artifact will be deployed to Maven central as part of Flink
> releases starting from Flink 1.10.0. Users can use the artifact out of
> shelf then and no longer have to build and maintain it on their own.
>
> It brings a much better user experience to our large AWS customer base by
> making their work simpler, smoother, and more productive!
>
> Thanks everyone who participated in coding and review to drive this
> initiative forward.
>
> Cheers,
> Bowen
>


Re: [SURVEY] Is the default restart delay of 0s causing problems?

2019-09-01 Thread Yu Li
-1 on increasing the default delay to none zero, with below reasons:

a) I could see some concerns about setting the delay to zero in the very
original JIRA (FLINK-2993 )
but later on in FLINK-9158
 we still decided to make
the change, so I'm wondering whether the decision also came from any
customer requirement? If so, how could we judge whether one requirement
override the other?

b) There could be valid reasons for both default values depending on
different use cases, as well as relative work around (like based on latest
policy, setting the config manually to 10s could resolve the problem
mentioned), and from former replies to this thread we could see users have
already taken actions. Changing it back to non-zero again won't affect such
users but might cause surprises to those depending on 0 as default.

Last but not least, no matter what decision we make this time, I'd suggest
to make it final and document in our release note explicitly. Checking the
1.5.0 release note [1] [2] it seems we didn't mention about the change on
default restart delay and we'd better learn from it this time. Thanks.

[1]
https://flink.apache.org/news/2018/05/25/release-1.5.0.html#release-notes
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html

Best Regards,
Yu


On Sun, 1 Sep 2019 at 04:33, Steven Wu  wrote:

> +1 on what Zhu Zhu said.
>
> We also override the default to 10 s.
>
> On Fri, Aug 30, 2019 at 8:58 PM Zhu Zhu  wrote:
>
>> In our production, we usually override the restart delay to be 10 s.
>> We once encountered cases that external services are overwhelmed by
>> reconnections from frequent restarted tasks.
>> As a safer though not optimized option, a default delay larger than 0 s
>> is better in my opinion.
>>
>>
>> 未来阳光 <2217232...@qq.com> 于2019年8月30日周五 下午10:23写道:
>>
>>> Hi,
>>>
>>>
>>> I thinks it's better to increase the default value. +1
>>>
>>>
>>> Best.
>>>
>>>
>>>
>>>
>>> -- 原始邮件 --
>>> 发件人: "Till Rohrmann";
>>> 发送时间: 2019年8月30日(星期五) 晚上10:07
>>> 收件人: "dev"; "user";
>>> 主题: [SURVEY] Is the default restart delay of 0s causing problems?
>>>
>>>
>>>
>>> Hi everyone,
>>>
>>> I wanted to reach out to you and ask whether decreasing the default delay
>>> to `0 s` for the fixed delay restart strategy [1] is causing trouble. A
>>> user reported that he would like to increase the default value because it
>>> can cause restart storms in case of systematic faults [2].
>>>
>>> The downside of increasing the default delay would be a slightly
>>> increased
>>> restart time if this config option is not explicitly set.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-9158
>>> [2] https://issues.apache.org/jira/browse/FLINK-11218
>>>
>>> Cheers,
>>> Till
>>
>>


Re: State of FLIPs

2019-09-01 Thread Yu Li
Thanks for the reminder Chesnay. I've just moved FLIP-50 into accepted list
since it has already passed the vote and is under development.

Best Regards,
Yu


On Fri, 30 Aug 2019 at 22:29, Dian Fu  wrote:

> Hi Chesnay,
>
> Thanks a lot for the remind. FLIP-38 has been released in 1.9 and I have
> updated the status in the wiki page.
>
> Regards,
> Dian
>
> On Fri, Aug 30, 2019 at 9:38 PM Becket Qin  wrote:
>
>> Hi Chesnay,
>>
>> You are right. FLIP-36 actually has not passed the vote yet. In fact some
>> of the key designs may have to change due to the later code changes. I'll
>> update the wiki and start a new vote.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Aug 30, 2019 at 8:44 PM Chesnay Schepler 
>> wrote:
>>
>> > The following FLIPs are marked as "Under discussion" in the wiki
>> > <
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>> >,
>> > but actually seem to be in progress (i.e. have open pull requests) and
>> some
>> > even  have code merged to master:
>> >
>> >- FLIP-36 (Interactive Programming)
>> >- FLIP-38 (Python Table API)
>> >- FLIP-44 (Support Local Aggregation)
>> >- FLIP-50 (Spill-able Heap Keyed State Backend)
>> >
>> > I would like to find out what the _actual_ state is, and then discuss
>> how
>> > we handle these FLIPs from now on (e.g., retcon history and mark them as
>> > accepted, freeze further development until a vote, ...).
>> >
>> > I've cc'd all people who create the wiki pages for said FLIPs.
>> >
>> >
>> >
>>
>


Re: Potential block size issue with S3 binary files

2019-09-01 Thread Stephan Ewen
Sounds reasonable.

I am adding Arvid to the thread - IIRC he authored that tool in his
Stratosphere days. And my a stroke of luck, he is now working on Flink
again.

@Arvid - what are your thoughts on Ken's suggestions?

On Fri, Aug 30, 2019 at 8:56 PM Ken Krugler 
wrote:

> Hi Stephan (switching to dev list),
>
> On Aug 29, 2019, at 2:52 AM, Stephan Ewen  wrote:
>
> That is a good point.
>
> Which way would you suggest to go? Not relying on the FS block size at
> all, but using a fix (configurable) block size?
>
>
> There’s value to not requiring a fixed block size, as then a file that’s
> moved between different file systems can be read using whatever block size
> is optimal for that system.
>
> Hadoop handles this in sequence files by storing a unique “sync marker”
> value in the file header (essentially a 16 byte UUID), injecting one of
> these every 2K bytes or so (in between records), and then code can scan for
> this to find record boundaries without relying on a block size. The idea is
> that 2^128 is a Big Number, so the odds of finding a false-positive sync
> marker in data is low enough to be ignorable.
>
> But that’s a bigger change. Simpler would be to put a header in each part
> file being written, with some signature bytes to aid in detecting
> old-format files.
>
> Or maybe deprecate SerializedOutputFormat/SerializedInputFormat, and
> provide some wrapper glue to make it easier to write/read Hadoop
> SequenceFiles that have a null key value, and store the POJO as the data
> value. Then you could also leverage Hadoop support for compression at
> either record or block level.
>
> — Ken
>
>
> On Thu, Aug 29, 2019 at 4:49 AM Ken Krugler 
> wrote:
>
>> Hi all,
>>
>> Wondering if anyone else has run into this.
>>
>> We write files to S3 using the SerializedOutputFormat.
>> When we read them back, sometimes we get deserialization errors where the
>> data seems to be corrupt.
>>
>> After a lot of logging, the weathervane of blame pointed towards the
>> block size somehow not being the same between the write (where it’s 64MB)
>> and the read (unknown).
>>
>> When I added a call to SerializedInputFormat.setBlockSize(64MB), the
>> problems went away.
>>
>> It looks like both input and output formats use fs.getDefaultBlockSize()
>> to set this value by default, so maybe the root issue is S3 somehow
>> reporting different values.
>>
>> But it does feel a bit odd that we’re relying on this default setting,
>> versus it being recorded in the file during the write phase.
>>
>> And it’s awkward to try to set the block size on the write, as you need
>> to set it in the environment conf, which means it applies to all output
>> files in the job.
>>
>> — Ken
>>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>


Re: CiBot Update

2019-09-01 Thread Yun Tang
Thanks for @Chesnay Schepler for this really helpful 
command!

I agreed with @Dian Fu that we should include 
this in the "Bot commands". I just wanted to find the exact command but found 
nothing in the template and come here for the command line.

Best
Yun Tang

From: Congxian Qiu 
Sent: Monday, August 26, 2019 20:08
To: dev@flink.apache.org 
Subject: Re: CiBot Update

Thanks Chesnay for the nice work, it's very helpful

Best,
Congxian


Terry Wang  于2019年8月26日周一 下午6:59写道:

> Very helpful! Thanks Chesnay!
> Best,
> Terry Wang
>
>
>
> > 在 2019年8月23日,下午11:47,Ethan Li  写道:
> >
> > Thank you very much Chesnay! This is helpful
> >
> >> On Aug 23, 2019, at 2:58 AM, Chesnay Schepler 
> wrote:
> >>
> >> @Ethan Li The source for the CiBot is available here <
> https://github.com/flink-ci/ci-bot/>. The implementation of this command
> is tightly connected to how the CiBot works; but conceptually it looks at a
> PR, finds the most recent build that ran, and uses the Travis REST API to
> restart the build.
> >> Additionally, it keeps track of which comments have been processed by
> storing the comment ID in the CI report.
> >> If you have further questions, feel free to ping me directly.
> >>
> >> @Dianfu I agree, we should include it somewhere in either the flinkbot
> template or the CI report.
> >>
> >> On 23/08/2019 03:35, Dian Fu wrote:
> >>> Thanks Chesnay for your great work! A very useful feature!
> >>>
> >>> Just one minor suggestion: It will be better if we could add this
> command to the section "Bot commands" in the flinkbot template.
> >>>
> >>> Regards,
> >>> Dian
> >>>
>  在 2019年8月23日,上午2:06,Ethan Li  写道:
> 
>  My question is specifically about implementation of "@flinkbot run
> travis"
> 
> > On Aug 22, 2019, at 1:06 PM, Ethan Li 
> wrote:
> >
> > Hi Chesnay,
> >
> > This is really nice feature!
> >
> > Can I ask how is this implemented? Do you have the related
> Jira/PR/docs that I can take a look? I’d like to introduce it to another
> project if applicable. Thank you very much!
> >
> > Best,
> > Ethan
> >
> >> On Aug 22, 2019, at 8:34 AM, Biao Liu  mmyy1...@gmail.com>> wrote:
> >>
> >> Thanks Chesnay a lot,
> >>
> >> I love this feature!
> >>
> >> Thanks,
> >> Biao /'bɪ.aʊ/
> >>
> >>
> >>
> >> On Thu, 22 Aug 2019 at 20:55, Hequn Cheng  > wrote:
> >>
> >>> Cool, thanks Chesnay a lot for the improvement!
> >>>
> >>> Best, Hequn
> >>>
> >>> On Thu, Aug 22, 2019 at 5:02 PM Zhu Zhu  > wrote:
> >>>
>  Thanks Chesnay for the CI improvement!
>  It is very helpful.
> 
>  Thanks,
>  Zhu Zhu
> 
>  zhijiang  wangzhijiang...@aliyun.com.invalid>> 于2019年8月22日周四 下午4:18写道:
> 
> > It is really very convenient now. Valuable work, Chesnay!
> >
> > Best,
> > Zhijiang
> >
> --
> > From:Till Rohrmann  trohrm...@apache.org>>
> > Send Time:2019年8月22日(星期四) 10:13
> > To:dev mailto:dev@flink.apache.org>>
> > Subject:Re: CiBot Update
> >
> > Thanks for the continuous work on the CiBot Chesnay!
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 22, 2019 at 9:47 AM Jark Wu  > wrote:
> >
> >> Great work! Thanks Chesnay!
> >>
> >>
> >>
> >> On Thu, 22 Aug 2019 at 15:42, Xintong Song <
> tonysong...@gmail.com >
> > wrote:
> >>> The re-triggering travis feature is so convenient. Thanks
> Chesnay~!
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Thu, Aug 22, 2019 at 9:26 AM Stephan Ewen  >
>  wrote:
>  Nice, thanks!
> 
>  On Thu, Aug 22, 2019 at 3:59 AM Zili Chen <
> wander4...@gmail.com >
> >> wrote:
> > Thanks for your announcement. Nice work!
> >
> > Best,
> > tison.
> >
> >
> > vino yang  yanghua1...@gmail.com>> 于2019年8月22日周四 上午8:14写道:
> >
> >> +1 for "@flinkbot run travis", it is very convenient.
> >>
> >> Chesnay Schepler  ches...@apache.org>> 于2019年8月21日周三
> >>> 下午9:12写道:
> >>> Hi everyone,
> >>>
> >>> this is an update on recent changes to the CI bot.
> >>>
> >>>
> >>> The bot now cancels builds if a new commit was added to a
> >>> PR,
> > and
> >>> cancels all builds 

[jira] [Created] (FLINK-13936) NOTICE-binary is outdated

2019-09-01 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-13936:


 Summary: NOTICE-binary is outdated
 Key: FLINK-13936
 URL: https://issues.apache.org/jira/browse/FLINK-13936
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.10.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0


The NOTICE-binary wasn't updated for the click-event example, the state 
processing API and changes to the table API packaging.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13935) YarnPrioritySchedulingITCase fails on hadoop 2.4.1

2019-09-01 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-13935:


 Summary: YarnPrioritySchedulingITCase fails on hadoop 2.4.1
 Key: FLINK-13935
 URL: https://issues.apache.org/jira/browse/FLINK-13935
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Tests
Affects Versions: 1.10.0
Reporter: Chesnay Schepler
 Fix For: 1.10.0


The {{YarnPrioritySchedulingITCase}} does an early exit in BeforeClass if run 
against a hadoop version lower than 2.8 . The AfterClass method in the 
YarnTestBase however cannot handle this case and fails with an NPE.
{code}
22:33:21.941 [ERROR] 
org.apache.flink.yarn.YarnPrioritySchedulingITCase.org.apache.flink.yarn.YarnPrioritySchedulingITCase
22:33:21.942 [ERROR]   Run 1: YarnPrioritySchedulingITCase.setup:40 » 
AssumptionViolated Priority scheduling...
22:33:21.943 [ERROR]   Run 2: 
YarnPrioritySchedulingITCase>YarnTestBase.teardown:956 » NullPointer
{code}

https://travis-ci.org/apache/flink/jobs/579264625




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13934) HistoryServerStaticFileServerHandlerTest failed on Travis

2019-09-01 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-13934:


 Summary: HistoryServerStaticFileServerHandlerTest failed on Travis
 Key: FLINK-13934
 URL: https://issues.apache.org/jira/browse/FLINK-13934
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.10.0
Reporter: Chesnay Schepler
 Fix For: 1.10.0


{code}
23:34:44.903 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 0.28 s <<< FAILURE! - in 
org.apache.flink.runtime.webmonitor.history.HistoryServerStaticFileServerHandlerTest
23:34:44.917 [ERROR] 
testRespondWithFile(org.apache.flink.runtime.webmonitor.history.HistoryServerStaticFileServerHandlerTest)
  Time elapsed: 0.279 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.flink.runtime.webmonitor.history.HistoryServerStaticFileServerHandlerTest.testRespondWithFile(HistoryServerStaticFileServerHandlerTest.java:66)
{code}
https://travis-ci.org/apache/flink/jobs/579264633



--
This message was sent by Atlassian Jira
(v8.3.2#803003)