from:"Zili Chen"

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-08-20 Thread Zili Chen

Thanks for the clarification.

The idea JobDeployer ever came into my mind when I was muddled with
how to execute per-job mode and session mode with the same user code
and framework codepath.

With the concept JobDeployer we back to the statement that environment
knows every configs of cluster deployment and job submission. We
configure or generate from configuration a specific JobDeployer in
environment and then code align on

*JobClient client = env.execute().get();*

which in session mode returned by clusterClient.submitJob and in per-job
mode returned by clusterDescriptor.deployJobCluster.

Here comes a problem that currently we directly run ClusterEntrypoint
with extracted job graph. Follow the JobDeployer way we'd better
align entry point of per-job deployment at JobDeployer. Users run
their main method or by a Cli(finally call main method) to deploy the
job cluster.

Best,
tison.


Stephan Ewen  于2019年8月20日周二 下午4:40写道：

> Till has made some good comments here.
>
> Two things to add:
>
>   - The job mode is very nice in the way that it runs the client inside the
> cluster (in the same image/process that is the JM) and thus unifies both
> applications and what the Spark world calls the "driver mode".
>
>   - Another thing I would add is that during the FLIP-6 design, we were
> thinking about setups where Dispatcher and JobManager are separate
> processes.
> A Yarn or Mesos Dispatcher of a session could run independently (even
> as privileged processes executing no code).
> Then you the "per-job" mode could still be helpful: when a job is
> submitted to the dispatcher, it launches the JM again in a per-job mode, so
> that JM and TM processes are bound to teh job only. For higher security
> setups, it is important that processes are not reused across jobs.
>
> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann 
> wrote:
>
> > I would not be in favour of getting rid of the per-job mode since it
> > simplifies the process of running Flink jobs considerably. Moreover, it
> is
> > not only well suited for container deployments but also for deployments
> > where you want to guarantee job isolation. For example, a user could use
> > the per-job mode on Yarn to execute his job on a separate cluster.
> >
> > I think that having two notions of cluster deployments (session vs.
> per-job
> > mode) does not necessarily contradict your ideas for the client api
> > refactoring. For example one could have the following interfaces:
> >
> > - ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
> > cluster.
> > - ClusterClient: allows to interact with a cluster
> > - JobClient: allows to interact with a running job
> >
> > Now the ClusterDeploymentDescriptor could have two methods:
> >
> > - ClusterClient deploySessionCluster()
> > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> >
> > where JobClusterClient is either a supertype of ClusterClient which does
> > not give you the functionality to submit jobs or deployPerJobCluster
> > returns directly a JobClient.
> >
> > When setting up the ExecutionEnvironment, one would then not provide a
> > ClusterClient to submit jobs but a JobDeployer which, depending on the
> > selected mode, either uses a ClusterClient (session mode) to submit jobs
> or
> > a ClusterDeploymentDescriptor to deploy per a job mode cluster with the
> job
> > to execute.
> >
> > These are just some thoughts how one could make it working because I
> > believe there is some value in using the per job mode from the
> > ExecutionEnvironment.
> >
> > Concerning the web submission, this is indeed a bit tricky. From a
> cluster
> > management stand point, I would in favour of not executing user code on
> the
> > REST endpoint. Especially when considering security, it would be good to
> > have a well defined cluster behaviour where it is explicitly stated where
> > user code and, thus, potentially risky code is executed. Ideally we limit
> > it to the TaskExecutor and JobMaster.
> >
> > Cheers,
> > Till
> >
> > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier  >
> > wrote:
> >
> > > In my opinion the client should not use any environment to get the Job
> > > graph because the jar should reside ONLY on the cluster (and not in the
> > > client classpath otherwise there are always inconsistencies between
> > client
> > > and Flink Job manager's classpath).
> > > In the YARN, Mesos and Kubernetes scenarios you have the jar but you
> > could
> > > start a cluster that has the jar on the Job Manager as well (but this
> is
>

Re: [DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-20 Thread Zili Chen

Implement question: how to apply the line length rules?

If we just turn on checkstyle rule "LineLength" then a huge
effort is required to break lines those break the rule. If
we use an auto-formatter here then it possibly break line
"just at the position" awfully.

Is it possible we require only to fit the rule on the fly
a pull request touch files?

Best,
tison.


Yu Li  于2019年8月20日周二 下午5:22写道：

> I second Stephan's summarize, and to be more explicit, +1 on:
> - Set a hard line length limit
> - Allow arguments on the same line if below length limit
> - With consistent argument breaking when that length is exceeded
> - Developers can break before that if they feel it helps with readability
>
> FWIW, hbase project also sets the line length limit to 100 [1] (personally
> I don't have any tendency on this, so JFYI).
>
> [1]
>
> https://github.com/apache/hbase/blob/a59f7d4ffc27ea23b9822c3c26d6aeb76ccdf9aa/hbase-checkstyle/src/main/resources/hbase/checkstyle.xml#L128
>
> Best Regards,
> Yu
>
>
> On Mon, 19 Aug 2019 at 18:22, Stephan Ewen  wrote:
>
> > I personally prefer not to break lines with few parameters.
> > It just feels unnecessarily clumsy to parse the breaks if there are only
> > two or three arguments with short names.
> >
> > So +1
> >   - for a hard line length limit
> >   - allowing arguments on the same line if below that limit
> >   - with consistent argument breaking when that length is exceeded
> >   - developers can break before that if they feel it helps with
> > readability.
> >
> > This should be similar to what we have, except for enforcing the line
> > length limit.
> >
> > I think our Java guide originally suggested 120 characters line length,
> but
> > we can reduce that to 100 if a majority argues for that, but it is
> separate
> > discussion.
> > We uses shorter lines in Scala (100 chars) because Scala code becomes
> > harder to read faster with long lines.
> >
> >
> > On Mon, Aug 19, 2019 at 10:45 AM Andrey Zagrebin 
> > wrote:
> >
> > > Hi Everybody,
> > >
> > > Thanks for your feedback guys and sorry for not getting back to the
> > > discussion for some time.
> > >
> > > @SHI Xiaogang
> > > About breaking lines for thrown exceptions:
> > > Indeed that would prevent growing the throw clause indefinitely.
> > > I am a bit concerned about putting the right parenthesis and/or throw
> > > clause on the next line
> > > because in general we do not it and there are a lot of variations of
> how
> > > and what to put to the next line so it needs explicit memorising.
> > > Also, we do not have many checked exceptions and usually avoid them.
> > > Although I am not a big fan of many function arguments either but this
> > > seems to be a bigger problem in the code base.
> > > I would be ok to not enforce anything for exceptions atm.
> > >
> > > @Chesnay Schepler 
> > > Thanks for mentioning automatic checks.
> > > Indeed, pointing out this kind of style issues during PR reviews is
> very
> > > tedious
> > > and cannot really force it without automated tools.
> > > I would still consider the outcome of this discussion as a soft
> > > recommendation atm (which we also have for some other things in the
> code
> > > style draft).
> > > We need more investigation about how to enforce things. I am not so
> > > knowledgable about code style/IDE checks.
> > > From the first glance I also do not see a simple way. If somebody has
> > more
> > > insight please share your experience.
> > >
> > > @Biao Liu 
> > > Line length limitation:
> > > I do not see anything for Java, only for Scala: 100 (also enforced by
> > build
> > > AFAIK).
> > > From what I heard there has been already some discussion about the hard
> > > limit for the line length.
> > > Although quite some people are in favour of it (including me) and seems
> > to
> > > be a nice limitation,
> > > there are some practical implication about it.
> > > Historically, Flink did not have any code style checks and huge chunks
> of
> > > code base have to be reformatted destroying the commit history.
> > > Another thing is value for the limit. Nowadays, we have wide screens
> and
> > do
> > > not often even need to scroll.
> > > Nevertheless, we can kick off another discussion about the line length
> > > limit and enforcing it.
> > > Atm I see people to adhere to a soft recommendation of 120 line length
> > for
> > > Java because it is usually a bit more verbose comparing to Scala.
> > >
> > > *Question 1*:
> > > I would be ok to always break line if there is more than one chained
> > call.
> > > There are a lot of places where this is only one short call I would not
> > > break line in this case.
> > > If it is too confusing I would be ok to stick to the rule to break
> either
> > > all or none.
> > > Thanks for pointing out this explicitly: For a chained method calls,
> the
> > > new line should be started with the dot.
> > > I think it should be also part of the rule if forced.
> > >
> > > *Question 2:*
> > > The indent of new line should be 1 ta

[SURVEY] How do you use high-availability services in Flink?

2019-08-21 Thread Zili Chen

Hi guys,

We want to have an accurate idea of how users actually use
high-availability services in Flink, especially how you customize
high-availability services by HighAvailabilityServicesFactory.

Basically there are standalone impl., zookeeper impl., embedded impl.
used in MiniCluster, YARN impl. not yet implemented, and a gate to
customized implementations.

Generally I think standalone impl. and zookeeper impl. are the most
widely used implementations. The embedded impl. is used without
awareness when users run a MiniCluster.

Besides that, it is helpful to know how you guys customize
high-availability services using HighAvailabilityServicesFactory
interface for the ongoing FLINK-10333[1] which would evolve
high-availability services in Flink. As well as whether there is any
user take interest in the not yet implemented YARN impl..

Any user case should be helpful. I really appreciate your time and your
insight.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-10333

Re: [SURVEY] How do you use high-availability services in Flink?

2019-08-21 Thread Zili Chen

In addition, FLINK-13750[1] also likely introduce breaking change
on high-availability services. So it is highly encouraged you who
might be affected by the change share your cases :-)

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-13750


Zili Chen  于2019年8月21日周三 下午3:32写道：

> Hi guys,
>
> We want to have an accurate idea of how users actually use
> high-availability services in Flink, especially how you customize
> high-availability services by HighAvailabilityServicesFactory.
>
> Basically there are standalone impl., zookeeper impl., embedded impl.
> used in MiniCluster, YARN impl. not yet implemented, and a gate to
> customized implementations.
>
> Generally I think standalone impl. and zookeeper impl. are the most
> widely used implementations. The embedded impl. is used without
> awareness when users run a MiniCluster.
>
> Besides that, it is helpful to know how you guys customize
> high-availability services using HighAvailabilityServicesFactory
> interface for the ongoing FLINK-10333[1] which would evolve
> high-availability services in Flink. As well as whether there is any
> user take interest in the not yet implemented YARN impl..
>
> Any user case should be helpful. I really appreciate your time and your
> insight.
>
> Best,
> tison.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10333
>

Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-21 Thread Zili Chen

;> - Optional for method return values if not performance critical
> > >>>>>>> - Optional can be used for internal methods if it makes sense
> > >>>>>>> - No optional fields
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Till
> > >>>>>>>
> > >>>>>>> On Mon, Aug 5, 2019 at 12:07 PM Aljoscha Krettek <
> > >>> aljos...@apache.org>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> I’m also in favour of using Optional only for method return
> > >> values.
> > >>> I
> > >>>>>>>> could be persuaded to allow them for parameters of internal
> > >> methods
> > >>>> but
> > >>>>>> I’m
> > >>>>>>>> sceptical about that.
> > >>>>>>>>
> > >>>>>>>> Aljoscha
> > >>>>>>>>
> > >>>>>>>>> On 2. Aug 2019, at 15:35, Yu Li  wrote:
> > >>>>>>>>>
> > >>>>>>>>> TL; DR: I second Timo that we should use Optional only as
> method
> > >>>> return
> > >>>>>>>>> type for non-performance critical code.
> > >>>>>>>>>
> > >>>>>>>>>  From the example given on our AvroFactory [1] I also noticed
> > that
> > >>>>>>>> Jetbrains
> > >>>>>>>>> marks the OptionalUsedAsFieldOrParameterType inspection as a
> > >>> warning.
> > >>>>>>>> It's
> > >>>>>>>>> relatively easy to understand why it's not suggested to use
> > >>>> (java.util)
> > >>>>>>>>> Optional as a field since it's not serializable. What made me
> > >> feel
> > >>>>>>>> curious
> > >>>>>>>>> is that why we shouldn't use it as a parameter type, so I did
> > >> some
> > >>>>>>>>> investigation and here is what I found:
> > >>>>>>>>>
> > >>>>>>>>> There's a JB blog talking about java8 top tips [2] where we
> could
> > >>>> find
> > >>>>>>>> the
> > >>>>>>>>> advice around Optional, there I found another blog telling
> about
> > >>> the
> > >>>>>>>>> pragmatic approach of using Optional [3]. Reading further we
> > >> could
> > >>>> see
> > >>>>>>>> the
> > >>>>>>>>> reason why we shouldn't use Optional as parameter type, please
> > >>> allow
> > >>>> me
> > >>>>>>>> to
> > >>>>>>>>> quote here:
> > >>>>>>>>>
> > >>>>>>>>> It is often the case that domain objects hang about in memory
> > >> for a
> > >>>>>> fair
> > >>>>>>>>> while, as processing in the application occurs, making each
> > >>> optional
> > >>>>>>>>> instance rather long-lived (tied to the lifetime of the domain
> > >>>> object).
> > >>>>>>>> By
> > >>>>>>>>> contrast, the Optionalinstance returned from the getter is
> likely
> > >>> to
> > >>>> be
> > >>>>>>>>> very short-lived. The caller will call the getter, interpret
> the
> > >>>>>> result,
> > >>>>>>>>> and then move on. If you know anything about garbage collection
> > >>>> you'll
> > >>>>>>>> know
> > >>>>>>>>> that the JVM handles these short-lived objects well. In
> addition,
> > >>>> there
> > >>>>>>>> is
> > >>>>>>>>> more potential for hotspot to remove the costs of the Optional
> > >>>> instance
> > >>>>>>>>

Re: [DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-21 Thread Zili Chen

Thanks Andrey for driving the discussion. Just for clarification,
what we conclude here are several guidelines without automatic
checker/tool guard them, right?

Best,
tison.


Andrey Zagrebin  于2019年8月21日周三 下午8:18写道：

> Hi All,
>
> I suggest we also conclude this discussion now.
>
> Breaking the line of too long statements (line longness is yet to be fully
> defined) to improve code readability in case of
>
>- Long function argument lists (declaration or call): void func(type1
>arg1, type2 arg2, ...)
>- Long sequence of chained calls:
>list.stream().map(...).reduce(...).collect(...)...
>
> Rules:
>
>- Break the list of arguments/calls if the line exceeds limit or earlier
>if you believe that the breaking would improve the code readability
>- If you break the line then each argument/call should have a separate
>line, including the first one
>- Each new line argument/call should have one extra indentation relative
>to the line of the parent function name or called entity
>- The opening brace always stays on the line of the parent function name
>- The closing brace of the function argument list and the possible
>thrown exception list always stay on the line of the last argument
>- The dot of a chained call is always on the line of that chained call
>proceeding the call at the beginning
>
> Examples of breaking:
>
>- Function arguments
>
> *public **void func(*
> *int arg1,*
> *int arg2,*
> *...)** throws E1, E2, E3 {*
> *...*
> *}*
>
>
>- Chained method calls:
>
> *values*
> *.stream()*
> *.map(*...*)*
> *.collect(...);*
>
>
> I suggest we spawn separate discussion threads (can do as a follow-up)
> about:
>
>- the hard line length limit in Java, possibly to confirm it also for
>Scala (cc @Tison)
>- indentation rules for the broken list of a declared function arguments
>
> If there are no more comments/objections/concerns, I will open a PR to
> capture the discussion outcome.
>
> Best,
> Andrey
>
>
>
> On Wed, Aug 21, 2019 at 8:57 AM Zili Chen  wrote:
>
> > Implement question: how to apply the line length rules?
> >
> > If we just turn on checkstyle rule "LineLength" then a huge
> > effort is required to break lines those break the rule. If
> > we use an auto-formatter here then it possibly break line
> > "just at the position" awfully.
> >
> > Is it possible we require only to fit the rule on the fly
> > a pull request touch files?
> >
> > Best,
> > tison.
> >
> >
> > Yu Li  于2019年8月20日周二 下午5:22写道：
> >
> > > I second Stephan's summarize, and to be more explicit, +1 on:
> > > - Set a hard line length limit
> > > - Allow arguments on the same line if below length limit
> > > - With consistent argument breaking when that length is exceeded
> > > - Developers can break before that if they feel it helps with
> readability
> > >
> > > FWIW, hbase project also sets the line length limit to 100 [1]
> > (personally
> > > I don't have any tendency on this, so JFYI).
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/hbase/blob/a59f7d4ffc27ea23b9822c3c26d6aeb76ccdf9aa/hbase-checkstyle/src/main/resources/hbase/checkstyle.xml#L128
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 19 Aug 2019 at 18:22, Stephan Ewen  wrote:
> > >
> > > > I personally prefer not to break lines with few parameters.
> > > > It just feels unnecessarily clumsy to parse the breaks if there are
> > only
> > > > two or three arguments with short names.
> > > >
> > > > So +1
> > > >   - for a hard line length limit
> > > >   - allowing arguments on the same line if below that limit
> > > >   - with consistent argument breaking when that length is exceeded
> > > >   - developers can break before that if they feel it helps with
> > > > readability.
> > > >
> > > > This should be similar to what we have, except for enforcing the line
> > > > length limit.
> > > >
> > > > I think our Java guide originally suggested 120 characters line
> length,
> > > but
> > > > we can reduce that to 100 if a majority argues for that, but it is
> > > separate
> > > > discussion.
> > > > We uses shorter lines in Scala (100 chars) because Scala code becomes
> > > > harder to read faster

Re: CiBot Update

2019-08-21 Thread Zili Chen

Thanks for your announcement. Nice work!

Best,
tison.


vino yang  于2019年8月22日周四 上午8:14写道：

> +1 for "@flinkbot run travis", it is very convenient.
>
> Chesnay Schepler  于2019年8月21日周三 下午9:12写道：
>
> > Hi everyone,
> >
> > this is an update on recent changes to the CI bot.
> >
> >
> > The bot now cancels builds if a new commit was added to a PR, and
> > cancels all builds if the PR was closed.
> > (This was implemented a while ago; I'm just mentioning it again for
> > discoverability)
> >
> >
> > Additionally, starting today you can now re-trigger a Travis run by
> > writing a comment "@flinkbot run travis"; this means you no longer have
> > to commit an empty commit or do other shenanigans to get another build
> > running.
> > Note that this will /not/ work if the PR was re-opened, until at least 1
> > new build was triggered by a push.
> >
>

Re: [DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-22 Thread Zili Chen

One more question, what do you differ

*public **void func(*
*int arg1,*
*int arg2,*
*...)** throws E1, E2, E3 {*
*...*
*}*

and

*public **void func(*
*int arg1,*
*int arg2,*
*...
*)** throws E1, E2, E3 {*
*...*
*}*

I prefer the latter because parentheses are aligned in a similar way,
as well as the border between declaration and function body is clear.


Zili Chen  于2019年8月22日周四 上午9:53写道：

> Thanks Andrey for driving the discussion. Just for clarification,
> what we conclude here are several guidelines without automatic
> checker/tool guard them, right?
>
> Best,
> tison.
>
>
> Andrey Zagrebin  于2019年8月21日周三 下午8:18写道：
>
>> Hi All,
>>
>> I suggest we also conclude this discussion now.
>>
>> Breaking the line of too long statements (line longness is yet to be fully
>> defined) to improve code readability in case of
>>
>>- Long function argument lists (declaration or call): void func(type1
>>arg1, type2 arg2, ...)
>>- Long sequence of chained calls:
>>list.stream().map(...).reduce(...).collect(...)...
>>
>> Rules:
>>
>>- Break the list of arguments/calls if the line exceeds limit or
>> earlier
>>if you believe that the breaking would improve the code readability
>>- If you break the line then each argument/call should have a separate
>>line, including the first one
>>- Each new line argument/call should have one extra indentation
>> relative
>>to the line of the parent function name or called entity
>>- The opening brace always stays on the line of the parent function
>> name
>>- The closing brace of the function argument list and the possible
>>thrown exception list always stay on the line of the last argument
>>- The dot of a chained call is always on the line of that chained call
>>proceeding the call at the beginning
>>
>> Examples of breaking:
>>
>>- Function arguments
>>
>> *public **void func(*
>> *int arg1,*
>> *int arg2,*
>> *...)** throws E1, E2, E3 {*
>> *...*
>> *}*
>>
>>
>>- Chained method calls:
>>
>> *values*
>> *.stream()*
>> *.map(*...*)*
>> *.collect(...);*
>>
>>
>> I suggest we spawn separate discussion threads (can do as a follow-up)
>> about:
>>
>>- the hard line length limit in Java, possibly to confirm it also for
>>Scala (cc @Tison)
>>- indentation rules for the broken list of a declared function
>> arguments
>>
>> If there are no more comments/objections/concerns, I will open a PR to
>> capture the discussion outcome.
>>
>> Best,
>> Andrey
>>
>>
>>
>> On Wed, Aug 21, 2019 at 8:57 AM Zili Chen  wrote:
>>
>> > Implement question: how to apply the line length rules?
>> >
>> > If we just turn on checkstyle rule "LineLength" then a huge
>> > effort is required to break lines those break the rule. If
>> > we use an auto-formatter here then it possibly break line
>> > "just at the position" awfully.
>> >
>> > Is it possible we require only to fit the rule on the fly
>> > a pull request touch files?
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Yu Li  于2019年8月20日周二 下午5:22写道：
>> >
>> > > I second Stephan's summarize, and to be more explicit, +1 on:
>> > > - Set a hard line length limit
>> > > - Allow arguments on the same line if below length limit
>> > > - With consistent argument breaking when that length is exceeded
>> > > - Developers can break before that if they feel it helps with
>> readability
>> > >
>> > > FWIW, hbase project also sets the line length limit to 100 [1]
>> > (personally
>> > > I don't have any tendency on this, so JFYI).
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/hbase/blob/a59f7d4ffc27ea23b9822c3c26d6aeb76ccdf9aa/hbase-checkstyle/src/main/resources/hbase/checkstyle.xml#L128
>> > >
>> > > Best Regards,
>> > > Yu
>> > >
>> > >
>> > > On Mon, 19 Aug 2019 at 18:22, Stephan Ewen  wrote:
>> > >
>> > > > I personally prefer not to break lines with few parameters.
>> > > > It just feels unnecessarily clumsy to parse the breaks if there are
>> > only
>> > > > two or three

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Zili Chen

Congratulations!

Thanks Gordon and Kurt for being the release manager.

Thanks all the contributors who have made this release possible.

Best,
tison.


Jark Wu  于2019年8月22日周四 下午8:11写道：

> Congratulations!
>
> Thanks Gordon and Kurt for being the release manager and thanks a lot to
> all the contributors.
>
>
> Cheers,
> Jark
>
> On Thu, 22 Aug 2019 at 20:06, Oytun Tez  wrote:
>
>> Congratulations team; thanks for the update, Gordon.
>>
>> ---
>> Oytun Tez
>>
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oy...@motaword.com — www.motaword.com
>>
>>
>> On Thu, Aug 22, 2019 at 8:03 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.9.0, which is the latest major release.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this new major release:
>>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Cheers,
>>> Gordon
>>>
>>

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-08-22 Thread Zili Chen

Hi Yang,

It would be helpful if you check Stephan's last comment,
which states that isolation is important.

For per-job mode, we run a dedicated cluster(maybe it
should have been a couple of JM and TMs during FLIP-6
design) for a specific job. Thus the process is prevented
from other jobs.

In our cases there was a time we suffered from multi
jobs submitted by different users and they affected
each other so that all ran into an error state. Also,
run the client inside the cluster could save client
resource at some points.

However, we also face several issues as you mentioned,
that in per-job mode it always uses parent classloader
thus classloading issues occur.

BTW, one can makes an analogy between session/per-job mode
in  Flink, and client/cluster mode in Spark.

Best,
tison.


Yang Wang  于2019年8月22日周四 上午11:25写道：

> From the user's perspective, it is really confused about the scope of
> per-job cluster.
>
>
> If it means a flink cluster with single job, so that we could get better
> isolation.
>
> Now it does not matter how we deploy the cluster, directly deploy(mode1)
>
> or start a flink cluster and then submit job through cluster client(mode2).
>
>
> Otherwise, if it just means directly deploy, how should we name the mode2,
>
> session with job or something else?
>
> We could also benefit from the mode2. Users could get the same isolation
> with mode1.
>
> The user code and dependencies will be loaded by user class loader
>
> to avoid class conflict with framework.
>
>
>
> Anyway, both of the two submission modes are useful.
>
> We just need to clarify the concepts.
>
>
>
>
> Best,
>
> Yang
>
> Zili Chen  于2019年8月20日周二 下午5:58写道：
>
> > Thanks for the clarification.
> >
> > The idea JobDeployer ever came into my mind when I was muddled with
> > how to execute per-job mode and session mode with the same user code
> > and framework codepath.
> >
> > With the concept JobDeployer we back to the statement that environment
> > knows every configs of cluster deployment and job submission. We
> > configure or generate from configuration a specific JobDeployer in
> > environment and then code align on
> >
> > *JobClient client = env.execute().get();*
> >
> > which in session mode returned by clusterClient.submitJob and in per-job
> > mode returned by clusterDescriptor.deployJobCluster.
> >
> > Here comes a problem that currently we directly run ClusterEntrypoint
> > with extracted job graph. Follow the JobDeployer way we'd better
> > align entry point of per-job deployment at JobDeployer. Users run
> > their main method or by a Cli(finally call main method) to deploy the
> > job cluster.
> >
> > Best,
> > tison.
> >
> >
> > Stephan Ewen  于2019年8月20日周二 下午4:40写道：
> >
> > > Till has made some good comments here.
> > >
> > > Two things to add:
> > >
> > >   - The job mode is very nice in the way that it runs the client inside
> > the
> > > cluster (in the same image/process that is the JM) and thus unifies
> both
> > > applications and what the Spark world calls the "driver mode".
> > >
> > >   - Another thing I would add is that during the FLIP-6 design, we were
> > > thinking about setups where Dispatcher and JobManager are separate
> > > processes.
> > > A Yarn or Mesos Dispatcher of a session could run independently
> (even
> > > as privileged processes executing no code).
> > > Then you the "per-job" mode could still be helpful: when a job is
> > > submitted to the dispatcher, it launches the JM again in a per-job
> mode,
> > so
> > > that JM and TM processes are bound to teh job only. For higher security
> > > setups, it is important that processes are not reused across jobs.
> > >
> > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann 
> > > wrote:
> > >
> > > > I would not be in favour of getting rid of the per-job mode since it
> > > > simplifies the process of running Flink jobs considerably. Moreover,
> it
> > > is
> > > > not only well suited for container deployments but also for
> deployments
> > > > where you want to guarantee job isolation. For example, a user could
> > use
> > > > the per-job mode on Yarn to execute his job on a separate cluster.
> > > >
> > > > I think that having two notions of cluster deployments (session vs.
> > > per-job
> > > > mode) does not necessarily contradict your ideas for the client api
> > > > refac

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Zili Chen

Hi Gavin,

I also find a problem in shell if the directory contain whitespace
then the final command to run is incorrect. Could you check the
final command to be executed?

FYI, here is the ticket[1].

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-13827


Gavin Lee  于2019年8月23日周五 下午3:36写道：

> Why bin/start-scala-shell.sh local return following error?
>
> bin/start-scala-shell.sh local
>
> Error: Could not find or load main class
> org.apache.flink.api.scala.FlinkShell
> For flink 1.8.1 and previous ones, no such issues.
>
> On Fri, Aug 23, 2019 at 2:05 PM qi luo  wrote:
>
>> Congratulations and thanks for the hard work!
>>
>> Qi
>>
>> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.0, which is the latest major release.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this new major release:
>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Cheers,
>> Gordon
>>
>>
>>
>
> --
> Gavin
>

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Zili Chen

I downloaded 1.9.0 dist from here[1] and didn't see the issue. Where do you
download it? Could you try to download the dist from [1] and see whether
the problem last?

Best,
tison.

[1]
http://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.9.0/flink-1.9.0-bin-scala_2.11.tgz


Gavin Lee  于2019年8月23日周五 下午4:34写道：

> Thanks for your reply @Zili.
> I'm afraid it's not the same issue.
> I found that the FlinkShell.class was not included in flink dist jar file
> in 1.9.0 version.
> Nowhere can find this class file inside jar, either in opt or lib
> directory under root folder of flink distribution.
>
>
> On Fri, Aug 23, 2019 at 4:10 PM Zili Chen  wrote:
>
>> Hi Gavin,
>>
>> I also find a problem in shell if the directory contain whitespace
>> then the final command to run is incorrect. Could you check the
>> final command to be executed?
>>
>> FYI, here is the ticket[1].
>>
>> Best,
>> tison.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-13827
>>
>>
>> Gavin Lee  于2019年8月23日周五 下午3:36写道：
>>
>>> Why bin/start-scala-shell.sh local return following error?
>>>
>>> bin/start-scala-shell.sh local
>>>
>>> Error: Could not find or load main class
>>> org.apache.flink.api.scala.FlinkShell
>>> For flink 1.8.1 and previous ones, no such issues.
>>>
>>> On Fri, Aug 23, 2019 at 2:05 PM qi luo  wrote:
>>>
>>>> Congratulations and thanks for the hard work!
>>>>
>>>> Qi
>>>>
>>>> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai 
>>>> wrote:
>>>>
>>>> The Apache Flink community is very happy to announce the release of
>>>> Apache Flink 1.9.0, which is the latest major release.
>>>>
>>>> Apache Flink® is an open-source stream processing framework for
>>>> distributed, high-performing, always-available, and accurate data streaming
>>>> applications.
>>>>
>>>> The release is available for download at:
>>>> https://flink.apache.org/downloads.html
>>>>
>>>> Please check out the release blog post for an overview of the
>>>> improvements for this new major release:
>>>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>>>
>>>> The full release notes are available in Jira:
>>>>
>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>>>>
>>>> We would like to thank all contributors of the Apache Flink community
>>>> who made this release possible!
>>>>
>>>> Cheers,
>>>> Gordon
>>>>
>>>>
>>>>
>>>
>>> --
>>> Gavin
>>>
>>
>
> --
> Gavin
>

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-08-23 Thread Zili Chen

Hi Till,

Thanks for your update. Nice to hear :-)

Best,
tison.


Till Rohrmann  于2019年8月23日周五 下午10:39写道：

> Hi Tison,
>
> just a quick comment concerning the class loading issues when using the per
> job mode. The community wants to change it so that the
> StandaloneJobClusterEntryPoint actually uses the user code class loader
> with child first class loading [1]. Hence, I hope that this problem will be
> resolved soon.
>
> [1] https://issues.apache.org/jira/browse/FLINK-13840
>
> Cheers,
> Till
>
> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas  wrote:
>
> > Hi all,
> >
> > On the topic of web submission, I agree with Till that it only seems
> > to complicate things.
> > It is bad for security, job isolation (anybody can submit/cancel jobs),
> > and its
> > implementation complicates some parts of the code. So, if it were to
> > redesign the
> > WebUI, maybe this part could be left out. In addition, I would say
> > that the ability to cancel
> > jobs could also be left out.
> >
> > Also I would also be in favour of removing the "detached" mode, for
> > the reasons mentioned
> > above (i.e. because now we will have a future representing the result
> > on which the user
> > can choose to wait or not).
> >
> > Now for the separating job submission and cluster creation, I am in
> > favour of keeping both.
> > Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
> > and also Zili seems
> > to agree. They mainly have to do with security, isolation and ease of
> > resource management
> > for the user as he knows that "when my job is done, everything will be
> > cleared up". This is
> > also the experience you get when launching a process on your local OS.
> >
> > On excluding the per-job mode from returning a JobClient or not, I
> > believe that eventually
> > it would be nice to allow users to get back a jobClient. The reason is
> > that 1) I cannot
> > find any objective reason why the user-experience should diverge, and
> > 2) this will be the
> > way that the user will be able to interact with his running job.
> > Assuming that the necessary
> > ports are open for the REST API to work, then I think that the
> > JobClient can run against the
> > REST API without problems. If the needed ports are not open, then we
> > are safe to not return
> > a JobClient, as the user explicitly chose to close all points of
> > communication to his running job.
> >
> > On the topic of not hijacking the "env.execute()" in order to get the
> > Plan, I definitely agree but
> > for the proposal of having a "compile()" method in the env, I would
> > like to have a better look at
> > the existing code.
> >
> > Cheers,
> > Kostas
> >
> > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen  wrote:
> > >
> > > Hi Yang,
> > >
> > > It would be helpful if you check Stephan's last comment,
> > > which states that isolation is important.
> > >
> > > For per-job mode, we run a dedicated cluster(maybe it
> > > should have been a couple of JM and TMs during FLIP-6
> > > design) for a specific job. Thus the process is prevented
> > > from other jobs.
> > >
> > > In our cases there was a time we suffered from multi
> > > jobs submitted by different users and they affected
> > > each other so that all ran into an error state. Also,
> > > run the client inside the cluster could save client
> > > resource at some points.
> > >
> > > However, we also face several issues as you mentioned,
> > > that in per-job mode it always uses parent classloader
> > > thus classloading issues occur.
> > >
> > > BTW, one can makes an analogy between session/per-job mode
> > > in  Flink, and client/cluster mode in Spark.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Yang Wang  于2019年8月22日周四 上午11:25写道：
> > >
> > > > From the user's perspective, it is really confused about the scope of
> > > > per-job cluster.
> > > >
> > > >
> > > > If it means a flink cluster with single job, so that we could get
> > better
> > > > isolation.
> > > >
> > > > Now it does not matter how we deploy the cluster, directly
> > deploy(mode1)
> > > >
> > > > or start a flink cluster and then submit job through cluster
> > client(mode2).
> > > >
> > >

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Zili Chen

Hi Till,

Did we mention this in release note(or maybe previous release note where we
did the exclusion)?

Best,
tison.


Till Rohrmann  于2019年8月23日周五 下午10:28写道：

> Hi Gavin,
>
> if I'm not mistaken, then the community excluded the Scala FlinkShell
> since a couple of versions for Scala 2.12. The problem seems to be that
> some of the tests failed. See here [1] for more information.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10911
>
> Cheers,
> Till
>
> On Fri, Aug 23, 2019 at 11:44 AM Gavin Lee  wrote:
>
>> I used package on apache official site, with mirror [1], the difference is
>> I used scala 2.12 version.
>> I also tried to build from source for both scala 2.11 and 2.12, when build
>> 2.12 the FlinkShell.class is in flink-dist jar file but after running mvn
>> clean package -Dscala-2.12, this class was removed in flink-dist_2.12-1.9
>> jar
>> file.
>> Seems broken here for scala 2.12, right?
>>
>> [1]
>>
>> http://mirror.bit.edu.cn/apache/flink/flink-1.9.0/flink-1.9.0-bin-scala_2.12.tgz
>>
>> On Fri, Aug 23, 2019 at 4:37 PM Zili Chen  wrote:
>>
>> > I downloaded 1.9.0 dist from here[1] and didn't see the issue. Where do
>> you
>> > download it? Could you try to download the dist from [1] and see whether
>> > the problem last?
>> >
>> > Best,
>> > tison.
>> >
>> > [1]
>> >
>> http://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.9.0/flink-1.9.0-bin-scala_2.11.tgz
>> >
>> >
>> > Gavin Lee  于2019年8月23日周五 下午4:34写道：
>> >
>> >> Thanks for your reply @Zili.
>> >> I'm afraid it's not the same issue.
>> >> I found that the FlinkShell.class was not included in flink dist jar
>> file
>> >> in 1.9.0 version.
>> >> Nowhere can find this class file inside jar, either in opt or lib
>> >> directory under root folder of flink distribution.
>> >>
>> >>
>> >> On Fri, Aug 23, 2019 at 4:10 PM Zili Chen 
>> wrote:
>> >>
>> >>> Hi Gavin,
>> >>>
>> >>> I also find a problem in shell if the directory contain whitespace
>> >>> then the final command to run is incorrect. Could you check the
>> >>> final command to be executed?
>> >>>
>> >>> FYI, here is the ticket[1].
>> >>>
>> >>> Best,
>> >>> tison.
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/FLINK-13827
>> >>>
>> >>>
>> >>> Gavin Lee  于2019年8月23日周五 下午3:36写道：
>> >>>
>> >>>> Why bin/start-scala-shell.sh local return following error?
>> >>>>
>> >>>> bin/start-scala-shell.sh local
>> >>>>
>> >>>> Error: Could not find or load main class
>> >>>> org.apache.flink.api.scala.FlinkShell
>> >>>> For flink 1.8.1 and previous ones, no such issues.
>> >>>>
>> >>>> On Fri, Aug 23, 2019 at 2:05 PM qi luo  wrote:
>> >>>>
>> >>>>> Congratulations and thanks for the hard work!
>> >>>>>
>> >>>>> Qi
>> >>>>>
>> >>>>> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> >>>>> wrote:
>> >>>>>
>> >>>>> The Apache Flink community is very happy to announce the release of
>> >>>>> Apache Flink 1.9.0, which is the latest major release.
>> >>>>>
>> >>>>> Apache Flink® is an open-source stream processing framework for
>> >>>>> distributed, high-performing, always-available, and accurate data
>> streaming
>> >>>>> applications.
>> >>>>>
>> >>>>> The release is available for download at:
>> >>>>> https://flink.apache.org/downloads.html
>> >>>>>
>> >>>>> Please check out the release blog post for an overview of the
>> >>>>> improvements for this new major release:
>> >>>>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>> >>>>>
>> >>>>> The full release notes are available in Jira:
>> >>>>>
>> >>>>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>> >>>>>
>> >>>>> We would like to thank all contributors of the Apache Flink
>> community
>> >>>>> who made this release possible!
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Gordon
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> Gavin
>> >>>>
>> >>>
>> >>
>> >> --
>> >> Gavin
>> >>
>> >
>>
>> --
>> Gavin
>>
>

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

2019-08-23 Thread Zili Chen

Hi Stephan,

I like the idea unify usage of time/duration api. We actually
use at least five different classes for this purposes(see below).

One thing I'd like to pick up is that duration configuration
in Flink is almost in pattern as "60 s" that fits in the pattern
parsed by scala.concurrent.duration.Duration. AFAIK Duration
in Java 8 doesn't support this pattern. However, we can solve
it by introduce a DurationUtils.

Also to clarify, we now have (correct me if any other)

java.time.Duration
scala.concurrent.duration.Duration
scala.concurrent.duration.FiniteDuration
org.apache.flink.api.common.time.Time
org.apache.flink.streaming.api.windowing.time.Time

in use. If we'd prefer java.time.Duration, it is worth to consider
whether we unify all of them into Java's Duration, i.e., Java's
Duration is the first class time/duration api, while others should
be converted into or out from it.

Best,
tison.


Stephan Ewen  于2019年8月23日周五 下午10:45写道：

> Hi all!
>
> Many parts of the code use Flink's "Time" class. The Time really is a "time
> interval" or a "Duration".
>
> Since Java 8, there is a Java class "Duration" that is nice and flexible to
> use.
> I would suggest we start using Java Duration instead and drop Time as much
> as possible in the runtime from now on.
>
> Maybe even drop that class from the API in Flink 2.0.
>
> Best,
> Stephan
>

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

2019-08-23 Thread Zili Chen

Hi vino,

I agree that it introduces extra complexity to replace Duration(Scala)
with Duration(Java) *in Scala code*. We could separate the usage for each
language and use a bridge when necessary.

As a matter of fact, Scala concurrent APIs(including Duration) are used
more than necessary at least in flink-runtime. Also we even try to make
flink-runtime scala free.

Best,
tison.


vino yang  于2019年8月24日周六 上午10:05写道：

> +1 to replace the Time class provided by Flink with Java's Duration:
>
>
>- Java's Duration has better representation than the Flink's Time class;
>- As a built-in Java class, Duration class has a clear advantage over
>Java's Time class when interacting with other Java APIs and third-party
>libraries;
>
>
> But I have reservations about replacing the Duration and FineDuration
> classes in scala with the Duration class in Java. Java and Scala have
> different types of systems. Currently, Duration (scala) and FineDuration
> (scala) work well.  In addition, this work brings additional complexity and
> cost compared to the gains obtained.
>
> Best,
> Vino
>
> Zili Chen  于2019年8月23日周五 下午11:14写道：
>
> > Hi Stephan,
> >
> > I like the idea unify usage of time/duration api. We actually
> > use at least five different classes for this purposes(see below).
> >
> > One thing I'd like to pick up is that duration configuration
> > in Flink is almost in pattern as "60 s" that fits in the pattern
> > parsed by scala.concurrent.duration.Duration. AFAIK Duration
> > in Java 8 doesn't support this pattern. However, we can solve
> > it by introduce a DurationUtils.
> >
> > Also to clarify, we now have (correct me if any other)
> >
> > java.time.Duration
> > scala.concurrent.duration.Duration
> > scala.concurrent.duration.FiniteDuration
> > org.apache.flink.api.common.time.Time
> > org.apache.flink.streaming.api.windowing.time.Time
> >
> > in use. If we'd prefer java.time.Duration, it is worth to consider
> > whether we unify all of them into Java's Duration, i.e., Java's
> > Duration is the first class time/duration api, while others should
> > be converted into or out from it.
> >
> > Best,
> > tison.
> >
> >
> > Stephan Ewen  于2019年8月23日周五 下午10:45写道：
> >
> > > Hi all!
> > >
> > > Many parts of the code use Flink's "Time" class. The Time really is a
> > "time
> > > interval" or a "Duration".
> > >
> > > Since Java 8, there is a Java class "Duration" that is nice and
> flexible
> > to
> > > use.
> > > I would suggest we start using Java Duration instead and drop Time as
> > much
> > > as possible in the runtime from now on.
> > >
> > > Maybe even drop that class from the API in Flink 2.0.
> > >
> > > Best,
> > > Stephan
> > >
> >
>

Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper connections

2019-08-25 Thread Zili Chen

Hi Till,

I'd like to revive this thread since 1.9.0 has been released.

IMHO we already reached a consensus on JIRA and if you can review
the pull request we hopefully address the issue in next release.

Best,
tison.


Zili Chen  于2019年7月29日周一 下午11:05写道：

> Hi Till,
>
> Thanks for your explanation. Let's pick up this thread in 1.10 developing.
>
> Best,
> tison.
>
>
> Till Rohrmann  于2019年7月29日周一 下午9:12写道：
>
>> Hi Tison,
>>
>> I would consider this a new feature and as such it won't be possible to
>> include it in the 1.9.0 release since the feature freeze has been passed.
>> We might target 1.10, though.
>>
>> Cheers,
>> Till
>>
>> On Mon, Jul 29, 2019 at 3:01 AM Zili Chen  wrote:
>>
>> > Hi committers,
>> >
>> > Now that we have an ongoing pr[1] to this JIRA, we need a committer
>> > to push this thread forward. It would be glad to see this issue fixed
>> > in 1.9.0.
>> >
>> > Best,
>> > tison.
>> >
>> > [1] https://github.com/apache/flink/pull/9158
>> >
>> >
>> > 未来阳光 <2217232...@qq.com> 于2019年7月23日周二 下午9:28写道：
>> >
>> > > Ok, If you have any suggestions, we can talk aobut the details under
>> > > FLINK-10052.
>> > >
>> > >
>> > > Best.
>> > >
>> > >
>> > > -- 原始邮件 --
>> > > 发件人: "Till Rohrmann";
>> > > 发送时间: 2019年7月23日(星期二) 晚上9:19
>> > > 收件人: "dev";
>> > >
>> > > 主题: Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper
>> connections
>> > >
>> > >
>> > >
>> > > Hi Lamber-Ken,
>> > >
>> > > thanks for starting this discussion. I think there is benefit of not
>> > > directly losing leadership if the ZooKeeper connection goes into the
>> > > SUSPENDED state. In particular if we can guarantee that there is only
>> a
>> > > single JobMaster, it might make sense to not overly eagerly give up
>> > > leadership. I would suggest to continue the technical discussion on
>> the
>> > > JIRA issue thread since it already contains a good amount of details.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Sat, Jul 20, 2019 at 12:55 PM QQ邮箱 <2217232...@qq.com> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > Desc
>> > > > We deploy flink streaming jobs on hadoop cluster on per-job model
>> and
>> > use
>> > > > zookeeper as HighAvailabilityService, but we found that flink job
>> will
>> > > > restart because of the network disconnected temporarily between
>> > > jobmanager
>> > > > and zookeeper.So we analyze this problem deeply. Flink JobManager
>> use
>> > > > curator's `LeaderLatch` to maintain the leadership. When network
>> > > > disconncet, the `LeaderLatch` will change leadership to false
>> directly.
>> > > We
>> > > > think it's too brutally that many flink longrunning jobs will
>> restart
>> > > > because of the network shake.Instead of directly revoking the
>> > leadership
>> > > > upon a SUSPENDED ZooKeeper connection, it would be better to wait
>> until
>> > > the
>> > > > ZooKeeper connection is LOST.
>> > > >
>> > > > Here're two jiras about the problem, FLINK-10052 and FLINK-13189,
>> they
>> > > are
>> > > > duplicate. Thanks to @Elias Levy told us that FLINK-13189, so close
>> > > > FLINK-13189.
>> > > >
>> > > > Solution
>> > > > Back to this problem, there're two ways to solve this currently,
>> one is
>> > > > rewrite LeaderLatch#handleStateChange method, another is upgrade
>> > > > curator-4.2.0. The first way is hackly but right, the second way
>> need
>> > to
>> > > > consider the
>> > > > compatibility. For more detail, please see FLINK-10052.
>> > > >
>> > > > Hope
>> > > > The FLINK-10052 was reported at 2018-08-03(about a year ago), so we
>> > hope
>> > > > this problem can fix as soon as possible.
>> > > > btw, thanks @TisonKun for talking about this problem and review pr.
>> > > >
>> > > > Links
>> > > > FLINK-10052 https://issues.apache.org/jira/browse/FLINK-10052 <
>> > > > https://issues.apache.org/jira/browse/FLINK-10052>
>> > > > FLINK-13189 https://issues.apache.org/jira/browse/FLINK-13189 <
>> > > > https://issues.apache.org/jira/browse/FLINK-13189>
>> > > >
>> > > > Any suggestion is welcome, what do you think?
>> > > >
>> > > > Best, lamber-ken.
>> >
>>
>

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-26 Thread Zili Chen

Hi Oytun,

I think it intents to publish flink-queryable-state-client-java
without scala suffix since it is scala-free. An artifact without
scala suffix has been published [2].

See also [1].

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-12602
[2]
https://mvnrepository.com/artifact/org.apache.flink/flink-queryable-state-client-java/1.9.0



Till Rohrmann  于2019年8月26日周一 下午3:50写道：

> The missing support for the Scala shell with Scala 2.12 was documented in
> the 1.7 release notes [1].
>
> @Oytun, the docker image should be updated in a bit. Sorry for the
> inconveniences. Thanks for the pointer that
> flink-queryable-state-client-java_2.11 hasn't been published. We'll upload
> this in a bit.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html#scala-shell-does-not-work-with-scala-212
>
> Cheers,
> Till
>
> On Sat, Aug 24, 2019 at 12:14 PM chaojianok  wrote:
>
>> Congratulations and thanks!
>> At 2019-08-22 20:03:26, "Tzu-Li (Gordon) Tai" 
>> wrote:
>> >The Apache Flink community is very happy to announce the release of
>> Apache
>> >Flink 1.9.0, which is the latest major release.
>> >
>> >Apache Flink® is an open-source stream processing framework for
>> >distributed, high-performing, always-available, and accurate data
>> streaming
>> >applications.
>> >
>> >The release is available for download at:
>> >https://flink.apache.org/downloads.html
>> >
>> >Please check out the release blog post for an overview of the
>> improvements
>> >for this new major release:
>> >https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>> >
>> >The full release notes are available in Jira:
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>> >
>> >We would like to thank all contributors of the Apache Flink community who
>> >made this release possible!
>> >
>> >Cheers,
>> >Gordon
>>
>

[DISCUSS] Builder dedicated for testing

2019-08-26 Thread Zili Chen

Hi devs,

I'd like to share an observation that we have too many
@VisibleForTesting constructors that only used in test scope such as
ExecutionGraph and RestClusterClient.

It would be helpful if we introduce Builders in test scope for build
such instance and remain the production code only necessary
constructors.

Otherwise, code becomes in mess and contributors might be confused by
a series constructors but some of them are just for testing. Note that
@VisibleForTesting doesn't mean a method is *only* for testing.

Best,
tison.

Re: [VOTE] FLIP-54: Evolve ConfigOption and Configuration

2019-08-28 Thread Zili Chen

The design looks good to me.

+1 go ahead!

Best,
tison.


Jark Wu  于2019年8月28日周三 下午6:08写道：

> Hi Timo,
>
> The new changes looks good to me.
>
> +1 to the FLIP.
>
>
> Cheers,
> Jark
>
> On Wed, 28 Aug 2019 at 16:02, Timo Walther  wrote:
>
> > Hi everyone,
> >
> > after some last minute changes yesterday, I would like to start a new
> > vote on FLIP-54. The discussion seems to have reached an agreement. Of
> > course this doesn't mean that we can't propose further improvements on
> > ConfigOption's and Flink configuration in general in the future. It is
> > just one step towards having a better unified configuration for the
> > project.
> >
> > Please vote for the following design document:
> >
> >
> >
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#
> >
> > The discussion can be found at:
> >
> >
> >
> https://lists.apache.org/thread.html/a56c6b52e5f828d4a737602b031e36b5dd6eaa97557306696a8063a9@%3Cdev.flink.apache.org%3E
> >
> > This voting will be open for at least 72 hours. I'll try to close it on
> > 2019-09-02 8:00 UTC, unless there is an objection or not enough votes.
> >
> > I will convert it to a Wiki page afterwards.
> >
> > Thanks,
> >
> > Timo
> >
> >
>

Re: [DISCUSS] FLIP-50: Spill-able Heap Keyed State Backend

2019-08-29 Thread Zili Chen

Hi Yu,

Notice that the wiki is still marked as "*Under Discussion*" state.

I think you can update it correspondingly.

Best,
tison.


Yu Li  于2019年8月20日周二 下午10:28写道：

> Sorry for the lag but since we've got a consensus days ago, I started a
> vote thread which will have a result by EOD, thus I'm closing this
> discussion thread. Thanks all for the participation and
> comments/suggestions!
>
> Best Regards,
> Yu
>
>
> On Fri, 16 Aug 2019 at 09:09, Till Rohrmann  wrote:
>
> > +1 for this FLIP and the feature. I think this feature will be super
> > helpful for many Flink users.
> >
> > Once the SpillableHeapKeyedStateBackend has proven to be superior to the
> > HeapKeyedStateBackend we should think about removing the latter
> completely
> > to reduce maintenance burden.
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 16, 2019 at 4:06 AM Congxian Qiu 
> > wrote:
> >
> > > Big +1 for this feature.
> > >
> > > This FLIP can help improves at least the following two scenarios:
> > > - Temporary data peak when using Heap StateBackend
> > > - Heap State Backend has better performance than RocksDBStateBackend,
> > > especially on SATA disk. there are some guys ever told me that they
> > > increased the parallelism of operators(and use HeapStateBackend) other
> > than
> > > use RocksDBStateBackend to get better performance. But increase
> > parallelism
> > > will have some other problems, after this FLIP, we can run Flink Job
> with
> > > the same parallelism as RocksDBStateBackend and get better performance
> > > also.
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > Yu Li  于2019年8月16日周五 上午12:14写道：
> > >
> > > > Thanks all for the reviews and comments!
> > > >
> > > > bq. From the implementation plan, it looks like this exists purely
> in a
> > > new
> > > > module and does not require any changes in other parts of Flink's
> code.
> > > Can
> > > > you confirm that?
> > > > Confirmed, thanks!
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Thu, 15 Aug 2019 at 18:04, Tzu-Li (Gordon) Tai <
> tzuli...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > +1 to start a VOTE for this FLIP.
> > > > >
> > > > > Given the properties of this new state backend and that it will
> exist
> > > as
> > > > a
> > > > > new module without touching the original heap backend, I don't see
> a
> > > harm
> > > > > in including this.
> > > > > Regarding design of the feature, I've already mentioned my comments
> > in
> > > > the
> > > > > original discussion thread.
> > > > >
> > > > > Cheers,
> > > > > Gordon
> > > > >
> > > > > On Thu, Aug 15, 2019 at 5:53 PM Yun Tang  wrote:
> > > > >
> > > > > > Big +1 for this feature.
> > > > > >
> > > > > > Our customers including me, have ever met dilemma where we have
> to
> > > use
> > > > > > window to aggregate events in applications like real-time
> > monitoring.
> > > > The
> > > > > > larger of timer and window state, the poor performance of
> RocksDB.
> > > > > However,
> > > > > > switching to use FsStateBackend would always make me feel fear
> > about
> > > > the
> > > > > > OOM errors.
> > > > > >
> > > > > > Look forward for more powerful enrichment to state-backend, and
> > help
> > > > > Flink
> > > > > > to achieve better performance together.
> > > > > >
> > > > > > Best
> > > > > > Yun Tang
> > > > > > 
> > > > > > From: Stephan Ewen 
> > > > > > Sent: Thursday, August 15, 2019 23:07
> > > > > > To: dev 
> > > > > > Subject: Re: [DISCUSS] FLIP-50: Spill-able Heap Keyed State
> Backend
> > > > > >
> > > > > > +1 for this feature. I think this will be appreciated by users,
> as
> > a
> > > > way
> > > > > to
> > > > > > use the HeapStateBackend with a safety-net against OOM errors.
> > > > > > And having had major production exposure is great.
> > > > > >
> > > > > > From the implementation plan, it looks like this exists purely
> in a
> > > new
> > > > > > module and does not require any changes in other parts of Flink's
> > > code.
> > > > > Can
> > > > > > you confirm that?
> > > > > >
> > > > > > Other that that, I have no further questions and we could proceed
> > to
> > > > vote
> > > > > > on this FLIP, from my side.
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 13, 2019 at 10:00 PM Yu Li  wrote:
> > > > > >
> > > > > > > Sorry for forgetting to give the link of the FLIP, here it is:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Yu
> > > > > > >
> > > > > > >
> > > > > > > On Tue, 13 Aug 2019 at 18:06, Yu Li  wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > We ever held a discussion about this feature before [1] but
> now
> > > > > opening
> > > > > > > > another thread because after a second thought introducing a
> new
> > > > > backend
>

[PROPOSAL] Force rebase on master before merge

2019-08-29 Thread Zili Chen

Hi devs,

GitHub provides a mechanism which is able to require branches to be
up to date before merged[1](point 6). I can see several advantages
enabling it. Thus propose our project to turn on this switch. Below are
my concerns. Looking forward to your insights.

1. Avoid CI failures in pr which fixed by another commit. We now merge a
pull request even if CI fails but the failures knowns as flaky tests.
We doesn't resolve this by turn on the switch but it helps to find any
other potential valid failures.

2. Avoid CI failures in master after pull request merged. Actually, CI
tests the branch that pull request bind exactly. Even if it gave green
it is still possible a systematic failure introduced because conflicts
with another new commit merged in master but not merged in this branch.

For the downside, it might require contributors rebase his pull requests
some times before getting merged. But it should not inflict too much
works.

Best,
tison.

[1] https://help.github.com/en/articles/enabling-required-status-checks

Re: [PROPOSAL] Force rebase on master before merge

2019-08-29 Thread Zili Chen

Hi Kurt,

Thanks for your reply!

I find two concerns about the downside from your email. Correct
me if I misunderstanding.

1. Rebase times. Typically commits are independent one another, rebase
just fast-forward changes so that contributors rarely resolve conflicts
by himself. Reviews doesn't get blocked by this force rebase if there is
a green travis report ever -- just require contributor rebase and test
again, which generally doesn't involve changes(unless resolve conflicts).
Contributor rebases his pull request when he has spare time or is required
by reviewer/before getting merged. This should not inflict too much works.

2. Testing time. It is a separated topic that discussed in this thread[1].
I don't think we finally live with a long testing time, so it won't be a
problem then we trigger multiple tests.

Simply sum up, for trivial cases, works are trivial and it
prevents accidentally
failures; for complicated cases, it already requires rebase and fully tests.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E


Kurt Young  于2019年8月30日周五 上午9:15写道：

> Hi Zili,
>
> Thanks for the proposal, I had similar confusion in the past with your
> point #2.
> Force rebase to master before merging can solve some problems, but it also
> introduces new problem. Given the CI testing time is quite long (couple of
> hours)
> now, it's highly possible that before your test which triggered by rebasing
> finishes,
> the master will get some more new commits. This situation will get worse if
> more
> people are doing this. One possible solution is let the committer decide
> what should
> do before he/she merges it. If it's a trivial issue, just merge it if
> travis passes is
> fine. But if it's a rather big one, and some related codes just got merged
> in to master,
> I will choose to rebase to master and push it to my own repo to trigger my
> personal
> CI test on it because this can guarantee the testing time.
>
> To summarize: I think this should be decided by the committer who is
> merging the PR,
> but not be forced.
>
> Best,
> Kurt
>
>
> On Thu, Aug 29, 2019 at 11:07 PM Zili Chen  wrote:
>
> > Hi devs,
> >
> > GitHub provides a mechanism which is able to require branches to be
> > up to date before merged[1](point 6). I can see several advantages
> > enabling it. Thus propose our project to turn on this switch. Below are
> > my concerns. Looking forward to your insights.
> >
> > 1. Avoid CI failures in pr which fixed by another commit. We now merge a
> > pull request even if CI fails but the failures knowns as flaky tests.
> > We doesn't resolve this by turn on the switch but it helps to find any
> > other potential valid failures.
> >
> > 2. Avoid CI failures in master after pull request merged. Actually, CI
> > tests the branch that pull request bind exactly. Even if it gave green
> > it is still possible a systematic failure introduced because conflicts
> > with another new commit merged in master but not merged in this branch.
> >
> > For the downside, it might require contributors rebase his pull requests
> > some times before getting merged. But it should not inflict too much
> > works.
> >
> > Best,
> > tison.
> >
> > [1] https://help.github.com/en/articles/enabling-required-status-checks
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-08-30 Thread Zili Chen

Great Kostas! Looking forward to your POC!

Best,
tison.


Jeff Zhang  于2019年8月30日周五 下午11:07写道：

> Awesome, @Kostas Looking forward your POC.
>
> Kostas Kloudas  于2019年8月30日周五 下午8:33写道：
>
> > Hi all,
> >
> > I am just writing here to let you know that I am working on a POC that
> > tries to refactor the current state of job submission in Flink.
> > I want to stress out that it introduces NO CHANGES to the current
> > behaviour of Flink. It just re-arranges things and introduces the
> > notion of an Executor, which is the entity responsible for taking the
> > user-code and submitting it for execution.
> >
> > Given this, the discussion about the functionality that the JobClient
> > will expose to the user can go on independently and the same
> > holds for all the open questions so far.
> >
> > I hope I will have some more new to share soon.
> >
> > Thanks,
> > Kostas
> >
> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang  wrote:
> > >
> > > Hi Zili,
> > >
> > > It make sense to me that a dedicated cluster is started for a per-job
> > > cluster and will not accept more jobs.
> > > Just have a question about the command line.
> > >
> > > Currently we could use the following commands to start different
> > clusters.
> > > *per-job cluster*
> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> > > examples/streaming/WindowJoin.jar
> > > *session cluster*
> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> > > examples/streaming/WindowJoin.jar
> > >
> > > What will it look like after client enhancement?
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Zili Chen  于2019年8月23日周五 下午10:46写道：
> > >
> > > > Hi Till,
> > > >
> > > > Thanks for your update. Nice to hear :-)
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Till Rohrmann  于2019年8月23日周五 下午10:39写道：
> > > >
> > > > > Hi Tison,
> > > > >
> > > > > just a quick comment concerning the class loading issues when using
> > the
> > > > per
> > > > > job mode. The community wants to change it so that the
> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
> > loader
> > > > > with child first class loading [1]. Hence, I hope that this problem
> > will
> > > > be
> > > > > resolved soon.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas  >
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > On the topic of web submission, I agree with Till that it only
> > seems
> > > > > > to complicate things.
> > > > > > It is bad for security, job isolation (anybody can submit/cancel
> > jobs),
> > > > > > and its
> > > > > > implementation complicates some parts of the code. So, if it were
> > to
> > > > > > redesign the
> > > > > > WebUI, maybe this part could be left out. In addition, I would
> say
> > > > > > that the ability to cancel
> > > > > > jobs could also be left out.
> > > > > >
> > > > > > Also I would also be in favour of removing the "detached" mode,
> for
> > > > > > the reasons mentioned
> > > > > > above (i.e. because now we will have a future representing the
> > result
> > > > > > on which the user
> > > > > > can choose to wait or not).
> > > > > >
> > > > > > Now for the separating job submission and cluster creation, I am
> in
> > > > > > favour of keeping both.
> > > > > > Once again, the reasons are mentioned above by Stephan, Till,
> > Aljoscha
> > > > > > and also Zili seems
> > > > > > to agree. They mainly have to do with security, isolation and
> ease
> > of
> > > > > > resource management
> > > > > > for the user as he knows that "when my job is done, everything
> > will be
> > > > > &

Re: [PROPOSAL] Force rebase on master before merge

2019-09-01 Thread Zili Chen

Hi all,

Thanks for your replies.

For Till's question, as Chesnay said if we cannot attach travis checks
via CIBot workflow the mechanism provided by GitHub doesn't work at all,
which states "This setting will not take effect unless at least one
status check is enabled".

Technically we can involve this up-to-date checker in CIBot workflow.
However,
Given the status that our project is currently under quite active
development
and it takes too long to run an extra, almost no implicit conflict build
pass,
I agree that it is not our case to enforce such rules.

Best,
tison.


Chesnay Schepler  于2019年8月30日周五 下午4:38写道：

> I think this is a non-issue; every committer I know checks beforehand if
> the build passes.
>
> Piotr has provided good arguments for why this approach isn't practical.
> Additionally, there are simply technical limitations that prevent this
> from working as expected.
>
> a) we cannot attach Travis checks via CiBot due to lack of permissions
> b) It is not possible AFAIK to force a PR to be up-to-date with current
> master when Travis runs. In other words, I can open a PR, travis passes,
> and so long as no new merge conflicts arise I could _still_ merge it 2
> months later.
>
> On 30/08/2019 10:34, Piotr Nowojski wrote:
> > Hi,
> >
> > Thanks for the proposal. I have similar concerns as Kurt.
> >
> > If we enforced such rule I would be afraid that everybody would be
> waiting for tests on his PR to complete, racing others committers to be
> “the first guy that clicks the merge button”, then forcing all of the
> others to rebase manually and race again. For example it wouldn’t be
> possible to push a final version of the PR, wait for the tests to complete
> overnight and merge it next day. Unless we would allow for merging without
> green travis after a final rebase, but that for me would be almost exactly
> what we have now.
> >
> > Is this a big issue in the first place? I don’t feel it that way, but
> maybe I’m working in not very contested parts of the code?
> >
> > If it’s an issue, I would suggest to go for the merging bot, that would
> have a queue of PRs to be:
> > 1. Automatically rebased on the latest master
> > 2. If no conflicts in 1., run the tests
> > 3. If no test failures merge
> >
> > Piotrek
> >
> >> On 30 Aug 2019, at 09:38, Till Rohrmann  wrote:
> >>
> >> Hi Tison,
> >>
> >> thanks for starting this discussion. In general, I'm in favour of
> >> automations which remove human mistakes out of the equation.
> >>
> >> Do you know how these status checks work concretely? Will Github reject
> >> commits for which there is no passed Travis run? How would hotfix
> commits
> >> being distinguished from PR commits for which a Travis run should
> exist? So
> >> I guess my question is how would enabling the status checks change how
> >> committers interact with the Github repository?
> >>
> >> Cheers,
> >> Till
> >>
> >> On Fri, Aug 30, 2019 at 4:46 AM Zili Chen  wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> Thanks for your reply!
> >>>
> >>> I find two concerns about the downside from your email. Correct
> >>> me if I misunderstanding.
> >>>
> >>> 1. Rebase times. Typically commits are independent one another, rebase
> >>> just fast-forward changes so that contributors rarely resolve conflicts
> >>> by himself. Reviews doesn't get blocked by this force rebase if there
> is
> >>> a green travis report ever -- just require contributor rebase and test
> >>> again, which generally doesn't involve changes(unless resolve
> conflicts).
> >>> Contributor rebases his pull request when he has spare time or is
> required
> >>> by reviewer/before getting merged. This should not inflict too much
> works.
> >>>
> >>> 2. Testing time. It is a separated topic that discussed in this
> thread[1].
> >>> I don't think we finally live with a long testing time, so it won't be
> a
> >>> problem then we trigger multiple tests.
> >>>
> >>> Simply sum up, for trivial cases, works are trivial and it
> >>> prevents accidentally
> >>> failures; for complicated cases, it already requires rebase and fully
> >>> tests.
> >>>
> >>> Best,
> >>> tison.
> >>>
> >>> [1]
> >>>
> >>>
> https://lists.apache.org/x/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-09-04 Thread Zili Chen

Hi Kostas & Aljoscha,

I notice that there is a JIRA(FLINK-13946) which could be included
in this refactor thread. Since we agree on most of directions in
big picture, is it reasonable that we create an umbrella issue for
refactor client APIs and also linked subtasks? It would be a better
way that we join forces of our community.

Best,
tison.


Zili Chen  于2019年8月31日周六 下午12:52写道：

> Great Kostas! Looking forward to your POC!
>
> Best,
> tison.
>
>
> Jeff Zhang  于2019年8月30日周五 下午11:07写道：
>
>> Awesome, @Kostas Looking forward your POC.
>>
>> Kostas Kloudas  于2019年8月30日周五 下午8:33写道：
>>
>> > Hi all,
>> >
>> > I am just writing here to let you know that I am working on a POC that
>> > tries to refactor the current state of job submission in Flink.
>> > I want to stress out that it introduces NO CHANGES to the current
>> > behaviour of Flink. It just re-arranges things and introduces the
>> > notion of an Executor, which is the entity responsible for taking the
>> > user-code and submitting it for execution.
>> >
>> > Given this, the discussion about the functionality that the JobClient
>> > will expose to the user can go on independently and the same
>> > holds for all the open questions so far.
>> >
>> > I hope I will have some more new to share soon.
>> >
>> > Thanks,
>> > Kostas
>> >
>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang 
>> wrote:
>> > >
>> > > Hi Zili,
>> > >
>> > > It make sense to me that a dedicated cluster is started for a per-job
>> > > cluster and will not accept more jobs.
>> > > Just have a question about the command line.
>> > >
>> > > Currently we could use the following commands to start different
>> > clusters.
>> > > *per-job cluster*
>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>> > > examples/streaming/WindowJoin.jar
>> > > *session cluster*
>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>> > > examples/streaming/WindowJoin.jar
>> > >
>> > > What will it look like after client enhancement?
>> > >
>> > >
>> > > Best,
>> > > Yang
>> > >
>> > > Zili Chen  于2019年8月23日周五 下午10:46写道：
>> > >
>> > > > Hi Till,
>> > > >
>> > > > Thanks for your update. Nice to hear :-)
>> > > >
>> > > > Best,
>> > > > tison.
>> > > >
>> > > >
>> > > > Till Rohrmann  于2019年8月23日周五 下午10:39写道：
>> > > >
>> > > > > Hi Tison,
>> > > > >
>> > > > > just a quick comment concerning the class loading issues when
>> using
>> > the
>> > > > per
>> > > > > job mode. The community wants to change it so that the
>> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
>> > loader
>> > > > > with child first class loading [1]. Hence, I hope that this
>> problem
>> > will
>> > > > be
>> > > > > resolved soon.
>> > > > >
>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
>> > > > >
>> > > > > Cheers,
>> > > > > Till
>> > > > >
>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>> kklou...@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > On the topic of web submission, I agree with Till that it only
>> > seems
>> > > > > > to complicate things.
>> > > > > > It is bad for security, job isolation (anybody can submit/cancel
>> > jobs),
>> > > > > > and its
>> > > > > > implementation complicates some parts of the code. So, if it
>> were
>> > to
>> > > > > > redesign the
>> > > > > > WebUI, maybe this part could be left out. In addition, I would
>> say
>> > > > > > that the ability to cancel
>> > > > > > jobs could also be left out.
>> > > > > >
>> > > > > > Also I would also be in favour of removing the "detached" mode,
>> for
>> > > > > > the reasons

Re: [VOTE] FLIP-61 Simplify Flink's cluster level RestartStrategy configuration

2019-09-04 Thread Zili Chen

+1


zhijiang  于2019年9月5日周四 上午12:36写道：

> +1
> --
> From:Till Rohrmann 
> Send Time:2019年9月4日(星期三) 13:39
> To:dev 
> Cc:Zhu Zhu 
> Subject:Re: [VOTE] FLIP-61 Simplify Flink's cluster level RestartStrategy
> configuration
>
> +1 (binding)
>
> On Wed, Sep 4, 2019 at 12:39 PM Chesnay Schepler 
> wrote:
>
> > +1 (binding)
> >
> > On 04/09/2019 11:13, Zhu Zhu wrote:
> > > +1 (non-binding)
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Till Rohrmann  于2019年9月4日周三 下午5:05写道：
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to start the voting process for FLIP-61 [1], which is
> > >> discussed and reached consensus in this thread [2].
> > >>
> > >> Since the change is rather small I'd like to shorten the voting period
> > to
> > >> 48 hours. Hence, I'll try to close it September 6th, 11:00 am CET,
> > unless
> > >> there is an objection or not enough votes.
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-61+Simplify+Flink%27s+cluster+level+RestartStrategy+configuration
> > >> [2]
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/e206390127bcbd9b24d9c41a838faa75157e468e01552ad241e3e24b@%3Cdev.flink.apache.org%3E
> > >>
> > >> Cheers,
> > >> Till
> > >>
> >
> >
>
>

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Zili Chen

Congrats Klou!

Best,
tison.


Till Rohrmann  于2019年9月6日周五 下午9:23写道：

> Congrats Klou!
>
> Cheers,
> Till
>
> On Fri, Sep 6, 2019 at 3:00 PM Dian Fu  wrote:
>
>> Congratulations Kostas!
>>
>> Regards,
>> Dian
>>
>> > 在 2019年9月6日，下午8:58，Wesley Peng  写道：
>> >
>> > On 2019/9/6 8:55 下午, Fabian Hueske wrote:
>> >> I'm very happy to announce that Kostas Kloudas is joining the Flink
>> PMC.
>> >> Kostas is contributing to Flink for many years and puts lots of effort
>> in helping our users and growing the Flink community.
>> >> Please join me in congratulating Kostas!
>> >
>> > congratulation Kostas!
>> >
>> > regards.
>>
>>

[DISCUSS] Retrieval services in non-high-availability scenario

2019-09-09 Thread Zili Chen

Hi devs,

I'd like to start a discussion thread on the topic how we provide
retrieval services in non-high-availability scenario. To clarify
terminology, non-high-availability scenario refers to
StandaloneHaServices and EmbeddedHaServices.

***The problem***

We notice that retrieval services of current StandaloneHAServices
(pre-configured) and EmbeddedHAServices(in-memory) has their
respective problems.

For pre-configured scenario, we now have a
getJobManagerLeaderRetriever(JobID, defaultJMAddress) method
to workaround the problem that it is impossible to configure JM
address previously. The parameter defaultJMAddress is not in use in
any other defaultJMAddress with any other high-availability mode.
Also in MiniCluster scenario and anywhere else leader address
pre-configure becomes impossible, StandaloneHAServices cannot be used.

For in-memory case, it is clearly that it doesn't fit any distributed
scenario.

***The proposal***

In order to address the inconsistency between pre-configured retrieval
services and zookeeper based retrieval services, we reconsider the
promises provided by "non-high-availability" and regard it as
similar services as zookeeper based one except it doesn't tolerate
node failure. Thus, we implement a service acts like a standalone
zookeeper cluster, named LeaderServer.

A leader server is an actor runs on jobmanager actor system and reacts
to leader contender register and leader retriever request. If
jobmanager fails, the leader server associated fails, too, where
"non-high-availability" stands.

In order to communicate with leader server, we start leader client per
high-availability services(JM, TM, ClusterClient). When leader
election service starts, it registers the contender to leader server
via leader client(by akka communication); when leader retriever
starts, it registers itself to leader server via leader client.

Leader server handles leader election internally just like Embedded
implementation, and notify retrievers with new leader information
when there is new leader elected.

In this way, we unify the view of retrieval services in all scenario:

1. Configure a name services to communicate with. In zookeeper mode
it is zookeeper and in non-high-availability mode it is leader server.
2. Any retrieval request is sent to the name services and is handled
by that services.

Apart from a unified view, there are other advantages:

+ We need not to use a special method
getJobManagerLeaderRetriever(JobID, defaultJMAddress), instead, use
getJobManagerLeaderRetriever(JobID). And so that we need not include
JobManager address in slot request which might become stale during
transmission.

+ Separated configuration concerns on launch and retrieval. JobManager
address & port, REST address & port is only configured when launch
a cluster(even in YARN scenario, no need to configure). And when
retrieval requested, configure the connect info to name services(zk
or leader server).

+ Embedded implementation could be also included in this abstraction
without any regression on multiple leader simulation for test purpose.
Actually, leader server acts as a limited standalone zookeeper
cluster. And thus, from where this proposal comes from, when we
refactor metadata storage with transaction store proposed in
FLINK-10333, we only take care of zookeeper implementation and a
unified non-high-availability implementation.

***Clean up***

It is also noticed that there are several stale & unimplemented
high-availability services implementations which I'd like to remove for
a clean codebase work on this thread and FLINK-10333. They are:

- YarnHighAvailabilityServices
- AbstractYarnNonHaServices
- YarnIntraNonHaMasterServices
- YarnPreConfiguredMasterNonHaServices
- SingleLeaderElectionService
- FsNegativeRunningJobsRegistry

Any feedback is appreciated.

Best,
tison.

Re: [DISCUSS] Retrieval services in non-high-availability scenario

2019-09-10 Thread Zili Chen

to require users pre-configure the port of job manager,
especially
on cloud native scenario. Although the specific implementation couple the
address and port of LeaderServer and that of job manager, it is not a
fundamental constraint. Thus, LeaderServer based implementation is more
flexible for evolution.

Best,
tison.


Till Rohrmann  于2019年9月9日周一 下午5:37写道：

> Hi Tison,
>
> thanks for starting this discussion. I think your mail includes multiple
> points which are worth being treated separately (might even make sense to
> have separate discussion threads). Please correct me if I understood things
> wrongly:
>
> 1. Adding new non-ha HAServices:
>
> Based on your description I could see the "ZooKeeper-light" non-ha
> HAServices implementation work. Would any changes to the existing
> interfaces be needed? How would the LeaderServer integrate in the lifecycle
> of the cluster entrypoint?
>
> 2. Replacing existing non-ha HAServices with LeaderServer implementation:
>
> I'm not sure whether we need to enforce that every non-ha HAServices
> implementation works as you've described. I think it is pretty much an
> implementation detail whether the services talk to a LeaderServer or are
> being started with a pre-configured address. I also think that it is fair
> to have different implementations with different characteristics and usage
> scenarios. As you've said the EmbeddedHaServices are targeted for single
> process cluster setups and they are only used by the MiniCluster.
>
> What I like about the StandaloneHaServices is that they are dead simple
> (apart from the configuration). With a new implementation based on the
> LeaderServer, the client side implementation becomes much more complex
> because now one needs to handle all kind of network issues properly.
> Moreover, it adds more complexity to the system because it starts a new
> distributed component which needs to be managed. I could see that once the
> new implementation has matured enough that it might replace the
> EmbeddedHaServices. But I wouldn't start with removing them.
>
> You are right that due to the fact that we don't know the JM address before
> it's being started that we need to send the address with every slot
> request. Moreover we have the method #getJobManagerLeaderRetriever(JobID,
> defaultJMAddress) on the HAServices. While this is not super nice, I don't
> think that this is a fundamental problem at the moment. What we pay is a
> couple of extra bytes we need to send over the network.
>
> Configuration-wise, I'm not so sure whether we gain too much by replacing
> the StandaloneHaServices with the LeaderServer based implementation. For
> the new implementation one needs to configure a static address as well at
> cluster start-up time. The only benefit I can see is that we don't need to
> send the JM address to the RM and TMs. But as I've said, I don't think that
> this is a big problem for which we need to introduce new HAServices.
> Instead I could see that we might be able to remove it once the
> LeaderServer HAServices implementation has proven to be stable.
>
> 3. Configuration of HAServices:
>
> I agree that Flink's address and port configuration is not done
> consistently. I might make sense to group the address and port
> configuration under the ha service configuration section. Maybe it makes
> also sense to rename ha services into ServiceDiscovery because it also
> works in the non-ha case. it could be possible to only configure address
> and port if one is using the non-ha services, for example. However, this
> definitely deserves a separate discussion and design because one needs to
> check where exactly the respective configuration options are being used.
>
> I think improving the configuration of HAServices is actually orthogonal to
> introducing the LeaderServer HAServices implementation and could also be
> done for the existing HAServices.
>
> 4. Clean up of HAServices implementations:
>
> You are right that some of the existing HAServices implementations are
> "dead code" at the moment. They are the result of some implementation ideas
> which haven't been completed. I would suggest to start a separate
> discussion to discuss what to do with them.
>
> Cheers,
> Till
>
> On Mon, Sep 9, 2019 at 9:16 AM Zili Chen  wrote:
>
> > Hi devs,
> >
> > I'd like to start a discussion thread on the topic how we provide
> > retrieval services in non-high-availability scenario. To clarify
> > terminology, non-high-availability scenario refers to
> > StandaloneHaServices and EmbeddedHaServices.
> >
> > ***The problem***
> >
> > We notice that retrieval services of cu

Re: [DISCUSS] Retrieval services in non-high-availability scenario

2019-09-10 Thread Zili Chen

Hi Till,

Thanks for your quick reply. I'd like to narrow the intention of this
thread as I posted above

>Well, I see your concerns on replace existing stable services hurriedly
with
a new implementation. Here I list the pros and cons of this replacement. If
we
agree that it does good I can provide an neat and full featured
implementation
for preview and see concretely what we add and what we gain. For
integration,
we can then first integrate with MiniCluster and later.

To clarify, the intention of this thread is narrowed to introduce a new
HighAvailabilityServices implementation based on LeaderServer described
above.
For now, we introduce such an implementation aimed at using it in
MiniCluster
scenario and it is a location transparent version of EmbeddedHaServices. It
would
serve as EmbeddedHaServices and be flexible for evolution. Let's defer all
topics
about the concrete evolutions until the implementation converges to be
stable.

A quick answer for a point above possibly raises confusion,

>Why would the job manager switch its address in the non-ha case? We don't
support this if I'm not mistaken.

Yes we don't support this because we fail the whole dispatcher resource
manager
component on job manager failures. It is less than awesome since we can let
Dispatcher the supervisor launch a new job manager to execute the job.
However,
as described above, let's defer all topics about the concrete evolutions
until
the implementation converges to be stable.

Best,
tison.


Till Rohrmann  于2019年9月10日周二 下午6:06写道：

> Hi Tison,
>
> thanks for the detailed response. I put some comments inline:
>
> On Tue, Sep 10, 2019 at 10:51 AM Zili Chen  wrote:
>
> > Hi Till,
> >
> > Thanks for your reply. I agree point 3 and 4 in your email worth a
> > separated
> > thread to discuss. Let me answer your questions and concerns in point 1
> and
> > 2
> > respectively.
> >
> > 1.Lifecycle of LeaderServer and requirement to implement it
> >
> > LeaderServer starts on cluster entrypoint and its lifecycle is bound to
> the
> > lifecycle of cluster entrypoint. That is, when the cluster entrypoint
> > starts,
> > a LeaderServer also starts; and LeaderServer gets shut down when the
> > cluster
> > entrypoint gets shut down. This is because we need to provide services
> > discovery
> > during the cluster is running.
> >
> > For implementation part, conceptually it is a service running on cluster
> > entrypoint which holds in memory services information and can be
> > communicatied
> > with. As our internal specific implementation, LeaderServer is an actor
> > running
> > on the actor system running on cluster entrypoint, which is referred as
> > `commonRpcService`. It is just another unfenced rpc endpoint and required
> > no extra changes to the existing interfaces.
> >
> > Apart from LeaderServer, there is another concept in this implementation,
> > the
> > LeaderClient. LeaderClient forwards register request from election
> service
> > and
> > retrieval service; forwards leader changed message from LeaderServer. As
> > our
> > specific implementation, LeaderClient is an actor and runs on cluster
> > entrypoint, task manager and cluster client.
>
>
> I think these kind of changes to the ClusterEntrypoint deserve a separate
> design and discussion.
>
> >
>
>
> > (1). cluster entrypoint
> >
> > The lifecycle of LeaderClient is like LeaderServer.
> >
> > (2). task manager
> >
> > The lifecycle of LeaderClient is bound to the lifecycle of task manager.
> > Specifically, it runs on `rpcService` starts on task manager runner and
> > stops
> > when the service gets shut down.
> >
> > (3). cluster client
> >
> > The lifecycle of LeaderClient is bound to the ClusterClient. With our
> > codebase,
> > only RestClusterClient should do the adaptation. When start
> ClientHAService
> > based on LeaderClient, it starts a dedicated rpc service on which the
> > LeaderClient runs. The service as well as the LeaderClient gets shut down
> > on
> > RestClusterClient closed, where ClientHAService#close called. It is a
> > transparent implementation inside a specific ClientHAService; thus also,
> no
> > changes to the existing interfaces.
> >
> > 2. The proposal to replace existing non-ha services
> >
> > Well, I see your concerns on replace existing stable services hurriedly
> > with
> > a new implementation. Here I list the pros and cons of this replacement.
> If
> > we
> > agree that it does good I can provide an neat and full featured
> > implementation
> &

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-09-11 Thread Zili Chen

Hi Aljoscha,

I'm OK to use the ASF slack.

Best,
tison.


Jeff Zhang  于2019年9月11日周三 下午4:48写道：

> +1 for using slack for instant communication
>
> Aljoscha Krettek  于2019年9月11日周三 下午4:44写道：
>
>> Hi,
>>
>> We could try and use the ASF slack for this purpose, that would probably
>> be easiest. See https://s.apache.org/slack-invite. We could create a
>> dedicated channel for our work and would still use the open ASF
>> infrastructure and people can have a look if they are interested because
>> discussion would be public. What do you think?
>>
>> P.S. Committers/PMCs should should be able to login with their apache ID.
>>
>> Best,
>> Aljoscha
>>
>> > On 6. Sep 2019, at 14:24, Zili Chen  wrote:
>> >
>> > Hi Aljoscha,
>> >
>> > I'd like to gather all the ideas here and among documents, and draft a
>> > formal FLIP
>> > that keep us on the same page. Hopefully I start a FLIP thread in next
>> week.
>> >
>> > For the implementation or said POC part, I'd like to work with you guys
>> who
>> > proposed
>> > the concept Executor to make sure that we go in the same direction. I'm
>> > wondering
>> > whether a dedicate thread or a Slack group is the proper one. In my
>> opinion
>> > we can
>> > involve the team in a Slack group, concurrent with the FLIP process
>> start
>> > our branch
>> > and once we reach a consensus on the FLIP, open an umbrella issue about
>> the
>> > framework
>> > and start subtasks. What do you think?
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Aljoscha Krettek  于2019年9月5日周四 下午9:39写道：
>> >
>> >> Hi Tison,
>> >>
>> >> To keep this moving forward, maybe you want to start working on a
>> proof of
>> >> concept implementation for the new JobClient interface, maybe with a
>> new
>> >> method executeAsync() in the environment that returns the JobClient and
>> >> implement the methods to see how that works and to see where we get.
>> Would
>> >> you be interested in that?
>> >>
>> >> Also, at some point we should collect all the ideas and start forming
>> an
>> >> actual FLIP.
>> >>
>> >> Best,
>> >> Aljoscha
>> >>
>> >>> On 4. Sep 2019, at 12:04, Zili Chen  wrote:
>> >>>
>> >>> Thanks for your update Kostas!
>> >>>
>> >>> It looks good to me that clean up existing code paths as first
>> >>> pass. I'd like to help on review and file subtasks if I find ones.
>> >>>
>> >>> Best,
>> >>> tison.
>> >>>
>> >>>
>> >>> Kostas Kloudas  于2019年9月4日周三 下午5:52写道：
>> >>> Here is the issue, and I will keep on updating it as I find more
>> issues.
>> >>>
>> >>> https://issues.apache.org/jira/browse/FLINK-13954
>> >>>
>> >>> This will also cover the refactoring of the Executors that we
>> discussed
>> >>> in this thread, without any additional functionality (such as the job
>> >> client).
>> >>>
>> >>> Kostas
>> >>>
>> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas 
>> >> wrote:
>> >>>>
>> >>>> Great idea Tison!
>> >>>>
>> >>>> I will create the umbrella issue and post it here so that we are all
>> >>>> on the same page!
>> >>>>
>> >>>> Cheers,
>> >>>> Kostas
>> >>>>
>> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen 
>> >> wrote:
>> >>>>>
>> >>>>> Hi Kostas & Aljoscha,
>> >>>>>
>> >>>>> I notice that there is a JIRA(FLINK-13946) which could be included
>> >>>>> in this refactor thread. Since we agree on most of directions in
>> >>>>> big picture, is it reasonable that we create an umbrella issue for
>> >>>>> refactor client APIs and also linked subtasks? It would be a better
>> >>>>> way that we join forces of our community.
>> >>>>>
>> >>>>> Best,
>> >>>>> tison.
>> >>>>>
>> >>>>>
>> >>>>> Zili Chen  于2019年8月31日周六

Re: Interested in contributing and looking for good first issues

2019-09-12 Thread Zili Chen

Hi Anoop,

Welcome to the Flick community. Although we don't maintain a list of
starter issues so far, it would be helpful if you provide more information
about which topic you are interested in.

Flink servers stateful computations over data streams based on a series of
layered
concepts, such as SQL/Table API, DataStream API, Flink runtime, Flink MLLib,
Flink Python API and so on. Each of them has their specific focus and there
are community members who are familiar with specific topics. If you speak a
bit
detailedly which topic you might be interested in, our members can possibly
provides you information on good first issues.

Best,
tison.


Till Rohrmann  于2019年9月12日周四 下午4:25写道：

> Hi Anoop,
>
> welcome to the Flink community. I think you've already read the right
> things in order to get started.
>
> As far as I know, the Flink community does not maintain a list of starter
> issues. Hence, I would recommend to take a look at the open issues and ask
> on issues which are interesting to you whether this could be a good starter
> issue or not.
>
> Cheers,
> Till
>
> On Thu, Sep 12, 2019 at 7:23 AM Anoop Hallur 
> wrote:
>
> > Hello Flink Devs,
> > I am interested in contributing to Apache Flink and I am looking for
> > beginner friendly issues/tasks to hit the ground running. *Can somebody
> > recommend good issues for first time contributors* ?
> >
> > Also, can somebody add me to JIRA project for Flink ? My JIRA username is
> > anoophallur
> >
> > What I have done so far.
> >
> > * I have read the "How to contribute" guide (at
> > https://flink.apache.org/contributing/how-to-contribute.html) and all
> it's
> > subpages.
> >
> > * I have set up my development environment by following the guide at
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Setting+up+a+Flink+development+environment
> > .
> > "mvn clean packages" builds and packages the build artifacts for me. All
> > the tests are passing locally as well.
> >
> > * I have looked at the open issues at
> >
> >
> https://issues.apache.org/jira/projects/FLINK/issues/FLINK-13613?filter=allopenissues
> > but
> > I am not sure which ones are good for new members. I use JVM based
> > languages + mvn for my day job, so I understand the overall project, but
> > not the relative importance of each of the issues.
> >
> > * I have also read bits and pieces of documentation + the tutorials(at
> > https://training.ververica.com/intro/intro-1.html). I consider myself a
> > novice as a Flink consumer.
> >
> > Thanks,
> > Anoop
> >
> > *Anoop Hallur*
> > (001) 917-285-3445 | anoophal...@gmail.com | Skype: anoophallur
> > 
> >   
> >
>

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

2019-09-12 Thread Zili Chen

Hi,

I've given it a try to switch to Java's Duration for all runtime Java code.
Generally, most of its usages are for @RpcTimeout and testing timeout.

However, do a clean work without touch public interface possibly introduce
bridge codes convert runtime Java Duration to Flink's Time. I don't think
it is worth to do the detailed distinguish job, and even we possibly
introduce
extra bridge codes.

Given this, I just filed an issue about this topic(we should have done :-))
FLINK-14068[1] as a subtask of FLINK-3957 "Breaking changes for Flink 2.0".

For what we can do now,

1. Deprecate org.apache.flink.api.common.time.Time also.
2. Stop introduce more usages of Flink's Time, specifically for testing
timeout. This could be manually checked when review pull request(not perfect
thought :\)

We can do the final removal at once when prepare for 2.0 though.

Best,
tison.

[1]https://issues.apache.org/jira/browse/FLINK-14068


Stephan Ewen  于2019年8月27日周二 上午1:19写道：

> Seems everyone is in favor in principle.
>
>   - For public APIs, I would keep Time for now (to not break the API).
> Maybe add a Duration variant and deprecate the Time variant, but not remove
> it before Flink 1.0
>   - For all runtime Java code, switch to Java's Duration now
>   - For all Scala code let's see how much we can switch to Java Durations
> without blowing up stuff. After all, like Tison said, we want to get the
> runtime Scala free in the future.
>
> On Mon, Aug 26, 2019 at 3:45 AM Jark Wu  wrote:
>
> > +1 to use Java's Duration instead of Flink's Time.
> >
> > Regarding to the Duration parsing, we have mentioned this in FLIP-54[1]
> to
> > use `org.apache.flink.util.TimeUtils` for the parsing.
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> >
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#heading=h.egdwkc93dn1k
> >
> > On Sat, 24 Aug 2019 at 18:24, Zhu Zhu  wrote:
> >
> > > +1 since Java Duration is more common and powerful than Flink Time.
> > >
> > > For whether to drop scala Duration for parsing duration OptionConfig, I
> > > think it's another question and should be discussed in another thread.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Becket Qin  于2019年8月24日周六 下午4:16写道：
> > >
> > > > +1, makes sense. BTW, we probably need a FLIP as this is a public API
> > > > change.
> > > >
> > > > On Sat, Aug 24, 2019 at 8:11 AM SHI Xiaogang  >
> > > > wrote:
> > > >
> > > > > +1 to replace Flink's time with Java's Duration.
> > > > >
> > > > > Besides, i also suggest to use Java's Instant for "point-in-time".
> > > > > It can take care of time units when we calculate Duration between
> > > > different
> > > > > instants.
> > > > >
> > > > > Regards,
> > > > > Xiaogang
> > > > >
> > > > > Zili Chen  于2019年8月24日周六 上午10:45写道：
> > > > >
> > > > > > Hi vino,
> > > > > >
> > > > > > I agree that it introduces extra complexity to replace
> > > Duration(Scala)
> > > > > > with Duration(Java) *in Scala code*. We could separate the usage
> > for
> > > > each
> > > > > > language and use a bridge when necessary.
> > > > > >
> > > > > > As a matter of fact, Scala concurrent APIs(including Duration)
> are
> > > used
> > > > > > more than necessary at least in flink-runtime. Also we even try
> to
> > > make
> > > > > > flink-runtime scala free.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > vino yang  于2019年8月24日周六 上午10:05写道：
> > > > > >
> > > > > > > +1 to replace the Time class provided by Flink with Java's
> > > Duration:
> > > > > > >
> > > > > > >
> > > > > > >- Java's Duration has better representation than the Flink's
> > > Time
> > > > > > class;
> > > > > > >- As a built-in Java class, Duration class has a clear
> > advantage
> > > > > over
> > > > > > >Java's Time class when interacting with other Java APIs and
> > > > > > third-party
> > > > > > &

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-12 Thread Zili Chen

Thanks a lot everyone for the warm welcome. Happy Mid-autumn Festival!

Best,
tison.


Leonard Xu  于2019年9月12日周四 上午11:05写道：

> Congratulations Zili Chen ! ！
>
> Best，
> Leonard Xu
> > On 2019年9月12日, at 上午11:02, Yun Tang  wrote:
> >
> > Congratulations Zili
> >
> > Best
> > Yun Tang
> > 
> > From: Yun Gao  yungao...@aliyun.com.invalid>>
> > Sent: Thursday, September 12, 2019 10:12
> > To: dev mailto:dev@flink.apache.org>>
> > Subject: Re: [ANNOUNCE] Zili Chen becomes a Flink committer
> >
> > Congratulations Zili!
> >
> >   Best,
> >   Yun
> >
> >
> > --
> > From:Yangze Guo 
> > Send Time:2019 Sep. 12 (Thu.) 09:38
> > To:dev 
> > Subject:Re: [ANNOUNCE] Zili Chen becomes a Flink committer
> >
> > Congratulations!
> >
> > Best,
> > Yangze Guo
> >
> > On Thu, Sep 12, 2019 at 9:35 AM Rong Rong  wrote:
> >>
> >> Congratulations Zili!
> >>
> >> --
> >> Rong
> >>
> >> On Wed, Sep 11, 2019 at 6:26 PM Hequn Cheng 
> wrote:
> >>
> >>> Congratulations!
> >>>
> >>> Best, Hequn
> >>>
> >>> On Thu, Sep 12, 2019 at 9:24 AM Jark Wu  wrote:
> >>>
> >>>> Congratulations Zili!
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>> On Wed, 11 Sep 2019 at 23:06,  wrote:
> >>>>
> >>>>> Congratulations, Zili.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Xingcan
> >>>>>
> >>>>>
> >>>>>
> >>>>> *From:* SHI Xiaogang 
> >>>>> *Sent:* Wednesday, September 11, 2019 7:43 AM
> >>>>> *To:* Guowei Ma 
> >>>>> *Cc:* Fabian Hueske ; Biao Liu <
> mmyy1...@gmail.com>;
> >>>>> Oytun Tez ; bupt_ljy ; dev <
> >>>>> dev@flink.apache.org>; user ; Till Rohrmann <
> >>>>> trohrm...@apache.org>
> >>>>> *Subject:* Re: [ANNOUNCE] Zili Chen becomes a Flink committer
> >>>>>
> >>>>>
> >>>>>
> >>>>> Congratulations!
> >>>>>
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Xiaogang
> >>>>>
> >>>>>
> >>>>>
> >>>>> Guowei Ma  于2019年9月11日周三 下午7:07写道：
> >>>>>
> >>>>> Congratulations Zili !
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Guowei
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Fabian Hueske  于2019年9月11日周三 下午7:02写道：
> >>>>>
> >>>>> Congrats Zili Chen :-)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Cheers, Fabian
> >>>>>
> >>>>>
> >>>>>
> >>>>> Am Mi., 11. Sept. 2019 um 12:48 Uhr schrieb Biao Liu <
> >>>> mmyy1...@gmail.com>:
> >>>>>
> >>>>> Congrats Zili!
> >>>>>
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Biao /'bɪ.aʊ/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, 11 Sep 2019 at 18:43, Oytun Tez  wrote:
> >>>>>
> >>>>> Congratulations!
> >>>>>
> >>>>>
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> Oytun Tez
> >>>>>
> >>>>>
> >>>>>
> >>>>> *M O T A W O R D*
> >>>>>
> >>>>> *The World's Fastest Human Translation Platform.*
> >>>>>
> >>>>> oy...@motaword.com <mailto:oy...@motaword.com> — www.motaword.com <
> http://www.motaword.com/><http://www.motaword.com <
> http://www.motaword.com/>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Sep 11, 2019 at 6:36 AM bupt_ljy  wrote:
> >>>>>
> >>>>> Congratulations!
> >>>>>
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Jiayi Liao
> >>>>>
> >>>>>
> >>>>>
> >>>>> Original Message
> >>>>>
> >>>>> *Sender:* Till Rohrmann
> >>>>>
> >>>>> *Recipient:* dev; user
> >>>>>
> >>>>> *Date:* Wednesday, Sep 11, 2019 17:22
> >>>>>
> >>>>> *Subject:* [ANNOUNCE] Zili Chen becomes a Flink committer
> >>>>>
> >>>>>
> >>>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>>
> >>>>>
> >>>>> I'm very happy to announce that Zili Chen (some of you might also
> know
> >>>>> him as Tison Kun) accepted the offer of the Flink PMC to become a
> >>>> committer
> >>>>> of the Flink project.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Zili Chen has been an active community member for almost 16 months
> now.
> >>>>> He helped pushing the Flip-6 effort over the finish line, ported a
> lot
> >>>> of
> >>>>> legacy code tests, removed a good part of the legacy code,
> contributed
> >>>>> numerous fixes, is involved in the Flink's client API refactoring,
> >>>> drives
> >>>>> the refactoring of Flink's HighAvailabilityServices and much more.
> Zili
> >>>>> Chen also helped the community by PR reviews, reporting Flink issues,
> >>>>> answering user mails and being very active on the dev mailing list.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Congratulations Zili Chen!
> >>>>>
> >>>>>
> >>>>>
> >>>>> Best, Till
> >>>>>
> >>>>> (on behalf of the Flink PMC)
>
>

[PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-17 Thread Zili Chen

Hi devs,

FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
class NewClusterClient into ClusterClient because with the effort
under FLINK-10392 this bridge class is no longer necessary.

Technically in current codebase all implementation of interface
NewClusterClient is subclass of ClusterClient so that the work
required is no more than move method declaration. It helps we use
type signature ClusterClient instead of
https://issues.apache.org/jira/browse/FLINK-14096
[2]
https://github.com/apache/flink/commit/dc9e4494dddfed36432e6bbf6cd3231530bc2e01

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-17 Thread Zili Chen

Hi Jeff,

Thanks for your reply.

The ongoing client API enhancement thread[1] is mainly aimed at dealing with
issues of our client API, as you mentioned, current client API is no so
clean.

Because client API naturally becomes public & user-facing inteface it is
expected that we start a series of discussions for how the inteface should
look like. However, it isn't expected that we have to talk about backward
compatibility too much in this scope.

I agree that it is painful if we always keep compatibility for
non-well-designed API. Even in this specific scenario we bring such to
Public.
It is mentioned in the discussion under [2] that I think it could be the
time
or so to discuss our InterfaceAudience policy. At least it would be a pity
if
we don't address this InterfaceAudience issue towards 2.0. But let's say it
could be a separated thread.

For expose a thin interface and move all the implementation to
AbstractClusterClient, I think the community consensus is towards an
ClusterClient interface and thus there is no need for an
AbstractClusterClient.
For implementation details, it is mainly about a series of #run methods. We
will gradually exclude them from ClusterClient and it is the responsibility
of Executor in the design document[3] to take care of compilation and
deployment.

BTW, I take a look at the pull requests you link to. In fact I create a
similar issue[4] and also consider simplify code in flink-scala-shell. Let's
move the detailed discussion to the corresponding issues and pull requests
or start another thread then. I don't intend to cover a lot of concerns
generally under this thread :-)

Best,
tison.

[1]
https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
[2]
https://github.com/apache/flink/commit/dc9e4494dddfed36432e6bbf6cd3231530bc2e01#commitcomment-34980406
[3]
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#
[4] https://issues.apache.org/jira/browse/FLINK-13961

Jeff Zhang  于2019年9月18日周三 上午10:49写道：

> Thanks for raising this discussion. Overall +1 to merge NewClusterClient
> into ClusterClient.
>
> 1. I think it is OK to break the backward compatibility. This current
> client api is no so clean which already cause issue for downstream project
> and flink itself.
> In flink scala shell, I notice this kind of non-readable code
> Option[Either
> [MiniCluster , ClusterClient[_]]])
>
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> I also created tickets and PR to try to simply it.
> https://github.com/apache/flink/pull/8546
> https://github.com/apache/flink/pull/8533
>Personally I don't think we need to keep backward compatibility for
> non-well-designed api, otherwise it will bring lots of unnecessary
> overhead.
>
> 2. Another concern is that I notice there're many implementation details in
> ClusterClient. I think we should just expose a thin interface, so maybe we
> can create interface ClusterClient which includes as less methods as
> possible, and move all the implementation to AbstractClusterClient.
>
>
>
>
>
>
>
> Zili Chen  于2019年9月18日周三 上午9:46写道：
>
> > Hi devs,
> >
> > FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
> > class NewClusterClient into ClusterClient because with the effort
> > under FLINK-10392 this bridge class is no longer necessary.
> >
> > Technically in current codebase all implementation of interface
> > NewClusterClient is subclass of ClusterClient so that the work
> > required is no more than move method declaration. It helps we use
> > type signature ClusterClient instead of
> >  > latter if we aren't in a type variable context. This should not affect
> > anything internal in Flink scope.
> >
> > However, as mentioned by Kostas in the JIRA and a previous discussion
> > under a commit[2], it seems that we provide some levels of backward
> > compatibility for ClusterClient and thus it's better to start a public
> > discussion here.
> >
> > There are two concerns from my side.
> >
> > 1. How much impact this proposal brings to users programming directly
> > to ClusterClient?
> >
> > The specific changes here are add two methods `submitJob` and
> > `requestJobResult` which are already implemented by RestClusterClient
> > and MiniClusterClient. Users would only be affected if they create
> > a class that inherits ClusterClient and doesn't implement these
> > methods. Besides, users who create a class that implements
> > NewClusterClient would be affected by the removal of NewClusterClient.
> >
> > If we have to provide bac

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread Zili Chen

Hi Xiaogang,

Thanks for your reply.

According to the feature discussion thread[1] client API enhancement is a
planned
feature towards 1.10 and thus I think this thread is valid if we can reach
a consensus
and introduce new client API in this development cycle.

Best,
tison.

[1]
https://lists.apache.org/thread.html/22639ca7de62a18f50e90db53e73910bd99b7f00c82f7494f4cb035f@%3Cdev.flink.apache.org%3E


SHI Xiaogang  于2019年9月18日周三 下午3:03写道：

> Hi Tison,
>
> Thanks for bringing this.
>
> I think it's fine to break the back compatibility of client API now that
> ClusterClient is not well designed for public usage.
> But from my perspective, we should postpone any modification to existing
> interfaces until we come to an agreement on new client API. Otherwise, our
> users may adapt their implementation more than once.
>
> Regards,
> Xiaogang
>
> Jeff Zhang  于2019年9月18日周三 上午10:49写道：
>
> > Thanks for raising this discussion. Overall +1 to merge NewClusterClient
> > into ClusterClient.
> >
> > 1. I think it is OK to break the backward compatibility. This current
> > client api is no so clean which already cause issue for downstream
> project
> > and flink itself.
> > In flink scala shell, I notice this kind of non-readable code
> > Option[Either
> > [MiniCluster , ClusterClient[_]]])
> >
> >
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> > I also created tickets and PR to try to simply it.
> > https://github.com/apache/flink/pull/8546
> > https://github.com/apache/flink/pull/8533
> >Personally I don't think we need to keep backward compatibility for
> > non-well-designed api, otherwise it will bring lots of unnecessary
> > overhead.
> >
> > 2. Another concern is that I notice there're many implementation details
> in
> > ClusterClient. I think we should just expose a thin interface, so maybe
> we
> > can create interface ClusterClient which includes as less methods as
> > possible, and move all the implementation to AbstractClusterClient.
> >
> >
> >
> >
> >
> >
> >
> > Zili Chen  于2019年9月18日周三 上午9:46写道：
> >
> > > Hi devs,
> > >
> > > FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
> > > class NewClusterClient into ClusterClient because with the effort
> > > under FLINK-10392 this bridge class is no longer necessary.
> > >
> > > Technically in current codebase all implementation of interface
> > > NewClusterClient is subclass of ClusterClient so that the work
> > > required is no more than move method declaration. It helps we use
> > > type signature ClusterClient instead of
> > >  > > latter if we aren't in a type variable context. This should not affect
> > > anything internal in Flink scope.
> > >
> > > However, as mentioned by Kostas in the JIRA and a previous discussion
> > > under a commit[2], it seems that we provide some levels of backward
> > > compatibility for ClusterClient and thus it's better to start a public
> > > discussion here.
> > >
> > > There are two concerns from my side.
> > >
> > > 1. How much impact this proposal brings to users programming directly
> > > to ClusterClient?
> > >
> > > The specific changes here are add two methods `submitJob` and
> > > `requestJobResult` which are already implemented by RestClusterClient
> > > and MiniClusterClient. Users would only be affected if they create
> > > a class that inherits ClusterClient and doesn't implement these
> > > methods. Besides, users who create a class that implements
> > > NewClusterClient would be affected by the removal of NewClusterClient.
> > >
> > > If we have to provide backward compatibility and the impact is no
> > > further than those above, we can deprecate NewClusterClient, merge
> > > the methods into ClusterClient with a dummy default like throw
> > > Exception.
> > >
> > > 2. Why do we provide backward compatibility for ClusterClient?
> > >
> > > It already surprises Kostas and me while we think ClusterClient is a
> > > totally internal class which we can evolve regardless of api
> > > stability. Our community promises api stability by marking class
> > > and/or method as @Public/@PublicEvolving. It is wried and even
> > > dangerous we are somehow enforced to provide backward compatibility
> > > for classes without any annotation.
> > >
> > > Besides, as I mention in [2], users who anyway want to program
> > > directly to internal classes/interfaces are considered to prepare to
> > > make adaptations when bump version of Flink.
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-14096
> > > [2]
> > >
> > >
> >
> https://github.com/apache/flink/commit/dc9e4494dddfed36432e6bbf6cd3231530bc2e01
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Confluence permission for FLIP creation

2019-09-19 Thread Zili Chen

Hi devs,

I'd like to create a page about the ongoing JobClient FLIP. Could you
grant me Confluence permission for FLIP creation?

My Confluence ID is tison.

Best,
tison.

Re: Confluence permission for FLIP creation

2019-09-19 Thread Zili Chen

Thanks!

Best,
tison.


Till Rohrmann  于2019年9月19日周四 下午5:34写道：

> Granted. You should now be able to create pages Tison.
>
> Cheers,
> Till
>
> On Thu, Sep 19, 2019 at 11:01 AM Zili Chen  wrote:
>
> > Hi devs,
> >
> > I'd like to create a page about the ongoing JobClient FLIP. Could you
> > grant me Confluence permission for FLIP creation?
> >
> > My Confluence ID is tison.
> >
> > Best,
> > tison.
> >
>

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-19 Thread Zili Chen

Thanks for your suggestions and information.

I'll therefore reopen the corresponding pull request.

Best,
tison.


Till Rohrmann  于2019年9月18日周三 下午10:55写道：

> No reason to keep the separation. The NewClusterClient interface was only
> introduced to add new methods and not having to implement them for the
> other ClusterClient implementations.
>
> Cheers,
> Till
>
> On Wed, Sep 18, 2019 at 3:17 PM Aljoscha Krettek 
> wrote:
>
> > I agree that NewClusterClient and ClusterClient can be merged now that
> > there is no pre-FLIP-6 code base anymore.
> >
> > Side note, there are a lot of methods in ClusterClient that should not
> > really be there, in my opinion:
> >  - all the getOptimizedPlan*() method
> >  - the run() methods. In the end, only submitJob should be required
> >
> > We should also see what Till (cc’ed) says, maybe he has an opinion on why
> > the separation should be kept.
> >
> > Best,
> > Aljoscha
> >
> > > On 18. Sep 2019, at 11:54, Zili Chen  wrote:
> > >
> > > Hi Xiaogang,
> > >
> > > Thanks for your reply.
> > >
> > > According to the feature discussion thread[1] client API enhancement
> is a
> > > planned
> > > feature towards 1.10 and thus I think this thread is valid if we can
> > reach
> > > a consensus
> > > and introduce new client API in this development cycle.
> > >
> > > Best,
> > > tison.
> > >
> > > [1]
> > >
> >
> https://lists.apache.org/thread.html/22639ca7de62a18f50e90db53e73910bd99b7f00c82f7494f4cb035f@%3Cdev.flink.apache.org%3E
> > >
> > >
> > > SHI Xiaogang  于2019年9月18日周三 下午3:03写道：
> > >
> > >> Hi Tison,
> > >>
> > >> Thanks for bringing this.
> > >>
> > >> I think it's fine to break the back compatibility of client API now
> that
> > >> ClusterClient is not well designed for public usage.
> > >> But from my perspective, we should postpone any modification to
> existing
> > >> interfaces until we come to an agreement on new client API. Otherwise,
> > our
> > >> users may adapt their implementation more than once.
> > >>
> > >> Regards,
> > >> Xiaogang
> > >>
> > >> Jeff Zhang  于2019年9月18日周三 上午10:49写道：
> > >>
> > >>> Thanks for raising this discussion. Overall +1 to merge
> > NewClusterClient
> > >>> into ClusterClient.
> > >>>
> > >>> 1. I think it is OK to break the backward compatibility. This current
> > >>> client api is no so clean which already cause issue for downstream
> > >> project
> > >>> and flink itself.
> > >>> In flink scala shell, I notice this kind of non-readable code
> > >>> Option[Either
> > >>> [MiniCluster , ClusterClient[_]]])
> > >>>
> > >>>
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> > >>> I also created tickets and PR to try to simply it.
> > >>> https://github.com/apache/flink/pull/8546
> > >>> https://github.com/apache/flink/pull/8533
> > >>>   Personally I don't think we need to keep backward compatibility for
> > >>> non-well-designed api, otherwise it will bring lots of unnecessary
> > >>> overhead.
> > >>>
> > >>> 2. Another concern is that I notice there're many implementation
> > details
> > >> in
> > >>> ClusterClient. I think we should just expose a thin interface, so
> maybe
> > >> we
> > >>> can create interface ClusterClient which includes as less methods as
> > >>> possible, and move all the implementation to AbstractClusterClient.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Zili Chen  于2019年9月18日周三 上午9:46写道：
> > >>>
> > >>>> Hi devs,
> > >>>>
> > >>>> FLINK-14096[1] was created yesterday. It is aimed at merge the
> bridge
> > >>>> class NewClusterClient into ClusterClient because with the effort
> > >>>> under FLINK-10392 this bridge class is no longer necessary.
> > >>>>
> > >>>> Technically in current codebase all implementation of

Re: How to prevent from launching 2 jobs at the same time

2019-09-22 Thread Zili Chen

The situation is as Dian said. Flink identifies jobs by job id instead of
job name.

However, I think it is still a valid question if it is an alternative Flink
identifies jobs by job name and
leaves the work to distinguish jobs by name to users. The advantages in
this way includes a readable
display and interaction, as well as reduce some hardcode works on job id,
such as we always set
job id to new JobID(0, 0) in standalone per-job mode for getting the same
ZK path.

Best,
tison.


Dian Fu  于2019年9月23日周一 上午10:55写道：

> Hi David,
>
> The jobs are identified by job id, not by job name internally in Flink and
> so It will only check if there are two jobs with the same job id.
>
> If you submit the job via CLI[1], I'm afraid there are still no built-in
> ways provided as currently the job id is generated randomly when submitting
> a job via CLI and the generated job id has nothing to do with the job name.
> However, if you submit the job via REST API [2], it did provide an option
> to specify the job id when submitting a job. You can generate the job id by
> yourself.
>
> Regards,
> Dian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
>
> 在 2019年9月23日，上午4:57，David Morin  写道：
>
> Hi,
>
> What is the best way to prevent from launching 2 jobs with the same name
> concurrently ?
> Instead of doing a check in the script that starts the Flink job, I would
> prefer to stop a job if another one with the same name is in progress
> (Exception or something like that).
>
> David
>
>
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-09-24 Thread Zili Chen

Hi Thomas,

>Should the new Executor execute method be defined as asynchronous? It could
>return a job handle to interact with the job and the legacy environments
>can still block to retain their semantics.

During our discussion there will be a method

executeAsync(...): CompletableFuture

where JobClient can be regarded as job handle in your context.

I think we remain

execute(...): JobExecutionResult

just for backward compatibility because this effort towards 1.10 which is
not a
major version bump.

BTW, I am drafting details of JobClient(as FLIP-74). Will start a separated
discussion
thread on that interface as soon as I finish an early version.

Best,
tison.


Thomas Weise  于2019年9月25日周三 上午1:17写道：

> Thanks for the proposal. These changes will make it significantly easier to
> programmatically use Flink in downstream frameworks.
>
> Should the new Executor execute method be defined as asynchronous? It could
> return a job handle to interact with the job and the legacy environments
> can still block to retain their semantics.
>
> (The blocking execution has also made things more difficult in Beam, we
> could simply switch to use Executor directly.)
>
> Thomas
>
>
> On Tue, Sep 24, 2019 at 6:48 AM Kostas Kloudas 
> wrote:
>
> > Hi all,
> >
> > In the context of the discussion about introducing the Job Client API
> [1],
> > there was a side-discussion about refactoring the way users submit jobs
> in
> > Flink. There were many different interesting ideas on the topic and 3
> > design documents that were trying to tackle both the issue about code
> > submission and the Job Client API.
> >
> > This discussion thread aims at the job submission part and proposes the
> > approach of introducing the Executor abstraction which will abstract the
> > job submission logic from the Environments and will make it API agnostic.
> >
> > The FLIP can be found at [2].
> >
> > Please keep the discussion here, in the mailing list.
> >
> > Looking forward to your opinions,
> > Kostas
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
> >
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-09-24 Thread Zili Chen

>Since Exceutor is a new interface, why is backward compatibility a concern?

For backward compatibility, it is on (Stream)ExecutionEnvironment#execute.
You're right that we don't stick to blocking to return a JobExecutionResult
in
Executor aspect but implementing env.execute with a unique

Executor#execute(or with suffix Async): CompletableFuture

what do you think @Kostas Kloudas ?

>I could see that become an issue later when replacing Executor execute with
>executeAsync. Or are both targeted for 1.10?

IIUC both Executors and JobClient are targeted for 1.10.


Thomas Weise  于2019年9月25日周三 上午2:39写道：

> Since Exceutor is a new interface, why is backward compatibility a concern?
>
> I could see that become an issue later when replacing Executor execute with
> executeAsync. Or are both targeted for 1.10?
>
>
> On Tue, Sep 24, 2019 at 10:24 AM Zili Chen  wrote:
>
> > Hi Thomas,
> >
> > >Should the new Executor execute method be defined as asynchronous? It
> > could
> > >return a job handle to interact with the job and the legacy environments
> > >can still block to retain their semantics.
> >
> > During our discussion there will be a method
> >
> > executeAsync(...): CompletableFuture
> >
> > where JobClient can be regarded as job handle in your context.
> >
> > I think we remain
> >
> > execute(...): JobExecutionResult
> >
> > just for backward compatibility because this effort towards 1.10 which is
> > not a
> > major version bump.
> >
> > BTW, I am drafting details of JobClient(as FLIP-74). Will start a
> separated
> > discussion
> > thread on that interface as soon as I finish an early version.
> >
> > Best,
> > tison.
> >
> >
> > Thomas Weise  于2019年9月25日周三 上午1:17写道：
> >
> > > Thanks for the proposal. These changes will make it significantly
> easier
> > to
> > > programmatically use Flink in downstream frameworks.
> > >
> > > Should the new Executor execute method be defined as asynchronous? It
> > could
> > > return a job handle to interact with the job and the legacy
> environments
> > > can still block to retain their semantics.
> > >
> > > (The blocking execution has also made things more difficult in Beam, we
> > > could simply switch to use Executor directly.)
> > >
> > > Thomas
> > >
> > >
> > > On Tue, Sep 24, 2019 at 6:48 AM Kostas Kloudas 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > In the context of the discussion about introducing the Job Client API
> > > [1],
> > > > there was a side-discussion about refactoring the way users submit
> jobs
> > > in
> > > > Flink. There were many different interesting ideas on the topic and 3
> > > > design documents that were trying to tackle both the issue about code
> > > > submission and the Job Client API.
> > > >
> > > > This discussion thread aims at the job submission part and proposes
> the
> > > > approach of introducing the Executor abstraction which will abstract
> > the
> > > > job submission logic from the Environments and will make it API
> > agnostic.
> > > >
> > > > The FLIP can be found at [2].
> > > >
> > > > Please keep the discussion here, in the mailing list.
> > > >
> > > > Looking forward to your opinions,
> > > > Kostas
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
> > > >
> > >
> >
>

[DISCUSS] Expose multiple level clients

2019-09-25 Thread Zili Chen

Hi all,

While drafting FLIP-74 I notice that a job level client(called JobClient)
is always retrieved from a
Flink application cluster level client(called ClusterClient), which is then
always retrieved from a
extern cluster(YARN, mesos, k8s, etc.) level client(called
ClusterDescriptor).

A Flink job management platform possibly requires all levels of client
mentioned above for
customized Flink cluster deployment, Flink cluster management, job
submission and job
management. They can be sorted into interfaces of different level of client
mentioned above.

ClusterDescriptor: Flink cluster deployment
ClusterClient(retrieved from ClusterDescriptor): Flink cluster management
and job submission
JobClient(retrieved from ClusterClient): job management

Recently we have FLIP-73 and FLIP-74 working on client API side, FLIP-74 is
aimed at the job
level client called JobClient which take responsibility of job management.
FLIP-73 is aimed at
a dedicated job executor which take responsibility of job submission.
However, for full functions to
handle Flink clusters & jobs, Flink cluster management and Flink cluster
deployment still require
public interface.These interface is mainly used for downstream project
developer instead of users
who are only interested in Flink job.

Further, we already used ClusterClient and ClusterDescriptor in
CliFrontend, sql-client,
scala-shell and so on, which can be regarded as downstream project hosted
in Flink repo.
Given this observation, there is no reason we don't expose API for Flink
cluster management
and Flink cluster deployment.

So here comes two question:

1. Do we want to expose multiple level clients as described above? If no,
why? If so, what does
the plan look like?
2. Does Executor introduced in the ongoing FLIP-73 break multiple level
clients layout
described above? Since ClusterClient already implements functions for job
submission, where
is Executor in the layout above? Does it overlap with existing client
concept?

Best,
tison.

[DISCUSS] FLIP-74: Flink JobClient API

2019-09-25 Thread Zili Chen

Hi all,

Summary from the discussion about introducing Flink JobClient API[1] we
draft FLIP-74[2] to
gather thoughts and towards a standard public user-facing interfaces.

This discussion thread aims at standardizing job level client API. But I'd
like to emphasize that
how to retrieve JobClient possibly causes further discussion on different
level clients exposed from
Flink so that a following thread will be started later to coordinate
FLIP-73 and FLIP-74 on
expose issue.

Looking forward to your opinions.

Best,
tison.

[1]
https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API

Re: CiBot Update

2019-09-25 Thread Zili Chen

Aug 22, 2019 at 5:02 PM Zhu Zhu  > <mailto:reed...@gmail.com>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks Chesnay for the CI improvement!
> > >>>>>>>> It is very helpful.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Zhu Zhu
> > >>>>>>>>
> > >>>>>>>> zhijiang  > wangzhijiang...@aliyun.com.invalid>> 于2019年8月22日周四 下午4:18写道：
> > >>>>>>>>
> > >>>>>>>>> It is really very convenient now. Valuable work, Chesnay!
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Zhijiang
> > >>>>>>>>>
> > --
> > >>>>>>>>> From:Till Rohrmann  > trohrm...@apache.org>>
> > >>>>>>>>> Send Time:2019年8月22日(星期四) 10:13
> > >>>>>>>>> To:dev mailto:dev@flink.apache.org>>
> > >>>>>>>>> Subject:Re: CiBot Update
> > >>>>>>>>>
> > >>>>>>>>> Thanks for the continuous work on the CiBot Chesnay!
> > >>>>>>>>>
> > >>>>>>>>> Cheers,
> > >>>>>>>>> Till
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Aug 22, 2019 at 9:47 AM Jark Wu  > <mailto:imj...@gmail.com>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Great work! Thanks Chesnay!
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, 22 Aug 2019 at 15:42, Xintong Song <
> > tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>> The re-triggering travis feature is so convenient. Thanks
> > Chesnay~!
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thank you~
> > >>>>>>>>>>>
> > >>>>>>>>>>> Xintong Song
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Aug 22, 2019 at 9:26 AM Stephan Ewen <
> se...@apache.org
> > <mailto:se...@apache.org>>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>> Nice, thanks!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, Aug 22, 2019 at 3:59 AM Zili Chen <
> > wander4...@gmail.com <mailto:wander4...@gmail.com>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>> Thanks for your announcement. Nice work!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> tison.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> vino yang  > yanghua1...@gmail.com>> 于2019年8月22日周四 上午8:14写道：
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> +1 for "@flinkbot run travis", it is very convenient.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Chesnay Schepler  > ches...@apache.org>> 于2019年8月21日周三
> > >>>>>>> 下午9:12写道：
> > >>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> this is an update on recent changes to the CI bot.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The bot now cancels builds if a new commit was added to a
> > >>>>>>> PR,
> > >>>>>>>>> and
> > >>>>>>>>>>>>>>> cancels all builds if the PR was closed.
> > >>>>>>>>>>>>>>> (This was implemented a while ago; I'm just mentioning it
> > >>>>>>>> again
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>> discoverability)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Additionally, starting today you can now re-trigger a
> > >>>>>>> Travis
> > >>>>>>>>> run
> > >>>>>>>>>> by
> > >>>>>>>>>>>>>>> writing a comment "@flinkbot run travis"; this means you
> no
> > >>>>>>>>>> longer
> > >>>>>>>>>>>> have
> > >>>>>>>>>>>>>>> to commit an empty commit or do other shenanigans to get
> > >>>>>>>>> another
> > >>>>>>>>>>>> build
> > >>>>>>>>>>>>>>> running.
> > >>>>>>>>>>>>>>> Note that this will /not/ work if the PR was re-opened,
> > >>>>>>> until
> > >>>>>>>>> at
> > >>>>>>>>>>>> least
> > >>>>>>>>>>>>> 1
> > >>>>>>>>>>>>>>> new build was triggered by a push.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>
> > >>>
> > >>
> > >
> >
> >
>

[DISCUSS] Remove uncompleted YARNHighAvailabilityService

2019-09-26 Thread Zili Chen

Hi devs,

Noticed that there are several stale & uncompleted high-availability
services implementations, I
start this thread in order to see whether or not we can remove them for a
clean codebase work on
the ongoing high-availability refactor effort[1].

Below are all of classes I noticed.

- YarnHighAvailabilityServices
- AbstractYarnNonHaServices
- YarnIntraNonHaMasterServices
- YarnPreConfiguredMasterNonHaServices
- SingleLeaderElectionService
- FsNegativeRunningJobsRegistry
(as well as their dedicated tests)

Any suggestion?
Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-10333

Re: [DISCUSS] Expose multiple level clients

2019-09-26 Thread Zili Chen

Thanks for your reply Kostas!

Sorry for replying late. The most unclear point about multi-layered client
is about where Executor is.
I will also separate the corresponding reply into FLIP-73 and FLIP-74
dedicated thread.

Best,
tison.


Kostas Kloudas  于2019年9月25日周三 下午9:31写道：

> Hi Tison,
>
> I shared some thoughts on this topic in the FLIP-74 thread, as some of
> the questions are related to that.
>
> Cheers,
> Kostas
>
> On Wed, Sep 25, 2019 at 12:30 PM Zili Chen  wrote:
> >
> > Hi all,
> >
> > While drafting FLIP-74 I notice that a job level client(called
> JobClient) is always retrieved from a
> > Flink application cluster level client(called ClusterClient), which is
> then always retrieved from a
> > extern cluster(YARN, mesos, k8s, etc.) level client(called
> ClusterDescriptor).
> >
> > A Flink job management platform possibly requires all levels of client
> mentioned above for
> > customized Flink cluster deployment, Flink cluster management, job
> submission and job
> > management. They can be sorted into interfaces of different level of
> client mentioned above.
> >
> > ClusterDescriptor: Flink cluster deployment
> > ClusterClient(retrieved from ClusterDescriptor): Flink cluster
> management and job submission
> > JobClient(retrieved from ClusterClient): job management
> >
> > Recently we have FLIP-73 and FLIP-74 working on client API side, FLIP-74
> is aimed at the job
> > level client called JobClient which take responsibility of job
> management. FLIP-73 is aimed at
> > a dedicated job executor which take responsibility of job submission.
> However, for full functions to
> > handle Flink clusters & jobs, Flink cluster management and Flink cluster
> deployment still require
> > public interface.These interface is mainly used for downstream project
> developer instead of users
> > who are only interested in Flink job.
> >
> > Further, we already used ClusterClient and ClusterDescriptor in
> CliFrontend, sql-client,
> > scala-shell and so on, which can be regarded as downstream project
> hosted in Flink repo.
> > Given this observation, there is no reason we don't expose API for Flink
> cluster management
> > and Flink cluster deployment.
> >
> > So here comes two question:
> >
> > 1. Do we want to expose multiple level clients as described above? If
> no, why? If so, what does
> > the plan look like?
> > 2. Does Executor introduced in the ongoing FLIP-73 break multiple level
> clients layout
> > described above? Since ClusterClient already implements functions for
> job submission, where
> > is Executor in the layout above? Does it overlap with existing client
> concept?
> >
> > Best,
> > tison.
>

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-09-26 Thread Zili Chen

future everything is DataStream.
> Essentially, I think of these as layers of an onion with the clients
> being close to the core. The higher you go, the more functionality is
> included and hidden from the public eye.
>
> Point iii) by the way is just a thought and by no means final. I also
> like the idea of multi-layered clients so this may spark up the
> discussion.
>
> Cheers,
> Kostas
>
> On Wed, Sep 25, 2019 at 2:21 PM Aljoscha Krettek 
> wrote:
> >
> > Hi Tison,
> >
> > Thanks for proposing the document! I had some comments on the document.
> >
> > I think the only complex thing that we still need to figure out is how
> to get a JobClient for a job that is already running. As you mentioned in
> the document. Currently I’m thinking that its ok to add a method to
> Executor for retrieving a JobClient for a running job by providing an ID.
> Let’s see what Kostas has to say on the topic.
> >
> > Best,
> > Aljoscha
> >
> > > On 25. Sep 2019, at 12:31, Zili Chen  wrote:
> > >
> > > Hi all,
> > >
> > > Summary from the discussion about introducing Flink JobClient API[1] we
> > > draft FLIP-74[2] to
> > > gather thoughts and towards a standard public user-facing interfaces.
> > >
> > > This discussion thread aims at standardizing job level client API. But
> I'd
> > > like to emphasize that
> > > how to retrieve JobClient possibly causes further discussion on
> different
> > > level clients exposed from
> > > Flink so that a following thread will be started later to coordinate
> > > FLIP-73 and FLIP-74 on
> > > expose issue.
> > >
> > > Looking forward to your opinions.
> > >
> > > Best,
> > > tison.
> > >
> > > [1]
> > >
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
> >
>

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-09-27 Thread Zili Chen

About JobCluster

Actually I am not quite sure what we gains from DETACHED configuration on
cluster side.
We don't have a NON-DETACHED JobCluster in fact in our codebase, right?

It comes to me one major questions we have to answer first.

*What JobCluster conceptually is exactly*

Related discussion can be found in JIRA[1] and mailing list[2]. Stephan
gives a nice
description of JobCluster:

Two things to add: - The job mode is very nice in the way that it runs the
client inside the cluster (in the same image/process that is the JM) and
thus unifies both applications and what the Spark world calls the "driver
mode". - Another thing I would add is that during the FLIP-6 design, we
were thinking about setups where Dispatcher and JobManager are separate
processes. A Yarn or Mesos Dispatcher of a session could run independently
(even as privileged processes executing no code). Then you the "per-job"
mode could still be helpful: when a job is submitted to the dispatcher, it
launches the JM again in a per-job mode, so that JM and TM processes are
bound to teh job only. For higher security setups, it is important that
processes are not reused across jobs.

However, currently in "per-job" mode we generate JobGraph in client side,
launching
the JobCluster and retrieve the JobGraph for execution. So actually, we
don't "run the
client inside the cluster".

Besides, refer to the discussion with Till[1], it would be helpful we
follow the same process
of session mode for that of "per-job" mode in user perspective, that we
don't use
OptimizedPlanEnvironment to create JobGraph, but directly deploy Flink
cluster in env.execute.

Generally 2 points

1. Running Flink job by invoke user main method and execute throughout,
instead of create
JobGraph from main-class.
2. Run the client inside the cluster.

If 1 and 2 are implemented. There is obvious no need for DETACHED mode in
cluster side
because we just shutdown the cluster on the exit of client that running
inside cluster. Whether
or not delivered the result is up to user code.

[1]
https://issues.apache.org/jira/browse/FLINK-14051?focusedCommentId=16931388&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16931388
[2]
https://lists.apache.org/x/thread.html/e8f14a381be6c027e8945f884c3cfcb309ce49c1ba557d3749fca495@%3Cdev.flink.apache.org%3E


Zili Chen  于2019年9月27日周五 下午2:13写道：

> Thanks for your replies Kostas & Aljoscha!
>
> Below are replies point by point.
>
> 1. For DETACHED mode, what I said there is about the DETACHED mode in
> client side.
> There are two configurations overload the item DETACHED[1].
>
> In client side, it means whether or not client.submitJob is blocking to
> job execution result.
> Due to client.submitJob returns CompletableFuture NON-DETACHED
> is no
> power at all. Caller of submitJob makes the decision whether or not
> blocking to get the
> JobClient and request for the job execution result. If client crashes, it
> is a user scope
> exception that should be handled in user code; if client lost connection
> to cluster, we have
> a retry times and interval configuration that automatically retry and
> throws an user scope
> exception if exceed.
>
> Your comment about poll for result or job result sounds like a concern on
> cluster side.
>
> In cluster side, DETACHED mode is alive only in JobCluster. If DETACHED
> configured,
> JobCluster exits on job finished; if NON-DETACHED configured, JobCluster
> exits on job
> execution result delivered. FLIP-74 doesn't stick to changes on this
> scope, it is just remained.
>
> However, it is an interesting part we can revisit this implementation a
> bit.
>
> 
>
> 2. The retrieval of JobClient is so important that if we don't have a way
> to retrieve JobClient it is
> a dumb public user-facing interface(what a strange state :P).
>
> About the retrieval of JobClient, as mentioned in the document, two ways
> should be supported.
>
> (1). Retrieved as return type of job submission.
> (2). Retrieve a JobClient of existing job.(with job id)
>
> I highly respect your thoughts about how Executors should be and thoughts
> on multi-layered clients.
> Although, (2) is not supported by public interfaces as summary of
> discussion above, we can discuss
> a bit on the place of Executors on multi-layered clients and find a way to
> retrieve JobClient of
> existing job with public client API. I will comment in FLIP-73 thread[2]
> since it is almost about Executors.
>
> Best,
> tison.
>
> [1]
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?disco=DnLLvM8
> [2]
> https://lists.apache.org/x/thread.html/dc3a541709f96906b43df4155373af1cd09e08c3f105b0bd0ba3fca2@%3Cdev.flink.apache.org%3E
>

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-09-27 Thread Zili Chen

modify

/we just shutdown the cluster on the exit of client that running inside
cluster/

to

we just shutdown the cluster on both the exit of client that running inside
cluster and the finish of job.
Since client is running inside cluster we can easily wait for the end of
two both in ClusterEntrypoint.


Zili Chen  于2019年9月27日周五 下午3:13写道：

> About JobCluster
>
> Actually I am not quite sure what we gains from DETACHED configuration on
> cluster side.
> We don't have a NON-DETACHED JobCluster in fact in our codebase, right?
>
> It comes to me one major questions we have to answer first.
>
> *What JobCluster conceptually is exactly*
>
> Related discussion can be found in JIRA[1] and mailing list[2]. Stephan
> gives a nice
> description of JobCluster:
>
> Two things to add: - The job mode is very nice in the way that it runs the
> client inside the cluster (in the same image/process that is the JM) and
> thus unifies both applications and what the Spark world calls the "driver
> mode". - Another thing I would add is that during the FLIP-6 design, we
> were thinking about setups where Dispatcher and JobManager are separate
> processes. A Yarn or Mesos Dispatcher of a session could run independently
> (even as privileged processes executing no code). Then you the "per-job"
> mode could still be helpful: when a job is submitted to the dispatcher, it
> launches the JM again in a per-job mode, so that JM and TM processes are
> bound to teh job only. For higher security setups, it is important that
> processes are not reused across jobs.
>
> However, currently in "per-job" mode we generate JobGraph in client side,
> launching
> the JobCluster and retrieve the JobGraph for execution. So actually, we
> don't "run the
> client inside the cluster".
>
> Besides, refer to the discussion with Till[1], it would be helpful we
> follow the same process
> of session mode for that of "per-job" mode in user perspective, that we
> don't use
> OptimizedPlanEnvironment to create JobGraph, but directly deploy Flink
> cluster in env.execute.
>
> Generally 2 points
>
> 1. Running Flink job by invoke user main method and execute throughout,
> instead of create
> JobGraph from main-class.
> 2. Run the client inside the cluster.
>
> If 1 and 2 are implemented. There is obvious no need for DETACHED mode in
> cluster side
> because we just shutdown the cluster on the exit of client that running
> inside cluster. Whether
> or not delivered the result is up to user code.
>
> [1]
> https://issues.apache.org/jira/browse/FLINK-14051?focusedCommentId=16931388&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16931388
> [2]
> https://lists.apache.org/x/thread.html/e8f14a381be6c027e8945f884c3cfcb309ce49c1ba557d3749fca495@%3Cdev.flink.apache.org%3E
>
>
> Zili Chen  于2019年9月27日周五 下午2:13写道：
>
>> Thanks for your replies Kostas & Aljoscha!
>>
>> Below are replies point by point.
>>
>> 1. For DETACHED mode, what I said there is about the DETACHED mode in
>> client side.
>> There are two configurations overload the item DETACHED[1].
>>
>> In client side, it means whether or not client.submitJob is blocking to
>> job execution result.
>> Due to client.submitJob returns CompletableFuture NON-DETACHED
>> is no
>> power at all. Caller of submitJob makes the decision whether or not
>> blocking to get the
>> JobClient and request for the job execution result. If client crashes, it
>> is a user scope
>> exception that should be handled in user code; if client lost connection
>> to cluster, we have
>> a retry times and interval configuration that automatically retry and
>> throws an user scope
>> exception if exceed.
>>
>> Your comment about poll for result or job result sounds like a concern on
>> cluster side.
>>
>> In cluster side, DETACHED mode is alive only in JobCluster. If DETACHED
>> configured,
>> JobCluster exits on job finished; if NON-DETACHED configured, JobCluster
>> exits on job
>> execution result delivered. FLIP-74 doesn't stick to changes on this
>> scope, it is just remained.
>>
>> However, it is an interesting part we can revisit this implementation a
>> bit.
>>
>> 
>>
>> 2. The retrieval of JobClient is so important that if we don't have a way
>> to retrieve JobClient it is
>> a dumb public user-facing interface(what a strange state :P).
>>
>> About the retrieval of JobClient, as mentioned in the document, two ways
>> should be supported.
>>
>> (1). Retrieved as return type of job

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-09-27 Thread Zili Chen

Thanks for your reply Kostas.

As mentioned in FLIP-74 thread[1] there are two questions on Executor design

(1) Where Executor is in a multi-layered clients view.
(2) A bit more details about PerJobExecutor implementation.

For (1) Where Executor is in a multi-layered clients view,

As described in the multi-layered client thread[2], in our current
codebase, with JobClient
introduced in FLIP-74, clients can be layered as

1) ClusterDescriptor: interact with external resource manager, responsible
for deploy Flink
application cluster and retrieve Flink application cluster client.
2) ClusterClient: interact with Flink application cluster, responsible for
query cluster level
status, submit Flink job and retrieve Flink job client.
3) JobClient: interact with Flink job, responsible for query job level
status and perform job
level operation such as trigger savepoint.

However, the singularity is JobCluster, which couple a bit cluster
deployment and job
submission. From my perspective with FLIP-73 and Kostas's thoughts in
FLIP-74 thread,
we form a multi-layered client as below

1) Executor: responsible for job submission, whether the corresponding
cluster is
SessionCluster or JobCluster doesn't matter. Executor always returns
JobClient.
2). ClusterClientFactory: responsible for deploy session cluster and
retrieve session cluster
client.
3). ClusterClient: interact with session cluster, responsible for query
cluster level
status, submit Flink job and retrieve Flink job client.
4) JobClient: interact with Flink job, responsible for query job level
status and perform job
level operation such as trigger savepoint.

I am not sure if the structure above is the same as that in your mind. If
so, there are two questions

I). It seems we cannot have a ClusterClient of JobCluster. Is it
expected(due to the cluster bound to the job)?
II). It seems we treat session cluster quite different from job cluster,
but cluster client can submit a job, which
overlaps a bit with Executor.

For (2) A bit more details about PerJobExecutor implementation,

>From the content of FLIP-73 it doesn't describe how PerJobExecutor would be
although it is spoken a bit in
the design document[3]. In FLIP-74 thread I forward previous insights in
our community which towards two
attributes of JobCluster

I). Running Flink job by invoke user main method and execute throughout,
instead of create JobGraph from main-class.
II). Run the client inside the cluster.

Does PerJobExecutor fit this requirement? Anyway, it would be helpful we
describe the abstraction of Executor
in the FLIP, at least the different between PerJobExecutor and
SessionExecutor is essential.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/x/thread.html/240582148eda905a772d59b2424cb38fa16ab993647824d178cacb02@%3Cdev.flink.apache.org%3E
[3]
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?ts=5d8cbe34#heading=h.qq4wc2suukg

Kostas Kloudas  于2019年9月25日周三 下午8:27写道：

> Hi,
>
> @Aljoscha, I believe that it is better to be done like this so that we
> do not step on each-other's feet. If the executor already "knew" about
> the JobClient, then we should also know about how we expect the
> JobExecutionResult is retrieved (which is part of FLIP-74). I think it
> is nice to have each discussion self-contained.
>
> Cheers,
> Kostas
>
> On Wed, Sep 25, 2019 at 2:13 PM Aljoscha Krettek 
> wrote:
> >
> > Hi,
> >
> > I’m fine with either signature for the new execute() method but I think
> we should focus on the executor discovery and executor configuration part
> in this FLIP while FLIP-74 is about the evolution of the method signature
> to return a future.
> >
> > I understand that it’s a bit weird, that this FLIP introduces a new
> interface only to be changed within the same Flink release in a follow-up
> FLIP. But I think we can still do it. What do you think?
> >
> > Best,
> > Aljoscha
> >
> > > On 25. Sep 2019, at 10:11, Kostas Kloudas  wrote:
> > >
> > > Hi Thomas and Zili,
> > >
> > > As you both said the Executor is a new addition so there are no
> > > compatibility concerns.
> > > Backwards compatibility comes into play on the
> > > (Stream)ExecutionEnvironment#execute().
> > >
> > > This method has to stay and keep having the same (blocking) semantics,
> > > but we can
> > > add a new one, sth along the lines of executeAsync() that will return
> > > the JobClient and
> > > will allow the caller to interact with the job.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Wed, Sep 25, 2019 at 2:44 AM Zili Chen 
> wrote:
&g

Re: REST API / JarRunHandler: More flexibility for launching jobs

2019-09-29 Thread Zili Chen

jump on a call about this because these
> things
> >> are very tricky to figure out and I might be wrong.
> >>
> >> Best,
> >> Aljoscha
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
> >> [2]
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?ts=5d88e631
> >>
> >>> On 6. Aug 2019, at 09:51, Till Rohrmann  wrote:
> >>>
> >>> I think there was the idea to make the JobGraph a "public"/stable
> >> interface
> >>> other projects can rely on at some point. If I remember correctly, then
> >> we
> >>> wanted to define a proto buf definition for the JobGraph so that
> clients
> >>> written in a different language can submit JobGraphs and we could
> extend
> >>> the data structure. As far as I know, this effort hasn't been started
> yet
> >>> and is still in the backlog (I think there doesn't exist a JIRA issue
> >> yet).
> >>>
> >>> The problem came up when discussing additions to the JobGraph because
> >> they
> >>> need to be backwards compatible otherwise newer version of Flink would
> >> not
> >>> be able to recover jobs. I think so far Flink provides backwards
> >>> compatibility between different versions of the JobGraph. However, this
> >> is
> >>> not officially guaranteed.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Tue, Aug 6, 2019 at 3:56 AM Zili Chen  wrote:
> >>>
> >>>> It sounds like a request to change the interface Program into
> >>>>
> >>>> public interface Program {
> >>>> JobGraph getJobGraph(String... args);
> >>>> }
> >>>>
> >>>> Also, given that JobGraph is said as internal interface or
> >>>> cannot be relied on, we might introduce and use a
> >>>> representation that allows for cross version compatibility.
> >>>>
> >>>>
> >>>> Thomas Weise  于2019年8月6日周二 上午12:11写道：
> >>>>
> >>>>> If the goal is to keep job creation and job submission separate and
> we
> >>>>> agree that there should be more flexibility for the job construction,
> >>>> then
> >>>>> JobGraph and friends should be stable API that the user can depend
> on.
> >> If
> >>>>> that's the case, the path Chesnay pointed to may become viable.
> >>>>>
> >>>>> There was discussion in the past that JobGraph cannot be relied on
> WRT
> >>>>> backward compatibility and I would expect that at some point we want
> to
> >>>>> move to a representation that allows for cross version compatibility.
> >>>> Beam
> >>>>> is an example how this could be accomplished (with its pipeline
> proto).
> >>>>>
> >>>>> So if the Beam job server was able to produce the JobGraph, is there
> >>>>> agreement that we should provide a mechanism that allows the program
> >>>> entry
> >>>>> point to return the JobGraph directly (without using the
> >>>>> ExecutionEnvironment to build it)?
> >>>>>
> >>>>>
> >>>>> On Mon, Aug 5, 2019 at 2:10 AM Zili Chen 
> wrote:
> >>>>>
> >>>>>> Hi Thomas,
> >>>>>>
> >>>>>> If REST handler calls main(), the behavior inside main() is
> >>>>>> unpredictable.
> >>>>>>
> >>>>>> Now the jar run handler extract the job graph and submit
> >>>>>> it with the job id configured in REST request. If REST
> >>>>>> handler calls main() we can hardly even know how much
> >>>>>> jobs are executed.
> >>>>>>
> >>>>>> A new environment, as you said,
> >>>>>> ExtractJobGraphAndSubmitToDispatcherEnvironment can be
> >>>>>> added to satisfy your requirement. However, it is a bit
> >>>>>> out of Flink scope. It might be better to write your own
> >>>>>> REST handler.
> >>>>>>
> >>>>>> WebMonitorExtension is for extending REST handlers but
> >>>>>> it seems also unable to cust

Re: [DISCUSS] Remove uncompleted YARNHighAvailabilityService

2019-09-30 Thread Zili Chen

Any suggestion?

IMO it is exactly inactive for quite some time, we can remove these
uncompleted codes at least for now and re-introduce if needed.

Best,
tison.


Zili Chen  于2019年9月27日周五 上午9:23写道：

> Hi devs,
>
> Noticed that there are several stale & uncompleted high-availability
> services implementations, I
> start this thread in order to see whether or not we can remove them for a
> clean codebase work on
> the ongoing high-availability refactor effort[1].
>
> Below are all of classes I noticed.
>
> - YarnHighAvailabilityServices
> - AbstractYarnNonHaServices
> - YarnIntraNonHaMasterServices
> - YarnPreConfiguredMasterNonHaServices
> - SingleLeaderElectionService
> - FsNegativeRunningJobsRegistry
> (as well as their dedicated tests)
>
> Any suggestion?
> Best,
> tison.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10333
>

Re: [DISCUSS] Remove uncompleted YARNHighAvailabilityService

2019-09-30 Thread Zili Chen

Thanks for your reply Till.

I will wait a bit for other thoughts, and create JIRA and start progress if
no further objections.

Best,
tison.


Till Rohrmann  于2019年9月30日周一 下午5:51写道：

> Hi Tison,
>
> I agree that unused HA implementations can be removed since they are dead
> code. If we should need them in the future, then we can still get them by
> going back a bit in time. Hence +1 for removing unused HA implementations.
>
> Cheers,
> Till
>
> On Mon, Sep 30, 2019 at 10:42 AM Zili Chen  wrote:
>
> > Any suggestion?
> >
> > IMO it is exactly inactive for quite some time, we can remove these
> > uncompleted codes at least for now and re-introduce if needed.
> >
> > Best,
> > tison.
> >
> >
> > Zili Chen  于2019年9月27日周五 上午9:23写道：
> >
> > > Hi devs,
> > >
> > > Noticed that there are several stale & uncompleted high-availability
> > > services implementations, I
> > > start this thread in order to see whether or not we can remove them
> for a
> > > clean codebase work on
> > > the ongoing high-availability refactor effort[1].
> > >
> > > Below are all of classes I noticed.
> > >
> > > - YarnHighAvailabilityServices
> > > - AbstractYarnNonHaServices
> > > - YarnIntraNonHaMasterServices
> > > - YarnPreConfiguredMasterNonHaServices
> > > - SingleLeaderElectionService
> > > - FsNegativeRunningJobsRegistry
> > > (as well as their dedicated tests)
> > >
> > > Any suggestion?
> > > Best,
> > > tison.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-10333
> > >
> >
>

[CODE STYLE] Parameters of method are always final

2019-10-01 Thread Zili Chen

Hi devs,

Coming from this discussion[1] I'd like to propose that in Flink codebase
we suggest a code style
that parameters of method are always final. Almost everywhere parameters of
method are final
already and if we have such consensus we can prevent redundant final
modifier in method
declaration so that we survive from those noise.

Here are some cases that might require to modify a parameter.

1. to set default; especially if (param == null) { param = ... }
2. to refine parameter; it is in pattern if ( ... ) { param =
refine(param); }

Either of the cases we can move the refine/set default logics to the caller
or select another
name for the refined value case by case.

Looking forward to your feedbacks :-)

Best,
tison.

[1] https://github.com/apache/flink/pull/9788#discussion_r329314681

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-10-02 Thread Zili Chen

Hi Konstantin,

* should we add "cancelWithSavepeoint" to a new public API, when we have
deprecated the corresponding REST API/CLI methods? In my understanding
there is no reason to use it anymore.

Good point. We can exclude "cancelWithSavepoint" from public API at least
for now,
since it is deprecated already. Let's see if there is other concerns.

* should we call "stopWithSavepoint" simply "stop" as "stop" always
performs a savepoint?

Well for naming issue I'm fine with that if it is a consensus of our
community. I can see
there is a "stop" CLI command which means "stop with savepoint".

Best,
tison.


Konstantin Knauf  于2019年9月30日周一 下午12:16写道：

> Hi Thomas,
>
> maybe there is a misunderstanding. There is no plan to deprecate anything
> in the REST API in the process of introducing the JobClient API, and it
> shouldn't.
>
> Since "cancel with savepoint" was already deprecated in the REST API and
> CLI, I am just raising the question whether to add it to the JobClient API
> in the first place.
>
> Best,
>
> Konstantin
>
>
>
> On Mon, Sep 30, 2019 at 1:16 AM Thomas Weise  wrote:
>
> > I did not realize there was a plan to deprecate anything in the REST API?
> >
> > The REST API is super important for tooling written in non JVM languages,
> > that does not include a Flink client (like FlinkK8sOperator). The REST
> API
> > should continue to support all job management operations, including job
> > submission.
> >
> > Thomas
> >
> >
> > On Sun, Sep 29, 2019 at 1:37 PM Konstantin Knauf <
> konstan...@ververica.com
> > >
> > wrote:
> >
> > > Hi Zili,
> > >
> > > thanks for working on this topic. Just read through the FLIP and I have
> > two
> > > questions:
> > >
> > > * should we add "cancelWithSavepeoint" to a new public API, when we
> have
> > > deprecated the corresponding REST API/CLI methods? In my understanding
> > > there is no reason to use it anymore.
> > > * should we call "stopWithSavepoint" simply "stop" as "stop" always
> > > performs a savepoint?
> > >
> > > Best,
> > >
> > > Konstantin
> > >
> > >
> > >
> > > On Fri, Sep 27, 2019 at 10:48 AM Aljoscha Krettek  >
> > > wrote:
> > >
> > > > Hi Flavio,
> > > >
> > > > I agree that this would be good to have. But I also think that this
> is
> > > > outside the scope of FLIP-74, I think it is an orthogonal feature.
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > > On 27. Sep 2019, at 10:31, Flavio Pompermaier <
> pomperma...@okkam.it>
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > > just a remark about the Flink REST APIs (and its client as well):
> > > almost
> > > > > all the times we need a way to dynamically know which jobs are
> > > contained
> > > > in
> > > > > a jar file, and this could be exposed by the REST endpoint under
> > > > > /jars/:jarid/entry-points (a simple way to implement this would be
> to
> > > > check
> > > > > the value of Main-class or Main-classes inside the Manifest of the
> > jar
> > > if
> > > > > they exists [1]).
> > > > >
> > > > > I understand that this is something that is not strictly required
> to
> > > > > execute Flink jobs but IMHO it would ease A LOT the work of UI
> > > developers
> > > > > that could have a way to show the users all available jobs inside a
> > > jar +
> > > > > their configurable parameters.
> > > > > For example, right now in the WebUI, you can upload a jar and then
> > you
> > > > have
> > > > > to set (without any autocomplete or UI support) the main class and
> > > their
> > > > > params (for example using a string like --param1 xx --param2 yy).
> > > > > Adding this functionality to the REST API and the respective client
> > > would
> > > > > enable the WebUI (and all UIs interacting with a Flink cluster) to
> > > > prefill
> > > > > a dropdown list containing the list of entry-point classes (i.e.
> > Flink
> > > > > jobs) and, once selected, their required (typed) parameters.
> > > > >
> > > > > Best,
> >

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

Thanks for your thoughts Kostas!

I agree Executor to be a concept on clients now. And sincerely second the
description

Now the Executor simply uses a client, e.g. a ClusterClient, to submit
the job (JobGraph) that it will create from the user program.
In that sense, the Executor is one level of abstraction above the
clients, as it adds more functionality and it uses the one offered by
the client.

In fact, let's think of the statement an Executor simply uses a client to
submit the job.
I'd like to give a description of how job submission works in per-job mode
and it will
follow a similar view now which

(1) achieve run client on cluster side @Stephan Ewen 
(2) support multi-parts per-job program so that we don't hack to fallback
to session cluster
in this case @Till Rohrmann 

Let's start with an example we submit a user program via CLI in per-job
mode.

1) CLI generates configuration for getting all information about deployment.
2) CLI deploys a job cluster *with user jars* and specially mark the jar
contains user program
3) JobClusterEntrypoint takes care of the bootstrap of flink cluster and
executes user program,
respects all configuration passed from client
4) user program now runs on cluster side, it starts executing main method,
get a environment with
information of the associated job cluster. since the cluster has already
started, it can submit the
job to that cluster as in session cluster.
5) job cluster shutdown on user program exits *and* Dispatcher doesn't
maintain any jobs.

Since we actually runs client on cluster side we can execute multi-parts
program because we submit
to local cluster one by one. And because we change the process from

- start a per job cluster with job graph

to

+ start a per job cluster with user program

we runs client on cluster side, it avoids that we "extract" job graph from
user program which limits
on multi-parts program and doesn't respect user logic outside of Flink
related code.

Take session scenario into consideration, overall we now have

1. ClusterDeployer and its factory which are SPI for platform developers so
that they can deploy a
job cluster with user program or session cluster.

2. Environment and Executor is unified. Environment helps describe user
program logic and internally
compile the job as well as submit job with Executor. Executor always make
use of a ClusterClient
to submit the job. Specifically, in per-job mode, Environment reads
configuration refined by job cluster
so that it knows how to generate a ClusterClient.

3. Platform developers gets ClusterClient as return value of deploy method
of ClusterDeployer or
retrieves from an existing public known session Cluster(by ClusterRetriever
or extend ClusterDeploy to
another general concept).

4. JobClient can be used by user program writer or platform developer for
manage job in different condition.

There are many other refactor we can do to respect this architecture but
let's re-emphasize the key difference

** job cluster doesn't start with a job graph anymore but start with a user
program and it runs the program
on the same place as the cluster runs on. So that for the program, it is
nothing different to a session cluster.
It just an existing cluster. **

Best,
tison.

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

Thomas,

For your requirement on jar run REST endpoint we can follow a similar way
as job cluster way described above. That is, we submit the jar and runs user
program on cluster side.

Also cc Aljoscha,

There is no JobManager.runJar in mind. All logic that handles jar run is in
WebMonitor,
we prepare the configuration especially that for looking up cluster for the
program
and execute it in another progress.

Obviously it might cause some security issues because main method possibly
contains
arbitrary codes, but

1. We can provide an option to forbid jar run on cluster when it is
considered important
2. Session cluster already share resource between jobs, if a job want to
have its dedicate
resource, use per-job mode.

Does this way fits your requirement Thomas?


Zili Chen  于2019年10月2日周三 下午4:57写道：

> Thanks for your thoughts Kostas!
>
> I agree Executor to be a concept on clients now. And sincerely second the
> description
>
> Now the Executor simply uses a client, e.g. a ClusterClient, to submit
> the job (JobGraph) that it will create from the user program.
> In that sense, the Executor is one level of abstraction above the
> clients, as it adds more functionality and it uses the one offered by
> the client.
>
> In fact, let's think of the statement an Executor simply uses a client to
> submit the job.
> I'd like to give a description of how job submission works in per-job mode
> and it will
> follow a similar view now which
>
> (1) achieve run client on cluster side @Stephan Ewen 
> (2) support multi-parts per-job program so that we don't hack to fallback
> to session cluster
> in this case @Till Rohrmann 
>
> Let's start with an example we submit a user program via CLI in per-job
> mode.
>
> 1) CLI generates configuration for getting all information about
> deployment.
> 2) CLI deploys a job cluster *with user jars* and specially mark the jar
> contains user program
> 3) JobClusterEntrypoint takes care of the bootstrap of flink cluster and
> executes user program,
> respects all configuration passed from client
> 4) user program now runs on cluster side, it starts executing main method,
> get a environment with
> information of the associated job cluster. since the cluster has already
> started, it can submit the
> job to that cluster as in session cluster.
> 5) job cluster shutdown on user program exits *and* Dispatcher doesn't
> maintain any jobs.
>
> Since we actually runs client on cluster side we can execute multi-parts
> program because we submit
> to local cluster one by one. And because we change the process from
>
> - start a per job cluster with job graph
>
> to
>
> + start a per job cluster with user program
>
> we runs client on cluster side, it avoids that we "extract" job graph from
> user program which limits
> on multi-parts program and doesn't respect user logic outside of Flink
> related code.
>
> Take session scenario into consideration, overall we now have
>
> 1. ClusterDeployer and its factory which are SPI for platform developers
> so that they can deploy a
> job cluster with user program or session cluster.
>
> 2. Environment and Executor is unified. Environment helps describe user
> program logic and internally
> compile the job as well as submit job with Executor. Executor always make
> use of a ClusterClient
> to submit the job. Specifically, in per-job mode, Environment reads
> configuration refined by job cluster
> so that it knows how to generate a ClusterClient.
>
> 3. Platform developers gets ClusterClient as return value of deploy method
> of ClusterDeployer or
> retrieves from an existing public known session Cluster(by
> ClusterRetriever or extend ClusterDeploy to
> another general concept).
>
> 4. JobClient can be used by user program writer or platform developer for
> manage job in different condition.
>
> There are many other refactor we can do to respect this architecture but
> let's re-emphasize the key difference
>
> ** job cluster doesn't start with a job graph anymore but start with a
> user program and it runs the program
> on the same place as the cluster runs on. So that for the program, it is
> nothing different to a session cluster.
> It just an existing cluster. **
>
> Best,
> tison.
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

To be honest I formerly want to firstly start a thread discuss about
what per-job mode means because things gets quite different whether
or not per-job mode contains exactly one JobGraph or allow to have
multiple part. Plus the complexity that whether or not we support
post-execution logic it becomes more unclear what per-job
looks like in user perspective.

But the original purpose is towards a concrete PerJobExecutor and I
want to save bandwidth by reduce concurrent coupled threads a bit.


Zili Chen  于2019年10月2日周三 下午5:33写道：

> Hi Till,
>
> The purpose to post thoughts above here is because FLIP-73 is unclear on
> how to
> achieve PerJobExecutor. In order to discuss this topic it is necessary to
> clarify how
> per-job mode runs regardless what it is now.
>
> With PerJobExecutor called in Environment I don't think we still keep
> current logic. If
> we keep current logic, it looks like
>
> 1. env.execute calls executor.execute
> 2. executor get current job graph, deploy a job cluster
> 3. for the rest part, shall we deploy a new job cluster? reuse the
> previous job cluster?
> or as current logic, we abort on the first submission?
>
> These question should be answered to clarify what PerJobExecutor is and
> how it works.
>
> Best,
> tison
>
>
> Till Rohrmann  于2019年10月2日周三 下午5:19写道：
>
>> I'm not sure whether removing the current per-job mode semantics all
>> together is a good idea. It has some nice properties, for example the
>> JobGraph stays constant. With your proposal which I would coin the
>> driver mode, the JobGraph would be regenerated in case of a failover.
>> Depending on the user code logic, this could generate a different JobGraph.
>>
>> Aren't we unnecessarily widening the scope of this FLIP here? Wouldn't it
>> be possible to introduce the Executors without changing Flink's deployment
>> options in the first step? I don't fully understand where this
>> need/requirement comes from.
>>
>> Cheers,
>> Till
>>
>> On Wed, Oct 2, 2019 at 10:58 AM Zili Chen  wrote:
>>
>>> Thanks for your thoughts Kostas!
>>>
>>> I agree Executor to be a concept on clients now. And sincerely second
>>> the description
>>>
>>> Now the Executor simply uses a client, e.g. a ClusterClient, to submit
>>> the job (JobGraph) that it will create from the user program.
>>> In that sense, the Executor is one level of abstraction above the
>>> clients, as it adds more functionality and it uses the one offered by
>>> the client.
>>>
>>> In fact, let's think of the statement an Executor simply uses a client
>>> to submit the job.
>>> I'd like to give a description of how job submission works in per-job
>>> mode and it will
>>> follow a similar view now which
>>>
>>> (1) achieve run client on cluster side @Stephan Ewen 
>>> (2) support multi-parts per-job program so that we don't hack to
>>> fallback to session cluster
>>> in this case @Till Rohrmann 
>>>
>>> Let's start with an example we submit a user program via CLI in per-job
>>> mode.
>>>
>>> 1) CLI generates configuration for getting all information about
>>> deployment.
>>> 2) CLI deploys a job cluster *with user jars* and specially mark the jar
>>> contains user program
>>> 3) JobClusterEntrypoint takes care of the bootstrap of flink cluster and
>>> executes user program,
>>> respects all configuration passed from client
>>> 4) user program now runs on cluster side, it starts executing main
>>> method, get a environment with
>>> information of the associated job cluster. since the cluster has already
>>> started, it can submit the
>>> job to that cluster as in session cluster.
>>> 5) job cluster shutdown on user program exits *and* Dispatcher doesn't
>>> maintain any jobs.
>>>
>>> Since we actually runs client on cluster side we can execute multi-parts
>>> program because we submit
>>> to local cluster one by one. And because we change the process from
>>>
>>> - start a per job cluster with job graph
>>>
>>> to
>>>
>>> + start a per job cluster with user program
>>>
>>> we runs client on cluster side, it avoids that we "extract" job graph
>>> from user program which limits
>>> on multi-parts program and doesn't respect user logic outside of Flink
>>> related code.
>>>
>>> Take session scenario into consideration, overall we now

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

Hi Till,

The purpose to post thoughts above here is because FLIP-73 is unclear on
how to
achieve PerJobExecutor. In order to discuss this topic it is necessary to
clarify how
per-job mode runs regardless what it is now.

With PerJobExecutor called in Environment I don't think we still keep
current logic. If
we keep current logic, it looks like

1. env.execute calls executor.execute
2. executor get current job graph, deploy a job cluster
3. for the rest part, shall we deploy a new job cluster? reuse the previous
job cluster?
or as current logic, we abort on the first submission?

These question should be answered to clarify what PerJobExecutor is and how
it works.

Best,
tison


Till Rohrmann  于2019年10月2日周三 下午5:19写道：

> I'm not sure whether removing the current per-job mode semantics all
> together is a good idea. It has some nice properties, for example the
> JobGraph stays constant. With your proposal which I would coin the
> driver mode, the JobGraph would be regenerated in case of a failover.
> Depending on the user code logic, this could generate a different JobGraph.
>
> Aren't we unnecessarily widening the scope of this FLIP here? Wouldn't it
> be possible to introduce the Executors without changing Flink's deployment
> options in the first step? I don't fully understand where this
> need/requirement comes from.
>
> Cheers,
> Till
>
> On Wed, Oct 2, 2019 at 10:58 AM Zili Chen  wrote:
>
>> Thanks for your thoughts Kostas!
>>
>> I agree Executor to be a concept on clients now. And sincerely second the
>> description
>>
>> Now the Executor simply uses a client, e.g. a ClusterClient, to submit
>> the job (JobGraph) that it will create from the user program.
>> In that sense, the Executor is one level of abstraction above the
>> clients, as it adds more functionality and it uses the one offered by
>> the client.
>>
>> In fact, let's think of the statement an Executor simply uses a client to
>> submit the job.
>> I'd like to give a description of how job submission works in per-job
>> mode and it will
>> follow a similar view now which
>>
>> (1) achieve run client on cluster side @Stephan Ewen 
>> (2) support multi-parts per-job program so that we don't hack to fallback
>> to session cluster
>> in this case @Till Rohrmann 
>>
>> Let's start with an example we submit a user program via CLI in per-job
>> mode.
>>
>> 1) CLI generates configuration for getting all information about
>> deployment.
>> 2) CLI deploys a job cluster *with user jars* and specially mark the jar
>> contains user program
>> 3) JobClusterEntrypoint takes care of the bootstrap of flink cluster and
>> executes user program,
>> respects all configuration passed from client
>> 4) user program now runs on cluster side, it starts executing main
>> method, get a environment with
>> information of the associated job cluster. since the cluster has already
>> started, it can submit the
>> job to that cluster as in session cluster.
>> 5) job cluster shutdown on user program exits *and* Dispatcher doesn't
>> maintain any jobs.
>>
>> Since we actually runs client on cluster side we can execute multi-parts
>> program because we submit
>> to local cluster one by one. And because we change the process from
>>
>> - start a per job cluster with job graph
>>
>> to
>>
>> + start a per job cluster with user program
>>
>> we runs client on cluster side, it avoids that we "extract" job graph
>> from user program which limits
>> on multi-parts program and doesn't respect user logic outside of Flink
>> related code.
>>
>> Take session scenario into consideration, overall we now have
>>
>> 1. ClusterDeployer and its factory which are SPI for platform developers
>> so that they can deploy a
>> job cluster with user program or session cluster.
>>
>> 2. Environment and Executor is unified. Environment helps describe user
>> program logic and internally
>> compile the job as well as submit job with Executor. Executor always make
>> use of a ClusterClient
>> to submit the job. Specifically, in per-job mode, Environment reads
>> configuration refined by job cluster
>> so that it knows how to generate a ClusterClient.
>>
>> 3. Platform developers gets ClusterClient as return value of deploy
>> method of ClusterDeployer or
>> retrieves from an existing public known session Cluster(by
>> ClusterRetriever or extend ClusterDeploy to
>> another general concept).
>>
>> 4. JobClient can be used by user program writer or platform developer for
>> manage job in different condition.
>>
>> There are many other refactor we can do to respect this architecture but
>> let's re-emphasize the key difference
>>
>> ** job cluster doesn't start with a job graph anymore but start with a
>> user program and it runs the program
>> on the same place as the cluster runs on. So that for the program, it is
>> nothing different to a session cluster.
>> It just an existing cluster. **
>>
>> Best,
>> tison.
>>
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

Thanks for your clarification Till.

I agree with the current semantics of the per-job mode, one should deploy a
new cluster for each part of the job. Apart from the performance concern
it also means that PerJobExecutor knows how to deploy a cluster actually,
which is different from the description that Executor submit a job.

Anyway it sounds workable and narrow the changes.

Re: [CODE STYLE] Parameters of method are always final

2019-10-02 Thread Zili Chen

Yes exactly.


Piotr Nowojski  于2019年10月2日周三 下午7:03写道：

> Hi Tison,
>
> To clarify  your proposal. You are proposing to actually drop the
> `final` keyword from the parameters and we should implicilty assume that
> it’s always there (in other words, we shouldn’t be modifying the
> parameters). Did I understand this correctly?
>
> Piotrek
>
> > On 1 Oct 2019, at 21:44, Zili Chen  wrote:
> >
> > Hi devs,
> >
> > Coming from this discussion[1] I'd like to propose that in Flink codebase
> > we suggest a code style
> > that parameters of method are always final. Almost everywhere parameters
> of
> > method are final
> > already and if we have such consensus we can prevent redundant final
> > modifier in method
> > declaration so that we survive from those noise.
> >
> > Here are some cases that might require to modify a parameter.
> >
> > 1. to set default; especially if (param == null) { param = ... }
> > 2. to refine parameter; it is in pattern if ( ... ) { param =
> > refine(param); }
> >
> > Either of the cases we can move the refine/set default logics to the
> caller
> > or select another
> > name for the refined value case by case.
> >
> > Looking forward to your feedbacks :-)
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/apache/flink/pull/9788#discussion_r329314681
>
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

Thanks for your thoughts Aljoscha.

Another question since FLIP-73 might contains refactors on Environemnt:
shall we support
something like PreviewPlanEnvironment? If so, how? From a user perspective
preview plan
is useful, by give visual view, to modify topos and configure without
submit it.

Best,
tison.


Aljoscha Krettek  于2019年10月2日周三 下午10:10写道：

> I agree with Till that we should not change the semantics of per-job mode.
> In my opinion per-job mode means that the cluster (JobManager) is brought
> up with one job and it only executes that one job. There should be no open
> ports/anything that would allow submitting further jobs. This is very
> important for deployments in docker/Kubernetes or other environments were
> you bring up jobs without necessarily having the notion of a Flink cluster.
>
> What this means for a user program that has multiple execute() calls is
> that you will get a fresh cluster for each execute call. This also means,
> that further execute() calls will only happen if the “client” is still
> alive, because it is the one driving execution. Currently, this only works
> if you start the job in “attached” mode. If you start in “detached” mode
> only the first execute() will happen and the rest will be ignored.
>
> This brings us to the tricky question about what to do about “detached”
> and “attached”. In the long run, I would like to get rid of the distinction
> and leave it up to the user program, by either blocking or not on the
> Future (or JobClient or whatnot) that job submission returns. This,
> however, means that users cannot simply request “detached” execution when
> using bin/flink, the user program has to “play along”. On the other hand,
> “detached” mode is quite strange for the user program. The execute() call
> either returns with a proper job result after the job ran (in “attached”
> mode) or with a dummy result (in “detached” mode) right after submission. I
> think this can even lead to weird cases where multiple "execute()” run in
> parallel. For per-job detached mode we also “throw” out of the first
> execute so the rest (including result processing logic) is ignored.
>
> For this here FLIP-73 we can (and should) ignore these problems, because
> FLIP-73 only moves the existing submission logic behind a reusable
> abstraction and makes it usable via API. We should closely follow up on the
> above points though because I think they are also important.
>
> Best,
> Aljoscha
>
> > On 2. Oct 2019, at 12:08, Zili Chen  wrote:
> >
> > Thanks for your clarification Till.
> >
> > I agree with the current semantics of the per-job mode, one should
> deploy a
> > new cluster for each part of the job. Apart from the performance concern
> > it also means that PerJobExecutor knows how to deploy a cluster actually,
> > which is different from the description that Executor submit a job.
> >
> > Anyway it sounds workable and narrow the changes.
>
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

Hi Kostas,

It seems does no harm we have a configuration parameter of Executor#execute
since we can merge this one with the one configured on Executor created and
let this one overwhelm that one.

I can see it is useful that conceptually we can create an Executor for a
series jobs
to the same cluster but with different job configuration per pipeline.

Best,
tison.


Kostas Kloudas  于2019年10月3日周四 上午1:37写道：

> Hi again,
>
> I did not include this to my previous email, as this is related to the
> proposal on the FLIP itself.
>
> In the existing proposal, the Executor interface is the following.
>
> public interface Executor {
>
>   JobExecutionResult execute(Pipeline pipeline) throws Exception;
>
> }
>
> This implies that all the necessary information for the execution of a
> Pipeline should be included in the Configuration passed in the
> ExecutorFactory which instantiates the Executor itself. This should
> include, for example, all the parameters currently supplied by the
> ProgramOptions, which are conceptually not executor parameters but
> rather parameters for the execution of the specific pipeline. To this
> end, I would like to propose a change in the current Executor
> interface showcased below:
>
>
> public interface Executor {
>
>   JobExecutionResult execute(Pipeline pipeline, Configuration
> executionOptions) throws Exception;
>
> }
>
> The above will allow to have the Executor specific options passed in
> the configuration given during executor instantiation, while the
> pipeline specific options can be passed in the executionOptions. As a
> positive side-effect, this will make Executors re-usable, i.e.
> instantiate an executor and use it to execute multiple pipelines, if
> in the future we choose to do so.
>
> Let me know what do you think,
> Kostas
>
> On Wed, Oct 2, 2019 at 7:23 PM Kostas Kloudas  wrote:
> >
> > Hi all,
> >
> > I agree with Tison that we should disentangle threads so that people
> > can work independently.
> >
> > For FLIP-73:
> >  - for Preview/OptimizedPlanEnv: I think they are orthogonal to the
> > Executors work, as they are using the exexute() method because this is
> > the only "entry" to the user program. To this regard, I believe we
> > should just see the fact that they have their dedicated environment as
> > an "implementation detail".
> >  - for getting rid of the per-job mode: as a first note, there was
> > already a discussion here:
> >
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > with many people, including myself, expressing their opinion. I am
> > mentioning that to show that this topic already has some history and
> > the discussin does not start from scratch but there are already some
> > contradicting opinions. My opinion is that we should not get rid of
> > the per-job mode but I agree that we should discuss about the
> > semantics in more detail. Although in terms of code it may be tempting
> > to "merge" the two submission modes, one of the main benefits of the
> > per-job mode is isolation, both for resources and security, as the
> > jobGraph to be executed is fixed and the cluster is "locked" just for
> > that specific graph. This would be violated by having a session
> > cluster launched and having all the infrastrucutre (ports and
> > endpoints) set for submittting to that cluster any job.
> > - for getting rid of the "detached" mode: I agree with getting rid of
> > it but this implies some potential user-facing changes that should be
> > discussed.
> >
> > Given the above, I think that:
> > 1) in the context of FLIP-73 we should not change any semantics but
> > simply push the existing submission logic behind a reusable
> > abstraction and make it usable via public APIs, as Aljoscha said.
> > 2) as Till said, changing the semantics is beyond the scope of this
> > FLIP and as Tison mentioned we should work towards decoupling
> > discussions rather than the opposite. So let's discuss about the
> > future of the per-job and detached modes in a separate thread. This
> > will also allow to give the proper visibility to such an important
> > topic.
> >
> > Cheers,
> > Kostas
> >
> > On Wed, Oct 2, 2019 at 4:40 PM Zili Chen  wrote:
> > >
> > > Thanks for your thoughts Aljoscha.
> > >
> > > Another question since FLIP-73 might contains refactors on Environemnt:
> > > shall we support
> > > something like PreviewPlanEnvironment? If so, how? From a user
> perspective

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

BTW, correct me if I misunderstand, now I learn more about our community
way. Since FLIP-73 aimed at introducing an interface with community
consensus the discussion is more about the interface in order to properly
define a useful and extensible API. The integration story could be a follow
up
since this one does not affect current behavior at all.

Best,
tison.


Zili Chen  于2019年10月3日周四 上午2:02写道：

> Hi Kostas,
>
> It seems does no harm we have a configuration parameter of Executor#execute
> since we can merge this one with the one configured on Executor created and
> let this one overwhelm that one.
>
> I can see it is useful that conceptually we can create an Executor for a
> series jobs
> to the same cluster but with different job configuration per pipeline.
>
> Best,
> tison.
>
>
> Kostas Kloudas  于2019年10月3日周四 上午1:37写道：
>
>> Hi again,
>>
>> I did not include this to my previous email, as this is related to the
>> proposal on the FLIP itself.
>>
>> In the existing proposal, the Executor interface is the following.
>>
>> public interface Executor {
>>
>>   JobExecutionResult execute(Pipeline pipeline) throws Exception;
>>
>> }
>>
>> This implies that all the necessary information for the execution of a
>> Pipeline should be included in the Configuration passed in the
>> ExecutorFactory which instantiates the Executor itself. This should
>> include, for example, all the parameters currently supplied by the
>> ProgramOptions, which are conceptually not executor parameters but
>> rather parameters for the execution of the specific pipeline. To this
>> end, I would like to propose a change in the current Executor
>> interface showcased below:
>>
>>
>> public interface Executor {
>>
>>   JobExecutionResult execute(Pipeline pipeline, Configuration
>> executionOptions) throws Exception;
>>
>> }
>>
>> The above will allow to have the Executor specific options passed in
>> the configuration given during executor instantiation, while the
>> pipeline specific options can be passed in the executionOptions. As a
>> positive side-effect, this will make Executors re-usable, i.e.
>> instantiate an executor and use it to execute multiple pipelines, if
>> in the future we choose to do so.
>>
>> Let me know what do you think,
>> Kostas
>>
>> On Wed, Oct 2, 2019 at 7:23 PM Kostas Kloudas 
>> wrote:
>> >
>> > Hi all,
>> >
>> > I agree with Tison that we should disentangle threads so that people
>> > can work independently.
>> >
>> > For FLIP-73:
>> >  - for Preview/OptimizedPlanEnv: I think they are orthogonal to the
>> > Executors work, as they are using the exexute() method because this is
>> > the only "entry" to the user program. To this regard, I believe we
>> > should just see the fact that they have their dedicated environment as
>> > an "implementation detail".
>> >  - for getting rid of the per-job mode: as a first note, there was
>> > already a discussion here:
>> >
>> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
>> > with many people, including myself, expressing their opinion. I am
>> > mentioning that to show that this topic already has some history and
>> > the discussin does not start from scratch but there are already some
>> > contradicting opinions. My opinion is that we should not get rid of
>> > the per-job mode but I agree that we should discuss about the
>> > semantics in more detail. Although in terms of code it may be tempting
>> > to "merge" the two submission modes, one of the main benefits of the
>> > per-job mode is isolation, both for resources and security, as the
>> > jobGraph to be executed is fixed and the cluster is "locked" just for
>> > that specific graph. This would be violated by having a session
>> > cluster launched and having all the infrastrucutre (ports and
>> > endpoints) set for submittting to that cluster any job.
>> > - for getting rid of the "detached" mode: I agree with getting rid of
>> > it but this implies some potential user-facing changes that should be
>> > discussed.
>> >
>> > Given the above, I think that:
>> > 1) in the context of FLIP-73 we should not change any semantics but
>> > simply push the existing submission logic behind a reusable
>> > abstraction and make it usable via public APIs, as Aljoscha said.
>> > 2)

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-02 Thread Zili Chen

 - for Preview/OptimizedPlanEnv: I think they are orthogonal to the
Executors work, as they are using the exexute() method because this is
the only "entry" to the user program. To this regard, I believe we
should just see the fact that they have their dedicated environment as
an "implementation detail".

The proposal says

In this document, we propose to abstract away from the Environments the job
submission logic and put it in a newly introduced Executor. This will
allow *each
API to have a single Environment* which, based on the provided
configuration, will decide which executor to use, *e.g.* Yarn, Local, etc.
In addition, it will allow different APIs and downstream projects to re-use
the provided executors, thus limiting the amount of code duplication and
the amount of code that has to be written.

note that This will allow *each API to have a single Environment*  it
seems a bit diverge with you statement above. Or we say a single Environment
as a possible advantage after the introduction of Executor so that we
exclude it
from this pass.

Best,
tison.


Zili Chen  于2019年10月3日周四 上午2:07写道：

> BTW, correct me if I misunderstand, now I learn more about our community
> way. Since FLIP-73 aimed at introducing an interface with community
> consensus the discussion is more about the interface in order to properly
> define a useful and extensible API. The integration story could be a
> follow up
> since this one does not affect current behavior at all.
>
> Best,
> tison.
>
>
> Zili Chen  于2019年10月3日周四 上午2:02写道：
>
>> Hi Kostas,
>>
>> It seems does no harm we have a configuration parameter of
>> Executor#execute
>> since we can merge this one with the one configured on Executor created
>> and
>> let this one overwhelm that one.
>>
>> I can see it is useful that conceptually we can create an Executor for a
>> series jobs
>> to the same cluster but with different job configuration per pipeline.
>>
>> Best,
>> tison.
>>
>>
>> Kostas Kloudas  于2019年10月3日周四 上午1:37写道：
>>
>>> Hi again,
>>>
>>> I did not include this to my previous email, as this is related to the
>>> proposal on the FLIP itself.
>>>
>>> In the existing proposal, the Executor interface is the following.
>>>
>>> public interface Executor {
>>>
>>>   JobExecutionResult execute(Pipeline pipeline) throws Exception;
>>>
>>> }
>>>
>>> This implies that all the necessary information for the execution of a
>>> Pipeline should be included in the Configuration passed in the
>>> ExecutorFactory which instantiates the Executor itself. This should
>>> include, for example, all the parameters currently supplied by the
>>> ProgramOptions, which are conceptually not executor parameters but
>>> rather parameters for the execution of the specific pipeline. To this
>>> end, I would like to propose a change in the current Executor
>>> interface showcased below:
>>>
>>>
>>> public interface Executor {
>>>
>>>   JobExecutionResult execute(Pipeline pipeline, Configuration
>>> executionOptions) throws Exception;
>>>
>>> }
>>>
>>> The above will allow to have the Executor specific options passed in
>>> the configuration given during executor instantiation, while the
>>> pipeline specific options can be passed in the executionOptions. As a
>>> positive side-effect, this will make Executors re-usable, i.e.
>>> instantiate an executor and use it to execute multiple pipelines, if
>>> in the future we choose to do so.
>>>
>>> Let me know what do you think,
>>> Kostas
>>>
>>> On Wed, Oct 2, 2019 at 7:23 PM Kostas Kloudas 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I agree with Tison that we should disentangle threads so that people
>>> > can work independently.
>>> >
>>> > For FLIP-73:
>>> >  - for Preview/OptimizedPlanEnv: I think they are orthogonal to the
>>> > Executors work, as they are using the exexute() method because this is
>>> > the only "entry" to the user program. To this regard, I believe we
>>> > should just see the fact that they have their dedicated environment as
>>> > an "implementation detail".
>>> >  - for getting rid of the per-job mode: as a first note, there was
>>> > already a discussion here:
>>> >
>>> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.or

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-10-02 Thread Zili Chen

Hi all,

Narrow the scope to FLIP-74 we aimed at introduce a useful and extensible
user-facing public interface JobClient. Let me reemphasize two major works
under this thread.

1. standard interface

As in FLIP-74 we introduce an interface JobClient with its methods, we'd
like to
make it a standard (non-final since we can always extends on demand)
interface.

On this branch I'd like to, with respect to Konstantin's suggestion, 1)
exclude deprecated
cancelWithSavepoint from the proposal 2) rename stopWithSavepoint to stop
to keep
consistency with our CLI command. If there is no more concern on these
topics I will
update proposal tomorrow.

2. client interfaces are asynchronous

If the asynchronous JobClient interfaces approved, a necessary proposed
changed is
corresponding update ClusterClient interfaces. Still ClusterClient is an
internal concept
after this FLIP but it might have some impact so I think it's better to
reach a community
consensus as prerequisite. Note that with all client methods are
asynchronous, no matter
whether or not we remove client side detach option it is no power.

Let me know your ideas on these topic and keep moving forward :-)

Best,
tison.


Zili Chen  于2019年10月2日周三 下午4:10写道：

> Hi Konstantin,
>
> * should we add "cancelWithSavepeoint" to a new public API, when we have
> deprecated the corresponding REST API/CLI methods? In my understanding
> there is no reason to use it anymore.
>
> Good point. We can exclude "cancelWithSavepoint" from public API at least
> for now,
> since it is deprecated already. Let's see if there is other concerns.
>
> * should we call "stopWithSavepoint" simply "stop" as "stop" always
> performs a savepoint?
>
> Well for naming issue I'm fine with that if it is a consensus of our
> community. I can see
> there is a "stop" CLI command which means "stop with savepoint".
>
> Best,
> tison.
>
>
> Konstantin Knauf  于2019年9月30日周一 下午12:16写道：
>
>> Hi Thomas,
>>
>> maybe there is a misunderstanding. There is no plan to deprecate anything
>> in the REST API in the process of introducing the JobClient API, and it
>> shouldn't.
>>
>> Since "cancel with savepoint" was already deprecated in the REST API and
>> CLI, I am just raising the question whether to add it to the JobClient API
>> in the first place.
>>
>> Best,
>>
>> Konstantin
>>
>>
>>
>> On Mon, Sep 30, 2019 at 1:16 AM Thomas Weise  wrote:
>>
>> > I did not realize there was a plan to deprecate anything in the REST
>> API?
>> >
>> > The REST API is super important for tooling written in non JVM
>> languages,
>> > that does not include a Flink client (like FlinkK8sOperator). The REST
>> API
>> > should continue to support all job management operations, including job
>> > submission.
>> >
>> > Thomas
>> >
>> >
>> > On Sun, Sep 29, 2019 at 1:37 PM Konstantin Knauf <
>> konstan...@ververica.com
>> > >
>> > wrote:
>> >
>> > > Hi Zili,
>> > >
>> > > thanks for working on this topic. Just read through the FLIP and I
>> have
>> > two
>> > > questions:
>> > >
>> > > * should we add "cancelWithSavepeoint" to a new public API, when we
>> have
>> > > deprecated the corresponding REST API/CLI methods? In my understanding
>> > > there is no reason to use it anymore.
>> > > * should we call "stopWithSavepoint" simply "stop" as "stop" always
>> > > performs a savepoint?
>> > >
>> > > Best,
>> > >
>> > > Konstantin
>> > >
>> > >
>> > >
>> > > On Fri, Sep 27, 2019 at 10:48 AM Aljoscha Krettek <
>> aljos...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi Flavio,
>> > > >
>> > > > I agree that this would be good to have. But I also think that this
>> is
>> > > > outside the scope of FLIP-74, I think it is an orthogonal feature.
>> > > >
>> > > > Best,
>> > > > Aljoscha
>> > > >
>> > > > > On 27. Sep 2019, at 10:31, Flavio Pompermaier <
>> pomperma...@okkam.it>
>> > > > wrote:
>> > > > >
>> > > > > Hi all,
>> > > > > just a remark about the Flink REST APIs (and its client as well):
>> > > almost
>> > > > > all the times we need a way t

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-03 Thread Zili Chen

Thanks for your explanation Kostas to make it clear subtasks under FLIP-73.

As you described, changes of Environment are included in this FLIP. For
"each
API to have a single Environment", it could be helpful to describe which
APIs we'd
like to have after FLIP-73. And if we keep multiple Environments, shall we
keep the
way inject context environment for each API?


Kostas Kloudas  于2019年10月3日周四 下午1:44写道：

> Hi Tison,
>
> The changes that this FLIP propose are:
> - the introduction of the Executor interface
> - the fact that everything in the current state of job submission in
> Flink can be defined through configuration parameters
> - implementation of Executors that do not change any of the semantics
> of the currently offered "modes" of job submission
>
> In this, and in the FLIP itself where the
> ExecutionEnvironment.execute() method is described, there are details
> about parts of the
> integration with the existing Flink code-base.
>
> So I am not sure what do you mean by making the "integration a
> follow-up discussion".
>
> Cheers,
> Kostas
>
> On Wed, Oct 2, 2019 at 8:10 PM Zili Chen  wrote:
> >
> >  - for Preview/OptimizedPlanEnv: I think they are orthogonal to the
> > Executors work, as they are using the exexute() method because this is
> > the only "entry" to the user program. To this regard, I believe we
> > should just see the fact that they have their dedicated environment as
> > an "implementation detail".
> >
> > The proposal says
> >
> > In this document, we propose to abstract away from the Environments the
> job
> > submission logic and put it in a newly introduced Executor. This will
> > allow *each
> > API to have a single Environment* which, based on the provided
> > configuration, will decide which executor to use, *e.g.* Yarn, Local,
> etc.
> > In addition, it will allow different APIs and downstream projects to
> re-use
> > the provided executors, thus limiting the amount of code duplication and
> > the amount of code that has to be written.
> >
> > note that This will allow *each API to have a single Environment*  it
> > seems a bit diverge with you statement above. Or we say a single
> Environment
> > as a possible advantage after the introduction of Executor so that we
> > exclude it
> > from this pass.
> >
> > Best,
> > tison.
> >
> >
> > Zili Chen  于2019年10月3日周四 上午2:07写道：
> >
> > > BTW, correct me if I misunderstand, now I learn more about our
> community
> > > way. Since FLIP-73 aimed at introducing an interface with community
> > > consensus the discussion is more about the interface in order to
> properly
> > > define a useful and extensible API. The integration story could be a
> > > follow up
> > > since this one does not affect current behavior at all.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Zili Chen  于2019年10月3日周四 上午2:02写道：
> > >
> > >> Hi Kostas,
> > >>
> > >> It seems does no harm we have a configuration parameter of
> > >> Executor#execute
> > >> since we can merge this one with the one configured on Executor
> created
> > >> and
> > >> let this one overwhelm that one.
> > >>
> > >> I can see it is useful that conceptually we can create an Executor
> for a
> > >> series jobs
> > >> to the same cluster but with different job configuration per pipeline.
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >>
> > >> Kostas Kloudas  于2019年10月3日周四 上午1:37写道：
> > >>
> > >>> Hi again,
> > >>>
> > >>> I did not include this to my previous email, as this is related to
> the
> > >>> proposal on the FLIP itself.
> > >>>
> > >>> In the existing proposal, the Executor interface is the following.
> > >>>
> > >>> public interface Executor {
> > >>>
> > >>>   JobExecutionResult execute(Pipeline pipeline) throws Exception;
> > >>>
> > >>> }
> > >>>
> > >>> This implies that all the necessary information for the execution of
> a
> > >>> Pipeline should be included in the Configuration passed in the
> > >>> ExecutorFactory which instantiates the Executor itself. This should
> > >>> include, for example, all the parameters currently supplied by the
> > >>>

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-10-03 Thread Zili Chen

s/on the context if/on the context of/
s/dummy/dumb/

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-10-03 Thread Zili Chen

Hi Kostas,

By mention "integration to be a follow up discussion" in FLIP-73 discussion
I think I'm more on the context if FLIP-74 because without including the
retrieval of JobClient in FLIP-74 we actually introduce a dummy public
interface.

1. return JobClient from Executor#execute actually has a dependency of
FLIP-73.
2. retrieve JobClient of an existing job directly lead to the discussion of
the retrieval
chains which I started as [DISCUSS] Expose multiple level clients.

Best,
tison.


Zili Chen  于2019年10月3日周四 上午2:35写道：

> Hi all,
>
> Narrow the scope to FLIP-74 we aimed at introduce a useful and extensible
> user-facing public interface JobClient. Let me reemphasize two major works
> under this thread.
>
> 1. standard interface
>
> As in FLIP-74 we introduce an interface JobClient with its methods, we'd
> like to
> make it a standard (non-final since we can always extends on demand)
> interface.
>
> On this branch I'd like to, with respect to Konstantin's suggestion, 1)
> exclude deprecated
> cancelWithSavepoint from the proposal 2) rename stopWithSavepoint to stop
> to keep
> consistency with our CLI command. If there is no more concern on these
> topics I will
> update proposal tomorrow.
>
> 2. client interfaces are asynchronous
>
> If the asynchronous JobClient interfaces approved, a necessary proposed
> changed is
> corresponding update ClusterClient interfaces. Still ClusterClient is an
> internal concept
> after this FLIP but it might have some impact so I think it's better to
> reach a community
> consensus as prerequisite. Note that with all client methods are
> asynchronous, no matter
> whether or not we remove client side detach option it is no power.
>
> Let me know your ideas on these topic and keep moving forward :-)
>
> Best,
> tison.
>
>
> Zili Chen  于2019年10月2日周三 下午4:10写道：
>
>> Hi Konstantin,
>>
>> * should we add "cancelWithSavepeoint" to a new public API, when we have
>> deprecated the corresponding REST API/CLI methods? In my understanding
>> there is no reason to use it anymore.
>>
>> Good point. We can exclude "cancelWithSavepoint" from public API at least
>> for now,
>> since it is deprecated already. Let's see if there is other concerns.
>>
>> * should we call "stopWithSavepoint" simply "stop" as "stop" always
>> performs a savepoint?
>>
>> Well for naming issue I'm fine with that if it is a consensus of our
>> community. I can see
>> there is a "stop" CLI command which means "stop with savepoint".
>>
>> Best,
>> tison.
>>
>>
>> Konstantin Knauf  于2019年9月30日周一 下午12:16写道：
>>
>>> Hi Thomas,
>>>
>>> maybe there is a misunderstanding. There is no plan to deprecate anything
>>> in the REST API in the process of introducing the JobClient API, and it
>>> shouldn't.
>>>
>>> Since "cancel with savepoint" was already deprecated in the REST API and
>>> CLI, I am just raising the question whether to add it to the JobClient
>>> API
>>> in the first place.
>>>
>>> Best,
>>>
>>> Konstantin
>>>
>>>
>>>
>>> On Mon, Sep 30, 2019 at 1:16 AM Thomas Weise  wrote:
>>>
>>> > I did not realize there was a plan to deprecate anything in the REST
>>> API?
>>> >
>>> > The REST API is super important for tooling written in non JVM
>>> languages,
>>> > that does not include a Flink client (like FlinkK8sOperator). The REST
>>> API
>>> > should continue to support all job management operations, including job
>>> > submission.
>>> >
>>> > Thomas
>>> >
>>> >
>>> > On Sun, Sep 29, 2019 at 1:37 PM Konstantin Knauf <
>>> konstan...@ververica.com
>>> > >
>>> > wrote:
>>> >
>>> > > Hi Zili,
>>> > >
>>> > > thanks for working on this topic. Just read through the FLIP and I
>>> have
>>> > two
>>> > > questions:
>>> > >
>>> > > * should we add "cancelWithSavepeoint" to a new public API, when we
>>> have
>>> > > deprecated the corresponding REST API/CLI methods? In my
>>> understanding
>>> > > there is no reason to use it anymore.
>>> > > * should we call "stopWithSavepoint" simply "stop" as "stop" always
>>> > > performs a savepoint?

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-10-03 Thread Zili Chen

Hi Kostas,

That is exactly how things go on in my mind and the reason I
say "integration to be a follow up discussion" :-)

Best,
tison.


Kostas Kloudas  于2019年10月3日周四 下午6:23写道：

> Hi Tison,
>
> I see. Then I would say that as a first step, and to see if people are
> happy with the result,
> integration with the production code can be through a new method
> executeAsync() in the Executor
> that we discussed earlier.
>
> This method could potentially be exposed to ExecutionEnvironment as a
> new flavour of
> the execute that returns a JobClient.
>
> In the future we could consider exposing it through a
> ClusterClientFactory (or sth similar).
>
> What do you think
> Kostas
>
> On Thu, Oct 3, 2019 at 10:12 AM Zili Chen  wrote:
> >
> > s/on the context if/on the context of/
> > s/dummy/dumb/
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-04 Thread Zili Chen

Hi Aljoscha,

After clearly narrow the scope of this FLIP it looks good to me the
interface
Executor and its discovery so that I'm glad to see the vote thread.

As you said, we should still discuss on implementation details but I don't
think
it should be a blocker of the vote thread because a vote means we generally
agree on the motivation and overall design.

As for Executor.execute() to be async, it is much better than we keep the
difference between sync/async in this level. But I'd like to note that it
only
works internally for now because user-facing interface is still env.execute
which block and return a JobExecutionResult. I'm afraid that there are
several
people depends on the result for doing post execution process, although it
doesn't
work on current per-job mode.

Best,
tison.


Aljoscha Krettek  于2019年10月4日周五 下午4:40写道：

> Do you all think we could agree on the basic executor primitives and start
> voting on this FLIP? There are still some implementation details but I
> think we can discuss/tackle them when we get to them and the various people
> implementing this should be in close collaboration.
>
> Best,
> Aljoscha
>
> > On 4. Oct 2019, at 10:15, Aljoscha Krettek  wrote:
> >
> > Hi,
> >
> > I think the end goal is to have only one environment per API, but I
> think we won’t be able to achieve that in the short-term because of
> backwards compatibility. This is most notable with the context environment,
> preview environments etc.
> >
> > To keep this FLIP very slim we can make this only about the executors
> and executor discovery. Anything else like job submission semantics,
> detached mode, … can be tackled after this. If we don’t focus I’m afraid
> this will drag on for quite a while.
> >
> > One thing I would like to propose to make this easier is to change
> Executor.execute() to return a CompletableFuture and to completely remove
> the “detached” logic from ClusterClient. That way, the new components make
> no distinction between “detached” and “attached” but we can still do it in
> the CLI (via the ContextEnvironment) to support the existing “detached”
> behaviour of the CLI that users expect. What do you think about this?
> >
> > Best,
> > Aljoscha
> >
> >> On 3. Oct 2019, at 10:03, Zili Chen  wrote:
> >>
> >> Thanks for your explanation Kostas to make it clear subtasks under
> FLIP-73.
> >>
> >> As you described, changes of Environment are included in this FLIP. For
> >> "each
> >> API to have a single Environment", it could be helpful to describe which
> >> APIs we'd
> >> like to have after FLIP-73. And if we keep multiple Environments, shall
> we
> >> keep the
> >> way inject context environment for each API?
> >>
> >>
> >> Kostas Kloudas  于2019年10月3日周四 下午1:44写道：
> >>
> >>> Hi Tison,
> >>>
> >>> The changes that this FLIP propose are:
> >>> - the introduction of the Executor interface
> >>> - the fact that everything in the current state of job submission in
> >>> Flink can be defined through configuration parameters
> >>> - implementation of Executors that do not change any of the semantics
> >>> of the currently offered "modes" of job submission
> >>>
> >>> In this, and in the FLIP itself where the
> >>> ExecutionEnvironment.execute() method is described, there are details
> >>> about parts of the
> >>> integration with the existing Flink code-base.
> >>>
> >>> So I am not sure what do you mean by making the "integration a
> >>> follow-up discussion".
> >>>
> >>> Cheers,
> >>> Kostas
> >>>
> >>> On Wed, Oct 2, 2019 at 8:10 PM Zili Chen  wrote:
> >>>>
> >>>> - for Preview/OptimizedPlanEnv: I think they are orthogonal to the
> >>>> Executors work, as they are using the exexute() method because this is
> >>>> the only "entry" to the user program. To this regard, I believe we
> >>>> should just see the fact that they have their dedicated environment as
> >>>> an "implementation detail".
> >>>>
> >>>> The proposal says
> >>>>
> >>>> In this document, we propose to abstract away from the Environments
> the
> >>> job
> >>>> submission logic and put it in a newly introduced Executor. This will
> >>>> allow *each
> >>>> API to have a single Environment* which, based on the provided
&g

Re: [CODE STYLE] Parameters of method are always final

2019-10-04 Thread Zili Chen

Hi Aljoscha,

I totally agree with you on field topic. Of course it makes significant
difference whether
or not a field is final and IDE/compiler can help on checking.

Here are several thoughts about final modifier on parameters and why I
propose this one:

1. parameters should be always final

As described above, there is no reason a parameter to be non-final. So
different with field,
a field can be final or non-final based on whether or not it is immutable.
Thus with such a
code style guide in our community, we can work towards a codebase every
parameter is
effectively final.

2. parameter final cannot be inherited

Unfortunately Java doesn't keep final modifier of method parameter when
inheritance. So
even you mark a parameter as final in an interface or super class, you have
to re-mark it
as final in its implementation or subclass. From another perspective, final
modifier of
parameter is a local attribute of parameter so that we can narrow possible
effect during
review.

3. IDE can lint difference between effective final and mutable parameter

It is true that IDE such as Intellij IDEA can lint difference between
effective final and
mutable parameter(with underline). So that with this code style what we
lose is that
we cannot get a compile time error if someone modifies parameter in the
method body.
But as mentioned in 1, by no means we allow a parameter to be modified. If
we agree
on this statement, then we hopefully converge in a codebase that no
parameter is
modified.

For what we gain, I'd like to recur our existing code style of @Nonnull to
be default.
Actually it does help for compiler to report compile time warning if we
possibly pass a
nullable value to an non-null field. We make @Nonnull as default to "reduce
code noise"
so I think we can reuse the statement here also.

Best,
tison.

Aljoscha Krettek  于2019年10月4日周五 下午5:58写道：

> I actually think that doing this the other way round would be correct.
> Removing final everywhere and relying on humans to assume that everything
> is final doesn’t seem maintainable to me. The benefit of having final on
> parameters/fields is that the compiler/IDE actually checks that you don’t
> modify it.
>
> In general, I think that as much as possible should be declared final,
> including fields and parameters.
>
> Best,
> Aljoscha
>
> > On 2. Oct 2019, at 13:31, Piotr Nowojski  wrote:
> >
> > +1 from my side.
> >
> >> On 2 Oct 2019, at 13:07, Zili Chen  wrote:
> >>
> >> Yes exactly.
> >>
> >>
> >> Piotr Nowojski  于2019年10月2日周三 下午7:03写道：
> >>
> >>> Hi Tison,
> >>>
> >>> To clarify  your proposal. You are proposing to actually drop the
> >>> `final` keyword from the parameters and we should implicilty assume
> that
> >>> it’s always there (in other words, we shouldn’t be modifying the
> >>> parameters). Did I understand this correctly?
> >>>
> >>> Piotrek
> >>>
> >>>> On 1 Oct 2019, at 21:44, Zili Chen  wrote:
> >>>>
> >>>> Hi devs,
> >>>>
> >>>> Coming from this discussion[1] I'd like to propose that in Flink
> codebase
> >>>> we suggest a code style
> >>>> that parameters of method are always final. Almost everywhere
> parameters
> >>> of
> >>>> method are final
> >>>> already and if we have such consensus we can prevent redundant final
> >>>> modifier in method
> >>>> declaration so that we survive from those noise.
> >>>>
> >>>> Here are some cases that might require to modify a parameter.
> >>>>
> >>>> 1. to set default; especially if (param == null) { param = ... }
> >>>> 2. to refine parameter; it is in pattern if ( ... ) { param =
> >>>> refine(param); }
> >>>>
> >>>> Either of the cases we can move the refine/set default logics to the
> >>> caller
> >>>> or select another
> >>>> name for the refined value case by case.
> >>>>
> >>>> Looking forward to your feedbacks :-)
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>> [1] https://github.com/apache/flink/pull/9788#discussion_r329314681
> >>>
> >>>
> >
>
>

Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-10-04 Thread Zili Chen

Thanks for your replies.

Since no objection to Konstantin's proposal so far, I'd like to update
the FLIP correspondingly. They are naming issue and exclusion of
deprecated functionality.

I'm hereby infer that our community generally agree on the introduction
of the JobClient and its interfaces proposed in the FLIP. If there are other
concerns, please thrown into this thread. Otherwise I'm going to start a
vote thread later.

Best,
tison.


Kostas Kloudas  于2019年10月4日周五 下午11:33写道：

> I also agree @Zili Chen !
>
> On Fri, Oct 4, 2019 at 10:17 AM Aljoscha Krettek 
> wrote:
> >
> > This makes sense to me, yes!
> >
> > > On 2. Oct 2019, at 20:35, Zili Chen  wrote:
> > >
> > > Hi all,
> > >
> > > Narrow the scope to FLIP-74 we aimed at introduce a useful and
> extensible
> > > user-facing public interface JobClient. Let me reemphasize two major
> works
> > > under this thread.
> > >
> > > 1. standard interface
> > >
> > > As in FLIP-74 we introduce an interface JobClient with its methods,
> we'd
> > > like to
> > > make it a standard (non-final since we can always extends on demand)
> > > interface.
> > >
> > > On this branch I'd like to, with respect to Konstantin's suggestion, 1)
> > > exclude deprecated
> > > cancelWithSavepoint from the proposal 2) rename stopWithSavepoint to
> stop
> > > to keep
> > > consistency with our CLI command. If there is no more concern on these
> > > topics I will
> > > update proposal tomorrow.
> > >
> > > 2. client interfaces are asynchronous
> > >
> > > If the asynchronous JobClient interfaces approved, a necessary proposed
> > > changed is
> > > corresponding update ClusterClient interfaces. Still ClusterClient is
> an
> > > internal concept
> > > after this FLIP but it might have some impact so I think it's better to
> > > reach a community
> > > consensus as prerequisite. Note that with all client methods are
> > > asynchronous, no matter
> > > whether or not we remove client side detach option it is no power.
> > >
> > > Let me know your ideas on these topic and keep moving forward :-)
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Zili Chen  于2019年10月2日周三 下午4:10写道：
> > >
> > >> Hi Konstantin,
> > >>
> > >> * should we add "cancelWithSavepeoint" to a new public API, when we
> have
> > >> deprecated the corresponding REST API/CLI methods? In my understanding
> > >> there is no reason to use it anymore.
> > >>
> > >> Good point. We can exclude "cancelWithSavepoint" from public API at
> least
> > >> for now,
> > >> since it is deprecated already. Let's see if there is other concerns.
> > >>
> > >> * should we call "stopWithSavepoint" simply "stop" as "stop" always
> > >> performs a savepoint?
> > >>
> > >> Well for naming issue I'm fine with that if it is a consensus of our
> > >> community. I can see
> > >> there is a "stop" CLI command which means "stop with savepoint".
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >>
> > >> Konstantin Knauf  于2019年9月30日周一 下午12:16写道：
> > >>
> > >>> Hi Thomas,
> > >>>
> > >>> maybe there is a misunderstanding. There is no plan to deprecate
> anything
> > >>> in the REST API in the process of introducing the JobClient API, and
> it
> > >>> shouldn't.
> > >>>
> > >>> Since "cancel with savepoint" was already deprecated in the REST API
> and
> > >>> CLI, I am just raising the question whether to add it to the
> JobClient API
> > >>> in the first place.
> > >>>
> > >>> Best,
> > >>>
> > >>> Konstantin
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Sep 30, 2019 at 1:16 AM Thomas Weise  wrote:
> > >>>
> > >>>> I did not realize there was a plan to deprecate anything in the REST
> > >>> API?
> > >>>>
> > >>>> The REST API is super important for tooling written in non JVM
> > >>> languages,
> > >>>> that does not include a Flin

Re: [VOTE] FLIP-73: Introducing Executors for job submission

2019-10-04 Thread Zili Chen

Thanks for your works Kostas!

+1 for FLIP-73

Best,
tison


Kostas Kloudas  于2019年10月4日周五 下午11:40写道：

> Hi all,
>
> I would like to start the vote for FLIP-73 [1], which is discussed and
> reached a consensus in the discussion thread[2].
>
> Given that it is Friday, the vote will be open until Oct. 9th (72h
> starting on Monday), unless there is an objection or not enough votes.
>
> Thanks,
> Kostas
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
>
> [2]
> https://lists.apache.org/thread.html/dc3a541709f96906b43df4155373af1cd09e08c3f105b0bd0ba3fca2@%3Cdev.flink.apache.org%3E
>

[VOTE] FLIP-74: Flink JobClient API

2019-10-07 Thread Zili Chen

Hi all,

I would like to start the vote for FLIP-74[1], which is discussed and
reached a consensus in the discussion thread[2].

The vote will be open util Oct. 9th(72h starting on Oct.7th), unless
there is an objection or not  enough votes.

Best,
tison.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
[2]
https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E

Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Zili Chen

Thanks for your thoughts Arvid & Piotr,

I check out the effect of ParameterAssignment[1] and it does
prevent codes from modifying parameter which I argued above
the most value introduced by `final` modifier of parameter.

So firstly, I think it's good to enable ParameterAssignment in our
codebase.

Besides, there is no rule to forbid `final` modifier of parameter.
Instead, there is a rule to enforce `final` modifier[2] but given [1]
enabled it is actually redundant.

The main purpose for, given enforced not modify parameters, reducing
`final` modifiers of parameter is to remove redundant modifier so that we
don't see have declaration like

T fn(
  final ArgumentType1 argument1,
  final ArgumentType2 argument2,
  ...)

because we actually don't mark final everywhere so it might make some
confusions.

Given [1] is enforced these `final` are indeed redundant so that we can
add a convention to reduce `final` modifiers of parameters, which is a net
win.

Best,
tison.

[1] https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment
[2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters



Piotr Nowojski  于2019年10月7日周一 下午3:49写道：

> Hi,
>
> Couple of arguments to counter the proposal of making the `final` keyword
> obligatory. Can we prepare a code style/IDE settings to add it
> automatically? If not, I would be strongly against it, since:
>
> - Intellij’s automatic refactor actions will not work properly.
> - I don’t think it’s a big deal. I don’t remember having any issues with
> the lack or presence of the `final` keyword.
> - `final` is pretty much useless in most of the cases (it’s not `const`
> and it works only for the primitive types).
> - I don’t like the extra overhead of doing something of very little extra
> value. Especially the extra hassle of going back & forth during the reviews
> (both as a contributor & as a reviewer).
> - If missing `final` keyword caused some confusion, because surprisingly a
> parameter was modified somewhere in the function and it wasn’t obviously
> visible, the function is doing probably too many things and it’s too
> long/too complicated…
>
> Generally speaking, I’m against adding minor things to our codestyle that
> can not be enforced and added automatically.
>
> Piotrek
>
> > On 7 Oct 2019, at 09:13, Arvid Heise  wrote:
> >
> > Hi guys,
> >
> > I'm a bit torn. In general, +1 for making parameters effectively final.
> >
> > The question is how to enforce it. We can make it explicit (like Aljoscha
> > said). All IDEs will show immediately warnings/errors for violations. It
> > would allow to softly migrate code.
> >
> > Another option is to use a checkstyle rule [1]. Then, we could omit the
> > final keyword and rely on checkstyle checks as we do for quite a few
> other
> > things. A hard checkstyle rule would probably fail on a good portion of
> the
> > current code base. But we would also remove reassignment to parameters
> > (which I consider an anti-pattern).
> >
> > If we opt not to enforce it, then -1 for removing final keywords from my
> > side.
> >
> > Best,
> >
> > Arvid
> >
> > [1]
> >
> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/ParameterAssignmentCheck.html
> >
> > On Fri, Oct 4, 2019 at 1:22 PM Zili Chen  wrote:
> >
> >> Hi Aljoscha,
> >>
> >> I totally agree with you on field topic. Of course it makes significant
> >> difference whether
> >> or not a field is final and IDE/compiler can help on checking.
> >>
> >> Here are several thoughts about final modifier on parameters and why I
> >> propose this one:
> >>
> >> 1. parameters should be always final
> >>
> >> As described above, there is no reason a parameter to be non-final. So
> >> different with field,
> >> a field can be final or non-final based on whether or not it is
> immutable.
> >> Thus with such a
> >> code style guide in our community, we can work towards a codebase every
> >> parameter is
> >> effectively final.
> >>
> >> 2. parameter final cannot be inherited
> >>
> >> Unfortunately Java doesn't keep final modifier of method parameter when
> >> inheritance. So
> >> even you mark a parameter as final in an interface or super class, you
> have
> >> to re-mark it
> >> as final in its implementation or subclass. From another perspective,
> final
> >> modifier of
> >> parameter is a local attribute of parameter so that we can narrow
> possible
> >> effect during

Re: [VOTE] Release 1.9.1, release candidate #1

2019-10-07 Thread Zili Chen

Hi Jark,

I notice a critical bug[1] is marked resolved in 1.9.1 but given 1.9.1
has been cut I'd like to throw the issue here so that we're sure
whether or not it is included in 1.9.1.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-14315


Jark Wu  于2019年9月30日周一 下午3:25写道：

>  Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 1.9.1,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.9.1-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours.
> Please cast your votes before *Oct. 3th 2019, 08:00 UTC*.
>
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Jark
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346003
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.1-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1272/
> [5]
>
> https://github.com/apache/flink/commit/4d56de81cb692c68a7d1dbfff13087a5079a8252
> [6] https://github.com/apache/flink-web/pull/274
>

Re: [VOTE] FLIP-74: Flink JobClient API

2019-10-08 Thread Zili Chen

Given the ongoing FlinkForward Berlin event, I'm going to extend
this vote thread with a bit of period, said until Oct. 11th(Friday).

Best,
tison.


Zili Chen  于2019年10月7日周一 下午4:15写道：

> Hi all,
>
> I would like to start the vote for FLIP-74[1], which is discussed and
> reached a consensus in the discussion thread[2].
>
> The vote will be open util Oct. 9th(72h starting on Oct.7th), unless
> there is an objection or not  enough votes.
>
> Best,
> tison.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
> [2]
> https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-09 Thread Zili Chen

Hi Kostas & Aljoscha,

I'm drafting a plan exposing multi-layered clients. It is mainly about
how we distinguish different layers and what clients we're going to
expose.

In FLIP-73 scope I'd like to ask a question that whether or not Executor
becomes a public interface that can be made use of by downstream
project developer? Or it just an internal concept for unifying job
submission?
If it is the latter, I'm feeling multi-layer client topic is totally
independent from
Executor.

Best,
tison.


Thomas Weise  于2019年10月5日周六 上午12:17写道：

> It might be useful to mention on FLIP-73 that the intention for
> Executor.execute is to be an asynchronous API once it becomes public and
> also refer to FLIP-74 as such.
>
>
> On Fri, Oct 4, 2019 at 2:52 AM Aljoscha Krettek 
> wrote:
>
> > Hi Tison,
> >
> > I agree, for now the async Executor.execute() is an internal detail but
> > during your work for FLIP-74 it will probably also reach the public API.
> >
> > Best,
> > Aljoscha
> >
> > > On 4. Oct 2019, at 11:39, Zili Chen  wrote:
> > >
> > > Hi Aljoscha,
> > >
> > > After clearly narrow the scope of this FLIP it looks good to me the
> > > interface
> > > Executor and its discovery so that I'm glad to see the vote thread.
> > >
> > > As you said, we should still discuss on implementation details but I
> > don't
> > > think
> > > it should be a blocker of the vote thread because a vote means we
> > generally
> > > agree on the motivation and overall design.
> > >
> > > As for Executor.execute() to be async, it is much better than we keep
> the
> > > difference between sync/async in this level. But I'd like to note that
> it
> > > only
> > > works internally for now because user-facing interface is still
> > env.execute
> > > which block and return a JobExecutionResult. I'm afraid that there are
> > > several
> > > people depends on the result for doing post execution process, although
> > it
> > > doesn't
> > > work on current per-job mode.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Aljoscha Krettek  于2019年10月4日周五 下午4:40写道：
> > >
> > >> Do you all think we could agree on the basic executor primitives and
> > start
> > >> voting on this FLIP? There are still some implementation details but I
> > >> think we can discuss/tackle them when we get to them and the various
> > people
> > >> implementing this should be in close collaboration.
> > >>
> > >> Best,
> > >> Aljoscha
> > >>
> > >>> On 4. Oct 2019, at 10:15, Aljoscha Krettek 
> > wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I think the end goal is to have only one environment per API, but I
> > >> think we won’t be able to achieve that in the short-term because of
> > >> backwards compatibility. This is most notable with the context
> > environment,
> > >> preview environments etc.
> > >>>
> > >>> To keep this FLIP very slim we can make this only about the executors
> > >> and executor discovery. Anything else like job submission semantics,
> > >> detached mode, … can be tackled after this. If we don’t focus I’m
> afraid
> > >> this will drag on for quite a while.
> > >>>
> > >>> One thing I would like to propose to make this easier is to change
> > >> Executor.execute() to return a CompletableFuture and to completely
> > remove
> > >> the “detached” logic from ClusterClient. That way, the new components
> > make
> > >> no distinction between “detached” and “attached” but we can still do
> it
> > in
> > >> the CLI (via the ContextEnvironment) to support the existing
> “detached”
> > >> behaviour of the CLI that users expect. What do you think about this?
> > >>>
> > >>> Best,
> > >>> Aljoscha
> > >>>
> > >>>> On 3. Oct 2019, at 10:03, Zili Chen  wrote:
> > >>>>
> > >>>> Thanks for your explanation Kostas to make it clear subtasks under
> > >> FLIP-73.
> > >>>>
> > >>>> As you described, changes of Environment are included in this FLIP.
> > For
> > >>>> "each
> > >>>> API to have a single Environment", it could be helpful to describ

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-10 Thread Zili Chen

Thanks for your explanation Kostas.

I agree that Clients are independent from the Executors. From
your text I wonder one thing that whether Executor#execute
returns a cluster client or a job client? As discussed previously
I think conceptually it is a job client?

Best,
tison.


Kostas Kloudas  于2019年10月10日周四 下午5:08写道：

> Hi Tison,
>
> I would say that as a first step, and until we see that the interfaces
> we introduce cover all intended purposes, we keep the Executors
> non-public.
> From the previous discussion, I think that in general the Clients are
> independent from the Executors, as the Executors simply use the
> clients to submit jobs and return a cluster client.
>
> Cheers,
> Kostas
>
> On Wed, Oct 9, 2019 at 7:01 PM Zili Chen  wrote:
> >
> > Hi Kostas & Aljoscha,
> >
> > I'm drafting a plan exposing multi-layered clients. It is mainly about
> > how we distinguish different layers and what clients we're going to
> > expose.
> >
> > In FLIP-73 scope I'd like to ask a question that whether or not Executor
> > becomes a public interface that can be made use of by downstream
> > project developer? Or it just an internal concept for unifying job
> > submission?
> > If it is the latter, I'm feeling multi-layer client topic is totally
> > independent from
> > Executor.
> >
> > Best,
> > tison.
> >
> >
> > Thomas Weise  于2019年10月5日周六 上午12:17写道：
> >
> > > It might be useful to mention on FLIP-73 that the intention for
> > > Executor.execute is to be an asynchronous API once it becomes public
> and
> > > also refer to FLIP-74 as such.
> > >
> > >
> > > On Fri, Oct 4, 2019 at 2:52 AM Aljoscha Krettek 
> > > wrote:
> > >
> > > > Hi Tison,
> > > >
> > > > I agree, for now the async Executor.execute() is an internal detail
> but
> > > > during your work for FLIP-74 it will probably also reach the public
> API.
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > > On 4. Oct 2019, at 11:39, Zili Chen  wrote:
> > > > >
> > > > > Hi Aljoscha,
> > > > >
> > > > > After clearly narrow the scope of this FLIP it looks good to me the
> > > > > interface
> > > > > Executor and its discovery so that I'm glad to see the vote thread.
> > > > >
> > > > > As you said, we should still discuss on implementation details but
> I
> > > > don't
> > > > > think
> > > > > it should be a blocker of the vote thread because a vote means we
> > > > generally
> > > > > agree on the motivation and overall design.
> > > > >
> > > > > As for Executor.execute() to be async, it is much better than we
> keep
> > > the
> > > > > difference between sync/async in this level. But I'd like to note
> that
> > > it
> > > > > only
> > > > > works internally for now because user-facing interface is still
> > > > env.execute
> > > > > which block and return a JobExecutionResult. I'm afraid that there
> are
> > > > > several
> > > > > people depends on the result for doing post execution process,
> although
> > > > it
> > > > > doesn't
> > > > > work on current per-job mode.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Aljoscha Krettek  于2019年10月4日周五 下午4:40写道：
> > > > >
> > > > >> Do you all think we could agree on the basic executor primitives
> and
> > > > start
> > > > >> voting on this FLIP? There are still some implementation details
> but I
> > > > >> think we can discuss/tackle them when we get to them and the
> various
> > > > people
> > > > >> implementing this should be in close collaboration.
> > > > >>
> > > > >> Best,
> > > > >> Aljoscha
> > > > >>
> > > > >>> On 4. Oct 2019, at 10:15, Aljoscha Krettek 
> > > > wrote:
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I think the end goal is to have only one environment per API,
> but I
> > > > >> think we won’t be able to achieve that in the short-term bec

Re: [VOTE] FLIP-74: Flink JobClient API

2019-10-11 Thread Zili Chen

Hi Kostas,

Thanks for your reply.

(1) cancelWithSavepoint() has already been excluded from the FLIP. But
to emphasize that we make the decision to exclude it I add it to reject
alternatives.

(2) Updated FLIP to reflect the consensus :-)

Best,
tison.


Kostas Kloudas  于2019年10月11日周五 下午5:12写道：

> Hi all,
>
> I only have two minor comments before voting and they have to do with
> the following:
>
> 1) In the discussion, we agreed to remove the cancelWithSavepoint()
> from the JobClient as this is deprecated in the rest API. This is not
> in the FLIP.
> 2) The section "ClusterDescriptor or Executor(FLIP-73)(integration)"
> does not reflect our discussion where we said that for now only the
> Executor#execute() will give you the JobClient and there will be a
> separate discussion about alternative ways of exposing the JobClient.
>
> I think that these points should be updated in order for the FLIP to
> reflect the discussion in the ML thread.
>
> Cheers,
> Kostas
>
> On Fri, Oct 11, 2019 at 10:58 AM Biao Liu  wrote:
> >
> > +1 (non-binding), glad to have this improvement!
> >
> > Thanks,
> > Biao /'bɪ.aʊ/
> >
> >
> >
> > On Fri, 11 Oct 2019 at 14:44, Jeff Zhang  wrote:
> >
> > > +1， overall design make sense to me
> > >
> > > SHI Xiaogang  于2019年10月11日周五 上午11:15写道：
> > >
> > > > +1. The interface looks fine to me.
> > > >
> > > > Regards,
> > > > Xiaogang
> > > >
> > > > Zili Chen  于2019年10月9日周三 下午2:36写道：
> > > >
> > > > > Given the ongoing FlinkForward Berlin event, I'm going to extend
> > > > > this vote thread with a bit of period, said until Oct.
> 11th(Friday).
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Zili Chen  于2019年10月7日周一 下午4:15写道：
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to start the vote for FLIP-74[1], which is
> discussed and
> > > > > > reached a consensus in the discussion thread[2].
> > > > > >
> > > > > > The vote will be open util Oct. 9th(72h starting on Oct.7th),
> unless
> > > > > > there is an objection or not  enough votes.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
> > > > > > [2]
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
>

Re: [VOTE] FLIP-74: Flink JobClient API

2019-10-11 Thread Zili Chen

Well. Then I'd remove the requirement to change cancelWithSavepoint
but remain why we exclude it from JobClient.

We might still change signature to completable future for a consistent
async view of ClusterClient but it is quite an implement detail and we
don't stick to it on FLIP level.

Best,
tison.


Kostas Kloudas  于2019年10月11日周五 下午7:36写道：

> Hi Tison,
>
> Thanks for integrating the comments!
>
> +1 for accepting the FLIP from my side.
> What I meant is that in the Proposed Changes section, the FLIP still
> has that the cancelWithSavepoin(jobId, savepointDir) of the
> clusterClient should change to return a CompletableFuture. I believe
> that this change is redundant as we will not need it for the
> JobClient. I should have been more clear on what I meant before.
>
> Cheers,
> Kostas
>
> On Fri, Oct 11, 2019 at 11:51 AM Zili Chen  wrote:
> >
> > Hi Kostas,
> >
> > Thanks for your reply.
> >
> > (1) cancelWithSavepoint() has already been excluded from the FLIP. But
> > to emphasize that we make the decision to exclude it I add it to reject
> > alternatives.
> >
> > (2) Updated FLIP to reflect the consensus :-)
> >
> > Best,
> > tison.
> >
> >
> > Kostas Kloudas  于2019年10月11日周五 下午5:12写道：
> >
> > > Hi all,
> > >
> > > I only have two minor comments before voting and they have to do with
> > > the following:
> > >
> > > 1) In the discussion, we agreed to remove the cancelWithSavepoint()
> > > from the JobClient as this is deprecated in the rest API. This is not
> > > in the FLIP.
> > > 2) The section "ClusterDescriptor or Executor(FLIP-73)(integration)"
> > > does not reflect our discussion where we said that for now only the
> > > Executor#execute() will give you the JobClient and there will be a
> > > separate discussion about alternative ways of exposing the JobClient.
> > >
> > > I think that these points should be updated in order for the FLIP to
> > > reflect the discussion in the ML thread.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Fri, Oct 11, 2019 at 10:58 AM Biao Liu  wrote:
> > > >
> > > > +1 (non-binding), glad to have this improvement!
> > > >
> > > > Thanks,
> > > > Biao /'bɪ.aʊ/
> > > >
> > > >
> > > >
> > > > On Fri, 11 Oct 2019 at 14:44, Jeff Zhang  wrote:
> > > >
> > > > > +1， overall design make sense to me
> > > > >
> > > > > SHI Xiaogang  于2019年10月11日周五 上午11:15写道：
> > > > >
> > > > > > +1. The interface looks fine to me.
> > > > > >
> > > > > > Regards,
> > > > > > Xiaogang
> > > > > >
> > > > > > Zili Chen  于2019年10月9日周三 下午2:36写道：
> > > > > >
> > > > > > > Given the ongoing FlinkForward Berlin event, I'm going to
> extend
> > > > > > > this vote thread with a bit of period, said until Oct.
> > > 11th(Friday).
> > > > > > >
> > > > > > > Best,
> > > > > > > tison.
> > > > > > >
> > > > > > >
> > > > > > > Zili Chen  于2019年10月7日周一 下午4:15写道：
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I would like to start the vote for FLIP-74[1], which is
> > > discussed and
> > > > > > > > reached a consensus in the discussion thread[2].
> > > > > > > >
> > > > > > > > The vote will be open util Oct. 9th(72h starting on Oct.7th),
> > > unless
> > > > > > > > there is an objection or not  enough votes.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > tison.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
> > > > > > > > [2]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > > >
> > >
>

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-14 Thread Zili Chen

+1 to add Stateful Function to FLINK core repository.

Best,
tison.


Becket Qin  于2019年10月14日周一 下午4:16写道：

> +1 to adding Stateful Function to Flink. It is a very useful addition to
> the Flink ecosystem.
>
> Given this is essentially a new top-level / first-citizen API of Flink, it
> seems better to have it the Flink core repo. This will also avoid letting
> this important new API to be blocked on potential problems of maintaining
> multiple different repositories.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Sun, Oct 13, 2019 at 4:48 AM Hequn Cheng  wrote:
>
>> Hi Stephan,
>>
>> Big +1 for adding this to Apache Flink!
>>
>> As for the problem of whether this should be added to the Flink main
>> repository, from my side, I prefer to put it in the main repository. Not
>> only Stateful Functions shares very close relations with the current Flink,
>> but also other libs or modules in Flink can make use of it the other way
>> round in the future. At that time the Flink API stack would also be changed
>> a bit and this would be cool.
>>
>> Best, Hequn
>>
>> On Sat, Oct 12, 2019 at 9:16 PM Biao Liu  wrote:
>>
>>> Hi Stehpan,
>>>
>>> +1 for having Stateful Functions in Flink.
>>>
>>> Before discussing which repository it should belong, I was wondering if
>>> we have reached an agreement of "splitting flink repository" as Piotr
>>> mentioned or not. It seems that it's just no more further discussion.
>>> It's OK for me to add it to core repository. After all almost everything
>>> is in core repository now. But if we decide to split the core repository
>>> someday, I tend to create a separate repository for Stateful Functions. It
>>> might be good time to take the first step of splitting.
>>>
>>> Thanks,
>>> Biao /'bɪ.aʊ/
>>>
>>>
>>>
>>> On Sat, 12 Oct 2019 at 19:31, Yu Li  wrote:
>>>
 Hi Stephan,

 Big +1 for adding stateful functions to Flink. I believe a lot of user
 would be interested to try this out and I could imagine how this could
 contribute to reduce the TCO for business requiring both streaming
 processing and stateful functions.

 And my 2 cents is to put it into flink core repository since I could
 see a tight connection between this library and flink state.

 Best Regards,
 Yu


 On Sat, 12 Oct 2019 at 17:31, jincheng sun 
 wrote:

> Hi Stephan,
>
> bit +1 for adding this great features to Apache Flink.
>
> Regarding where we should place it, put it into Flink core repository
> or create a separate repository? I prefer put it into main repository and
> looking forward the more detail discussion for this decision.
>
> Best,
> Jincheng
>
>
> Jingsong Li  于2019年10月12日周六 上午11:32写道：
>
>> Hi Stephan,
>>
>> big +1 for this contribution. It provides another user interface that
>> is easy to use and popular at this time. these functions, It's hard for
>> users to write in SQL/TableApi, while using DataStream is too complex.
>> (We've done some stateFun kind jobs using DataStream before). With
>> statefun, it is very easy.
>>
>> I think it's also a good opportunity to exercise Flink's core
>> capabilities. I looked at stateful-functions-flink briefly, it is very
>> interesting. I think there are many other things Flink can improve. So I
>> think it's a better thing to put it into Flink, and the improvement for 
>> it
>> will be more natural in the future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz <
>> dwysakow...@apache.org> wrote:
>>
>>> Hi Stephan,
>>>
>>> I think this is a nice library, but what I like more about it is
>>> that it suggests exploring different use-cases. I think it definitely 
>>> makes
>>> sense for the Flink community to explore more lightweight applications 
>>> that
>>> reuses resources. Therefore I definitely think it is a good idea for 
>>> Flink
>>> community to accept this contribution and help maintaining it.
>>>
>>> Personally I'd prefer to have it in a separate repository. There
>>> were a few discussions before where different people were suggesting to
>>> extract connectors and other libraries to separate repositories. 
>>> Moreover I
>>> think it could serve as an example for the Flink ecosystem website[1]. 
>>> This
>>> could be the first project in there and give a good impression that the
>>> community sees potential in the ecosystem website.
>>>
>>> Lastly, I'm wondering if this should go through PMC vote according
>>> to our bylaws[2]. In the end the suggestion is to adopt an existing code
>>> base as is. It also proposes a new programs concept that could result 
>>> in a
>>> shift of priorities for the community in a long run.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> [1]
>>> http://apache-flink-mailing-li

Re: [VOTE] FLIP-74: Flink JobClient API

2019-10-14 Thread Zili Chen

Hi all,

+1 from my side.

Given the current state of this voting thread, FLIP-74 is accepted
with 3 binding vote and 2 non-binding vote. Thanks for your
participation!

I will update the wiki to reflect that the result of the vote.

Best,
tison.


Zili Chen  于2019年10月11日周五 下午8:48写道：

> Well. Then I'd remove the requirement to change cancelWithSavepoint
> but remain why we exclude it from JobClient.
>
> We might still change signature to completable future for a consistent
> async view of ClusterClient but it is quite an implement detail and we
> don't stick to it on FLIP level.
>
> Best,
> tison.
>
>
> Kostas Kloudas  于2019年10月11日周五 下午7:36写道：
>
>> Hi Tison,
>>
>> Thanks for integrating the comments!
>>
>> +1 for accepting the FLIP from my side.
>> What I meant is that in the Proposed Changes section, the FLIP still
>> has that the cancelWithSavepoin(jobId, savepointDir) of the
>> clusterClient should change to return a CompletableFuture. I believe
>> that this change is redundant as we will not need it for the
>> JobClient. I should have been more clear on what I meant before.
>>
>> Cheers,
>> Kostas
>>
>> On Fri, Oct 11, 2019 at 11:51 AM Zili Chen  wrote:
>> >
>> > Hi Kostas,
>> >
>> > Thanks for your reply.
>> >
>> > (1) cancelWithSavepoint() has already been excluded from the FLIP. But
>> > to emphasize that we make the decision to exclude it I add it to reject
>> > alternatives.
>> >
>> > (2) Updated FLIP to reflect the consensus :-)
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Kostas Kloudas  于2019年10月11日周五 下午5:12写道：
>> >
>> > > Hi all,
>> > >
>> > > I only have two minor comments before voting and they have to do with
>> > > the following:
>> > >
>> > > 1) In the discussion, we agreed to remove the cancelWithSavepoint()
>> > > from the JobClient as this is deprecated in the rest API. This is not
>> > > in the FLIP.
>> > > 2) The section "ClusterDescriptor or Executor(FLIP-73)(integration)"
>> > > does not reflect our discussion where we said that for now only the
>> > > Executor#execute() will give you the JobClient and there will be a
>> > > separate discussion about alternative ways of exposing the JobClient.
>> > >
>> > > I think that these points should be updated in order for the FLIP to
>> > > reflect the discussion in the ML thread.
>> > >
>> > > Cheers,
>> > > Kostas
>> > >
>> > > On Fri, Oct 11, 2019 at 10:58 AM Biao Liu  wrote:
>> > > >
>> > > > +1 (non-binding), glad to have this improvement!
>> > > >
>> > > > Thanks,
>> > > > Biao /'bɪ.aʊ/
>> > > >
>> > > >
>> > > >
>> > > > On Fri, 11 Oct 2019 at 14:44, Jeff Zhang  wrote:
>> > > >
>> > > > > +1， overall design make sense to me
>> > > > >
>> > > > > SHI Xiaogang  于2019年10月11日周五 上午11:15写道：
>> > > > >
>> > > > > > +1. The interface looks fine to me.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Xiaogang
>> > > > > >
>> > > > > > Zili Chen  于2019年10月9日周三 下午2:36写道：
>> > > > > >
>> > > > > > > Given the ongoing FlinkForward Berlin event, I'm going to
>> extend
>> > > > > > > this vote thread with a bit of period, said until Oct.
>> > > 11th(Friday).
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > tison.
>> > > > > > >
>> > > > > > >
>> > > > > > > Zili Chen  于2019年10月7日周一 下午4:15写道：
>> > > > > > >
>> > > > > > > > Hi all,
>> > > > > > > >
>> > > > > > > > I would like to start the vote for FLIP-74[1], which is
>> > > discussed and
>> > > > > > > > reached a consensus in the discussion thread[2].
>> > > > > > > >
>> > > > > > > > The vote will be open util Oct. 9th(72h starting on
>> Oct.7th),
>> > > unless
>> > > > > > > > there is an objection or not  enough votes.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > tison.
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
>> > > > > > > > [2]
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best Regards
>> > > > >
>> > > > > Jeff Zhang
>> > > > >
>> > >
>>
>

Re: [DISCUSS] FLIP policy for introducing config option keys

2019-10-15 Thread Zili Chen

Hi Aljoscha & Dawid & Kostas,

I agree that changes on config option keys deserve a FLIP and it is
reasonable
we commit the changes with a standard FLIP process so that ensure the change
given proper visibility.

My concern is about naming. Given FLIP-73 as an example, if FLIPs
associated to
FLIP-73(actually can be regarded as sub-FLIP of it) grows FLIP numbers and
appears
like FLIP-80 FLIP-85 FLIP-91 and so on, then we possibly run into a state
flooded by
quite a few config option only FLIP. Maybe it makes sense to number these
FLIP as
FLIP-73.1 FLIP-73.2, which shows the association and doesn't pollute other
FLIPs.

Remind the general thoughts, IMO changes on config option keys deserve a
standard
FLIP process, e.g. FLIP-61.

Best,
tison.

Kostas Kloudas  于2019年10月15日周二 下午8:20写道：

> Hi Aljoscha,
>
> Given that config option keys are user-facing and any change there is
> breaking, I think there should be a discussion about them and a FLIP,
> where people have to actually vote for it seems to be the right place.
> I understand that this is tedious (and actually I will also have to
> open some FLIPs as part of FLIP-73), but this contributes to the
> uniformity of our parameters and also giving them some more
> visibility.
>
> Cheers,
> Kostas
>
> On Tue, Oct 15, 2019 at 2:05 PM Aljoscha Krettek 
> wrote:
> >
> > Hi Everyone,
> >
> > The title says it all, do you think we need to cover all config options
> that we introduce/change by FLIPs? I was thinking about this because of the
> FLIP-73 work, which will introduce some new config options and also because
> I just spotted a PR [1] that introduces some config options.
> >
> > Best,
> > Aljoscha
> >
> > [1] https://github.com/apache/flink/pull/9836
>

Re: [DISCUSS] FLIP policy for introducing config option keys

2019-10-15 Thread Zili Chen

The naming concern above can be a separated issue since it looks also
affect FLIP-54 and isn't limited for config option changes FLIP.

Best,
tison.


Aljoscha Krettek  于2019年10月15日周二 下午8:37写道：

> Another PR that introduces new config options:
> https://github.com/apache/flink/pull/9759
>
> > On 15. Oct 2019, at 14:31, Zili Chen  wrote:
> >
> > Hi Aljoscha & Dawid & Kostas,
> >
> > I agree that changes on config option keys deserve a FLIP and it is
> > reasonable
> > we commit the changes with a standard FLIP process so that ensure the
> change
> > given proper visibility.
> >
> > My concern is about naming. Given FLIP-73 as an example, if FLIPs
> > associated to
> > FLIP-73(actually can be regarded as sub-FLIP of it) grows FLIP numbers
> and
> > appears
> > like FLIP-80 FLIP-85 FLIP-91 and so on, then we possibly run into a state
> > flooded by
> > quite a few config option only FLIP. Maybe it makes sense to number these
> > FLIP as
> > FLIP-73.1 FLIP-73.2, which shows the association and doesn't pollute
> other
> > FLIPs.
> >
> > Remind the general thoughts, IMO changes on config option keys deserve a
> > standard
> > FLIP process, e.g. FLIP-61.
> >
> > Best,
> > tison.
> >
> >
> > Kostas Kloudas  于2019年10月15日周二 下午8:20写道：
> >
> >> Hi Aljoscha,
> >>
> >> Given that config option keys are user-facing and any change there is
> >> breaking, I think there should be a discussion about them and a FLIP,
> >> where people have to actually vote for it seems to be the right place.
> >> I understand that this is tedious (and actually I will also have to
> >> open some FLIPs as part of FLIP-73), but this contributes to the
> >> uniformity of our parameters and also giving them some more
> >> visibility.
> >>
> >> Cheers,
> >> Kostas
> >>
> >> On Tue, Oct 15, 2019 at 2:05 PM Aljoscha Krettek 
> >> wrote:
> >>>
> >>> Hi Everyone,
> >>>
> >>> The title says it all, do you think we need to cover all config options
> >> that we introduce/change by FLIPs? I was thinking about this because of
> the
> >> FLIP-73 work, which will introduce some new config options and also
> because
> >> I just spotted a PR [1] that introduces some config options.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>> [1] https://github.com/apache/flink/pull/9836
> >>
>
>

Re: [DISCUSS] FLIP policy for introducing config option keys

2019-10-15 Thread Zili Chen

Hi Jark & Hequn,

Do you stick to introduce a looser FLIP? We possibly introduce a redundant
extra type
of community consensus if we are able to just reuse the process of current
FLIP. Given
the activity of our community I don't think it costs too much for a config
option keys
change with 3 days at least voting required >3 committer votes.

Best,
tison.


Hequn Cheng  于2019年10月16日周三 下午2:29写道：

> Hi all,
>
> +1 to have a looser FLIP policy for these API changes.
>
> I think the concerns raised above are all valid. Besides the feedbacks from
> @Jark, if we want to keep track of these changes, maybe we can create a new
> kind of FLIP that is dedicated to these minor API changes? For example, we
> can add a single wiki page and list all related JIRAs in it. The design
> details can be described in the JIRA.
> Another option is to simply add a new JIRA label to track these changes.
>
> What do you think?
>
> Best, Hequn
>
> On Tue, Oct 15, 2019 at 8:43 PM Zili Chen  wrote:
>
> > The naming concern above can be a separated issue since it looks also
> > affect FLIP-54 and isn't limited for config option changes FLIP.
> >
> > Best,
> > tison.
> >
> >
> > Aljoscha Krettek  于2019年10月15日周二 下午8:37写道：
> >
> > > Another PR that introduces new config options:
> > > https://github.com/apache/flink/pull/9759
> > >
> > > > On 15. Oct 2019, at 14:31, Zili Chen  wrote:
> > > >
> > > > Hi Aljoscha & Dawid & Kostas,
> > > >
> > > > I agree that changes on config option keys deserve a FLIP and it is
> > > > reasonable
> > > > we commit the changes with a standard FLIP process so that ensure the
> > > change
> > > > given proper visibility.
> > > >
> > > > My concern is about naming. Given FLIP-73 as an example, if FLIPs
> > > > associated to
> > > > FLIP-73(actually can be regarded as sub-FLIP of it) grows FLIP
> numbers
> > > and
> > > > appears
> > > > like FLIP-80 FLIP-85 FLIP-91 and so on, then we possibly run into a
> > state
> > > > flooded by
> > > > quite a few config option only FLIP. Maybe it makes sense to number
> > these
> > > > FLIP as
> > > > FLIP-73.1 FLIP-73.2, which shows the association and doesn't pollute
> > > other
> > > > FLIPs.
> > > >
> > > > Remind the general thoughts, IMO changes on config option keys
> deserve
> > a
> > > > standard
> > > > FLIP process, e.g. FLIP-61.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Kostas Kloudas  于2019年10月15日周二 下午8:20写道：
> > > >
> > > >> Hi Aljoscha,
> > > >>
> > > >> Given that config option keys are user-facing and any change there
> is
> > > >> breaking, I think there should be a discussion about them and a
> FLIP,
> > > >> where people have to actually vote for it seems to be the right
> place.
> > > >> I understand that this is tedious (and actually I will also have to
> > > >> open some FLIPs as part of FLIP-73), but this contributes to the
> > > >> uniformity of our parameters and also giving them some more
> > > >> visibility.
> > > >>
> > > >> Cheers,
> > > >> Kostas
> > > >>
> > > >> On Tue, Oct 15, 2019 at 2:05 PM Aljoscha Krettek <
> aljos...@apache.org
> > >
> > > >> wrote:
> > > >>>
> > > >>> Hi Everyone,
> > > >>>
> > > >>> The title says it all, do you think we need to cover all config
> > options
> > > >> that we introduce/change by FLIPs? I was thinking about this because
> > of
> > > the
> > > >> FLIP-73 work, which will introduce some new config options and also
> > > because
> > > >> I just spotted a PR [1] that introduces some config options.
> > > >>>
> > > >>> Best,
> > > >>> Aljoscha
> > > >>>
> > > >>> [1] https://github.com/apache/flink/pull/9836
> > > >>
> > >
> > >
> >
>

Re: [VOTE] FLIP-77: Introduce ConfigOptions with Data Types

2019-10-16 Thread Zili Chen

+1

Best,
tison.


Aljoscha Krettek  于2019年10月16日周三 下午6:25写道：

> +1
>
> > On 16. Oct 2019, at 10:30, Timo Walther  wrote:
> >
> > +1
> >
> > Thanks,
> > Timo
> >
> > On 15.10.19 17:07, Till Rohrmann wrote:
> >> Sorry for the confusion. I should have checked with an external mail
> >> client. Thanks a lot for the clarification.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Tue, Oct 15, 2019 at 2:07 PM Jark Wu  wrote:
> >>
> >>> +1
> >>>
> >>> It is a separate [VOTE] thread in my Mail client.
> >>>
> >>> Best,
> >>> Jark
> >>>
>  在 2019年10月15日，18:58，Dawid Wysakowicz  写道：
> 
>  Hi Till,
> 
>  It should and it actually is.
> 
>  I think it's some feature of gmail that it groups conversations with a
>  similar topic into a conversation. (I also see it this way in gmail,
> but
>  I see it as separate threads in my mail client)
> 
>  You can see that it is a separate thread e.g. in the ML archives:
> 
>  This thread:
> 
> >>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-77-Introduce-ConfigOptions-with-Data-Types-td33999.html
>  Discuss thread:
> 
> >>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-77-Introduce-ConfigOptions-with-Data-Types-td33902.html
>  Best,
> 
>  Dawid
> 
>  On 15/10/2019 12:00, Till Rohrmann wrote:
> > Shouldn't the voting happen in a distinct [VOTE] thread?
> >
> > Cheers,
> > Till
> >
> > On Tue, Oct 15, 2019 at 10:46 AM Kurt Young 
> wrote:
> >
> >> +1
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Tue, Oct 15, 2019 at 9:30 AM Dawid Wysakowicz <
> >>> dwysakow...@apache.org>
> >> wrote:
> >>
> >>> Hi everyone,
> >>> I would like to start a vote on FLIP-77.
> >>>
> >>> Please vote for the following design document:
> >>>
> >>>
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-77%3A+Introduce+ConfigOptions+with+Data+Types
> >>> The discussions can be found at:
> >>>
> >>>
> >>>
> >>>
> https://lists.apache.org/thread.html/a56c6b52e5f828d4a737602b031e36b5dd6eaa97557306696a8063a9@%3Cdev.flink.apache.org%3E
> >>>
> https://lists.apache.org/thread.html/6f98eae7879764aa3a3991311d700a5ccdb713dbe345e6ecf514b2d7@%3Cdev.flink.apache.org%3E
> >>> This voting will be open for at least 72 hours. I'll try to close
> it
> >>> on
> >>> 2019-10-18 8:00 UTC, unless there is an objection or not enough
> votes.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>>
> >>>
> >>>
> >
>
>

[DISCUSS] What do we gain by supporting customized High-Availability services

2019-10-17 Thread Zili Chen

Hi devs,

Recently the community excludes customize support on new restart strategies[1],
which reminds
me to think of which kind of customized support a framework like Flink
should provides.

The key idea is pluggable is not customizable.

We might handle a series of implementation of restart strategies as well as
high-availability
services in our codebase. But it has a fixed size, which is definitely
different from support
arbitrarily customized.

For a services like high-availability services, it underneath relies on
quite a lot of runtime
implementations. For example, JobGraphStore supports #releaseJobGraphStore
originally
due to ZK lock strategy; getJobManagerRetriever requires default address
because
StandaloneHighAvailabilityServices is non-ha and pre-configured.

This kind of interfaces, however, are possibly evolves with flink runtime
implementation such
as cluster management and coordination details. If we support customizing
it, it means
such internal a high-availability services becomes public interfaces. If we
keep it pluggable,
we can extend it reacting to runtime evolution, ensuring the
implementations stay in a fixed
set; while introducing new implementation(such as etcd[2] or MapDB[3]) if
they are good fit.

We don't have a customize support on ResourceManager although it is
pluggable that
others can implement a kubernetes resource manager[4]. Maybe this is a
better way
how we handle high-availability services. Pluggable, but not customizable.

Looking forward to your ideas. To be clear, I'm not trying to drop it now,
but I'm a bit
confusing about this topic and would like to turn to the wisdom in our
community.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/6ed95eb6a91168dba09901e158bc1b6f4b08f1e176db4641f79de765@%3Cdev.flink.apache.org%3E
[2] https://issues.apache.org/jira/browse/FLINK-11105
[3]
https://lists.apache.org/x/thread.html/eae4cbdf6dac466bc0247e3bc1a7a69fe7e1db7a512fcd607e9c081b@%3Cuser.flink.apache.org%3E
[4] https://github.com/tianchen92/flink/tree/k8s-master/flink-kubernete

Re: [DISCUSS] What do we gain by supporting customized High-Availability services

2019-10-17 Thread Zili Chen

A challenge is how we ensure the support for customized implementation. When
we introduce JobGraphStore#releaseJobGraph we actually change quite a bit
codepath
in Dispatcher. While we are unable to test arbitrarily customized
implementation our
compatibility promise is actually no more than compilation compatible.

Customer should still be required to be familiar with implementation
details to figure
out the fitment when they bump Flink version. This effort requires also and
no extra
when we support pluggable strategy. In another word, a customized support
tends
to hide the challenge when customer want to use their own implementation.

Re: [NOTICE] Binary licensing is now auto-generated

2019-10-17 Thread Zili Chen

Thanks for this improvement Chesnay ;-)

Best,
tison.


Chesnay Schepler  于2019年10月17日周四 下午9:37写道：

> Hello,
>
> I just merged FLINK-14008 to 1.8, 1.9 and 1.10, which means that from
> now on the tricky part of the binary licensing (NOTICE-binary,
> licenses-binary) is automatically generated during the release process.
>
> As such these files have been removed from the root directory of the
> project (thus, you don't have to update these things anymore ;)).
>
> This also means that only builds of flink-dist that were built as part
> of the release process will have these files attached.
>
> I have updated the Licensing guide
>  accordingly.
>
>

Re: [DISCUSS] What do we gain by supporting customized High-Availability services

2019-10-17 Thread Zili Chen

Another perspective is that a stable, carefully-designed interface with
clear semantic
could be safer to customize.

Following the discussion in FLINK-10333 our JobGraphStore is actually
required
performing write operation only with leadership,  which is a basic
requirement
for coordination rather than an implementation detail.
Thus it depends on LeaderElectionService(in the design, we narrow the
specific interface
as LeaderStore). HighAvailabilityServices#getJobGraphStore() infers a
implicit field for
that which is hard to express the relationship between them.

If the interface is unstable(also we introduce a ClientHAService for
separate concern and
have to keep b/w comp. for customized), we'd better keep it internal for
freely evolution.
And when we try to support customized, it would be helpful to start a
proposal to revisit
the interface to be well-designed and stable. ref[2].

In short, current high-availability services as well as
runtime/coordination is still under
development and active evolution. It is possibly not a good time for make
it public and
customizable.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-10333
[2]
https://github.com/apache/pulsar/wiki/PIP-45%3A-Pluggable-metadata-interface


Zili Chen  于2019年10月17日周四 下午8:13写道：

> A challenge is how we ensure the support for customized implementation.
> When
> we introduce JobGraphStore#releaseJobGraph we actually change quite a bit
> codepath
> in Dispatcher. While we are unable to test arbitrarily customized
> implementation our
> compatibility promise is actually no more than compilation compatible.
>
> Customer should still be required to be familiar with implementation
> details to figure
> out the fitment when they bump Flink version. This effort requires also
> and no extra
> when we support pluggable strategy. In another word, a customized support
> tends
> to hide the challenge when customer want to use their own implementation.
>

Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Zili Chen

Thanks a lot for being release manager Jark. Great work!

Best,
tison.


Till Rohrmann  于2019年10月19日周六 下午10:15写道：

> Thanks a lot for being our release manager Jark and thanks to everyone who
> has helped to make this release possible.
>
> Cheers,
> Till
>
> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>
>>  Hi,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>> 1.9 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>
>> We would like to thank all contributors of the Apache Flink community who
>> helped to verify this release and made this release possible!
>> Great thanks to @Jincheng for helping finalize this release.
>>
>> Regards,
>> Jark Wu
>>
>

Re: [VOTE] Accept Stateful Functions into Apache Flink

2019-10-21 Thread Zili Chen

+1 (non-binding)

Best,
tison.


Kostas Kloudas  于2019年10月21日周一 下午11:49写道：

> +1 (binding)
>
> On Mon, Oct 21, 2019 at 5:18 PM Aljoscha Krettek 
> wrote:
> >
> > +1 (binding)
> >
> > Aljoscha
> >
> > > On 21. Oct 2019, at 16:18, Thomas Weise  wrote:
> > >
> > > +1 (binding)
> > >
> > >
> > > On Mon, Oct 21, 2019 at 7:10 AM Timo Walther 
> wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >>
> > >> On 21.10.19 15:59, Till Rohrmann wrote:
> > >>> +1 (binding)
> > >>>
> > >>> Cheers,
> > >>> Till
> > >>>
> > >>> On Mon, Oct 21, 2019 at 12:13 PM Robert Metzger  >
> > >> wrote:
> > >>>
> >  +1 (binding)
> > 
> >  On Mon, Oct 21, 2019 at 12:06 PM Stephan Ewen 
> wrote:
> > 
> > > This is the official vote whether to accept the Stateful Functions
> code
> > > contribution to Apache Flink.
> > >
> > > The current Stateful Functions code, documentation, and website
> can be
> > > found here:
> > > https://statefun.io/
> > > https://github.com/ververica/stateful-functions
> > >
> > > This vote should capture whether the Apache Flink community is
> > >> interested
> > > in accepting, maintaining, and evolving Stateful Functions.
> > >
> > > Reiterating my original motivation, I believe that this project is
> a
> >  great
> > > match for Apache Flink, because it helps Flink to grow the
> community
> >  into a
> > > new set of use cases. We see current users interested in such use
> > >> cases,
> > > but they are not well supported by Flink as it currently is.
> > >
> > > I also personally commit to put time into making sure this
> integrates
> >  well
> > > with Flink and that we grow contributors and committers to maintain
> > >> this
> > > new component well.
> > >
> > > This is a "Adoption of a new Codebase" vote as per the Flink bylaws
> > >> [1].
> > > Only PMC votes are binding. The vote will be open at least 6 days
> > > (excluding weekends), meaning until Tuesday Oct.29th 12:00 UTC, or
> > >> until
> >  we
> > > achieve the 2/3rd majority.
> > >
> > > Happy voting!
> > >
> > > Best,
> > > Stephan
> > >
> > > [1]
> > >
> > 
> > >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > >>
> > >>
> > >>
> >
>

Re: [VOTE] FLIP-81: Executor-related new ConfigOptions

2019-10-23 Thread Zili Chen

Thanks for starting this voting process. I have looked at the FLIP and the
discussion
thread. These options added make sense to me.

+1 from my side.

Best,
tison.


Kostas Kloudas  于2019年10月23日周三 下午4:12写道：

> Hi all,
>
> This is the voting thread for FLIP-81, as the title says.
>
> The FLIP can be found in [1] and the discussion thread in [2].
>
> As per the bylaws, the vote will stay open until Friday 26-10 (3 days)
> or until it gets 3 votes.
>
> Thank you,
> Kostas
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=133631524
> [2]
> https://lists.apache.org/thread.html/a4dd8e0c7b79350c59f5afefc1bc583dac99abcf94caaa8c22017974@%3Cdev.flink.apache.org%3E
>

Re: [DISCUSS] What do we gain by supporting customized High-Availability services

2019-10-23 Thread Zili Chen

Thanks for your clarification Till.

For terminology pluggable and customizable, I am mainly concerning about
interface audience issue. Pluggable means we have multiple high-availability
implementation but closed to extend in user scope; customizable means
high-availability interface are stable and user-facing.

I agree that we should try to keep backward compatibility if possible. I
start
this thread due to being confused how I progress FLINK-10333. Specifically,
how to deal with compatibility things.

Given that it is an internal interface I think it is reasonable we
evolve it for
supporting leader store based high-availability storage in a minor version
bump. I'm going to start a discuss thread next week for gathering wider
feedbacks beyond the original (maybe limited) JIRA, which also calls of
review among community members. What do you think?

Best,
tison.


Till Rohrmann  于2019年10月21日周一 下午5:21写道：

> Hi Tison,
>
> I'm not sure whether I fully understand your distinction between
> customizable and pluggable. Maybe you could clarify your ideas a bit
> because you seem to favour support for pluggable implementations.
>
> Maybe let me try to answer some other questions you raised. With the
> HighAvailabilityServices interface and the functionality to load custom
> implementations specified via `high-availability: FQDN` it is indeed
> possible to provide a custom implementation of the HA services. This is,
> however, pretty much a power user feature where users have to implement
> against an internal API. As such, it can be subject to change and we don't
> give guarantees that these interfaces won't change. Of course, if possible,
> we should extend/change it in a way that we guarantee backwards
> compatibility.
>
> You are right that at the moment this interface is not stable enough for
> being public API. Once this changes and we are happy with it, then we can
> think about making it a public API and documenting it properly. Afaik,
> there is no documentation how to implement your own HA services at the
> moment. This underlines as well that this interface is an internal API.
>
> Cheers,
> Till
>
> On Fri, Oct 18, 2019 at 5:56 AM Zili Chen  wrote:
>
> > Another perspective is that a stable, carefully-designed interface with
> > clear semantic
> > could be safer to customize.
> >
> > Following the discussion in FLINK-10333 our JobGraphStore is actually
> > required
> > performing write operation only with leadership,  which is a basic
> > requirement
> > for coordination rather than an implementation detail.
> > Thus it depends on LeaderElectionService(in the design, we narrow the
> > specific interface
> > as LeaderStore). HighAvailabilityServices#getJobGraphStore() infers a
> > implicit field for
> > that which is hard to express the relationship between them.
> >
> > If the interface is unstable(also we introduce a ClientHAService for
> > separate concern and
> > have to keep b/w comp. for customized), we'd better keep it internal for
> > freely evolution.
> > And when we try to support customized, it would be helpful to start a
> > proposal to revisit
> > the interface to be well-designed and stable. ref[2].
> >
> > In short, current high-availability services as well as
> > runtime/coordination is still under
> > development and active evolution. It is possibly not a good time for make
> > it public and
> > customizable.
> >
> > Best,
> > tison.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-10333
> > [2]
> >
> >
> https://github.com/apache/pulsar/wiki/PIP-45%3A-Pluggable-metadata-interface
> >
> >
> > Zili Chen  于2019年10月17日周四 下午8:13写道：
> >
> > > A challenge is how we ensure the support for customized implementation.
> > > When
> > > we introduce JobGraphStore#releaseJobGraph we actually change quite a
> bit
> > > codepath
> > > in Dispatcher. While we are unable to test arbitrarily customized
> > > implementation our
> > > compatibility promise is actually no more than compilation compatible.
> > >
> > > Customer should still be required to be familiar with implementation
> > > details to figure
> > > out the fitment when they bump Flink version. This effort requires also
> > > and no extra
> > > when we support pluggable strategy. In another word, a customized
> support
> > > tends
> > > to hide the challenge when customer want to use their own
> implementation.
> > >
> >
>

Re: [ANNOUNCE] Becket Qin joins the Flink PMC

2019-10-28 Thread Zili Chen

Congratulations Becket!

Best,
tison.


Congxian Qiu  于2019年10月29日周二 上午9:53写道：

> Congratulations Becket!
>
> Best,
> Congxian
>
>
> Wei Zhong  于2019年10月29日周二 上午9:42写道：
>
> > Congratulations Becket!
> >
> > Best,
> > Wei
> >
> > > 在 2019年10月29日，09:36，Paul Lam  写道：
> > >
> > > Congrats Becket!
> > >
> > > Best,
> > > Paul Lam
> > >
> > >> 在 2019年10月29日，02:18，Xingcan Cui  写道：
> > >>
> > >> Congratulations, Becket!
> > >>
> > >> Best,
> > >> Xingcan
> > >>
> > >>> On Oct 28, 2019, at 1:23 PM, Xuefu Z  wrote:
> > >>>
> > >>> Congratulations, Becket!
> > >>>
> > >>> On Mon, Oct 28, 2019 at 10:08 AM Zhu Zhu  wrote:
> > >>>
> >  Congratulations Becket!
> > 
> >  Thanks,
> >  Zhu Zhu
> > 
> >  Peter Huang  于2019年10月29日周二 上午1:01写道：
> > 
> > > Congratulations Becket Qin!
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> > > On Mon, Oct 28, 2019 at 9:19 AM Rong Rong 
> > wrote:
> > >
> > >> Congratulations Becket!!
> > >>
> > >> --
> > >> Rong
> > >>
> > >> On Mon, Oct 28, 2019, 7:53 AM Jark Wu  wrote:
> > >>
> > >>> Congratulations Becket!
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>> On Mon, 28 Oct 2019 at 20:26, Benchao Li 
> >  wrote:
> > >>>
> >  Congratulations Becket.
> > 
> >  Dian Fu  于2019年10月28日周一 下午7:22写道：
> > 
> > > Congrats, Becket.
> > >
> > >> 在 2019年10月28日，下午6:07，Fabian Hueske  写道：
> > >>
> > >> Hi everyone,
> > >>
> > >> I'm happy to announce that Becket Qin has joined the Flink
> PMC.
> > >> Let's congratulate and welcome Becket as a new member of the
> > > Flink
> > >>> PMC!
> > >>
> > >> Cheers,
> > >> Fabian
> > >
> > >
> > 
> >  --
> > 
> >  Benchao Li
> >  School of Electronics Engineering and Computer Science, Peking
> > >> University
> >  Tel:+86-15650713730
> >  Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > 
> > >>>
> > >>
> > >
> > 
> > >>>
> > >>>
> > >>> --
> > >>> Xuefu Zhang
> > >>>
> > >>> "In Honey We Trust!"
> > >>
> > >
> >
> >
>

1 2 3 >

1 - 100 of 247 matches

Mail list logo