Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Yang Wang Mon, 09 Mar 2020 00:31:52 -0700

Hi Becket,

Thanks for your suggestion. We will update the FLIP to add/enrich the
following parts.
* User cli option change, use "-R/--remote" to apply the cluster deploy mode
* Configuration change, how to specify remote user jars and dependencies
* The whole story about how "application mode" works, upload -> fetch ->
submit job
* The cluster lifecycle, when and how the Flink cluster is destroyed



Best,
Yang

Becket Qin <becket....@gmail.com> 于2020年3月9日周一 下午12:34写道：

> Thanks for the reply, tison and Yang,
>
> Regarding the public interface, is "-R/--remote" option the only change?
> Will the users also need to provide a remote location to upload and store
> the jars, and a list of jars as dependencies to be uploaded?
>
> It would be important that the public interface section in the FLIP
> includes all the user sensible changes including the CLI / configuration /
> metrics, etc. Can we update the FLIP to include the conclusion we have here
> in the ML?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Mar 9, 2020 at 11:59 AM Yang Wang <danrtsey...@gmail.com> wrote:
>
>> Hi Becket,
>>
>> Thanks for jumping out and sharing your concerns. I second tison's answer
>> and just
>> make some additions.
>>
>>
>> > job submission interface
>>
>> This FLIP will introduce an interface for running user `main()` on
>> cluster, named as
>> “ProgramDeployer”. However, it is not a public interface. It will be used
>> in `CliFrontend`
>> when the remote deploy option(-R/--remote-deploy) is specified. So the
>> only changes
>> on user side is about the cli option.
>>
>>
>> > How to fetch the jars?
>>
>> The “local path” and “dfs path“ could be supported to fetch the user jars
>> and dependencies.
>> Just like tison has said, we could ship the user jar and dependencies
>> from client side to
>> HDFS and use the entrypoint to fetch.
>>
>> Also we have some other practical ways to use the new “application mode“.
>> 1. Upload the user jars and dependencies to the DFS(e.g. HDFS, S3, Aliyun
>> OSS) manually
>> or some external deployer system. For K8s, the user jars and dependencies
>> could also be
>> built in the docker image.
>> 2. Specify the remote/local user jar and dependencies in `flink run`.
>> Usually this could also
>> be done by the external deployer system.
>> 3. When the `ClusterEntrypoint` is launched, it will fetch the jars and
>> files automatically. We
>> do not need any specific fetcher implementation. Since we could leverage
>> flink `FileSystem`
>> to do this.
>>
>>
>>
>>
>>
>> Best,
>> Yang
>>
>> tison <wander4...@gmail.com> 于2020年3月9日周一 上午11:34写道：
>>
>>> Hi Becket,
>>>
>>> Thanks for your attention on FLIP-85! I answered your question inline.
>>>
>>> 1. What exactly the job submission interface will look like after this
>>> FLIP? The FLIP template has a Public Interface section but was removed from
>>> this FLIP.
>>>
>>> As Yang mentioned in this thread above:
>>>
>>> From user perspective, only a `-R/-- remote-deploy` cli option is
>>> visible. They are not aware of the application mode.
>>>
>>> 2. How will the new ClusterEntrypoint fetch the jars from external
>>> storage? What external storage will be supported out of the box? Will this
>>> "jar fetcher" be pluggable? If so, how does the API look like and how will
>>> users specify the custom "jar fetcher"?
>>>
>>> It depends actually. Here are several points:
>>>
>>> i. Currently, shipping user files is handled by Flink, dependencies
>>> fetching can be handled by Flink.
>>> ii. Current, we only support local file system shipfiles. When in
>>> Application Mode, to support meaningful jar fetch we should support user to
>>> configure richer shipfiles schema at first.
>>> iii. Dependencies fetching varies from deployments. That is, on YARN,
>>> its convention is through HDFS; on Kubernetes, its convention is configured
>>> resource server and fetched by initContainer.
>>>
>>> Thus, in the First phase of Application Mode dependencies fetching is
>>> totally handled within Flink.
>>>
>>> 3. It sounds that in this FLIP, the "session cluster" running the
>>> application has the same lifecycle as the user application. How will the
>>> session cluster be teared down after the application finishes? Will the
>>> ClusterEntrypoint do that? Will there be an option of not tearing the
>>> cluster down?
>>>
>>> The precondition we tear down the cluster is *both*
>>>
>>> i. user main reached to its end
>>> ii. all jobs submitted(current, at most one) reached global terminate
>>> state
>>>
>>> For the "how", it is an implementation topic, but conceptually it is
>>> ClusterEntrypoint's responsibility.
>>>
>>> >Will there be an option of not tearing the cluster down?
>>>
>>> I think the answer is "No" because the cluster is designed to be bounded
>>> with an Application. User logic that communicates with the job is always in
>>> its `main`, and for history information we have history server.
>>>
>>> Best,
>>> tison.
>>>
>>>
>>> Becket Qin <becket....@gmail.com> 于2020年3月9日周一 上午8:12写道：
>>>
>>>> Hi Peter and Kostas,
>>>>
>>>> Thanks for creating this FLIP. Moving the JobGraph compilation to the
>>>> cluster makes a lot of sense to me. FLIP-40 had the exactly same idea, but
>>>> is currently dormant and can probably be superseded by this FLIP. After
>>>> reading the FLIP, I still have a few questions.
>>>>
>>>> 1. What exactly the job submission interface will look like after this
>>>> FLIP? The FLIP template has a Public Interface section but was removed from
>>>> this FLIP.
>>>> 2. How will the new ClusterEntrypoint fetch the jars from external
>>>> storage? What external storage will be supported out of the box? Will this
>>>> "jar fetcher" be pluggable? If so, how does the API look like and how will
>>>> users specify the custom "jar fetcher"?
>>>> 3. It sounds that in this FLIP, the "session cluster" running the
>>>> application has the same lifecycle as the user application. How will the
>>>> session cluster be teared down after the application finishes? Will the
>>>> ClusterEntrypoint do that? Will there be an option of not tearing the
>>>> cluster down?
>>>>
>>>> Maybe they have been discussed in the ML earlier, but I think they
>>>> should be part of the FLIP also.
>>>>
>>>> Thanks,
>>>>
>>>> Jiangjie (Becket) Qin
>>>>
>>>> On Thu, Mar 5, 2020 at 10:09 PM Kostas Kloudas <kklou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Also from my side +1  to start voting.
>>>>>
>>>>> Cheers,
>>>>> Kostas
>>>>>
>>>>> On Thu, Mar 5, 2020 at 7:45 AM tison <wander4...@gmail.com> wrote:
>>>>> >
>>>>> > +1 to star voting.
>>>>> >
>>>>> > Best,
>>>>> > tison.
>>>>> >
>>>>> >
>>>>> > Yang Wang <danrtsey...@gmail.com> 于2020年3月5日周四 下午2:29写道：
>>>>> >>
>>>>> >> Hi Peter,
>>>>> >> Really thanks for your response.
>>>>> >>
>>>>> >> Hi all @Kostas Kloudas @Zili Chen @Peter Huang @Rong Rong
>>>>> >> It seems that we have reached an agreement. The “application mode”
>>>>> is regarded as the enhanced “per-job”. It is
>>>>> >> orthogonal with “cluster deploy”. Currently, we bind the “per-job”
>>>>> to `run-user-main-on-client` and “application mode”
>>>>> >> to `run-user-main-on-cluster`.
>>>>> >>
>>>>> >> Do you have other concerns to moving FLIP-85 to voting?
>>>>> >>
>>>>> >>
>>>>> >> Best,
>>>>> >> Yang
>>>>> >>
>>>>> >> Peter Huang <huangzhenqiu0...@gmail.com> 于2020年3月5日周四 下午12:48写道：
>>>>> >>>
>>>>> >>> Hi Yang and Kostas,
>>>>> >>>
>>>>> >>> Thanks for the clarification. It makes more sense to me if the
>>>>> long term goal is to replace per job mode to application mode
>>>>> >>>  in the future (at the time that multiple execute can be
>>>>> supported). Before that, It will be better to keep the concept of
>>>>> >>>  application mode internally. As Yang suggested, User only need to
>>>>> use a `-R/-- remote-deploy` cli option to launch
>>>>> >>> a per job cluster with the main function executed in cluster
>>>>> entry-point.  +1 for the execution plan.
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> Best Regards
>>>>> >>> Peter Huang
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang <danrtsey...@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> Hi Peter,
>>>>> >>>>
>>>>> >>>> Having the application mode does not mean we will drop the
>>>>> cluster-deploy
>>>>> >>>> option. I just want to share some thoughts about “Application
>>>>> Mode”.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> 1. The application mode could cover the per-job sematic. Its
>>>>> lifecyle is bound
>>>>> >>>> to the user `main()`. And all the jobs in the user main will be
>>>>> executed in a same
>>>>> >>>> Flink cluster. In first phase of FLIP-85 implementation, running
>>>>> user main on the
>>>>> >>>> cluster side could be supported in application mode.
>>>>> >>>>
>>>>> >>>> 2. Maybe in the future, we also need to support multiple
>>>>> `execute()` on client side
>>>>> >>>> in a same Flink cluster. Then the per-job mode will evolve to
>>>>> application mode.
>>>>> >>>>
>>>>> >>>> 3. From user perspective, only a `-R/-- remote-deploy` cli option
>>>>> is visible. They
>>>>> >>>> are not aware of the application mode.
>>>>> >>>>
>>>>> >>>> 4. In the first phase, the application mode is working as
>>>>> “per-job”(only one job in
>>>>> >>>> the user main). We just leave more potential for the future.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I am not against with calling it “cluster deploy mode” if you all
>>>>> think it is clearer for users.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Best,
>>>>> >>>> Yang
>>>>> >>>>
>>>>> >>>> Kostas Kloudas <kklou...@gmail.com> 于2020年3月3日周二 下午6:49写道：
>>>>> >>>>>
>>>>> >>>>> Hi Peter,
>>>>> >>>>>
>>>>> >>>>> I understand your point. This is why I was also a bit torn about
>>>>> the
>>>>> >>>>> name and my proposal was a bit aligned with yours (something
>>>>> along the
>>>>> >>>>> lines of "cluster deploy" mode).
>>>>> >>>>>
>>>>> >>>>> But many of the other participants in the discussion suggested
>>>>> the
>>>>> >>>>> "Application Mode". I think that the reasoning is that now the
>>>>> user's
>>>>> >>>>> Application is more self-contained.
>>>>> >>>>> It will be submitted to the cluster and the user can just
>>>>> disconnect.
>>>>> >>>>> In addition, as discussed briefly in the doc, in the future
>>>>> there may
>>>>> >>>>> be better support for multi-execute applications which will
>>>>> bring us
>>>>> >>>>> one step closer to the true "Application Mode". But this is how I
>>>>> >>>>> interpreted their arguments, of course they can also express
>>>>> their
>>>>> >>>>> thoughts on the topic :)
>>>>> >>>>>
>>>>> >>>>> Cheers,
>>>>> >>>>> Kostas
>>>>> >>>>>
>>>>> >>>>> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang <
>>>>> huangzhenqiu0...@gmail.com> wrote:
>>>>> >>>>> >
>>>>> >>>>> > Hi Kostas,
>>>>> >>>>> >
>>>>> >>>>> > Thanks for updating the wiki. We have aligned with the
>>>>> implementations in the doc. But I feel it is still a little bit confusing
>>>>> of the naming from a user's perspective. It is well known that Flink
>>>>> support per job cluster and session cluster. The concept is in the layer 
>>>>> of
>>>>> how a job is managed within Flink. The method introduced util now is a 
>>>>> kind
>>>>> of mixing job and session cluster to promising the implementation
>>>>> complexity. We probably don't need to label it as Application Model as the
>>>>> same layer of per job cluster and session cluster. Conceptually, I think 
>>>>> it
>>>>> is still a cluster mode implementation for per job cluster.
>>>>> >>>>> >
>>>>> >>>>> > To minimize the confusion of users, I think it would be better
>>>>> just an option of per job cluster for each type of cluster manager. How do
>>>>> you think?
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> > Best Regards
>>>>> >>>>> > Peter Huang
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> >
>>>>> >>>>> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas <
>>>>> kklou...@gmail.com> wrote:
>>>>> >>>>> >>
>>>>> >>>>> >> Hi Yang,
>>>>> >>>>> >>
>>>>> >>>>> >> The difference between per-job and application mode is that,
>>>>> as you
>>>>> >>>>> >> described, in the per-job mode the main is executed on the
>>>>> client
>>>>> >>>>> >> while in the application mode, the main is executed on the
>>>>> cluster.
>>>>> >>>>> >> I do not think we have to offer "application mode" with
>>>>> running the
>>>>> >>>>> >> main on the client side as this is exactly what the per-job
>>>>> mode does
>>>>> >>>>> >> currently and, as you described also, it would be redundant.
>>>>> >>>>> >>
>>>>> >>>>> >> Sorry if this was not clear in the document.
>>>>> >>>>> >>
>>>>> >>>>> >> Cheers,
>>>>> >>>>> >> Kostas
>>>>> >>>>> >>
>>>>> >>>>> >> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang <
>>>>> danrtsey...@gmail.com> wrote:
>>>>> >>>>> >> >
>>>>> >>>>> >> > Hi Kostas,
>>>>> >>>>> >> >
>>>>> >>>>> >> > Thanks a lot for your conclusion and updating the FLIP-85
>>>>> WIKI. Currently, i have no more
>>>>> >>>>> >> > questions about motivation, approach, fault tolerance and
>>>>> the first phase implementation.
>>>>> >>>>> >> >
>>>>> >>>>> >> > I think the new title "Flink Application Mode" makes a lot
>>>>> senses to me. Especially for the
>>>>> >>>>> >> > containerized environment, the cluster deploy option will
>>>>> be very useful.
>>>>> >>>>> >> >
>>>>> >>>>> >> > Just one concern, how do we introduce this new application
>>>>> mode to our users?
>>>>> >>>>> >> > Each user program(i.e. `main()`) is an application.
>>>>> Currently, we intend to only support one
>>>>> >>>>> >> > `execute()`. So what's the difference between per-job and
>>>>> application mode?
>>>>> >>>>> >> >
>>>>> >>>>> >> > For per-job, user `main()` is always executed on client
>>>>> side. And For application mode, user
>>>>> >>>>> >> > `main()` could be executed on client or master
>>>>> side(configured via cli option).
>>>>> >>>>> >> > Right? We need to have a clear concept. Otherwise, the
>>>>> users will be more and more confusing.
>>>>> >>>>> >> >
>>>>> >>>>> >> >
>>>>> >>>>> >> > Best,
>>>>> >>>>> >> > Yang
>>>>> >>>>> >> >
>>>>> >>>>> >> > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道：
>>>>> >>>>> >> >>
>>>>> >>>>> >> >> Hi all,
>>>>> >>>>> >> >>
>>>>> >>>>> >> >> I update
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode
>>>>> >>>>> >> >> based on the discussion we had here:
>>>>> >>>>> >> >>
>>>>> >>>>> >> >>
>>>>> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit#
>>>>> >>>>> >> >>
>>>>> >>>>> >> >> Please let me know what you think and please keep the
>>>>> discussion in the ML :)
>>>>> >>>>> >> >>
>>>>> >>>>> >> >> Thanks for starting the discussion and I hope that soon we
>>>>> will be
>>>>> >>>>> >> >> able to vote on the FLIP.
>>>>> >>>>> >> >>
>>>>> >>>>> >> >> Cheers,
>>>>> >>>>> >> >> Kostas
>>>>> >>>>> >> >>
>>>>> >>>>> >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang <
>>>>> danrtsey...@gmail.com> wrote:
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > Hi all,
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > Thanks a lot for the feedback from @Kostas Kloudas. Your
>>>>> all concerns are
>>>>> >>>>> >> >> > on point. The FLIP-85 is mainly
>>>>> >>>>> >> >> > focused on supporting cluster mode for per-job. Since it
>>>>> is more urgent and
>>>>> >>>>> >> >> > have much more use
>>>>> >>>>> >> >> > cases both in Yarn and Kubernetes deployment. For
>>>>> session cluster, we could
>>>>> >>>>> >> >> > have more discussion
>>>>> >>>>> >> >> > in a new thread later.
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > #1, How to download the user jars and dependencies for
>>>>> per-job in cluster
>>>>> >>>>> >> >> > mode?
>>>>> >>>>> >> >> > For Yarn, we could register the user jars and
>>>>> dependencies as
>>>>> >>>>> >> >> > LocalResource. They will be distributed
>>>>> >>>>> >> >> > by Yarn. And once the JobManager and TaskManager
>>>>> launched, the jars are
>>>>> >>>>> >> >> > already exists.
>>>>> >>>>> >> >> > For Standalone per-job and K8s, we expect that the user
>>>>> jars
>>>>> >>>>> >> >> > and dependencies are built into the image.
>>>>> >>>>> >> >> > Or the InitContainer could be used for downloading. It
>>>>> is natively
>>>>> >>>>> >> >> > distributed and we will not have bottleneck.
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > #2, Job graph recovery
>>>>> >>>>> >> >> > We could have an optimization to store job graph on the
>>>>> DFS. However, i
>>>>> >>>>> >> >> > suggest building a new jobgraph
>>>>> >>>>> >> >> > from the configuration is the default option. Since we
>>>>> will not always have
>>>>> >>>>> >> >> > a DFS store when deploying a
>>>>> >>>>> >> >> > Flink per-job cluster. Of course, we assume that using
>>>>> the same
>>>>> >>>>> >> >> > configuration(e.g. job_id, user_jar, main_class,
>>>>> >>>>> >> >> > main_args, parallelism, savepoint_settings, etc.) will
>>>>> get a same job
>>>>> >>>>> >> >> > graph. I think the standalone per-job
>>>>> >>>>> >> >> > already has the similar behavior.
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > #3, What happens with jobs that have multiple execute
>>>>> calls?
>>>>> >>>>> >> >> > Currently, it is really a problem. Even we use a local
>>>>> client on Flink
>>>>> >>>>> >> >> > master side, it will have different behavior with
>>>>> >>>>> >> >> > client mode. For client mode, if we execute multiple
>>>>> times, then we will
>>>>> >>>>> >> >> > deploy multiple Flink clusters for each execute.
>>>>> >>>>> >> >> > I am not pretty sure whether it is reasonable. However,
>>>>> i still think using
>>>>> >>>>> >> >> > the local client is a good choice. We could
>>>>> >>>>> >> >> > continue the discussion in a new thread. @Zili Chen <
>>>>> wander4...@gmail.com> Do
>>>>> >>>>> >> >> > you want to drive this?
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > Best,
>>>>> >>>>> >> >> > Yang
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四
>>>>> 上午1:55写道：
>>>>> >>>>> >> >> >
>>>>> >>>>> >> >> > > Hi Kostas,
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > Thanks for this feedback. I can't agree more about the
>>>>> opinion. The
>>>>> >>>>> >> >> > > cluster mode should be added
>>>>> >>>>> >> >> > > first in per job cluster.
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > 1) For job cluster implementation
>>>>> >>>>> >> >> > > 1. Job graph recovery from configuration or store as
>>>>> static job graph as
>>>>> >>>>> >> >> > > session cluster. I think the static one will be better
>>>>> for less recovery
>>>>> >>>>> >> >> > > time.
>>>>> >>>>> >> >> > > Let me update the doc for details.
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > 2. For job execute multiple times, I think @Zili Chen
>>>>> >>>>> >> >> > > <wander4...@gmail.com> has proposed the local client
>>>>> solution that can
>>>>> >>>>> >> >> > > the run program actually in the cluster entry point.
>>>>> We can put the
>>>>> >>>>> >> >> > > implementation in the second stage,
>>>>> >>>>> >> >> > > or even a new FLIP for further discussion.
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > 2) For session cluster implementation
>>>>> >>>>> >> >> > > We can disable the cluster mode for the session
>>>>> cluster in the first
>>>>> >>>>> >> >> > > stage. I agree the jar downloading will be a painful
>>>>> thing.
>>>>> >>>>> >> >> > > We can consider about PoC and performance evaluation
>>>>> first. If the end to
>>>>> >>>>> >> >> > > end experience is good enough, then we can consider
>>>>> >>>>> >> >> > > proceeding with the solution.
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > Looking forward to more opinions from @Yang Wang <
>>>>> danrtsey...@gmail.com> @Zili
>>>>> >>>>> >> >> > > Chen <wander4...@gmail.com> @Dian Fu <
>>>>> dian0511...@gmail.com>.
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > Best Regards
>>>>> >>>>> >> >> > > Peter Huang
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas <
>>>>> kklou...@gmail.com> wrote:
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >> > >> Hi all,
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> I am writing here as the discussion on the Google Doc
>>>>> seems to be a
>>>>> >>>>> >> >> > >> bit difficult to follow.
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> I think that in order to be able to make progress, it
>>>>> would be helpful
>>>>> >>>>> >> >> > >> to focus on per-job mode for now.
>>>>> >>>>> >> >> > >> The reason is that:
>>>>> >>>>> >> >> > >>  1) making the (unique) JobSubmitHandler responsible
>>>>> for creating the
>>>>> >>>>> >> >> > >> jobgraphs,
>>>>> >>>>> >> >> > >>   which includes downloading dependencies, is not an
>>>>> optimal solution
>>>>> >>>>> >> >> > >>  2) even if we put the responsibility on the
>>>>> JobMaster, currently each
>>>>> >>>>> >> >> > >> job has its own
>>>>> >>>>> >> >> > >>   JobMaster but they all run on the same process, so
>>>>> we have again a
>>>>> >>>>> >> >> > >> single entity.
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> Of course after this is done, and if we feel
>>>>> comfortable with the
>>>>> >>>>> >> >> > >> solution, then we can go to the session mode.
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> A second comment has to do with fault-tolerance in
>>>>> the per-job,
>>>>> >>>>> >> >> > >> cluster-deploy mode.
>>>>> >>>>> >> >> > >> In the document, it is suggested that upon recovery,
>>>>> the JobMaster of
>>>>> >>>>> >> >> > >> each job re-creates the JobGraph.
>>>>> >>>>> >> >> > >> I am just wondering if it is better to create and
>>>>> store the jobGraph
>>>>> >>>>> >> >> > >> upon submission and only fetch it
>>>>> >>>>> >> >> > >> upon recovery so that we have a static jobGraph.
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> Finally, I have a question which is what happens with
>>>>> jobs that have
>>>>> >>>>> >> >> > >> multiple execute calls?
>>>>> >>>>> >> >> > >> The semantics seem to change compared to the current
>>>>> behaviour, right?
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> Cheers,
>>>>> >>>>> >> >> > >> Kostas
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison <
>>>>> wander4...@gmail.com> wrote:
>>>>> >>>>> >> >> > >> >
>>>>> >>>>> >> >> > >> > not always, Yang Wang is also not yet a committer
>>>>> but he can join the
>>>>> >>>>> >> >> > >> > channel. I cannot find the id by clicking “Add new
>>>>> member in channel” so
>>>>> >>>>> >> >> > >> > come to you and ask for try out the link. Possibly
>>>>> I will find other
>>>>> >>>>> >> >> > >> ways
>>>>> >>>>> >> >> > >> > but the original purpose is that the slack channel
>>>>> is a public area we
>>>>> >>>>> >> >> > >> > discuss about developing...
>>>>> >>>>> >> >> > >> > Best,
>>>>> >>>>> >> >> > >> > tison.
>>>>> >>>>> >> >> > >> >
>>>>> >>>>> >> >> > >> >
>>>>> >>>>> >> >> > >> > Peter Huang <huangzhenqiu0...@gmail.com>
>>>>> 于2020年1月9日周四 上午2:44写道：
>>>>> >>>>> >> >> > >> >
>>>>> >>>>> >> >> > >> > > Hi Tison,
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >> > > I am not the committer of Flink yet. I think I
>>>>> can't join it also.
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >> > > Best Regards
>>>>> >>>>> >> >> > >> > > Peter Huang
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison <
>>>>> wander4...@gmail.com> wrote:
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >> > > > Hi Peter,
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > > > Could you try out this link?
>>>>> >>>>> >> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > > > Best,
>>>>> >>>>> >> >> > >> > > > tison.
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com>
>>>>> 于2020年1月9日周四 上午1:22写道：
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > > > > Hi Tison,
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > > > I can't join the group with shared link.
>>>>> Would you please add me
>>>>> >>>>> >> >> > >> into
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > group? My slack account is huangzhenqiu0825.
>>>>> >>>>> >> >> > >> > > > > Thank you in advance.
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > > > Best Regards
>>>>> >>>>> >> >> > >> > > > > Peter Huang
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison <
>>>>> wander4...@gmail.com>
>>>>> >>>>> >> >> > >> wrote:
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > > > > Hi Peter,
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > > As described above, this effort should get
>>>>> attention from people
>>>>> >>>>> >> >> > >> > > > > developing
>>>>> >>>>> >> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I
>>>>> recommend you to join
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > public
>>>>> >>>>> >> >> > >> > > > > > slack channel[1] for Flink Client API
>>>>> Enhancement and you can
>>>>> >>>>> >> >> > >> try to
>>>>> >>>>> >> >> > >> > > > > share
>>>>> >>>>> >> >> > >> > > > > > you detailed thoughts there. It possibly
>>>>> gets more concrete
>>>>> >>>>> >> >> > >> > > attentions.
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > > Best,
>>>>> >>>>> >> >> > >> > > > > > tison.
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > > [1]
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >>
>>>>> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com>
>>>>> 于2020年1月7日周二 上午5:09写道：
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > > > Dear All,
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > Happy new year! According to existing
>>>>> feedback from the
>>>>> >>>>> >> >> > >> community,
>>>>> >>>>> >> >> > >> > > we
>>>>> >>>>> >> >> > >> > > > > > > revised the doc with the consideration of
>>>>> session cluster
>>>>> >>>>> >> >> > >> support,
>>>>> >>>>> >> >> > >> > > > and
>>>>> >>>>> >> >> > >> > > > > > > concrete interface changes needed and
>>>>> execution plan. Please
>>>>> >>>>> >> >> > >> take
>>>>> >>>>> >> >> > >> > > one
>>>>> >>>>> >> >> > >> > > > > > more
>>>>> >>>>> >> >> > >> > > > > > > round of review at your most convenient
>>>>> time.
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >>
>>>>> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit#
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > Best Regards
>>>>> >>>>> >> >> > >> > > > > > > Peter Huang
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter
>>>>> Huang <
>>>>> >>>>> >> >> > >> > > > > huangzhenqiu0...@gmail.com>
>>>>> >>>>> >> >> > >> > > > > > > wrote:
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > Hi Dian,
>>>>> >>>>> >> >> > >> > > > > > > > Thanks for giving us valuable feedbacks.
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > 1) It's better to have a whole design
>>>>> for this feature
>>>>> >>>>> >> >> > >> > > > > > > > For the suggestion of enabling the
>>>>> cluster mode also session
>>>>> >>>>> >> >> > >> > > > > cluster, I
>>>>> >>>>> >> >> > >> > > > > > > > think Flink already supported it.
>>>>> WebSubmissionExtension
>>>>> >>>>> >> >> > >> already
>>>>> >>>>> >> >> > >> > > > > allows
>>>>> >>>>> >> >> > >> > > > > > > > users to start a job with the specified
>>>>> jar by using web UI.
>>>>> >>>>> >> >> > >> > > > > > > > But we need to enable the feature from
>>>>> CLI for both local
>>>>> >>>>> >> >> > >> jar,
>>>>> >>>>> >> >> > >> > > > remote
>>>>> >>>>> >> >> > >> > > > > > > jar.
>>>>> >>>>> >> >> > >> > > > > > > > I will align with Yang Wang first about
>>>>> the details and
>>>>> >>>>> >> >> > >> update
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > > design
>>>>> >>>>> >> >> > >> > > > > > > > doc.
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > 2) It's better to consider the
>>>>> convenience for users, such
>>>>> >>>>> >> >> > >> as
>>>>> >>>>> >> >> > >> > > > > debugging
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > I am wondering whether we can store the
>>>>> exception in
>>>>> >>>>> >> >> > >> jobgragh
>>>>> >>>>> >> >> > >> > > > > > > > generation in application master. As no
>>>>> streaming graph can
>>>>> >>>>> >> >> > >> be
>>>>> >>>>> >> >> > >> > > > > > scheduled
>>>>> >>>>> >> >> > >> > > > > > > in
>>>>> >>>>> >> >> > >> > > > > > > > this case, there will be no more TM
>>>>> will be requested from
>>>>> >>>>> >> >> > >> > > FlinkRM.
>>>>> >>>>> >> >> > >> > > > > > > > If the AM is still running, users can
>>>>> still query it from
>>>>> >>>>> >> >> > >> CLI. As
>>>>> >>>>> >> >> > >> > > > it
>>>>> >>>>> >> >> > >> > > > > > > > requires more change, we can get some
>>>>> feedback from <
>>>>> >>>>> >> >> > >> > > > > > aljos...@apache.org
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > and @zjf...@gmail.com <zjf...@gmail.com
>>>>> >.
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > 3) It's better to consider the impact
>>>>> to the stability of
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > cluster
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > I agree with Yang Wang's opinion.
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > Best Regards
>>>>> >>>>> >> >> > >> > > > > > > > Peter Huang
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu
>>>>> <
>>>>> >>>>> >> >> > >> dian0511...@gmail.com>
>>>>> >>>>> >> >> > >> > > > > wrote:
>>>>> >>>>> >> >> > >> > > > > > > >
>>>>> >>>>> >> >> > >> > > > > > > >> Hi all,
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> Sorry to jump into this discussion.
>>>>> Thanks everyone for the
>>>>> >>>>> >> >> > >> > > > > > discussion.
>>>>> >>>>> >> >> > >> > > > > > > >> I'm very interested in this topic
>>>>> although I'm not an
>>>>> >>>>> >> >> > >> expert in
>>>>> >>>>> >> >> > >> > > > this
>>>>> >>>>> >> >> > >> > > > > > > part.
>>>>> >>>>> >> >> > >> > > > > > > >> So I'm glad to share my thoughts as
>>>>> following:
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> 1) It's better to have a whole design
>>>>> for this feature
>>>>> >>>>> >> >> > >> > > > > > > >> As we know, there are two deployment
>>>>> modes: per-job mode
>>>>> >>>>> >> >> > >> and
>>>>> >>>>> >> >> > >> > > > session
>>>>> >>>>> >> >> > >> > > > > > > >> mode. I'm wondering which mode really
>>>>> needs this feature.
>>>>> >>>>> >> >> > >> As the
>>>>> >>>>> >> >> > >> > > > > > design
>>>>> >>>>> >> >> > >> > > > > > > doc
>>>>> >>>>> >> >> > >> > > > > > > >> mentioned, per-job mode is more used
>>>>> for streaming jobs and
>>>>> >>>>> >> >> > >> > > > session
>>>>> >>>>> >> >> > >> > > > > > > mode is
>>>>> >>>>> >> >> > >> > > > > > > >> usually used for batch jobs(Of course,
>>>>> the job types and
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > > > deployment
>>>>> >>>>> >> >> > >> > > > > > > >> modes are orthogonal). Usually
>>>>> streaming job is only
>>>>> >>>>> >> >> > >> needed to
>>>>> >>>>> >> >> > >> > > be
>>>>> >>>>> >> >> > >> > > > > > > submitted
>>>>> >>>>> >> >> > >> > > > > > > >> once and it will run for days or
>>>>> weeks, while batch jobs
>>>>> >>>>> >> >> > >> will be
>>>>> >>>>> >> >> > >> > > > > > > submitted
>>>>> >>>>> >> >> > >> > > > > > > >> more frequently compared with
>>>>> streaming jobs. This means
>>>>> >>>>> >> >> > >> that
>>>>> >>>>> >> >> > >> > > > maybe
>>>>> >>>>> >> >> > >> > > > > > > session
>>>>> >>>>> >> >> > >> > > > > > > >> mode also needs this feature. However,
>>>>> if we support this
>>>>> >>>>> >> >> > >> > > feature
>>>>> >>>>> >> >> > >> > > > in
>>>>> >>>>> >> >> > >> > > > > > > >> session mode, the application master
>>>>> will become the new
>>>>> >>>>> >> >> > >> > > > centralized
>>>>> >>>>> >> >> > >> > > > > > > >> service(which should be solved). So in
>>>>> this case, it's
>>>>> >>>>> >> >> > >> better to
>>>>> >>>>> >> >> > >> > > > > have
>>>>> >>>>> >> >> > >> > > > > > a
>>>>> >>>>> >> >> > >> > > > > > > >> complete design for both per-job mode
>>>>> and session mode.
>>>>> >>>>> >> >> > >> > > > Furthermore,
>>>>> >>>>> >> >> > >> > > > > > > even
>>>>> >>>>> >> >> > >> > > > > > > >> if we can do it phase by phase, we
>>>>> need to have a whole
>>>>> >>>>> >> >> > >> picture
>>>>> >>>>> >> >> > >> > > of
>>>>> >>>>> >> >> > >> > > > > how
>>>>> >>>>> >> >> > >> > > > > > > it
>>>>> >>>>> >> >> > >> > > > > > > >> works in both per-job mode and session
>>>>> mode.
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> 2) It's better to consider the
>>>>> convenience for users, such
>>>>> >>>>> >> >> > >> as
>>>>> >>>>> >> >> > >> > > > > > debugging
>>>>> >>>>> >> >> > >> > > > > > > >> After we finish this feature, the job
>>>>> graph will be
>>>>> >>>>> >> >> > >> compiled in
>>>>> >>>>> >> >> > >> > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> application master, which means that
>>>>> users cannot easily
>>>>> >>>>> >> >> > >> get the
>>>>> >>>>> >> >> > >> > > > > > > exception
>>>>> >>>>> >> >> > >> > > > > > > >> message synchorousely in the job
>>>>> client if there are
>>>>> >>>>> >> >> > >> problems
>>>>> >>>>> >> >> > >> > > > during
>>>>> >>>>> >> >> > >> > > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> job graph compiling (especially for
>>>>> platform users), such
>>>>> >>>>> >> >> > >> as the
>>>>> >>>>> >> >> > >> > > > > > > resource
>>>>> >>>>> >> >> > >> > > > > > > >> path is incorrect, the user program
>>>>> itself has some
>>>>> >>>>> >> >> > >> problems,
>>>>> >>>>> >> >> > >> > > etc.
>>>>> >>>>> >> >> > >> > > > > > What
>>>>> >>>>> >> >> > >> > > > > > > I'm
>>>>> >>>>> >> >> > >> > > > > > > >> thinking is that maybe we should throw
>>>>> the exceptions as
>>>>> >>>>> >> >> > >> early
>>>>> >>>>> >> >> > >> > > as
>>>>> >>>>> >> >> > >> > > > > > > possible
>>>>> >>>>> >> >> > >> > > > > > > >> (during job submission stage).
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> 3) It's better to consider the impact
>>>>> to the stability of
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > > cluster
>>>>> >>>>> >> >> > >> > > > > > > >> If we perform the compiling in the
>>>>> application master, we
>>>>> >>>>> >> >> > >> should
>>>>> >>>>> >> >> > >> > > > > > > consider
>>>>> >>>>> >> >> > >> > > > > > > >> the impact of the compiling errors.
>>>>> Although YARN could
>>>>> >>>>> >> >> > >> resume
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > > > >> application master in case of
>>>>> failures, but in some case
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > > compiling
>>>>> >>>>> >> >> > >> > > > > > > >> failure may be a waste of cluster
>>>>> resource and may impact
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > > > stability
>>>>> >>>>> >> >> > >> > > > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> cluster and the other jobs in the
>>>>> cluster, such as the
>>>>> >>>>> >> >> > >> resource
>>>>> >>>>> >> >> > >> > > > path
>>>>> >>>>> >> >> > >> > > > > > is
>>>>> >>>>> >> >> > >> > > > > > > >> incorrect, the user program itself has
>>>>> some problems(in
>>>>> >>>>> >> >> > >> this
>>>>> >>>>> >> >> > >> > > case,
>>>>> >>>>> >> >> > >> > > > > job
>>>>> >>>>> >> >> > >> > > > > > > >> failover cannot solve this kind of
>>>>> problems) etc. In the
>>>>> >>>>> >> >> > >> current
>>>>> >>>>> >> >> > >> > > > > > > >> implemention, the compiling errors are
>>>>> handled in the
>>>>> >>>>> >> >> > >> client
>>>>> >>>>> >> >> > >> > > side
>>>>> >>>>> >> >> > >> > > > > and
>>>>> >>>>> >> >> > >> > > > > > > there
>>>>> >>>>> >> >> > >> > > > > > > >> is no impact to the cluster at all.
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed
>>>>> in the design doc
>>>>> >>>>> >> >> > >> that
>>>>> >>>>> >> >> > >> > > only
>>>>> >>>>> >> >> > >> > > > > > > per-job
>>>>> >>>>> >> >> > >> > > > > > > >> mode will be supported. However, I
>>>>> think it's better to
>>>>> >>>>> >> >> > >> also
>>>>> >>>>> >> >> > >> > > > > consider
>>>>> >>>>> >> >> > >> > > > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> session mode in the design doc.
>>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 2) and 3), I have not
>>>>> seen related sections
>>>>> >>>>> >> >> > >> in the
>>>>> >>>>> >> >> > >> > > > > design
>>>>> >>>>> >> >> > >> > > > > > > >> doc. It will be good if we can cover
>>>>> them in the design
>>>>> >>>>> >> >> > >> doc.
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> Feel free to correct me If there is
>>>>> anything I
>>>>> >>>>> >> >> > >> misunderstand.
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> Regards,
>>>>> >>>>> >> >> > >> > > > > > > >> Dian
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >> > 在 2019年12月27日，上午3:13，Peter Huang <
>>>>> >>>>> >> >> > >> huangzhenqiu0...@gmail.com>
>>>>> >>>>> >> >> > >> > > > 写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > Hi Yang,
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > I can't agree more. The effort
>>>>> definitely needs to align
>>>>> >>>>> >> >> > >> with
>>>>> >>>>> >> >> > >> > > > the
>>>>> >>>>> >> >> > >> > > > > > > final
>>>>> >>>>> >> >> > >> > > > > > > >> > goal of FLIP-73.
>>>>> >>>>> >> >> > >> > > > > > > >> > I am thinking about whether we can
>>>>> achieve the goal with
>>>>> >>>>> >> >> > >> two
>>>>> >>>>> >> >> > >> > > > > phases.
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > 1) Phase I
>>>>> >>>>> >> >> > >> > > > > > > >> > As the CLiFrontend will not be
>>>>> depreciated soon. We can
>>>>> >>>>> >> >> > >> still
>>>>> >>>>> >> >> > >> > > > use
>>>>> >>>>> >> >> > >> > > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> > deployMode flag there,
>>>>> >>>>> >> >> > >> > > > > > > >> > pass the program info through Flink
>>>>> configuration,  use
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > > > > > >> > ClassPathJobGraphRetriever
>>>>> >>>>> >> >> > >> > > > > > > >> > to generate the job graph in
>>>>> ClusterEntrypoints of yarn
>>>>> >>>>> >> >> > >> and
>>>>> >>>>> >> >> > >> > > > > > > Kubernetes.
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > 2) Phase II
>>>>> >>>>> >> >> > >> > > > > > > >> > In  AbstractJobClusterExecutor, the
>>>>> job graph is
>>>>> >>>>> >> >> > >> generated in
>>>>> >>>>> >> >> > >> > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> execute
>>>>> >>>>> >> >> > >> > > > > > > >> > function. We can still
>>>>> >>>>> >> >> > >> > > > > > > >> > use the deployMode in it. With
>>>>> deployMode = cluster, the
>>>>> >>>>> >> >> > >> > > execute
>>>>> >>>>> >> >> > >> > > > > > > >> function
>>>>> >>>>> >> >> > >> > > > > > > >> > only starts the cluster.
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > When
>>>>> {Yarn/Kuberneates}PerJobClusterEntrypoint starts,
>>>>> >>>>> >> >> > >> It will
>>>>> >>>>> >> >> > >> > > > > start
>>>>> >>>>> >> >> > >> > > > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> > dispatch first, then we can use
>>>>> >>>>> >> >> > >> > > > > > > >> > a ClusterEnvironment similar to
>>>>> ContextEnvironment to
>>>>> >>>>> >> >> > >> submit
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > job
>>>>> >>>>> >> >> > >> > > > > > > >> with
>>>>> >>>>> >> >> > >> > > > > > > >> > jobName the local
>>>>> >>>>> >> >> > >> > > > > > > >> > dispatcher. For the details, we need
>>>>> more investigation.
>>>>> >>>>> >> >> > >> Let's
>>>>> >>>>> >> >> > >> > > > > wait
>>>>> >>>>> >> >> > >> > > > > > > >> > for @Aljoscha
>>>>> >>>>> >> >> > >> > > > > > > >> > Krettek <aljos...@apache.org> @Till
>>>>> Rohrmann <
>>>>> >>>>> >> >> > >> > > > > trohrm...@apache.org
>>>>> >>>>> >> >> > >> > > > > > >'s
>>>>> >>>>> >> >> > >> > > > > > > >> > feedback after the holiday season.
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas
>>>>> and Happy New
>>>>> >>>>> >> >> > >> Year!!!
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > Best Regards
>>>>> >>>>> >> >> > >> > > > > > > >> > Peter Huang
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang
>>>>> Wang <
>>>>> >>>>> >> >> > >> > > > danrtsey...@gmail.com>
>>>>> >>>>> >> >> > >> > > > > > > >> wrote:
>>>>> >>>>> >> >> > >> > > > > > > >> >
>>>>> >>>>> >> >> > >> > > > > > > >> >> Hi Peter,
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >> I think we need to reconsider
>>>>> tison's suggestion
>>>>> >>>>> >> >> > >> seriously.
>>>>> >>>>> >> >> > >> > > > After
>>>>> >>>>> >> >> > >> > > > > > > >> FLIP-73,
>>>>> >>>>> >> >> > >> > > > > > > >> >> the deployJobCluster has
>>>>> >>>>> >> >> > >> > > > > > > >> >> beenmoved into
>>>>> `JobClusterExecutor#execute`. It should
>>>>> >>>>> >> >> > >> not be
>>>>> >>>>> >> >> > >> > > > > > > perceived
>>>>> >>>>> >> >> > >> > > > > > > >> >> for `CliFrontend`. That
>>>>> >>>>> >> >> > >> > > > > > > >> >> means the user program will
>>>>> *ALWAYS* be executed on
>>>>> >>>>> >> >> > >> client
>>>>> >>>>> >> >> > >> > > > side.
>>>>> >>>>> >> >> > >> > > > > > This
>>>>> >>>>> >> >> > >> > > > > > > >> is
>>>>> >>>>> >> >> > >> > > > > > > >> >> the by design behavior.
>>>>> >>>>> >> >> > >> > > > > > > >> >> So, we could not just add
>>>>> `if(client mode) .. else
>>>>> >>>>> >> >> > >> if(cluster
>>>>> >>>>> >> >> > >> > > > > mode)
>>>>> >>>>> >> >> > >> > > > > > > >> ...`
>>>>> >>>>> >> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass
>>>>> >>>>> >> >> > >> > > > > > > >> >> the executor. We need to find a
>>>>> clean way to decouple
>>>>> >>>>> >> >> > >> > > executing
>>>>> >>>>> >> >> > >> > > > > > user
>>>>> >>>>> >> >> > >> > > > > > > >> >> program and deploying per-job
>>>>> >>>>> >> >> > >> > > > > > > >> >> cluster. Based on this, we could
>>>>> support to execute user
>>>>> >>>>> >> >> > >> > > > program
>>>>> >>>>> >> >> > >> > > > > on
>>>>> >>>>> >> >> > >> > > > > > > >> client
>>>>> >>>>> >> >> > >> > > > > > > >> >> or master side.
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give
>>>>> some good
>>>>> >>>>> >> >> > >> suggestions.
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >> Best,
>>>>> >>>>> >> >> > >> > > > > > > >> >> Yang
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >> Peter Huang <
>>>>> huangzhenqiu0...@gmail.com> 于2019年12月25日周三
>>>>> >>>>> >> >> > >> > > > > 上午4:03写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >> >>> Hi Jingjing,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>> The improvement proposed is a
>>>>> deployment option for
>>>>> >>>>> >> >> > >> CLI. For
>>>>> >>>>> >> >> > >> > > > SQL
>>>>> >>>>> >> >> > >> > > > > > > based
>>>>> >>>>> >> >> > >> > > > > > > >> >>> Flink application, It is more
>>>>> convenient to use the
>>>>> >>>>> >> >> > >> existing
>>>>> >>>>> >> >> > >> > > > > model
>>>>> >>>>> >> >> > >> > > > > > > in
>>>>> >>>>> >> >> > >> > > > > > > >> >>> SqlClient in which
>>>>> >>>>> >> >> > >> > > > > > > >> >>> the job graph is generated within
>>>>> SqlClient. After
>>>>> >>>>> >> >> > >> adding
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > > > delayed
>>>>> >>>>> >> >> > >> > > > > > > >> job
>>>>> >>>>> >> >> > >> > > > > > > >> >>> graph generation, I think there is
>>>>> no change is needed
>>>>> >>>>> >> >> > >> for
>>>>> >>>>> >> >> > >> > > > your
>>>>> >>>>> >> >> > >> > > > > > > side.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>> Best Regards
>>>>> >>>>> >> >> > >> > > > > > > >> >>> Peter Huang
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM
>>>>> jingjing bai <
>>>>> >>>>> >> >> > >> > > > > > > >> baijingjing7...@gmail.com>
>>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote:
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> hi peter:
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>    we had extension SqlClent to
>>>>> support sql job
>>>>> >>>>> >> >> > >> submit in
>>>>> >>>>> >> >> > >> > > web
>>>>> >>>>> >> >> > >> > > > > > base
>>>>> >>>>> >> >> > >> > > > > > > on
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> flink 1.9.   we support submit to
>>>>> yarn on per job
>>>>> >>>>> >> >> > >> mode too.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>    in this case, the job graph
>>>>> generated  on client
>>>>> >>>>> >> >> > >> side
>>>>> >>>>> >> >> > >> > > .  I
>>>>> >>>>> >> >> > >> > > > > > think
>>>>> >>>>> >> >> > >> > > > > > > >> >>> this
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> discuss Mainly to improve api
>>>>> programme.  but in my
>>>>> >>>>> >> >> > >> case ,
>>>>> >>>>> >> >> > >> > > > > there
>>>>> >>>>> >> >> > >> > > > > > is
>>>>> >>>>> >> >> > >> > > > > > > >> no
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> jar to upload but only a sql
>>>>> string .
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>    do u had more suggestion to
>>>>> improve for sql mode
>>>>> >>>>> >> >> > >> or it
>>>>> >>>>> >> >> > >> > > is
>>>>> >>>>> >> >> > >> > > > > > only a
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> switch for api programme？
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> best
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> bai jj
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com>
>>>>> 于2019年12月18日周三
>>>>> >>>>> >> >> > >> 下午7:21写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> I just want to revive this
>>>>> discussion.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Recently, i am thinking about
>>>>> how to natively run
>>>>> >>>>> >> >> > >> flink
>>>>> >>>>> >> >> > >> > > > > per-job
>>>>> >>>>> >> >> > >> > > > > > > >> >>> cluster on
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Kubernetes.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes
>>>>> is very different
>>>>> >>>>> >> >> > >> from on
>>>>> >>>>> >> >> > >> > > > Yarn.
>>>>> >>>>> >> >> > >> > > > > > And
>>>>> >>>>> >> >> > >> > > > > > > >> we
>>>>> >>>>> >> >> > >> > > > > > > >> >>> will
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> have
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> the same deployment requirements
>>>>> to the client and
>>>>> >>>>> >> >> > >> entry
>>>>> >>>>> >> >> > >> > > > > point.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 1. Flink client not always need
>>>>> a local jar to start
>>>>> >>>>> >> >> > >> a
>>>>> >>>>> >> >> > >> > > Flink
>>>>> >>>>> >> >> > >> > > > > > > per-job
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster. We could
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> support multiple schemas. For
>>>>> example,
>>>>> >>>>> >> >> > >> > > > file:///path/of/my.jar
>>>>> >>>>> >> >> > >> > > > > > > means
>>>>> >>>>> >> >> > >> > > > > > > >> a
>>>>> >>>>> >> >> > >> > > > > > > >> >>> jar
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> located
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at client side,
>>>>> >>>>> >> >> > >> hdfs://myhdfs/user/myname/flink/my.jar
>>>>> >>>>> >> >> > >> > > > means a
>>>>> >>>>> >> >> > >> > > > > > jar
>>>>> >>>>> >> >> > >> > > > > > > >> >>> located
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> remote hdfs,
>>>>> local:///path/in/image/my.jar means a
>>>>> >>>>> >> >> > >> jar
>>>>> >>>>> >> >> > >> > > > located
>>>>> >>>>> >> >> > >> > > > > > at
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> jobmanager side.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 2. Support running user program
>>>>> on master side. This
>>>>> >>>>> >> >> > >> also
>>>>> >>>>> >> >> > >> > > > > means
>>>>> >>>>> >> >> > >> > > > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> >>> entry
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> point
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> will generate the job graph on
>>>>> master side. We could
>>>>> >>>>> >> >> > >> use
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> or start a local Flink client to
>>>>> achieve this
>>>>> >>>>> >> >> > >> purpose.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do
>>>>> you think this is the
>>>>> >>>>> >> >> > >> right
>>>>> >>>>> >> >> > >> > > > > > > >> direction we
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to work?
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com>
>>>>> 于2019年12月12日周四
>>>>> >>>>> >> >> > >> 下午4:48写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> A quick idea is that we
>>>>> separate the deployment
>>>>> >>>>> >> >> > >> from user
>>>>> >>>>> >> >> > >> > > > > > program
>>>>> >>>>> >> >> > >> > > > > > > >> >>> that
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> it
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> has always been done
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> outside the program. On user
>>>>> program executed there
>>>>> >>>>> >> >> > >> is
>>>>> >>>>> >> >> > >> > > > > always a
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> ClusterClient that communicates
>>>>> with
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or
>>>>> local. It will be
>>>>> >>>>> >> >> > >> another
>>>>> >>>>> >> >> > >> > > > > thread
>>>>> >>>>> >> >> > >> > > > > > > so
>>>>> >>>>> >> >> > >> > > > > > > >> >>> just
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> for
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> your information.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> Best,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com>
>>>>> 于2019年12月12日周四
>>>>> >>>>> >> >> > >> 下午4:40写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Hi Peter,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Another concern I realized
>>>>> recently is that with
>>>>> >>>>> >> >> > >> current
>>>>> >>>>> >> >> > >> > > > > > > Executors
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73)
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program
>>>>> is designed to ALWAYS
>>>>> >>>>> >> >> > >> run
>>>>> >>>>> >> >> > >> > > on
>>>>> >>>>> >> >> > >> > > > > the
>>>>> >>>>> >> >> > >> > > > > > > >> >>> client
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> side.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Specifically,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor
>>>>> when env.execute
>>>>> >>>>> >> >> > >> called.
>>>>> >>>>> >> >> > >> > > > This
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> abstraction
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> possibly prevents
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Flink runs user program on the
>>>>> cluster side.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> For your proposal, in this
>>>>> case we already
>>>>> >>>>> >> >> > >> compiled the
>>>>> >>>>> >> >> > >> > > > > > program
>>>>> >>>>> >> >> > >> > > > > > > >> and
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> run
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> on
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> the client side,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and
>>>>> retrieve job graph
>>>>> >>>>> >> >> > >> from
>>>>> >>>>> >> >> > >> > > > program
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> metadata, it
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> doesn't make
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> many sense.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do
>>>>> you think about this
>>>>> >>>>> >> >> > >> > > > > constraint?
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Best,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> tison.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Peter Huang <
>>>>> huangzhenqiu0...@gmail.com>
>>>>> >>>>> >> >> > >> 于2019年12月10日周二
>>>>> >>>>> >> >> > >> > > > > > > >> 下午12:45写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Hi Tison,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I
>>>>> made the wrong
>>>>> >>>>> >> >> > >> argument
>>>>> >>>>> >> >> > >> > > in
>>>>> >>>>> >> >> > >> > > > > the
>>>>> >>>>> >> >> > >> > > > > > > doc.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar
>>>>> problem is only for
>>>>> >>>>> >> >> > >> > > platform
>>>>> >>>>> >> >> > >> > > > > > > users.
>>>>> >>>>> >> >> > >> > > > > > > >> >>> In
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> our
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> internal deploy service,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> we further optimized the
>>>>> deployment latency by
>>>>> >>>>> >> >> > >> letting
>>>>> >>>>> >> >> > >> > > > > users
>>>>> >>>>> >> >> > >> > > > > > to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> packaging
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with
>>>>> the uber jar, so that
>>>>> >>>>> >> >> > >> we
>>>>> >>>>> >> >> > >> > > > don't
>>>>> >>>>> >> >> > >> > > > > > need
>>>>> >>>>> >> >> > >> > > > > > > >> to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> consider
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> multiple flink version
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> support for now. In the
>>>>> session client mode, as
>>>>> >>>>> >> >> > >> Flink
>>>>> >>>>> >> >> > >> > > > libs
>>>>> >>>>> >> >> > >> > > > > > will
>>>>> >>>>> >> >> > >> > > > > > > >> be
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> shipped
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of
>>>>> yarn. Users actually
>>>>> >>>>> >> >> > >> don't
>>>>> >>>>> >> >> > >> > > > > need
>>>>> >>>>> >> >> > >> > > > > > to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> package
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> those libs into job jar.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Best Regards
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Peter Huang
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35
>>>>> PM tison <
>>>>> >>>>> >> >> > >> > > > wander4...@gmail.com
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote:
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about
>>>>> the package? Do users
>>>>> >>>>> >> >> > >> need
>>>>> >>>>> >> >> > >> > > to
>>>>> >>>>> >> >> > >> > > > > > > >> >>> compile
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients,
>>>>> flink-optimizer,
>>>>> >>>>> >> >> > >> flink-table
>>>>> >>>>> >> >> > >> > > > > codes?
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> The answer should be no
>>>>> because they exist in
>>>>> >>>>> >> >> > >> system
>>>>> >>>>> >> >> > >> > > > > > > classpath.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Best,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> tison.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Yang Wang <
>>>>> danrtsey...@gmail.com> 于2019年12月10日周二
>>>>> >>>>> >> >> > >> > > > > 下午12:18写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting
>>>>> this discussion. I
>>>>> >>>>> >> >> > >> think
>>>>> >>>>> >> >> > >> > > this
>>>>> >>>>> >> >> > >> > > > > is
>>>>> >>>>> >> >> > >> > > > > > a
>>>>> >>>>> >> >> > >> > > > > > > >> >>> very
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> useful
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> feature.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am
>>>>> focused on flink on
>>>>> >>>>> >> >> > >> > > Kubernetes
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> integration
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> and
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> come
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> across the same
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the
>>>>> job graph generated
>>>>> >>>>> >> >> > >> on
>>>>> >>>>> >> >> > >> > > > client
>>>>> >>>>> >> >> > >> > > > > > > side.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Instead,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> the
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When
>>>>> the job manager
>>>>> >>>>> >> >> > >> launched,
>>>>> >>>>> >> >> > >> > > we
>>>>> >>>>> >> >> > >> > > > > > just
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> I have some small
>>>>> suggestion about this.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 1.
>>>>> `ProgramJobGraphRetriever` is very similar to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> `ClasspathJobGraphRetriever`, the differences
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> are the former needs
>>>>> `ProgramMetadata` and the
>>>>> >>>>> >> >> > >> latter
>>>>> >>>>> >> >> > >> > > > > needs
>>>>> >>>>> >> >> > >> > > > > > > >> >>> some
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> arguments.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> have an unified
>>>>> `JobGraphRetriever` to support
>>>>> >>>>> >> >> > >> both?
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not
>>>>> use a local user jar to
>>>>> >>>>> >> >> > >> > > start
>>>>> >>>>> >> >> > >> > > > a
>>>>> >>>>> >> >> > >> > > > > > > >> >>> per-job
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> cluster?
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars
>>>>> has
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and
>>>>> we do need to
>>>>> >>>>> >> >> > >> download
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > jars
>>>>> >>>>> >> >> > >> > > > > > > to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> deployer
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> always need a local user
>>>>> jar to start a flink
>>>>> >>>>> >> >> > >> > > cluster.
>>>>> >>>>> >> >> > >> > > > It
>>>>> >>>>> >> >> > >> > > > > > is
>>>>> >>>>> >> >> > >> > > > > > > >> >>> be
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> great
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> if
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> we
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> could support remote user
>>>>> jars.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we
>>>>> assume users package
>>>>> >>>>> >> >> > >> > > > > > > >> >>> flink-clients,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer,
>>>>> flink-table together within
>>>>> >>>>> >> >> > >> the job
>>>>> >>>>> >> >> > >> > > > jar.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Otherwise,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within
>>>>> >>>>> >> >> > >> JobClusterEntryPoint will
>>>>> >>>>> >> >> > >> > > > > fail.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about
>>>>> the package? Do users
>>>>> >>>>> >> >> > >> need
>>>>> >>>>> >> >> > >> > > to
>>>>> >>>>> >> >> > >> > > > > > > >> >>> compile
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients,
>>>>> flink-optimizer,
>>>>> >>>>> >> >> > >> flink-table
>>>>> >>>>> >> >> > >> > > > > > codes?
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Best,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Yang
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang <
>>>>> huangzhenqiu0...@gmail.com>
>>>>> >>>>> >> >> > >> > > > 于2019年12月10日周二
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 上午2:37写道：
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Dear All,
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink
>>>>> community starts to
>>>>> >>>>> >> >> > >> improve the
>>>>> >>>>> >> >> > >> > > > yarn
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> descriptor
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config
>>>>> files configurable
>>>>> >>>>> >> >> > >> from
>>>>> >>>>> >> >> > >> > > > CLI.
>>>>> >>>>> >> >> > >> > > > > It
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> improves
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of  Flink
>>>>> deployment Yarn Per Job
>>>>> >>>>> >> >> > >> Mode.
>>>>> >>>>> >> >> > >> > > > For
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> platform
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> users
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> who
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of
>>>>> streaming pipelines
>>>>> >>>>> >> >> > >> for
>>>>> >>>>> >> >> > >> > > the
>>>>> >>>>> >> >> > >> > > > > > whole
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> org
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> or
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job
>>>>> graph generation in
>>>>> >>>>> >> >> > >> > > > > client-side
>>>>> >>>>> >> >> > >> > > > > > is
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> another
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to
>>>>> propose a
>>>>> >>>>> >> >> > >> configurable
>>>>> >>>>> >> >> > >> > > > > feature
>>>>> >>>>> >> >> > >> > > > > > > >> >>> for
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The
>>>>> feature can allow
>>>>> >>>>> >> >> > >> users to
>>>>> >>>>> >> >> > >> > > > > choose
>>>>> >>>>> >> >> > >> > > > > > > >> >>> the
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> job
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> graph
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink
>>>>> ClusterEntryPoint so that
>>>>> >>>>> >> >> > >> the
>>>>> >>>>> >> >> > >> > > job
>>>>> >>>>> >> >> > >> > > > > jar
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> doesn't
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> need
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> to
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job
>>>>> graph generation. The
>>>>> >>>>> >> >> > >> > > proposal
>>>>> >>>>> >> >> > >> > > > is
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> organized
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> as a
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FLIP
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> .
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and
>>>>> suggestions are welcomed.
>>>>> >>>>> >> >> > >> Thank
>>>>> >>>>> >> >> > >> > > you
>>>>> >>>>> >> >> > >> > > > in
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> advance.
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>>
>>>>> >>>>> >> >> > >> > > > > > > >> >>
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > > >>
>>>>> >>>>> >> >> > >> > > > > > >
>>>>> >>>>> >> >> > >> > > > > >
>>>>> >>>>> >> >> > >> > > > >
>>>>> >>>>> >> >> > >> > > >
>>>>> >>>>> >> >> > >> > >
>>>>> >>>>> >> >> > >>
>>>>> >>>>> >> >> > >
>>>>> >>>>> >> >>
>>>>>
>>>>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Reply via email to