Re: REST API / JarRunHandler: More flexibility for launching jobs

Zili Chen Mon, 05 Aug 2019 18:56:10 -0700

It sounds like a request to change the interface Program into

public interface Program {
  JobGraph getJobGraph(String... args);
}


Also, given that JobGraph is said as internal interface or
cannot be relied on, we might introduce and use a
representation that allows for cross version compatibility.


Thomas Weise <t...@apache.org> 于2019年8月6日周二 上午12:11写道：

> If the goal is to keep job creation and job submission separate and we
> agree that there should be more flexibility for the job construction, then
> JobGraph and friends should be stable API that the user can depend on. If
> that's the case, the path Chesnay pointed to may become viable.
>
> There was discussion in the past that JobGraph cannot be relied on WRT
> backward compatibility and I would expect that at some point we want to
> move to a representation that allows for cross version compatibility. Beam
> is an example how this could be accomplished (with its pipeline proto).
>
> So if the Beam job server was able to produce the JobGraph, is there
> agreement that we should provide a mechanism that allows the program entry
> point to return the JobGraph directly (without using the
> ExecutionEnvironment to build it)?
>
>
> On Mon, Aug 5, 2019 at 2:10 AM Zili Chen <wander4...@gmail.com> wrote:
>
> > Hi Thomas,
> >
> > If REST handler calls main(), the behavior inside main() is
> > unpredictable.
> >
> > Now the jar run handler extract the job graph and submit
> > it with the job id configured in REST request. If REST
> > handler calls main() we can hardly even know how much
> > jobs are executed.
> >
> > A new environment, as you said,
> > ExtractJobGraphAndSubmitToDispatcherEnvironment can be
> > added to satisfy your requirement. However, it is a bit
> > out of Flink scope. It might be better to write your own
> > REST handler.
> >
> > WebMonitorExtension is for extending REST handlers but
> > it seems also unable to customize...
> >
> > Best,
> > tison.
> >
> >
> > Thomas Weise <t...@apache.org> 于2019年8月3日周六 上午4:09写道：
> >
> > > Thanks for looking into this.
> > >
> > > I see the "Jar run handler" as function that takes few parameters and
> > > returns a job ID. I think it would be nice if the handler doesn't hard
> > code
> > > the function. Perhaps this could be accomplished by pushing the code
> into
> > > something like "ExtractJobGraphAndSubmitToDispatcherEnvironment" that
> the
> > > main method could also bypass if it has an alternative way to provide
> the
> > > jobId via a context variable?
> > >
> > > Zili: I looked at the client API proposal and left a few comments. I
> > think
> > > it is important to improve programmatic job submission. But it also
> seems
> > > orthogonal to how the jar run handler operates (i.e. these issues could
> > be
> > > addressed independently).
> > >
> > > Chesnay: You are right that the Beam job sever could be hacked to
> extract
> > > job graph and other ingredients. This isn't desirable though because
> > these
> > > Flink internals should not be exposed downstream. But even if we went
> > down
> > > that route we would still need a way to let the jar run handler know to
> > > just return the ID of an already submitted job vs. trying to submit one
> > > from OptimizerPlanEnvironment.
> > >
> > > The intended sequence would be:
> > >
> > > REST client provides a launcher jar
> > > REST client "runs jar"
> > > REST handler calls main()
> > > main launches Beam job server, runs Beam pipeline construction code
> > against
> > > that job server
> > > job server uses RemoteEnvironment to submit real job
> > > main "returns job id"
> > > REST handler returns job id
> > >
> > > Thomas
> > >
> > >
> > > On Wed, Jul 31, 2019 at 4:33 AM Zili Chen <wander4...@gmail.com>
> wrote:
> > >
> > > > By the way, currently Dispatcher implements RestfulGateway
> > > > and delegate resource request to ResourceManager. If we can,
> > > > semantically, let WebMonitor implement RestfulGateway,
> > > > and delegate job request to Dispatcher, resource request to
> > > > ResourceManager, it seems reasonable that when WebMonitor
> > > > receives a JarRun request, it spawns a process and run
> > > > the main method of the main class of that jar.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Zili Chen <wander4...@gmail.com> 于2019年7月31日周三 下午7:10写道：
> > > >
> > > >> I don't think the `Program` interface could solve the problem.
> > > >>
> > > >> The launcher launches the job server which creates the job graph,
> > > >> submits it and keeps monitoring. Even if user program implement
> > > >> `Program` Flink still extracts the JobGraph from `getPlan` and
> > > >> submits it, instead of really execute codes in main method of
> > > >> user program, so that the launcher is not started.
> > > >>
> > > >> @Thomas,
> > > >>
> > > >> Here is an ongoing discussion on client refactoring[1] as Till
> > > >> mentioned. However, I'm afraid that with current jar run semantic,
> > > >> i.e., extract the job graph and submit it to the Dispatcher, it
> cannot
> > > >> fits your requirement. The problem is that REST API directly
> > > >> communicates with Dispatcher and thus it's strange to tell the
> > > >> Dispatcher "just run a program in a process".
> > > >>
> > > >> As you mentioned in the document, with CLI in session mode the
> > > >> whole program would be executed sequentially. I'll appreciate it
> > > >> if you can participant the thread on client refactor[1]. In the
> > > >> design document[2], we propose to provide rich interfaces for
> > > >> downstream projects integration. You can customize your CLI for
> > > >> executing your program arbitrarily. Any requirement or advise
> > > >> would be help.
> > > >>
> > > >> Best,
> > > >> tison.
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > > >> [2]
> > > >>
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Till Rohrmann <trohrm...@apache.org> 于2019年7月31日周三 下午4:50写道：
> > > >>
> > > >>> Are you looking for something similar to the `Program` interface?
> > This
> > > >>> interface, even though it is a bit outdated and might get removed
> in
> > > the
> > > >>> future, offers a `getPlan` method which is called in order to
> > generate
> > > >>> the
> > > >>> `JobGraph`. In the client refactoring discussion thread it is
> > currently
> > > >>> being discussed what to do with this interface.
> > > >>>
> > > >>> Cheers,
> > > >>> Till
> > > >>>
> > > >>> On Wed, Jul 31, 2019 at 10:41 AM Chesnay Schepler <
> > ches...@apache.org>
> > > >>> wrote:
> > > >>>
> > > >>> > Couldn't the beam job server use the same work-around we're using
> > in
> > > >>> the
> > > >>> > JarRunHandler to get access to the JobGraph?
> > > >>> >
> > > >>> > On 26/07/2019 17:38, Thomas Weise wrote:
> > > >>> > > Hi Till,
> > > >>> > >
> > > >>> > > Thanks for taking a look!
> > > >>> > >
> > > >>> > > The Beam job server does not currently have the ability to just
> > > >>> output
> > > >>> > the
> > > >>> > > job graph (and related artifacts) that could then be used with
> > the
> > > >>> > > JobSubmitHandler. It is itself using
> StreamExecutionEnvironment,
> > > >>> which in
> > > >>> > > turn will lead to a REST API submission.
> > > >>> > >
> > > >>> > > Here I'm looking at what happens before the Beam job server
> gets
> > > >>> > involved:
> > > >>> > > the interaction of the k8s operator with the Flink deployment.
> > The
> > > >>> jar
> > > >>> > run
> > > >>> > > endpoint (ignoring the current handler implementation) is
> generic
> > > and
> > > >>> > > pretty much exactly matches what we would need for a uniform
> > entry
> > > >>> point.
> > > >>> > > It's just that in the Beam case the jar file would itself be a
> > > >>> "launcher"
> > > >>> > > that doesn't provide the job graph itself, but the dependencies
> > and
> > > >>> > > mechanism to invoke the actual client.
> > > >>> > >
> > > >>> > > I could accomplish what I'm looking for by creating a separate
> > REST
> > > >>> > > endpoint that looks almost the same. But I would prefer to
> reuse
> > > the
> > > >>> > Flink
> > > >>> > > REST API interaction that is already implemented for the Flink
> > Java
> > > >>> jobs
> > > >>> > to
> > > >>> > > reduce the complexity of the deployment.
> > > >>> > >
> > > >>> > > Thomas
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > On Fri, Jul 26, 2019 at 2:29 AM Till Rohrmann <
> > > trohrm...@apache.org>
> > > >>> > wrote:
> > > >>> > >
> > > >>> > >> Hi Thomas,
> > > >>> > >>
> > > >>> > >> quick question: Why do you wanna use the JarRunHandler? If
> > another
> > > >>> > process
> > > >>> > >> is building the JobGraph, then one could use the
> > JobSubmitHandler
> > > >>> which
> > > >>> > >> expects a JobGraph and then starts executing it.
> > > >>> > >>
> > > >>> > >> Cheers,
> > > >>> > >> Till
> > > >>> > >>
> > > >>> > >> On Thu, Jul 25, 2019 at 7:45 PM Thomas Weise <t...@apache.org>
> > > >>> wrote:
> > > >>> > >>
> > > >>> > >>> Hi,
> > > >>> > >>>
> > > >>> > >>> While considering different options to launch Beam jobs
> through
> > > the
> > > >>> > Flink
> > > >>> > >>> REST API, I noticed that the implementation of JarRunHandler
> > > places
> > > >>> > >> quite a
> > > >>> > >>> few restrictions on how the entry point shall construct a
> Flink
> > > >>> job, by
> > > >>> > >>> extracting and manipulating the job graph.
> > > >>> > >>>
> > > >>> > >>> That's normally not a problem for Flink Java programs, but in
> > the
> > > >>> > >> scenario
> > > >>> > >>> I'm looking at, the job graph would be constructed by a
> > different
> > > >>> > process
> > > >>> > >>> and isn't available to the REST handler. Instead, I would
> like
> > to
> > > >>> be
> > > >>> > able
> > > >>> > >>> to just respond with the job ID of the already launched job.
> > > >>> > >>>
> > > >>> > >>> For context, please see:
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.fh2f571kms4d
> > > >>> > >>> The current JarRunHandler code is here:
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> https://github.com/apache/flink/blob/f3c5dd960ff81a022ece2391ed3aee86080a352a/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarRunHandler.java#L82
> > > >>> > >>> It would be nice if there was an option to delegate the
> > > >>> responsibility
> > > >>> > >> for
> > > >>> > >>> job submission to the user code / entry point. That would be
> > > >>> useful for
> > > >>> > >>> Beam and other frameworks built on top of Flink that
> > dynamically
> > > >>> > create a
> > > >>> > >>> job graph from a different representation.
> > > >>> > >>>
> > > >>> > >>> Possible ways to get there:
> > > >>> > >>>
> > > >>> > >>> * an interface that the main class can be implement end when
> > > >>> present,
> > > >>> > the
> > > >>> > >>> jar run handler calls instead of main.
> > > >>> > >>>
> > > >>> > >>> * an annotated method
> > > >>> > >>>
> > > >>> > >>> Either way query parameters like savepoint path and
> parallelism
> > > >>> would
> > > >>> > be
> > > >>> > >>> forwarded to the user code and the result would be the ID of
> > the
> > > >>> > launched
> > > >>> > >>> job.
> > > >>> > >>>
> > > >>> > >>> Thougths?
> > > >>> > >>>
> > > >>> > >>> Thanks,
> > > >>> > >>> Thomas
> > > >>> > >>>
> > > >>> >
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Re: REST API / JarRunHandler: More flexibility for launching jobs

Reply via email to