Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Peter Huang Thu, 26 Dec 2019 11:14:28 -0800

Hi Yang,

I can't agree more. The effort definitely needs to align with the final
goal of FLIP-73.
I am thinking about whether we can achieve the goal with two phases.


1) Phase I
As the CLiFrontend will not be depreciated soon. We can still use the
deployMode flag there,
pass the program info through Flink configuration,  use the
ClassPathJobGraphRetriever
to generate the job graph in ClusterEntrypoints of yarn and Kubernetes.

2) Phase II
In  AbstractJobClusterExecutor, the job graph is generated in the execute
function. We can still
use the deployMode in it. With deployMode = cluster, the execute function
only starts the cluster.

When {Yarn/Kuberneates}PerJobClusterEntrypoint starts, It will start the
dispatch first, then we can use
a ClusterEnvironment similar to ContextEnvironment to submit the job with
jobName the local
dispatcher. For the details, we need more investigation. Let's wait
for @Aljoscha
Krettek <[email protected]> @Till Rohrmann <[email protected]>'s
feedback after the holiday season.

Thank you in advance. Merry Chrismas and Happy New Year!!!


Best Regards
Peter Huang








On Wed, Dec 25, 2019 at 1:08 AM Yang Wang <[email protected]> wrote:

> Hi Peter,
>
> I think we need to reconsider tison's suggestion seriously. After FLIP-73,
> the deployJobCluster has
> beenmoved into `JobClusterExecutor#execute`. It should not be perceived
> for `CliFrontend`. That
> means the user program will *ALWAYS* be executed on client side. This is
> the by design behavior.
> So, we could not just add `if(client mode) .. else if(cluster mode) ...`
> codes in `CliFrontend` to bypass
> the executor. We need to find a clean way to decouple executing user
> program and deploying per-job
> cluster. Based on this, we could support to execute user program on client
> or master side.
>
> Maybe Aljoscha and Jeff could give some good suggestions.
>
>
>
> Best,
> Yang
>
> Peter Huang <[email protected]> 于2019年12月25日周三 上午4:03写道：
>
>> Hi Jingjing,
>>
>> The improvement proposed is a deployment option for CLI. For SQL based
>> Flink application, It is more convenient to use the existing model in
>> SqlClient in which
>> the job graph is generated within SqlClient. After adding the delayed job
>> graph generation, I think there is no change is needed for your side.
>>
>>
>> Best Regards
>> Peter Huang
>>
>>
>> On Wed, Dec 18, 2019 at 6:01 AM jingjing bai <[email protected]>
>> wrote:
>>
>> > hi peter:
>> >     we had extension SqlClent to support sql job submit in web base on
>> > flink 1.9.   we support submit to yarn on per job mode too.
>> >     in this case, the job graph generated  on client side .  I think
>> this
>> > discuss Mainly to improve api programme.  but in my case , there is no
>> > jar to upload but only a sql string .
>> >     do u had more suggestion to improve for sql mode or it is only a
>> > switch for api programme？
>> >
>> >
>> > best
>> > bai jj
>> >
>> >
>> > Yang Wang <[email protected]> 于2019年12月18日周三 下午7:21写道：
>> >
>> >>  I just want to revive this discussion.
>> >>
>> >> Recently, i am thinking about how to natively run flink per-job
>> cluster on
>> >> Kubernetes.
>> >> The per-job mode on Kubernetes is very different from on Yarn. And we
>> will
>> >> have
>> >> the same deployment requirements to the client and entry point.
>> >>
>> >> 1. Flink client not always need a local jar to start a Flink per-job
>> >> cluster. We could
>> >> support multiple schemas. For example, file:///path/of/my.jar means a
>> jar
>> >> located
>> >> at client side, hdfs://myhdfs/user/myname/flink/my.jar means a jar
>> located
>> >> at
>> >> remote hdfs, local:///path/in/image/my.jar means a jar located at
>> >> jobmanager side.
>> >>
>> >> 2. Support running user program on master side. This also means the
>> entry
>> >> point
>> >> will generate the job graph on master side. We could use the
>> >> ClasspathJobGraphRetriever
>> >> or start a local Flink client to achieve this purpose.
>> >>
>> >>
>> >> cc tison, Aljoscha & Kostas Do you think this is the right direction we
>> >> need to work?
>> >>
>> >> tison <[email protected]> 于2019年12月12日周四 下午4:48写道：
>> >>
>> >> > A quick idea is that we separate the deployment from user program
>> that
>> >> it
>> >> > has always been done
>> >> > outside the program. On user program executed there is always a
>> >> > ClusterClient that communicates with
>> >> > an existing cluster, remote or local. It will be another thread so
>> just
>> >> for
>> >> > your information.
>> >> >
>> >> > Best,
>> >> > tison.
>> >> >
>> >> >
>> >> > tison <[email protected]> 于2019年12月12日周四 下午4:40写道：
>> >> >
>> >> > > Hi Peter,
>> >> > >
>> >> > > Another concern I realized recently is that with current Executors
>> >> > > abstraction(FLIP-73)
>> >> > > I'm afraid that user program is designed to ALWAYS run on the
>> client
>> >> > side.
>> >> > > Specifically,
>> >> > > we deploy the job in executor when env.execute called. This
>> >> abstraction
>> >> > > possibly prevents
>> >> > > Flink runs user program on the cluster side.
>> >> > >
>> >> > > For your proposal, in this case we already compiled the program and
>> >> run
>> >> > on
>> >> > > the client side,
>> >> > > even we deploy a cluster and retrieve job graph from program
>> >> metadata, it
>> >> > > doesn't make
>> >> > > many sense.
>> >> > >
>> >> > > cc Aljoscha & Kostas what do you think about this constraint?
>> >> > >
>> >> > > Best,
>> >> > > tison.
>> >> > >
>> >> > >
>> >> > > Peter Huang <[email protected]> 于2019年12月10日周二 下午12:45写道：
>> >> > >
>> >> > >> Hi Tison,
>> >> > >>
>> >> > >> Yes, you are right. I think I made the wrong argument in the doc.
>> >> > >> Basically, the packaging jar problem is only for platform users.
>> In
>> >> our
>> >> > >> internal deploy service,
>> >> > >> we further optimized the deployment latency by letting users to
>> >> > packaging
>> >> > >> flink-runtime together with the uber jar, so that we don't need to
>> >> > >> consider
>> >> > >> multiple flink version
>> >> > >> support for now. In the session client mode, as Flink libs will be
>> >> > shipped
>> >> > >> anyway as local resources of yarn. Users actually don't need to
>> >> package
>> >> > >> those libs into job jar.
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> Best Regards
>> >> > >> Peter Huang
>> >> > >>
>> >> > >> On Mon, Dec 9, 2019 at 8:35 PM tison <[email protected]>
>> wrote:
>> >> > >>
>> >> > >> > > 3. What do you mean about the package? Do users need to
>> compile
>> >> > their
>> >> > >> > jars
>> >> > >> > inlcuding flink-clients, flink-optimizer, flink-table codes?
>> >> > >> >
>> >> > >> > The answer should be no because they exist in system classpath.
>> >> > >> >
>> >> > >> > Best,
>> >> > >> > tison.
>> >> > >> >
>> >> > >> >
>> >> > >> > Yang Wang <[email protected]> 于2019年12月10日周二 下午12:18写道：
>> >> > >> >
>> >> > >> > > Hi Peter,
>> >> > >> > >
>> >> > >> > > Thanks a lot for starting this discussion. I think this is a
>> very
>> >> > >> useful
>> >> > >> > > feature.
>> >> > >> > >
>> >> > >> > > Not only for Yarn, i am focused on flink on Kubernetes
>> >> integration
>> >> > and
>> >> > >> > come
>> >> > >> > > across the same
>> >> > >> > > problem. I do not want the job graph generated on client side.
>> >> > >> Instead,
>> >> > >> > the
>> >> > >> > > user jars are built in
>> >> > >> > > a user-defined image. When the job manager launched, we just
>> >> need to
>> >> > >> > > generate the job graph
>> >> > >> > > based on local user jars.
>> >> > >> > >
>> >> > >> > > I have some small suggestion about this.
>> >> > >> > >
>> >> > >> > > 1. `ProgramJobGraphRetriever` is very similar to
>> >> > >> > > `ClasspathJobGraphRetriever`, the differences
>> >> > >> > > are the former needs `ProgramMetadata` and the latter needs
>> some
>> >> > >> > arguments.
>> >> > >> > > Is it possible to
>> >> > >> > > have an unified `JobGraphRetriever` to support both?
>> >> > >> > > 2. Is it possible to not use a local user jar to start a
>> per-job
>> >> > >> cluster?
>> >> > >> > > In your case, the user jars has
>> >> > >> > > existed on hdfs already and we do need to download the jars to
>> >> > >> deployer
>> >> > >> > > service. Currently, we
>> >> > >> > > always need a local user jar to start a flink cluster. It is
>> be
>> >> > great
>> >> > >> if
>> >> > >> > we
>> >> > >> > > could support remote user jars.
>> >> > >> > > >> In the implementation, we assume users package
>> flink-clients,
>> >> > >> > > flink-optimizer, flink-table together within the job jar.
>> >> Otherwise,
>> >> > >> the
>> >> > >> > > job graph generation within JobClusterEntryPoint will fail.
>> >> > >> > > 3. What do you mean about the package? Do users need to
>> compile
>> >> > their
>> >> > >> > jars
>> >> > >> > > inlcuding flink-clients, flink-optimizer, flink-table codes?
>> >> > >> > >
>> >> > >> > >
>> >> > >> > >
>> >> > >> > > Best,
>> >> > >> > > Yang
>> >> > >> > >
>> >> > >> > > Peter Huang <[email protected]> 于2019年12月10日周二
>> >> 上午2:37写道：
>> >> > >> > >
>> >> > >> > > > Dear All,
>> >> > >> > > >
>> >> > >> > > > Recently, the Flink community starts to improve the yarn
>> >> cluster
>> >> > >> > > descriptor
>> >> > >> > > > to make job jar and config files configurable from CLI. It
>> >> > improves
>> >> > >> the
>> >> > >> > > > flexibility of  Flink deployment Yarn Per Job Mode. For
>> >> platform
>> >> > >> users
>> >> > >> > > who
>> >> > >> > > > manage tens of hundreds of streaming pipelines for the whole
>> >> org
>> >> > or
>> >> > >> > > > company, we found the job graph generation in client-side is
>> >> > another
>> >> > >> > > > pinpoint. Thus, we want to propose a configurable feature
>> for
>> >> > >> > > > FlinkYarnSessionCli. The feature can allow users to choose
>> the
>> >> job
>> >> > >> > graph
>> >> > >> > > > generation in Flink ClusterEntryPoint so that the job jar
>> >> doesn't
>> >> > >> need
>> >> > >> > to
>> >> > >> > > > be locally for the job graph generation. The proposal is
>> >> organized
>> >> > >> as a
>> >> > >> > > > FLIP
>> >> > >> > > >
>> >> > >> > > >
>> >> > >> > >
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation
>> >> > >> > > > .
>> >> > >> > > >
>> >> > >> > > > Any questions and suggestions are welcomed. Thank you in
>> >> advance.
>> >> > >> > > >
>> >> > >> > > >
>> >> > >> > > > Best Regards
>> >> > >> > > > Peter Huang
>> >> > >> > > >
>> >> > >> > >
>> >> > >> >
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >
>>
>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Reply via email to