Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Yang Wang Wed, 25 Dec 2019 01:09:17 -0800

Hi Peter,

I think we need to reconsider tison's suggestion seriously. After FLIP-73,
the deployJobCluster has
beenmoved into `JobClusterExecutor#execute`. It should not be perceived for
`CliFrontend`. That
means the user program will *ALWAYS* be executed on client side. This is
the by design behavior.
So, we could not just add `if(client mode) .. else if(cluster mode) ...`
codes in `CliFrontend` to bypass
the executor. We need to find a clean way to decouple executing user
program and deploying per-job
cluster. Based on this, we could support to execute user program on client
or master side.


Maybe Aljoscha and Jeff could give some good suggestions.



Best,
Yang

Peter Huang <[email protected]> 于2019年12月25日周三 上午4:03写道：

> Hi Jingjing,
>
> The improvement proposed is a deployment option for CLI. For SQL based
> Flink application, It is more convenient to use the existing model in
> SqlClient in which
> the job graph is generated within SqlClient. After adding the delayed job
> graph generation, I think there is no change is needed for your side.
>
>
> Best Regards
> Peter Huang
>
>
> On Wed, Dec 18, 2019 at 6:01 AM jingjing bai <[email protected]>
> wrote:
>
> > hi peter:
> >     we had extension SqlClent to support sql job submit in web base on
> > flink 1.9.   we support submit to yarn on per job mode too.
> >     in this case, the job graph generated  on client side .  I think this
> > discuss Mainly to improve api programme.  but in my case , there is no
> > jar to upload but only a sql string .
> >     do u had more suggestion to improve for sql mode or it is only a
> > switch for api programme？
> >
> >
> > best
> > bai jj
> >
> >
> > Yang Wang <[email protected]> 于2019年12月18日周三 下午7:21写道：
> >
> >>  I just want to revive this discussion.
> >>
> >> Recently, i am thinking about how to natively run flink per-job cluster
> on
> >> Kubernetes.
> >> The per-job mode on Kubernetes is very different from on Yarn. And we
> will
> >> have
> >> the same deployment requirements to the client and entry point.
> >>
> >> 1. Flink client not always need a local jar to start a Flink per-job
> >> cluster. We could
> >> support multiple schemas. For example, file:///path/of/my.jar means a
> jar
> >> located
> >> at client side, hdfs://myhdfs/user/myname/flink/my.jar means a jar
> located
> >> at
> >> remote hdfs, local:///path/in/image/my.jar means a jar located at
> >> jobmanager side.
> >>
> >> 2. Support running user program on master side. This also means the
> entry
> >> point
> >> will generate the job graph on master side. We could use the
> >> ClasspathJobGraphRetriever
> >> or start a local Flink client to achieve this purpose.
> >>
> >>
> >> cc tison, Aljoscha & Kostas Do you think this is the right direction we
> >> need to work?
> >>
> >> tison <[email protected]> 于2019年12月12日周四 下午4:48写道：
> >>
> >> > A quick idea is that we separate the deployment from user program that
> >> it
> >> > has always been done
> >> > outside the program. On user program executed there is always a
> >> > ClusterClient that communicates with
> >> > an existing cluster, remote or local. It will be another thread so
> just
> >> for
> >> > your information.
> >> >
> >> > Best,
> >> > tison.
> >> >
> >> >
> >> > tison <[email protected]> 于2019年12月12日周四 下午4:40写道：
> >> >
> >> > > Hi Peter,
> >> > >
> >> > > Another concern I realized recently is that with current Executors
> >> > > abstraction(FLIP-73)
> >> > > I'm afraid that user program is designed to ALWAYS run on the client
> >> > side.
> >> > > Specifically,
> >> > > we deploy the job in executor when env.execute called. This
> >> abstraction
> >> > > possibly prevents
> >> > > Flink runs user program on the cluster side.
> >> > >
> >> > > For your proposal, in this case we already compiled the program and
> >> run
> >> > on
> >> > > the client side,
> >> > > even we deploy a cluster and retrieve job graph from program
> >> metadata, it
> >> > > doesn't make
> >> > > many sense.
> >> > >
> >> > > cc Aljoscha & Kostas what do you think about this constraint?
> >> > >
> >> > > Best,
> >> > > tison.
> >> > >
> >> > >
> >> > > Peter Huang <[email protected]> 于2019年12月10日周二 下午12:45写道：
> >> > >
> >> > >> Hi Tison,
> >> > >>
> >> > >> Yes, you are right. I think I made the wrong argument in the doc.
> >> > >> Basically, the packaging jar problem is only for platform users. In
> >> our
> >> > >> internal deploy service,
> >> > >> we further optimized the deployment latency by letting users to
> >> > packaging
> >> > >> flink-runtime together with the uber jar, so that we don't need to
> >> > >> consider
> >> > >> multiple flink version
> >> > >> support for now. In the session client mode, as Flink libs will be
> >> > shipped
> >> > >> anyway as local resources of yarn. Users actually don't need to
> >> package
> >> > >> those libs into job jar.
> >> > >>
> >> > >>
> >> > >>
> >> > >> Best Regards
> >> > >> Peter Huang
> >> > >>
> >> > >> On Mon, Dec 9, 2019 at 8:35 PM tison <[email protected]> wrote:
> >> > >>
> >> > >> > > 3. What do you mean about the package? Do users need to compile
> >> > their
> >> > >> > jars
> >> > >> > inlcuding flink-clients, flink-optimizer, flink-table codes?
> >> > >> >
> >> > >> > The answer should be no because they exist in system classpath.
> >> > >> >
> >> > >> > Best,
> >> > >> > tison.
> >> > >> >
> >> > >> >
> >> > >> > Yang Wang <[email protected]> 于2019年12月10日周二 下午12:18写道：
> >> > >> >
> >> > >> > > Hi Peter,
> >> > >> > >
> >> > >> > > Thanks a lot for starting this discussion. I think this is a
> very
> >> > >> useful
> >> > >> > > feature.
> >> > >> > >
> >> > >> > > Not only for Yarn, i am focused on flink on Kubernetes
> >> integration
> >> > and
> >> > >> > come
> >> > >> > > across the same
> >> > >> > > problem. I do not want the job graph generated on client side.
> >> > >> Instead,
> >> > >> > the
> >> > >> > > user jars are built in
> >> > >> > > a user-defined image. When the job manager launched, we just
> >> need to
> >> > >> > > generate the job graph
> >> > >> > > based on local user jars.
> >> > >> > >
> >> > >> > > I have some small suggestion about this.
> >> > >> > >
> >> > >> > > 1. `ProgramJobGraphRetriever` is very similar to
> >> > >> > > `ClasspathJobGraphRetriever`, the differences
> >> > >> > > are the former needs `ProgramMetadata` and the latter needs
> some
> >> > >> > arguments.
> >> > >> > > Is it possible to
> >> > >> > > have an unified `JobGraphRetriever` to support both?
> >> > >> > > 2. Is it possible to not use a local user jar to start a
> per-job
> >> > >> cluster?
> >> > >> > > In your case, the user jars has
> >> > >> > > existed on hdfs already and we do need to download the jars to
> >> > >> deployer
> >> > >> > > service. Currently, we
> >> > >> > > always need a local user jar to start a flink cluster. It is be
> >> > great
> >> > >> if
> >> > >> > we
> >> > >> > > could support remote user jars.
> >> > >> > > >> In the implementation, we assume users package
> flink-clients,
> >> > >> > > flink-optimizer, flink-table together within the job jar.
> >> Otherwise,
> >> > >> the
> >> > >> > > job graph generation within JobClusterEntryPoint will fail.
> >> > >> > > 3. What do you mean about the package? Do users need to compile
> >> > their
> >> > >> > jars
> >> > >> > > inlcuding flink-clients, flink-optimizer, flink-table codes?
> >> > >> > >
> >> > >> > >
> >> > >> > >
> >> > >> > > Best,
> >> > >> > > Yang
> >> > >> > >
> >> > >> > > Peter Huang <[email protected]> 于2019年12月10日周二
> >> 上午2:37写道：
> >> > >> > >
> >> > >> > > > Dear All,
> >> > >> > > >
> >> > >> > > > Recently, the Flink community starts to improve the yarn
> >> cluster
> >> > >> > > descriptor
> >> > >> > > > to make job jar and config files configurable from CLI. It
> >> > improves
> >> > >> the
> >> > >> > > > flexibility of  Flink deployment Yarn Per Job Mode. For
> >> platform
> >> > >> users
> >> > >> > > who
> >> > >> > > > manage tens of hundreds of streaming pipelines for the whole
> >> org
> >> > or
> >> > >> > > > company, we found the job graph generation in client-side is
> >> > another
> >> > >> > > > pinpoint. Thus, we want to propose a configurable feature for
> >> > >> > > > FlinkYarnSessionCli. The feature can allow users to choose
> the
> >> job
> >> > >> > graph
> >> > >> > > > generation in Flink ClusterEntryPoint so that the job jar
> >> doesn't
> >> > >> need
> >> > >> > to
> >> > >> > > > be locally for the job graph generation. The proposal is
> >> organized
> >> > >> as a
> >> > >> > > > FLIP
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation
> >> > >> > > > .
> >> > >> > > >
> >> > >> > > > Any questions and suggestions are welcomed. Thank you in
> >> advance.
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > Best Regards
> >> > >> > > > Peter Huang
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Reply via email to