Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Peter Huang Wed, 18 Dec 2019 10:54:55 -0800

Hi Yang,

Thanks for your input, I can see the master side job graph generation is a
common requirement for per job mode.
I think FLIP-73 is mainly for session mode. I think the proposal is a valid
improvement for existing CLI and per job mode.



Best Regards
Peter Huang

On Wed, Dec 18, 2019 at 3:21 AM Yang Wang <[email protected]> wrote:

>  I just want to revive this discussion.
>
> Recently, i am thinking about how to natively run flink per-job cluster on
> Kubernetes.
> The per-job mode on Kubernetes is very different from on Yarn. And we will
> have
> the same deployment requirements to the client and entry point.
>
> 1. Flink client not always need a local jar to start a Flink per-job
> cluster. We could
> support multiple schemas. For example, file:///path/of/my.jar means a jar
> located
> at client side, hdfs://myhdfs/user/myname/flink/my.jar means a jar located
> at
> remote hdfs, local:///path/in/image/my.jar means a jar located at
> jobmanager side.
>
> 2. Support running user program on master side. This also means the entry
> point
> will generate the job graph on master side. We could use the
> ClasspathJobGraphRetriever
> or start a local Flink client to achieve this purpose.
>
>
> cc tison, Aljoscha & Kostas Do you think this is the right direction we
> need to work?
>
> tison <[email protected]> 于2019年12月12日周四 下午4:48写道：
>
> > A quick idea is that we separate the deployment from user program that it
> > has always been done
> > outside the program. On user program executed there is always a
> > ClusterClient that communicates with
> > an existing cluster, remote or local. It will be another thread so just
> for
> > your information.
> >
> > Best,
> > tison.
> >
> >
> > tison <[email protected]> 于2019年12月12日周四 下午4:40写道：
> >
> > > Hi Peter,
> > >
> > > Another concern I realized recently is that with current Executors
> > > abstraction(FLIP-73)
> > > I'm afraid that user program is designed to ALWAYS run on the client
> > side.
> > > Specifically,
> > > we deploy the job in executor when env.execute called. This abstraction
> > > possibly prevents
> > > Flink runs user program on the cluster side.
> > >
> > > For your proposal, in this case we already compiled the program and run
> > on
> > > the client side,
> > > even we deploy a cluster and retrieve job graph from program metadata,
> it
> > > doesn't make
> > > many sense.
> > >
> > > cc Aljoscha & Kostas what do you think about this constraint?
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Peter Huang <[email protected]> 于2019年12月10日周二 下午12:45写道：
> > >
> > >> Hi Tison,
> > >>
> > >> Yes, you are right. I think I made the wrong argument in the doc.
> > >> Basically, the packaging jar problem is only for platform users. In
> our
> > >> internal deploy service,
> > >> we further optimized the deployment latency by letting users to
> > packaging
> > >> flink-runtime together with the uber jar, so that we don't need to
> > >> consider
> > >> multiple flink version
> > >> support for now. In the session client mode, as Flink libs will be
> > shipped
> > >> anyway as local resources of yarn. Users actually don't need to
> package
> > >> those libs into job jar.
> > >>
> > >>
> > >>
> > >> Best Regards
> > >> Peter Huang
> > >>
> > >> On Mon, Dec 9, 2019 at 8:35 PM tison <[email protected]> wrote:
> > >>
> > >> > > 3. What do you mean about the package? Do users need to compile
> > their
> > >> > jars
> > >> > inlcuding flink-clients, flink-optimizer, flink-table codes?
> > >> >
> > >> > The answer should be no because they exist in system classpath.
> > >> >
> > >> > Best,
> > >> > tison.
> > >> >
> > >> >
> > >> > Yang Wang <[email protected]> 于2019年12月10日周二 下午12:18写道：
> > >> >
> > >> > > Hi Peter,
> > >> > >
> > >> > > Thanks a lot for starting this discussion. I think this is a very
> > >> useful
> > >> > > feature.
> > >> > >
> > >> > > Not only for Yarn, i am focused on flink on Kubernetes integration
> > and
> > >> > come
> > >> > > across the same
> > >> > > problem. I do not want the job graph generated on client side.
> > >> Instead,
> > >> > the
> > >> > > user jars are built in
> > >> > > a user-defined image. When the job manager launched, we just need
> to
> > >> > > generate the job graph
> > >> > > based on local user jars.
> > >> > >
> > >> > > I have some small suggestion about this.
> > >> > >
> > >> > > 1. `ProgramJobGraphRetriever` is very similar to
> > >> > > `ClasspathJobGraphRetriever`, the differences
> > >> > > are the former needs `ProgramMetadata` and the latter needs some
> > >> > arguments.
> > >> > > Is it possible to
> > >> > > have an unified `JobGraphRetriever` to support both?
> > >> > > 2. Is it possible to not use a local user jar to start a per-job
> > >> cluster?
> > >> > > In your case, the user jars has
> > >> > > existed on hdfs already and we do need to download the jars to
> > >> deployer
> > >> > > service. Currently, we
> > >> > > always need a local user jar to start a flink cluster. It is be
> > great
> > >> if
> > >> > we
> > >> > > could support remote user jars.
> > >> > > >> In the implementation, we assume users package flink-clients,
> > >> > > flink-optimizer, flink-table together within the job jar.
> Otherwise,
> > >> the
> > >> > > job graph generation within JobClusterEntryPoint will fail.
> > >> > > 3. What do you mean about the package? Do users need to compile
> > their
> > >> > jars
> > >> > > inlcuding flink-clients, flink-optimizer, flink-table codes?
> > >> > >
> > >> > >
> > >> > >
> > >> > > Best,
> > >> > > Yang
> > >> > >
> > >> > > Peter Huang <[email protected]> 于2019年12月10日周二 上午2:37写道：
> > >> > >
> > >> > > > Dear All,
> > >> > > >
> > >> > > > Recently, the Flink community starts to improve the yarn cluster
> > >> > > descriptor
> > >> > > > to make job jar and config files configurable from CLI. It
> > improves
> > >> the
> > >> > > > flexibility of  Flink deployment Yarn Per Job Mode. For platform
> > >> users
> > >> > > who
> > >> > > > manage tens of hundreds of streaming pipelines for the whole org
> > or
> > >> > > > company, we found the job graph generation in client-side is
> > another
> > >> > > > pinpoint. Thus, we want to propose a configurable feature for
> > >> > > > FlinkYarnSessionCli. The feature can allow users to choose the
> job
> > >> > graph
> > >> > > > generation in Flink ClusterEntryPoint so that the job jar
> doesn't
> > >> need
> > >> > to
> > >> > > > be locally for the job graph generation. The proposal is
> organized
> > >> as a
> > >> > > > FLIP
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation
> > >> > > > .
> > >> > > >
> > >> > > > Any questions and suggestions are welcomed. Thank you in
> advance.
> > >> > > >
> > >> > > >
> > >> > > > Best Regards
> > >> > > > Peter Huang
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Reply via email to