Hi Yang, Thanks for your input, I can see the master side job graph generation is a common requirement for per job mode. I think FLIP-73 is mainly for session mode. I think the proposal is a valid improvement for existing CLI and per job mode.
Best Regards Peter Huang On Wed, Dec 18, 2019 at 3:21 AM Yang Wang <danrtsey...@gmail.com> wrote: > I just want to revive this discussion. > > Recently, i am thinking about how to natively run flink per-job cluster on > Kubernetes. > The per-job mode on Kubernetes is very different from on Yarn. And we will > have > the same deployment requirements to the client and entry point. > > 1. Flink client not always need a local jar to start a Flink per-job > cluster. We could > support multiple schemas. For example, file:///path/of/my.jar means a jar > located > at client side, hdfs://myhdfs/user/myname/flink/my.jar means a jar located > at > remote hdfs, local:///path/in/image/my.jar means a jar located at > jobmanager side. > > 2. Support running user program on master side. This also means the entry > point > will generate the job graph on master side. We could use the > ClasspathJobGraphRetriever > or start a local Flink client to achieve this purpose. > > > cc tison, Aljoscha & Kostas Do you think this is the right direction we > need to work? > > tison <wander4...@gmail.com> 于2019年12月12日周四 下午4:48写道: > > > A quick idea is that we separate the deployment from user program that it > > has always been done > > outside the program. On user program executed there is always a > > ClusterClient that communicates with > > an existing cluster, remote or local. It will be another thread so just > for > > your information. > > > > Best, > > tison. > > > > > > tison <wander4...@gmail.com> 于2019年12月12日周四 下午4:40写道: > > > > > Hi Peter, > > > > > > Another concern I realized recently is that with current Executors > > > abstraction(FLIP-73) > > > I'm afraid that user program is designed to ALWAYS run on the client > > side. > > > Specifically, > > > we deploy the job in executor when env.execute called. This abstraction > > > possibly prevents > > > Flink runs user program on the cluster side. > > > > > > For your proposal, in this case we already compiled the program and run > > on > > > the client side, > > > even we deploy a cluster and retrieve job graph from program metadata, > it > > > doesn't make > > > many sense. > > > > > > cc Aljoscha & Kostas what do you think about this constraint? > > > > > > Best, > > > tison. > > > > > > > > > Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月10日周二 下午12:45写道: > > > > > >> Hi Tison, > > >> > > >> Yes, you are right. I think I made the wrong argument in the doc. > > >> Basically, the packaging jar problem is only for platform users. In > our > > >> internal deploy service, > > >> we further optimized the deployment latency by letting users to > > packaging > > >> flink-runtime together with the uber jar, so that we don't need to > > >> consider > > >> multiple flink version > > >> support for now. In the session client mode, as Flink libs will be > > shipped > > >> anyway as local resources of yarn. Users actually don't need to > package > > >> those libs into job jar. > > >> > > >> > > >> > > >> Best Regards > > >> Peter Huang > > >> > > >> On Mon, Dec 9, 2019 at 8:35 PM tison <wander4...@gmail.com> wrote: > > >> > > >> > > 3. What do you mean about the package? Do users need to compile > > their > > >> > jars > > >> > inlcuding flink-clients, flink-optimizer, flink-table codes? > > >> > > > >> > The answer should be no because they exist in system classpath. > > >> > > > >> > Best, > > >> > tison. > > >> > > > >> > > > >> > Yang Wang <danrtsey...@gmail.com> 于2019年12月10日周二 下午12:18写道: > > >> > > > >> > > Hi Peter, > > >> > > > > >> > > Thanks a lot for starting this discussion. I think this is a very > > >> useful > > >> > > feature. > > >> > > > > >> > > Not only for Yarn, i am focused on flink on Kubernetes integration > > and > > >> > come > > >> > > across the same > > >> > > problem. I do not want the job graph generated on client side. > > >> Instead, > > >> > the > > >> > > user jars are built in > > >> > > a user-defined image. When the job manager launched, we just need > to > > >> > > generate the job graph > > >> > > based on local user jars. > > >> > > > > >> > > I have some small suggestion about this. > > >> > > > > >> > > 1. `ProgramJobGraphRetriever` is very similar to > > >> > > `ClasspathJobGraphRetriever`, the differences > > >> > > are the former needs `ProgramMetadata` and the latter needs some > > >> > arguments. > > >> > > Is it possible to > > >> > > have an unified `JobGraphRetriever` to support both? > > >> > > 2. Is it possible to not use a local user jar to start a per-job > > >> cluster? > > >> > > In your case, the user jars has > > >> > > existed on hdfs already and we do need to download the jars to > > >> deployer > > >> > > service. Currently, we > > >> > > always need a local user jar to start a flink cluster. It is be > > great > > >> if > > >> > we > > >> > > could support remote user jars. > > >> > > >> In the implementation, we assume users package flink-clients, > > >> > > flink-optimizer, flink-table together within the job jar. > Otherwise, > > >> the > > >> > > job graph generation within JobClusterEntryPoint will fail. > > >> > > 3. What do you mean about the package? Do users need to compile > > their > > >> > jars > > >> > > inlcuding flink-clients, flink-optimizer, flink-table codes? > > >> > > > > >> > > > > >> > > > > >> > > Best, > > >> > > Yang > > >> > > > > >> > > Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月10日周二 上午2:37写道: > > >> > > > > >> > > > Dear All, > > >> > > > > > >> > > > Recently, the Flink community starts to improve the yarn cluster > > >> > > descriptor > > >> > > > to make job jar and config files configurable from CLI. It > > improves > > >> the > > >> > > > flexibility of Flink deployment Yarn Per Job Mode. For platform > > >> users > > >> > > who > > >> > > > manage tens of hundreds of streaming pipelines for the whole org > > or > > >> > > > company, we found the job graph generation in client-side is > > another > > >> > > > pinpoint. Thus, we want to propose a configurable feature for > > >> > > > FlinkYarnSessionCli. The feature can allow users to choose the > job > > >> > graph > > >> > > > generation in Flink ClusterEntryPoint so that the job jar > doesn't > > >> need > > >> > to > > >> > > > be locally for the job graph generation. The proposal is > organized > > >> as a > > >> > > > FLIP > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation > > >> > > > . > > >> > > > > > >> > > > Any questions and suggestions are welcomed. Thank you in > advance. > > >> > > > > > >> > > > > > >> > > > Best Regards > > >> > > > Peter Huang > > >> > > > > > >> > > > > >> > > > >> > > > > > >