Hi Peter, I think we need to reconsider tison's suggestion seriously. After FLIP-73, the deployJobCluster has beenmoved into `JobClusterExecutor#execute`. It should not be perceived for `CliFrontend`. That means the user program will *ALWAYS* be executed on client side. This is the by design behavior. So, we could not just add `if(client mode) .. else if(cluster mode) ...` codes in `CliFrontend` to bypass the executor. We need to find a clean way to decouple executing user program and deploying per-job cluster. Based on this, we could support to execute user program on client or master side.
Maybe Aljoscha and Jeff could give some good suggestions. Best, Yang Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月25日周三 上午4:03写道: > Hi Jingjing, > > The improvement proposed is a deployment option for CLI. For SQL based > Flink application, It is more convenient to use the existing model in > SqlClient in which > the job graph is generated within SqlClient. After adding the delayed job > graph generation, I think there is no change is needed for your side. > > > Best Regards > Peter Huang > > > On Wed, Dec 18, 2019 at 6:01 AM jingjing bai <baijingjing7...@gmail.com> > wrote: > > > hi peter: > > we had extension SqlClent to support sql job submit in web base on > > flink 1.9. we support submit to yarn on per job mode too. > > in this case, the job graph generated on client side . I think this > > discuss Mainly to improve api programme. but in my case , there is no > > jar to upload but only a sql string . > > do u had more suggestion to improve for sql mode or it is only a > > switch for api programme? > > > > > > best > > bai jj > > > > > > Yang Wang <danrtsey...@gmail.com> 于2019年12月18日周三 下午7:21写道: > > > >> I just want to revive this discussion. > >> > >> Recently, i am thinking about how to natively run flink per-job cluster > on > >> Kubernetes. > >> The per-job mode on Kubernetes is very different from on Yarn. And we > will > >> have > >> the same deployment requirements to the client and entry point. > >> > >> 1. Flink client not always need a local jar to start a Flink per-job > >> cluster. We could > >> support multiple schemas. For example, file:///path/of/my.jar means a > jar > >> located > >> at client side, hdfs://myhdfs/user/myname/flink/my.jar means a jar > located > >> at > >> remote hdfs, local:///path/in/image/my.jar means a jar located at > >> jobmanager side. > >> > >> 2. Support running user program on master side. This also means the > entry > >> point > >> will generate the job graph on master side. We could use the > >> ClasspathJobGraphRetriever > >> or start a local Flink client to achieve this purpose. > >> > >> > >> cc tison, Aljoscha & Kostas Do you think this is the right direction we > >> need to work? > >> > >> tison <wander4...@gmail.com> 于2019年12月12日周四 下午4:48写道: > >> > >> > A quick idea is that we separate the deployment from user program that > >> it > >> > has always been done > >> > outside the program. On user program executed there is always a > >> > ClusterClient that communicates with > >> > an existing cluster, remote or local. It will be another thread so > just > >> for > >> > your information. > >> > > >> > Best, > >> > tison. > >> > > >> > > >> > tison <wander4...@gmail.com> 于2019年12月12日周四 下午4:40写道: > >> > > >> > > Hi Peter, > >> > > > >> > > Another concern I realized recently is that with current Executors > >> > > abstraction(FLIP-73) > >> > > I'm afraid that user program is designed to ALWAYS run on the client > >> > side. > >> > > Specifically, > >> > > we deploy the job in executor when env.execute called. This > >> abstraction > >> > > possibly prevents > >> > > Flink runs user program on the cluster side. > >> > > > >> > > For your proposal, in this case we already compiled the program and > >> run > >> > on > >> > > the client side, > >> > > even we deploy a cluster and retrieve job graph from program > >> metadata, it > >> > > doesn't make > >> > > many sense. > >> > > > >> > > cc Aljoscha & Kostas what do you think about this constraint? > >> > > > >> > > Best, > >> > > tison. > >> > > > >> > > > >> > > Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月10日周二 下午12:45写道: > >> > > > >> > >> Hi Tison, > >> > >> > >> > >> Yes, you are right. I think I made the wrong argument in the doc. > >> > >> Basically, the packaging jar problem is only for platform users. In > >> our > >> > >> internal deploy service, > >> > >> we further optimized the deployment latency by letting users to > >> > packaging > >> > >> flink-runtime together with the uber jar, so that we don't need to > >> > >> consider > >> > >> multiple flink version > >> > >> support for now. In the session client mode, as Flink libs will be > >> > shipped > >> > >> anyway as local resources of yarn. Users actually don't need to > >> package > >> > >> those libs into job jar. > >> > >> > >> > >> > >> > >> > >> > >> Best Regards > >> > >> Peter Huang > >> > >> > >> > >> On Mon, Dec 9, 2019 at 8:35 PM tison <wander4...@gmail.com> wrote: > >> > >> > >> > >> > > 3. What do you mean about the package? Do users need to compile > >> > their > >> > >> > jars > >> > >> > inlcuding flink-clients, flink-optimizer, flink-table codes? > >> > >> > > >> > >> > The answer should be no because they exist in system classpath. > >> > >> > > >> > >> > Best, > >> > >> > tison. > >> > >> > > >> > >> > > >> > >> > Yang Wang <danrtsey...@gmail.com> 于2019年12月10日周二 下午12:18写道: > >> > >> > > >> > >> > > Hi Peter, > >> > >> > > > >> > >> > > Thanks a lot for starting this discussion. I think this is a > very > >> > >> useful > >> > >> > > feature. > >> > >> > > > >> > >> > > Not only for Yarn, i am focused on flink on Kubernetes > >> integration > >> > and > >> > >> > come > >> > >> > > across the same > >> > >> > > problem. I do not want the job graph generated on client side. > >> > >> Instead, > >> > >> > the > >> > >> > > user jars are built in > >> > >> > > a user-defined image. When the job manager launched, we just > >> need to > >> > >> > > generate the job graph > >> > >> > > based on local user jars. > >> > >> > > > >> > >> > > I have some small suggestion about this. > >> > >> > > > >> > >> > > 1. `ProgramJobGraphRetriever` is very similar to > >> > >> > > `ClasspathJobGraphRetriever`, the differences > >> > >> > > are the former needs `ProgramMetadata` and the latter needs > some > >> > >> > arguments. > >> > >> > > Is it possible to > >> > >> > > have an unified `JobGraphRetriever` to support both? > >> > >> > > 2. Is it possible to not use a local user jar to start a > per-job > >> > >> cluster? > >> > >> > > In your case, the user jars has > >> > >> > > existed on hdfs already and we do need to download the jars to > >> > >> deployer > >> > >> > > service. Currently, we > >> > >> > > always need a local user jar to start a flink cluster. It is be > >> > great > >> > >> if > >> > >> > we > >> > >> > > could support remote user jars. > >> > >> > > >> In the implementation, we assume users package > flink-clients, > >> > >> > > flink-optimizer, flink-table together within the job jar. > >> Otherwise, > >> > >> the > >> > >> > > job graph generation within JobClusterEntryPoint will fail. > >> > >> > > 3. What do you mean about the package? Do users need to compile > >> > their > >> > >> > jars > >> > >> > > inlcuding flink-clients, flink-optimizer, flink-table codes? > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > Best, > >> > >> > > Yang > >> > >> > > > >> > >> > > Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月10日周二 > >> 上午2:37写道: > >> > >> > > > >> > >> > > > Dear All, > >> > >> > > > > >> > >> > > > Recently, the Flink community starts to improve the yarn > >> cluster > >> > >> > > descriptor > >> > >> > > > to make job jar and config files configurable from CLI. It > >> > improves > >> > >> the > >> > >> > > > flexibility of Flink deployment Yarn Per Job Mode. For > >> platform > >> > >> users > >> > >> > > who > >> > >> > > > manage tens of hundreds of streaming pipelines for the whole > >> org > >> > or > >> > >> > > > company, we found the job graph generation in client-side is > >> > another > >> > >> > > > pinpoint. Thus, we want to propose a configurable feature for > >> > >> > > > FlinkYarnSessionCli. The feature can allow users to choose > the > >> job > >> > >> > graph > >> > >> > > > generation in Flink ClusterEntryPoint so that the job jar > >> doesn't > >> > >> need > >> > >> > to > >> > >> > > > be locally for the job graph generation. The proposal is > >> organized > >> > >> as a > >> > >> > > > FLIP > >> > >> > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation > >> > >> > > > . > >> > >> > > > > >> > >> > > > Any questions and suggestions are welcomed. Thank you in > >> advance. > >> > >> > > > > >> > >> > > > > >> > >> > > > Best Regards > >> > >> > > > Peter Huang > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > > > >> > > >> > > >