Hi Weihua, Thanks a lot for your input!
I see the difference here is implementing the file distribution mechanism in the generic CLI or in the SQL Driver. The CLI approach could benefit non-pure-SQL applications (which is not covered by SQL Driver) as well. Not sure if you’re proposing the CliFrontend + init containers approach or suggesting a native file mechanism for the flink-kubernetes module? Considering your 2nd question, I’m assuming the latter. I think it’s totally worthwhile to discuss, and we may split it into another FLIP, because it’s a significant change. Personally, I’m big +1 for a built-in file distribution mechanism on K8s. Best, Paul Lam > 2023年6月7日 10:51,Weihua Hu <huweihua....@gmail.com> 写道: > > Hi, > > Thanks for updating the FLIP. > > I have two cents on the distribution of SQLs and resources. > 1. Should we support a common file distribution mechanism for k8s > application mode? > I have seen some issues and requirements on the mailing list. > In our production environment, we implement the download command in the > CliFrontend. > And automatically add an init container to the POD for file downloading. > The advantage of this > is that we can use all Flink-supported file systems to store files. > > This need more discussion. I would appreciate hearing more opinions. > > 2. In this FLIP, we distribute files in two different ways in YARN and > Kubernetes. Can we combine it in one way? > If we don't want to implement a common file distribution for k8s > application mode. Could we use the SQLDriver > to download the files both in YARN and K8S? IMO, this can reduce the cost > of code maintenance. > > Best, > Weihua > > > On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <paullin3...@gmail.com > <mailto:paullin3...@gmail.com>> wrote: > >> Hi Mason, >> >> Thanks for your input! >> >>> +1 for init containers or a more generalized way of obtaining arbitrary >>> files. File fetching isn't specific to just SQL--it also matters for Java >>> applications if the user doesn't want to rebuild a Flink image and just >>> wants to modify the user application fat jar. >> >> I agree that utilizing SQL Drivers in Java applications is equally >> important >> as employing them in SQL Gateway. WRT init containers, I think most >> users use them just as a workaround. For example, wget a jar from the >> maven repo. >> >> We could implement the functionality in SQL Driver in a more graceful >> way and the flink-supported filesystem approach seems to be a >> good choice. >> >>> Also, what do you think about prefixing the config options with >>> `sql-driver` instead of just `sql` to be more specific? >> >> LGTM, since SQL Driver is a public interface and the options are >> specific to it. >> >> Best, >> Paul Lam >> >>> 2023年6月6日 06:30,Mason Chen <mas.chen6...@gmail.com> 写道: >>> >>> Hi Paul, >>> >>> +1 for this feature and supporting SQL file + JSON plans. We get a lot of >>> requests to just be able to submit a SQL file, but the JSON plan >>> optimizations make sense. >>> >>> +1 for init containers or a more generalized way of obtaining arbitrary >>> files. File fetching isn't specific to just SQL--it also matters for Java >>> applications if the user doesn't want to rebuild a Flink image and just >>> wants to modify the user application fat jar. >>> >>> Please note that we could reuse the checkpoint storage like S3/HDFS, >> which >>>> should >>> >>> be required to run Flink in production, so I guess that would be >> acceptable >>>> for most >>> >>> users. WDYT? >>> >>> >>> If you do go this route, it would be nice to support writing these files >> to >>> S3/HDFS via Flink. This makes access control and policy management >> simpler. >>> >>> Also, what do you think about prefixing the config options with >>> `sql-driver` instead of just `sql` to be more specific? >>> >>> Best, >>> Mason >>> >>> On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <paullin3...@gmail.com >>> <mailto:paullin3...@gmail.com> <mailto: >> paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> wrote: >>> >>>> Hi Jark, >>>> >>>> Thanks for your input! Please see my comments inline. >>>> >>>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL? >>>>> DataStream API also doesn't provide a default main class for users, >>>>> why do we need to provide such one for SQL? >>>> >>>> Sorry for the confusion I caused. By DataStream jobs, I mean jobs >> submitted >>>> via Flink CLI which actually could be DataStream/Table jobs. >>>> >>>> I think a default main class would be user-friendly which eliminates the >>>> need >>>> for users to write a main class as SQLRunner in Flink K8s operator [1]. >>>> >>>>> I thought the proposed SqlDriver was a dedicated main class accepting >>>> SQL files, is >>>>> that correct? >>>> >>>> Both JSON plans and SQL files are accepted. SQL Gateway should use JSON >>>> plans, >>>> while CLI users may use either JSON plans or SQL files. >>>> >>>> Please see the updated FLIP[2] for more details. >>>> >>>>> Personally, I prefer the way of init containers which doesn't depend on >>>>> additional components. >>>>> This can reduce the moving parts of a production environment. >>>>> Depending on a distributed file system makes the testing, demo, and >> local >>>>> setup harder than init containers. >>>> >>>> Please note that we could reuse the checkpoint storage like S3/HDFS, >> which >>>> should >>>> be required to run Flink in production, so I guess that would be >>>> acceptable for most >>>> users. WDYT? >>>> >>>> WRT testing, demo, and local setups, I think we could support the local >>>> filesystem >>>> scheme i.e. file://** <file:///**> as the state backends do. It works as >>>> long as SQL >>>> Gateway >>>> and JobManager(or SQL Driver) can access the resource directory >> (specified >>>> via >>>> `sql-gateway.application.storage-dir`). >>>> >>>> Thanks! >>>> >>>> [1] >>>> >> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java >> >> <https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java> >>>> [2] >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver> >>>> [3] >>>> >> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161 >> >> <https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161> >>>> >>>> Best, >>>> Paul Lam >>>> >>>>> 2023年6月3日 12:21,Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>> 写道: >>>>> >>>>> Hi Paul, >>>>> >>>>> Thanks for your reply. I left my comments inline. >>>>> >>>>>> As the FLIP said, it’s good to have a default main class for Flink >> SQLs, >>>>>> which allows users to submit Flink SQLs in the same way as DataStream >>>>>> jobs, or else users need to write their own main class. >>>>> >>>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL? >>>>> DataStream API also doesn't provide a default main class for users, >>>>> why do we need to provide such one for SQL? >>>>> >>>>>> With the help of ExecNodeGraph, do we still need the serialized >>>>>> SessionState? If not, we could make SQL Driver accepts two serialized >>>>>> formats: >>>>> >>>>> No, ExecNodeGraph doesn't need to serialize SessionState. I thought the >>>>> proposed SqlDriver was a dedicated main class accepting SQL files, is >>>>> that correct? >>>>> If true, we have to ship the SessionState for this case which is a >> large >>>>> work. >>>>> I think we just need a JsonPlanDriver which is a main class that >> accepts >>>>> JsonPlan as the parameter. >>>>> >>>>> >>>>>> The common solutions I know is to use distributed file systems or use >>>>>> init containers to localize the resources. >>>>> >>>>> Personally, I prefer the way of init containers which doesn't depend on >>>>> additional components. >>>>> This can reduce the moving parts of a production environment. >>>>> Depending on a distributed file system makes the testing, demo, and >> local >>>>> setup harder than init containers. >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, 2 Jun 2023 at 18:10, Paul Lam <paullin3...@gmail.com >>>>> <mailto:paullin3...@gmail.com> <mailto: >> paullin3...@gmail.com <mailto:paullin3...@gmail.com>> <mailto: >>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> wrote: >>>>> >>>>>> The FLIP is in the early phase and some details are not included, but >>>>>> fortunately, we got lots of valuable ideas from the discussion. >>>>>> >>>>>> Thanks to everyone who joined the dissuasion! >>>>>> @Weihua @Shanmon @Shengkai @Biao @Jark >>>>>> >>>>>> This weekend I’m gonna revisit and update the FLIP, adding more >>>>>> details. Hopefully, we can further align our opinions. >>>>>> >>>>>> Best, >>>>>> Paul Lam >>>>>> >>>>>>> 2023年6月2日 18:02,Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com> <mailto: >> paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> 写道: >>>>>>> >>>>>>> Hi Jark, >>>>>>> >>>>>>> Thanks a lot for your input! >>>>>>> >>>>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it >> still >>>>>>>> necessary to support SQL Driver? >>>>>>> >>>>>>> I think so. Apart from usage in SQL Gateway, SQL Driver could >> simplify >>>>>>> Flink SQL execution with Flink CLI. >>>>>>> >>>>>>> As the FLIP said, it’s good to have a default main class for Flink >>>> SQLs, >>>>>>> which allows users to submit Flink SQLs in the same way as DataStream >>>>>>> jobs, or else users need to write their own main class. >>>>>>> >>>>>>>> SQL Driver needs to serialize SessionState which is very challenging >>>>>>>> but not detailed covered in the FLIP. >>>>>>> >>>>>>> With the help of ExecNodeGraph, do we still need the serialized >>>>>>> SessionState? If not, we could make SQL Driver accepts two serialized >>>>>>> formats: >>>>>>> >>>>>>> - SQL files for user-facing public usage >>>>>>> - ExecNodeGraph for internal usage >>>>>>> >>>>>>> It’s kind of similar to the relationship between job jars and >>>> jobgraphs. >>>>>>> >>>>>>>> Regarding "K8S doesn't support shipping multiple jars", is that >> true? >>>>>> Is it >>>>>>>> possible to support it? >>>>>>> >>>>>>> Yes, K8s doesn’t distribute any files. It’s the users’ responsibility >>>> to >>>>>> make >>>>>>> sure the resources are accessible in the containers. The common >>>> solutions >>>>>>> I know is to use distributed file systems or use init containers to >>>>>> localize the >>>>>>> resources. >>>>>>> >>>>>>> Now I lean toward introducing a fs to do the distribution job. WDYT? >>>>>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>>> 2023年6月1日 20:33,Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com> >>>>>>>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> >> <mailto:imj...@gmail.com <mailto:imj...@gmail.com> <mailto:imj...@gmail.com >> <mailto:imj...@gmail.com>>> >>>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com> >>>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> <mailto: >> imj...@gmail.com <mailto:imj...@gmail.com> <mailto:imj...@gmail.com >> <mailto:imj...@gmail.com>>>>> >>>>>> 写道: >>>>>>>> >>>>>>>> Hi Paul, >>>>>>>> >>>>>>>> Thanks for starting this discussion. I like the proposal! This is a >>>>>>>> frequently requested feature! >>>>>>>> >>>>>>>> I agree with Shengkai that ExecNodeGraph as the submission object >> is a >>>>>>>> better idea than SQL file. To be more specific, it should be >>>>>> JsonPlanGraph >>>>>>>> or CompiledPlan which is the serializable representation. >> CompiledPlan >>>>>> is a >>>>>>>> clear separation between compiling/optimization/validation and >>>>>> execution. >>>>>>>> This can keep the validation and metadata accessing still on the >>>>>> SQLGateway >>>>>>>> side. This allows SQLGateway to leverage some metadata caching and >> UDF >>>>>> JAR >>>>>>>> caching for better compiling performance. >>>>>>>> >>>>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it >> still >>>>>>>> necessary to support SQL Driver? Regarding non-interactive SQL jobs, >>>>>> users >>>>>>>> can use the Table API program for application mode. SQL Driver needs >>>> to >>>>>>>> serialize SessionState which is very challenging but not detailed >>>>>> covered >>>>>>>> in the FLIP. >>>>>>>> >>>>>>>> Regarding "K8S doesn't support shipping multiple jars", is that >> true? >>>>>> Is it >>>>>>>> possible to support it? >>>>>>>> >>>>>>>> Best, >>>>>>>> Jark >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, 1 Jun 2023 at 16:58, Paul Lam <paullin3...@gmail.com >>>>>>>> <mailto:paullin3...@gmail.com> >> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>> <mailto: >>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> <mailto: >>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>> <mailto: >> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Weihua, >>>>>>>>> >>>>>>>>> You’re right. Distributing the SQLs to the TMs is one of the >>>>>> challenging >>>>>>>>> parts of this FLIP. >>>>>>>>> >>>>>>>>> Web submission is not enabled in application mode currently as you >>>>>> said, >>>>>>>>> but it could be changed if we have good reasons. >>>>>>>>> >>>>>>>>> What do you think about introducing a distributed storage for SQL >>>>>> Gateway? >>>>>>>>> >>>>>>>>> We could make use of Flink file systems [1] to distribute the SQL >>>>>> Gateway >>>>>>>>> generated resources, that should solve the problem at its root >> cause. >>>>>>>>> >>>>>>>>> Users could specify Flink-supported file systems to ship files. >> It’s >>>>>> only >>>>>>>>> required when using SQL Gateway with K8s application mode. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>> >>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>> < >>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>>> < >>>>>> >>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>> < >>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>>>> >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Paul Lam >>>>>>>>> >>>>>>>>>> 2023年6月1日 13:55,Weihua Hu <huweihua....@gmail.com >>>>>>>>>> <mailto:huweihua....@gmail.com> <mailto: >> huweihua....@gmail.com <mailto:huweihua....@gmail.com>> <mailto: >>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>> <mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com>>>> 写道: >>>>>>>>>> >>>>>>>>>> Thanks Paul for your reply. >>>>>>>>>> >>>>>>>>>> SQLDriver looks good to me. >>>>>>>>>> >>>>>>>>>> 2. Do you mean a pass the SQL string a configuration or a program >>>>>>>>> argument? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I brought this up because we were unable to pass the SQL file to >>>> Flink >>>>>>>>>> using Kubernetes mode. >>>>>>>>>> For DataStream/Python users, they need to prepare their images for >>>> the >>>>>>>>> jars >>>>>>>>>> and dependencies. >>>>>>>>>> But for SQL users, they can use a common image to run different >> SQL >>>>>>>>> queries >>>>>>>>>> if there are no other udf requirements. >>>>>>>>>> It would be great if the SQL query and image were not bound. >>>>>>>>>> >>>>>>>>>> Using strings is a way to decouple these, but just as you >> mentioned, >>>>>> it's >>>>>>>>>> not easy to pass complex SQL. >>>>>>>>>> >>>>>>>>>>> use web submission >>>>>>>>>> AFAIK, we can not use web submission in the Application mode. >> Please >>>>>>>>>> correct me if I'm wrong. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Weihua >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, May 31, 2023 at 9:37 PM Paul Lam <paullin3...@gmail.com >>>>>>>>>> <mailto:paullin3...@gmail.com> >> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>> >>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Biao, >>>>>>>>>>> >>>>>>>>>>> Thanks for your comments! >>>>>>>>>>> >>>>>>>>>>>> 1. Scope: is this FLIP only targeted for non-interactive Flink >> SQL >>>>>> jobs >>>>>>>>>>> in >>>>>>>>>>>> Application mode? More specifically, if we use SQL >> client/gateway >>>> to >>>>>>>>>>>> execute some interactive SQLs like a SELECT query, can we ask >>>> flink >>>>>> to >>>>>>>>>>> use >>>>>>>>>>>> Application mode to execute those queries after this FLIP? >>>>>>>>>>> >>>>>>>>>>> Thanks for pointing it out. I think only DMLs would be executed >> via >>>>>> SQL >>>>>>>>>>> Driver. >>>>>>>>>>> I'll add the scope to the FLIP. >>>>>>>>>>> >>>>>>>>>>>> 2. Deployment: I believe in YARN mode, the implementation is >>>>>> trivial as >>>>>>>>>>> we >>>>>>>>>>>> can ship files via YARN's tool easily but for K8s, things can be >>>>>> more >>>>>>>>>>>> complicated as Shengkai said. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Your input is very informative. I’m thinking about using web >>>>>> submission, >>>>>>>>>>> but it requires exposing the JobManager port which could also be >> a >>>>>>>>> problem >>>>>>>>>>> on K8s. >>>>>>>>>>> >>>>>>>>>>> Another approach is to explicitly require a distributed storage >> to >>>>>> ship >>>>>>>>>>> files, >>>>>>>>>>> but we may need a new deployment executor for that. >>>>>>>>>>> >>>>>>>>>>> What do you think of these two approaches? >>>>>>>>>>> >>>>>>>>>>>> 3. Serialization of SessionState: in SessionState, there are >> some >>>>>>>>>>>> unserializable fields >>>>>>>>>>>> like >>>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader. >>>>>>>>> It >>>>>>>>>>>> may be worthwhile to add more details about the serialization >>>> part. >>>>>>>>>>> >>>>>>>>>>> I agree. That’s a missing part. But if we use ExecNodeGraph as >>>>>> Shengkai >>>>>>>>>>> mentioned, do we eliminate the need for serialization of >>>>>> SessionState? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Paul Lam >>>>>>>>>>> >>>>>>>>>>>> 2023年5月31日 13:07,Biao Geng <biaoge...@gmail.com >>>>>>>>>>>> <mailto:biaoge...@gmail.com> <mailto: >> biaoge...@gmail.com <mailto:biaoge...@gmail.com>> <mailto: >>>> biaoge...@gmail.com <mailto:biaoge...@gmail.com> >>>> <mailto:biaoge...@gmail.com <mailto:biaoge...@gmail.com>>>> 写道: >>>>>>>>>>>> >>>>>>>>>>>> Thanks Paul for the proposal!I believe it would be very useful >> for >>>>>>>>> flink >>>>>>>>>>>> users. >>>>>>>>>>>> After reading the FLIP, I have some questions: >>>>>>>>>>>> 1. Scope: is this FLIP only targeted for non-interactive Flink >> SQL >>>>>> jobs >>>>>>>>>>> in >>>>>>>>>>>> Application mode? More specifically, if we use SQL >> client/gateway >>>> to >>>>>>>>>>>> execute some interactive SQLs like a SELECT query, can we ask >>>> flink >>>>>> to >>>>>>>>>>> use >>>>>>>>>>>> Application mode to execute those queries after this FLIP? >>>>>>>>>>>> 2. Deployment: I believe in YARN mode, the implementation is >>>>>> trivial as >>>>>>>>>>> we >>>>>>>>>>>> can ship files via YARN's tool easily but for K8s, things can be >>>>>> more >>>>>>>>>>>> complicated as Shengkai said. I have implemented a simple POC >>>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >> < >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >>> >>>> < >>>> >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >> < >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >>> >>>>> >>>>>>>>>>>> >>>>>>>>>>>> based on SQL client before(i.e. consider the SQL client which >>>>>> supports >>>>>>>>>>>> executing a SQL file as the SQL driver in this FLIP). One >> problem >>>> I >>>>>>>>> have >>>>>>>>>>>> met is how do we ship SQL files ( or Job Graph) to the k8s side. >>>>>>>>> Without >>>>>>>>>>>> such support, users have to modify the initContainer or rebuild >> a >>>>>> new >>>>>>>>> K8s >>>>>>>>>>>> image every time to fetch the SQL file. Like the flink k8s >>>> operator, >>>>>>>>> one >>>>>>>>>>>> workaround is to utilize the flink config(transforming the SQL >>>> file >>>>>> to >>>>>>>>> a >>>>>>>>>>>> escaped string like Weihua mentioned) which will be converted >> to a >>>>>>>>>>>> ConfigMap but K8s has size limit of ConfigMaps(no larger than >> 1MB >>>>>>>>>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>>>>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> < >> https://kubernetes.io/docs/concepts/configuration/configmap/ >> <https://kubernetes.io/docs/concepts/configuration/configmap/>> < >>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> < >> https://kubernetes.io/docs/concepts/configuration/configmap/ >> <https://kubernetes.io/docs/concepts/configuration/configmap/>>>>). >>>>>> Not >>>>>>>>>>> sure >>>>>>>>>>>> if we have better solutions. >>>>>>>>>>>> 3. Serialization of SessionState: in SessionState, there are >> some >>>>>>>>>>>> unserializable fields >>>>>>>>>>>> like >>>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader. >>>>>>>>> It >>>>>>>>>>>> may be worthwhile to add more details about the serialization >>>> part. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Biao Geng >>>>>>>>>>>> >>>>>>>>>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>>>>>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>> >> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com> >> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> >>>> 于2023年5月31日周三 11:49写道: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Weihua, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot for your input! Please see my comments inline. >>>>>>>>>>>>> >>>>>>>>>>>>>> - Is SQLRunner the better name? We use this to run a SQL Job. >>>> (Not >>>>>>>>>>>>> strong, >>>>>>>>>>>>>> the SQLDriver is fine for me) >>>>>>>>>>>>> >>>>>>>>>>>>> I’ve thought about SQL Runner but picked SQL Driver for the >>>>>> following >>>>>>>>>>>>> reasons FYI: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. I have a PythonDriver doing the same job for PyFlink [1] >>>>>>>>>>>>> 2. Flink program's main class is sort of like Driver in JDBC >>>> which >>>>>>>>>>>>> translates SQLs into >>>>>>>>>>>>> databases specific languages. >>>>>>>>>>>>> >>>>>>>>>>>>> In general, I’m +1 for SQL Driver and +0 for SQL Runner. >>>>>>>>>>>>> >>>>>>>>>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise, we >> need >>>>>> to >>>>>>>>>>>>> prepare >>>>>>>>>>>>>> a SQL file in an image for Kubernetes application mode, which >>>> may >>>>>> be >>>>>>>>> a >>>>>>>>>>>>> bit >>>>>>>>>>>>>> cumbersome. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you mean a pass the SQL string a configuration or a program >>>>>>>>> argument? >>>>>>>>>>>>> >>>>>>>>>>>>> I thought it might be convenient for testing propose, but not >>>>>>>>>>> recommended >>>>>>>>>>>>> for production, >>>>>>>>>>>>> cause Flink SQLs could be complicated and involves lots of >>>>>> characters >>>>>>>>>>> that >>>>>>>>>>>>> need to escape. >>>>>>>>>>>>> >>>>>>>>>>>>> WDYT? >>>>>>>>>>>>> >>>>>>>>>>>>>> - I noticed that we don't specify the SQLDriver jar in the >>>>>>>>>>>>> "run-application" >>>>>>>>>>>>>> command. Does that mean we need to perform automatic detection >>>> in >>>>>>>>>>> Flink? >>>>>>>>>>>>> >>>>>>>>>>>>> Yes! It’s like running a PyFlink job with the following >> command: >>>>>>>>>>>>> >>>>>>>>>>>>> ``` >>>>>>>>>>>>> ./bin/flink run \ >>>>>>>>>>>>> --pyModule table.word_count \ >>>>>>>>>>>>> --pyFiles examples/python/table >>>>>>>>>>>>> ``` >>>>>>>>>>>>> >>>>>>>>>>>>> The CLI determines if it’s a SQL job, if yes apply the SQL >> Driver >>>>>>>>>>>>> automatically. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >> < >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >>> >>>> < >>>> >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >> < >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >>> >>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Paul Lam >>>>>>>>>>>>> >>>>>>>>>>>>>> 2023年5月30日 21:56,Weihua Hu <huweihua....@gmail.com >>>>>>>>>>>>>> <mailto:huweihua....@gmail.com> <mailto: >> huweihua....@gmail.com <mailto:huweihua....@gmail.com>> <mailto: >>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com>>> 写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Paul for the proposal. >>>>>>>>>>>>>> >>>>>>>>>>>>>> +1 for this. It is valuable in improving ease of use. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have a few questions. >>>>>>>>>>>>>> - Is SQLRunner the better name? We use this to run a SQL Job. >>>> (Not >>>>>>>>>>>>> strong, >>>>>>>>>>>>>> the SQLDriver is fine for me) >>>>>>>>>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise, we >> need >>>>>> to >>>>>>>>>>>>> prepare >>>>>>>>>>>>>> a SQL file in an image for Kubernetes application mode, which >>>> may >>>>>> be >>>>>>>>> a >>>>>>>>>>>>> bit >>>>>>>>>>>>>> cumbersome. >>>>>>>>>>>>>> - I noticed that we don't specify the SQLDriver jar in the >>>>>>>>>>>>> "run-application" >>>>>>>>>>>>>> command. Does that mean we need to perform automatic detection >>>> in >>>>>>>>>>> Flink? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Weihua >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, May 29, 2023 at 7:24 PM Paul Lam < >> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi team, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I’d like to start a discussion about FLIP-316 [1], which >>>>>> introduces >>>>>>>>> a >>>>>>>>>>>>> SQL >>>>>>>>>>>>>>> driver as the >>>>>>>>>>>>>>> default main class for Flink SQL jobs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Currently, Flink SQL could be executed out of the box either >>>> via >>>>>> SQL >>>>>>>>>>>>>>> Client/Gateway >>>>>>>>>>>>>>> or embedded in a Flink Java/Python program. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However, each one has its drawback: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - SQL Client/Gateway doesn’t support the application >> deployment >>>>>> mode >>>>>>>>>>> [2] >>>>>>>>>>>>>>> - Flink Java/Python program requires extra work to write a >>>>>> non-SQL >>>>>>>>>>>>> program >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Therefore, I propose adding a SQL driver to act as the >> default >>>>>> main >>>>>>>>>>>>> class >>>>>>>>>>>>>>> for SQL jobs. >>>>>>>>>>>>>>> Please see the FLIP docs for details and feel free to >> comment. >>>>>>>>> Thanks! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >>>>>>>>>>>>>>> < >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 < >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-26541> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Paul Lam