Thanks Paul for the proposal!I believe it would be very useful for flink users. After reading the FLIP, I have some questions: 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in Application mode? More specifically, if we use SQL client/gateway to execute some interactive SQLs like a SELECT query, can we ask flink to use Application mode to execute those queries after this FLIP? 2. Deployment: I believe in YARN mode, the implementation is trivial as we can ship files via YARN's tool easily but for K8s, things can be more complicated as Shengkai said. I have implemented a simple POC <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> based on SQL client before(i.e. consider the SQL client which supports executing a SQL file as the SQL driver in this FLIP). One problem I have met is how do we ship SQL files ( or Job Graph) to the k8s side. Without such support, users have to modify the initContainer or rebuild a new K8s image every time to fetch the SQL file. Like the flink k8s operator, one workaround is to utilize the flink config(transforming the SQL file to a escaped string like Weihua mentioned) which will be converted to a ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB <https://kubernetes.io/docs/concepts/configuration/configmap/>). Not sure if we have better solutions. 3. Serialization of SessionState: in SessionState, there are some unserializable fields like org.apache.flink.table.resource.ResourceManager#userClassLoader. It may be worthwhile to add more details about the serialization part.
Best, Biao Geng Paul Lam <paullin3...@gmail.com> 于2023年5月31日周三 11:49写道: > Hi Weihua, > > Thanks a lot for your input! Please see my comments inline. > > > - Is SQLRunner the better name? We use this to run a SQL Job. (Not > strong, > > the SQLDriver is fine for me) > > I’ve thought about SQL Runner but picked SQL Driver for the following > reasons FYI: > > 1. I have a PythonDriver doing the same job for PyFlink [1] > 2. Flink program's main class is sort of like Driver in JDBC which > translates SQLs into > databases specific languages. > > In general, I’m +1 for SQL Driver and +0 for SQL Runner. > > > - Could we run SQL jobs using SQL in strings? Otherwise, we need to > prepare > > a SQL file in an image for Kubernetes application mode, which may be a > bit > > cumbersome. > > Do you mean a pass the SQL string a configuration or a program argument? > > I thought it might be convenient for testing propose, but not recommended > for production, > cause Flink SQLs could be complicated and involves lots of characters that > need to escape. > > WDYT? > > > - I noticed that we don't specify the SQLDriver jar in the > "run-application" > > command. Does that mean we need to perform automatic detection in Flink? > > Yes! It’s like running a PyFlink job with the following command: > > ``` > ./bin/flink run \ > --pyModule table.word_count \ > --pyFiles examples/python/table > ``` > > The CLI determines if it’s a SQL job, if yes apply the SQL Driver > automatically. > > > [1] > https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java > > Best, > Paul Lam > > > 2023年5月30日 21:56,Weihua Hu <huweihua....@gmail.com> 写道: > > > > Thanks Paul for the proposal. > > > > +1 for this. It is valuable in improving ease of use. > > > > I have a few questions. > > - Is SQLRunner the better name? We use this to run a SQL Job. (Not > strong, > > the SQLDriver is fine for me) > > - Could we run SQL jobs using SQL in strings? Otherwise, we need to > prepare > > a SQL file in an image for Kubernetes application mode, which may be a > bit > > cumbersome. > > - I noticed that we don't specify the SQLDriver jar in the > "run-application" > > command. Does that mean we need to perform automatic detection in Flink? > > > > > > Best, > > Weihua > > > > > > On Mon, May 29, 2023 at 7:24 PM Paul Lam <paullin3...@gmail.com> wrote: > > > >> Hi team, > >> > >> I’d like to start a discussion about FLIP-316 [1], which introduces a > SQL > >> driver as the > >> default main class for Flink SQL jobs. > >> > >> Currently, Flink SQL could be executed out of the box either via SQL > >> Client/Gateway > >> or embedded in a Flink Java/Python program. > >> > >> However, each one has its drawback: > >> > >> - SQL Client/Gateway doesn’t support the application deployment mode [2] > >> - Flink Java/Python program requires extra work to write a non-SQL > program > >> > >> Therefore, I propose adding a SQL driver to act as the default main > class > >> for SQL jobs. > >> Please see the FLIP docs for details and feel free to comment. Thanks! > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver > >> < > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver > >>> > >> [2] https://issues.apache.org/jira/browse/FLINK-26541 < > >> https://issues.apache.org/jira/browse/FLINK-26541> > >> > >> Best, > >> Paul Lam > >