Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Biao Geng Tue, 30 May 2023 22:08:05 -0700

Thanks Paul for the proposal!I believe it would be very useful for flink
users.
After reading the FLIP, I have some questions:
1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in
Application mode? More specifically, if we use SQL client/gateway to
execute some interactive SQLs like a SELECT query, can we ask flink to use
Application mode to execute those queries after this FLIP?
2. Deployment: I believe in YARN mode, the implementation is trivial as we
can ship files via YARN's tool easily but for K8s, things can be more
complicated as Shengkai said. I have implemented a simple POC
<https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133>
based on SQL client before(i.e. consider the SQL client which supports
executing a SQL file as the SQL driver in this FLIP). One problem I have
met is how do we ship SQL files ( or Job Graph) to the k8s side. Without
such support, users have to modify the initContainer or rebuild a new K8s
image every time to fetch the SQL file. Like the flink k8s operator, one
workaround is to utilize the flink config(transforming the SQL file to a
escaped string like Weihua mentioned) which will be converted to a
ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB
<https://kubernetes.io/docs/concepts/configuration/configmap/>). Not sure
if we have better solutions.
3. Serialization of SessionState: in SessionState, there are some
unserializable fields
like org.apache.flink.table.resource.ResourceManager#userClassLoader. It
may be worthwhile to add more details about the serialization part.


Best,
Biao Geng

Paul Lam <[email protected]> 于2023年5月31日周三 11:49写道：

> Hi Weihua,
>
> Thanks a lot for your input! Please see my comments inline.
>
> > - Is SQLRunner the better name? We use this to run a SQL Job. (Not
> strong,
> > the SQLDriver is fine for me)
>
> I’ve thought about SQL Runner but picked SQL Driver for the following
> reasons FYI:
>
> 1. I have a PythonDriver doing the same job for PyFlink [1]
> 2. Flink program's main class is sort of like Driver in JDBC which
> translates SQLs into
>     databases specific languages.
>
> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
>
> > - Could we run SQL jobs using SQL in strings? Otherwise, we need to
> prepare
> > a SQL file in an image for Kubernetes application mode, which may be a
> bit
> > cumbersome.
>
> Do you mean a pass the SQL string a configuration or a program argument?
>
> I thought it might be convenient for testing propose, but not recommended
> for production,
> cause Flink SQLs could be complicated and involves lots of characters that
> need to escape.
>
> WDYT?
>
> > - I noticed that we don't specify the SQLDriver jar in the
> "run-application"
> > command. Does that mean we need to perform automatic detection in Flink?
>
> Yes! It’s like running a PyFlink job with the following command:
>
> ```
> ./bin/flink run \
>       --pyModule table.word_count \
>       --pyFiles examples/python/table
> ```
>
> The CLI determines if it’s a SQL job, if yes apply the SQL Driver
> automatically.
>
>
> [1]
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>
> Best,
> Paul Lam
>
> > 2023年5月30日 21:56，Weihua Hu <[email protected]> 写道：
> >
> > Thanks Paul for the proposal.
> >
> > +1 for this. It is valuable in improving ease of use.
> >
> > I have a few questions.
> > - Is SQLRunner the better name? We use this to run a SQL Job. (Not
> strong,
> > the SQLDriver is fine for me)
> > - Could we run SQL jobs using SQL in strings? Otherwise, we need to
> prepare
> > a SQL file in an image for Kubernetes application mode, which may be a
> bit
> > cumbersome.
> > - I noticed that we don't specify the SQLDriver jar in the
> "run-application"
> > command. Does that mean we need to perform automatic detection in Flink?
> >
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, May 29, 2023 at 7:24 PM Paul Lam <[email protected]> wrote:
> >
> >> Hi team,
> >>
> >> I’d like to start a discussion about FLIP-316 [1], which introduces a
> SQL
> >> driver as the
> >> default main class for Flink SQL jobs.
> >>
> >> Currently, Flink SQL could be executed out of the box either via SQL
> >> Client/Gateway
> >> or embedded in a Flink Java/Python program.
> >>
> >> However, each one has its drawback:
> >>
> >> - SQL Client/Gateway doesn’t support the application deployment mode [2]
> >> - Flink Java/Python program requires extra work to write a non-SQL
> program
> >>
> >> Therefore, I propose adding a SQL driver to act as the default main
> class
> >> for SQL jobs.
> >> Please see the FLIP docs for details and feel free to comment. Thanks!
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> >>>
> >> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
> >> https://issues.apache.org/jira/browse/FLINK-26541>
> >>
> >> Best,
> >> Paul Lam
>
>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to