Thanks for the reply, Gyula. Also thanks for the input from Thomas and Dongwoo.
Helm charts / templates for creating SQL Gateway deployment / services sounds good to me. I'll work on it and also update that to the OLAP quickstart doc. Best, Yangze Guo On Tue, Sep 19, 2023 at 11:46 PM Dongwoo Kim <dongwoo7....@gmail.com> wrote: > > A simple Helm chart that covers both scenarios - running gateway > 1) as a sidecar, 2) as an independent deployment- with solid docs sounds > good to me. > It would be nice if we could also include optional features in the chart > such as k8s service for exposing the sql gateway. > > Best regards, > Dongwoo > > 2023년 9월 19일 (화) 오후 10:15, Thomas Weise <t...@apache.org>님이 작성: > > > It is already possible to bring up a SQL Gateway as a sidecar utilizing the > > pod templates - I tend to also see this more of a documentation/example > > issue rather than something that calls for a separate CRD or other > > dedicated operator support. > > > > Thanks, > > Thomas > > > > > > > > On Tue, Sep 19, 2023 at 3:41 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > > Based on this I think we should start with simple Helm charts / templates > > > for creating the `FlinkDeployment` together with a separate Deployment > > for > > > the SQL Gateway. > > > If the gateway itself doesn't integrate well with the operator managed > > CRs > > > (sessionjobs) then I think it's better and simpler to have it separately. > > > > > > These Helm charts should be part of the operator repo / examples with > > nice > > > docs. If we see that it's useful and popular we can start thinking of > > > integrating it into the CRD. > > > > > > What do you think? > > > Gyula > > > > > > On Tue, Sep 19, 2023 at 6:09 AM Yangze Guo <karma...@gmail.com> wrote: > > > > > > > Thanks for the reply, @Gyula. > > > > > > > > I would like to first provide more context on OLAP scenarios. In OLAP > > > > scenarios, users typically submit multiple short batch jobs that have > > > > execution times typically measured in seconds or even sub-seconds. > > > > Additionally, due to the lightweight nature of these jobs, they often > > > > do not require lifecycle management features and can disable high > > > > availability functionalities such as failover and checkpointing. > > > > > > > > Regarding the integration issue, I believe that supporting the > > > > generation of FlinkSessionJob through a gateway is a "nice to have" > > > > feature rather than a "must-have." Firstly, it may be overkill to > > > > create a CRD for such lightweight jobs, and it could potentially > > > > impact the end-to-end execution time of the OLAP job. Secondly, as > > > > mentioned earlier, these jobs do not have strong lifecycle management > > > > requirements, so having an operator manage them would be a bit wasted. > > > > Therefore, atm, we can allow users to directly submit jobs using JDBC > > > > or REST API. WDYT? > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Mon, Sep 18, 2023 at 4:08 PM Gyula Fóra <gyula.f...@gmail.com> > > wrote: > > > > > > > > > > As I wrote in my previous answer, this could be done as a helm chart > > or > > > > as > > > > > part of the operator easily. Both would work. > > > > > My main concern for adding this into the operator is that the SQL > > > Gateway > > > > > itself is not properly integrated with the Operator Custom resources. > > > > > > > > > > Gyula > > > > > > > > > > On Mon, Sep 18, 2023 at 4:24 AM Shammon FY <zjur...@gmail.com> > > wrote: > > > > > > > > > > > Thanks @Gyula, I would like to share our use of sql-gateway with > > the > > > > Flink > > > > > > session cluster and I hope that it could help you to have a clearer > > > > > > understanding of our needs :) > > > > > > > > > > > > As @Yangze mentioned, currently we use flink as an olap platform by > > > the > > > > > > following steps > > > > > > 1. Setup a flink session cluster by flink k8s session with k8s or > > zk > > > > > > highavailable. > > > > > > 2. Write a Helm chart for Sql-Gateway image and launch multiple > > > > gateway > > > > > > instances to submit jobs to the same flink session cluster. > > > > > > > > > > > > As we mentioned in docs[1], we hope that users can easily launch > > > > > > sql-gateway instances in k8s. Does it only need to add a Helm chart > > > for > > > > > > sql-gateway, or should we need to add this feature to the flink > > > > > > operator? Can you help give the conclusion? Thank you very much > > > @Gyula > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/olap_quickstart/ > > > > > > > > > > > > Best, > > > > > > Shammon FY > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Sep 17, 2023 at 2:02 PM Gyula Fóra <gyula.f...@gmail.com> > > > > wrote: > > > > > > > > > > > > > Hi! > > > > > > > It sounds pretty easy to deploy the gateway automatically with > > > > session > > > > > > > cluster deployments from the operator , but there is a major > > > > limitation > > > > > > > currently. The SQL gateway itself doesn't really support any > > > operator > > > > > > > integration so jobs submitted through the SQL gateway would not > > be > > > > > > > manageable by the operator (they won't show up as session jobs). > > > > > > > > > > > > > > Without that, this is a very strange feature. We would make > > > something > > > > > > much > > > > > > > easier for users that is not well supported by the operator in > > the > > > > first > > > > > > > place. The operator is designed to manage clusters and jobs > > > > > > > (FlinkDeployment / FlinkSessionJob). It would be good to > > understand > > > > if we > > > > > > > could make the SQL Gateway create a FlinkSessionJob / Deployment > > > > (that > > > > > > > would require application cluster support) and basically submit > > the > > > > job > > > > > > > through the operator. > > > > > > > > > > > > > > Cheers, > > > > > > > Gyula > > > > > > > > > > > > > > On Sun, Sep 17, 2023 at 1:26 AM Yangze Guo <karma...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > There would be many different ways of doing this. One gateway > > > per > > > > > > > > session cluster, one gateway shared across different > > clusters... > > > > > > > > > > > > > > > > Currently, sql gateway cannot be shared across multiple > > clusters. > > > > > > > > > > > > > > > > > understand the tradeoff and the simplest way of accomplishing > > > > this. > > > > > > > > > > > > > > > > I'm not familiar with the Flink operator codebase, it would be > > > > > > > > appreciated if you could elaborate more on the cost of adding > > > this > > > > > > > > feature. I agree that deploying a gateway using the native > > > > Kubernetes > > > > > > > > Deployment can be a simple way and straightforward for users. > > > > However, > > > > > > > > integrating it into an operator can provide additional benefits > > > > and be > > > > > > > > more user-friendly, especially for users who are less familiar > > > with > > > > > > > > Kubernetes. By using an operator, users can benefit from > > > consistent > > > > > > > > version management with the session cluster and upgrade > > > > capabilities. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > On Fri, Sep 15, 2023 at 5:38 PM Gyula Fóra < > > gyula.f...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > There would be many different ways of doing this. One gateway > > > per > > > > > > > session > > > > > > > > > cluster, one gateway shared across different clusters... > > > > > > > > > I would not rush to add anything anywhere until we understand > > > the > > > > > > > > tradeoff > > > > > > > > > and the simplest way of accomplishing this. > > > > > > > > > > > > > > > > > > The operator already supports ingresses for session clusters > > so > > > > we > > > > > > > could > > > > > > > > > have a gateway sitting somewhere else simply using it. > > > > > > > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > On Fri, Sep 15, 2023 at 10:18 AM Yangze Guo < > > > karma...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for bringing this up, Dongwoo. Flink SQL Gateway is > > > > also a > > > > > > key > > > > > > > > > > component for OLAP scenarios. > > > > > > > > > > > > > > > > > > > > @Gyula > > > > > > > > > > How about add sql gateway as an optional component to > > Session > > > > > > Cluster > > > > > > > > > > Deployments. User can specify the resource / instance > > number > > > > and > > > > > > > ports > > > > > > > > > > of the sql gateway. I think that would help a lot for OLAP > > > and > > > > > > batch > > > > > > > > > > user. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > On Fri, Sep 15, 2023 at 3:19 PM ConradJam < > > > jam.gz...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > If we start from the crd direction, I think this mode is > > > more > > > > > > like > > > > > > > a > > > > > > > > > > > sidecar of the session cluster, which is submitted to the > > > > session > > > > > > > > cluster > > > > > > > > > > > by sending sql commands to the sql gateway. I don't know > > if > > > > my > > > > > > > > statement > > > > > > > > > > is > > > > > > > > > > > accurate. > > > > > > > > > > > > > > > > > > > > > > Xiaolong Wang <xiaolong.w...@smartnews.com.invalid> > > > > > > 于2023年9月15日周五 > > > > > > > > > > 13:27写道: > > > > > > > > > > > > > > > > > > > > > > > Hi, Dongwoo, > > > > > > > > > > > > > > > > > > > > > > > > Since Flink SQL gateway should run upon a Flink session > > > > > > cluster, > > > > > > > I > > > > > > > > > > think > > > > > > > > > > > > it'd be easier to add more fields to the CRD of > > > > > > > `FlinkSessionJob`. > > > > > > > > > > > > > > > > > > > > > > > > e.g. > > > > > > > > > > > > > > > > > > > > > > > > apiVersion: flink.apache.org/v1beta1 > > > > > > > > > > > > kind: FlinkSessionJob > > > > > > > > > > > > metadata: > > > > > > > > > > > > name: sql-gateway > > > > > > > > > > > > spec: > > > > > > > > > > > > sqlGateway: > > > > > > > > > > > > endpoint: "hiveserver2" > > > > > > > > > > > > mode: "streaming" > > > > > > > > > > > > hiveConf: > > > > > > > > > > > > configMap: > > > > > > > > > > > > name: hive-config > > > > > > > > > > > > items: > > > > > > > > > > > > - key: hive-site.xml > > > > > > > > > > > > path: hive-site.xml > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 15, 2023 at 12:56 PM Dongwoo Kim < > > > > > > > > dongwoo7....@gmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > *@Gyula* > > > > > > > > > > > > > Thanks for the consideration Gyula. My initial idea > > for > > > > the > > > > > > CR > > > > > > > > was > > > > > > > > > > > > roughly > > > > > > > > > > > > > like below. > > > > > > > > > > > > > I focused on simplifying the setup in k8s > > environment, > > > > but I > > > > > > > > agree > > > > > > > > > > with > > > > > > > > > > > > > your opinion that for the sql gateway > > > > > > > > > > > > > we don't need custom operator logic to handle and > > most > > > > of the > > > > > > > > > > > > requirements > > > > > > > > > > > > > can be met by existing k8s resources. > > > > > > > > > > > > > So maybe helm chart that bundles all resources needed > > > > should > > > > > > be > > > > > > > > > > enough. > > > > > > > > > > > > > > > > > > > > > > > > > > apiVersion: flink.apache.org/v1beta1 > > > > > > > > > > > > > kind: FlinkSqlGateway > > > > > > > > > > > > > metadata: > > > > > > > > > > > > > name: flink-sql-gateway-example > > > > > > > > > > > > > namespace: default > > > > > > > > > > > > > spec: > > > > > > > > > > > > > clusterName: flink-session-cluster-example > > > > > > > > > > > > > exposeServiceType: LoadBalancer > > > > > > > > > > > > > flinkSqlGatewayConfiguration: > > > > > > > > > > > > > sql-gateway.endpoint.type: "hiveserver2" > > > > > > > > > > > > > sql-gateway.endpoint.hiveserver2.catalog.name: > > > > "hive" > > > > > > > > > > > > > hiveConf: > > > > > > > > > > > > > configMap: > > > > > > > > > > > > > name: hive-config > > > > > > > > > > > > > items: > > > > > > > > > > > > > - key: hive-site.xml > > > > > > > > > > > > > path: hive-site.xml > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *@xiaolong, @Shammon* > > > > > > > > > > > > > Hi xiaolong and Shammon. > > > > > > > > > > > > > Thanks for taking the time to share. > > > > > > > > > > > > > I'd also like to add my experience with setting up > > > flink > > > > sql > > > > > > > > gateway > > > > > > > > > > on > > > > > > > > > > > > > k8s. > > > > > > > > > > > > > Without building a new Docker image, I've added a > > > > separate > > > > > > > > container > > > > > > > > > > to > > > > > > > > > > > > the > > > > > > > > > > > > > existing JobManager pod and started the sql gateway > > > > using the > > > > > > > > > > > > > "sql-gateway.sh start-foreground" command. > > > > > > > > > > > > > I haven't explored deploying the sql gateway as an > > > > > > independent > > > > > > > > > > deployment > > > > > > > > > > > > > yet, but that's something I'm considering after > > > modifying > > > > > > JM's > > > > > > > > > > address to > > > > > > > > > > > > > desired session cluster. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks all > > > > > > > > > > > > > > > > > > > > > > > > > > Best > > > > > > > > > > > > > Dongwoo > > > > > > > > > > > > > > > > > > > > > > > > > > 2023년 9월 15일 (금) 오전 11:55, Xiaolong Wang > > > > > > > > > > > > > <xiaolong.w...@smartnews.com.invalid>님이 작성: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Shammon, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, I want to create a Flink SQL-gateway in a > > > > job-manager. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Currently, the above script is generally a > > > work-around > > > > and > > > > > > > > allows > > > > > > > > > > me to > > > > > > > > > > > > > > start a Flink session job manager with a SQL > > gateway > > > > > > running > > > > > > > > upon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I agree that it'd be more elegant that we create a > > > new > > > > job > > > > > > > > type and > > > > > > > > > > > > > write a > > > > > > > > > > > > > > script, which is much easier for the user to use > > > (since > > > > > > they > > > > > > > > do not > > > > > > > > > > > > need > > > > > > > > > > > > > to > > > > > > > > > > > > > > build a separate Flink image any more). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 15, 2023 at 10:29 AM Shammon FY < > > > > > > > zjur...@gmail.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Currently `sql-gateway` can be started with the > > > > script > > > > > > > > > > > > `sql-gateway.sh` > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > an existing node, it is more like a simple > > > > "standalone" > > > > > > > > node. I > > > > > > > > > > think > > > > > > > > > > > > > > it's > > > > > > > > > > > > > > > valuable if we can do more work to start it in > > k8s. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For xiaolong: > > > > > > > > > > > > > > > Do you want to start a sql-gateway instance in > > the > > > > > > > jobmanager > > > > > > > > > > pod? I > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > maybe we need a script like > > > > `kubernetes-sql-gatewah.sh` > > > > > > to > > > > > > > > start > > > > > > > > > > > > > > > `sql-gateway` pods with a flink image, what do > > you > > > > think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > Shammon FY > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 15, 2023 at 10:02 AM Xiaolong Wang > > > > > > > > > > > > > > > <xiaolong.w...@smartnews.com.invalid> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, I've experiment this feature on K8S > > recently, > > > > here > > > > > > is > > > > > > > > some > > > > > > > > > > of > > > > > > > > > > > > my > > > > > > > > > > > > > > > trial: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Create a new kubernetes-jobmanager.sh script > > > > with > > > > > > the > > > > > > > > > > following > > > > > > > > > > > > > > > content > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > #!/usr/bin/env bash > > > > > > > > > > > > > > > > $FLINK_HOME/bin/sql-gateway.sh start > > > > > > > > > > > > > > > > $FLINK_HOME/bin/kubernetes-jobmanager1.sh > > > > > > > > kubernetes-session > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Build your own Flink docker image something > > > like > > > > > > this > > > > > > > > > > > > > > > > FROM flink:1.17.1-scala_2.12-java11 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > RUN mv $FLINK_HOME/bin/kubernetes-jobmanager.sh > > > > > > > > > > $FLINK_HOME/bin/ > > > > > > > > > > > > > > > > kubernetes-jobmanager1.sh > > > > > > > > > > > > > > > > COPY ./kubernetes-jobmanager.sh > > > > > > > > > > > > > > $FLINK_HOME/bin/kubernetes-jobmanager.sh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > RUN chmod +x $FLINK_HOME/bin/*.sh > > > > > > > > > > > > > > > > USER flink > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Create a Flink session job with the operator > > > > using > > > > > > the > > > > > > > > above > > > > > > > > > > > > > image. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 14, 2023 at 9:49 PM Gyula Fóra < > > > > > > > > > > gyula.f...@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't completely understand what would be a > > > > content > > > > > > > of > > > > > > > > such > > > > > > > > > > > > CRD, > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > you give a minimal example how the Flink SQL > > > > Gateway > > > > > > CR > > > > > > > > yaml > > > > > > > > > > > > would > > > > > > > > > > > > > > look > > > > > > > > > > > > > > > > > like? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding a CRD would mean you need to add some > > > > > > > > > > operator/controller > > > > > > > > > > > > > > logic > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > well. Why not simply use a Deployment / > > > > StatefulSet > > > > > > in > > > > > > > > > > > > Kubernetes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or a Helm chart if you want to make it more > > > user > > > > > > > > friendly? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 14, 2023 at 12:57 PM Dongwoo Kim > > < > > > > > > > > > > > > > dongwoo7....@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've been working on setting up a flink SQL > > > > gateway > > > > > > > in > > > > > > > > a > > > > > > > > > > k8s > > > > > > > > > > > > > > > > environment > > > > > > > > > > > > > > > > > > and it got me thinking — what if we had a > > CRD > > > > for > > > > > > > this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So I have quick questions below. > > > > > > > > > > > > > > > > > > 1. Is there ongoing work to create a CRD > > for > > > > the > > > > > > > Flink > > > > > > > > SQL > > > > > > > > > > > > > Gateway? > > > > > > > > > > > > > > > > > > 2. If not, would the community be open to > > > > > > > considering a > > > > > > > > > > CRD for > > > > > > > > > > > > > > this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've noticed a growing demand for > > simplified > > > > setup > > > > > > of > > > > > > > > the > > > > > > > > > > flink > > > > > > > > > > > > > sql > > > > > > > > > > > > > > > > > gateway > > > > > > > > > > > > > > > > > > in flink's slack channel. > > > > > > > > > > > > > > > > > > Implementing a CRD could make deployments > > > > easier > > > > > > and > > > > > > > > offer > > > > > > > > > > > > better > > > > > > > > > > > > > > > > > > integration with k8s. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If this idea is accepted, I'm open to > > > drafting > > > > a > > > > > > FLIP > > > > > > > > for > > > > > > > > > > > > further > > > > > > > > > > > > > > > > > > discussion > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your time and looking forward to > > > > your > > > > > > > > thoughts! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > Dongwoo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Best > > > > > > > > > > > > > > > > > > > > > > ConradJam > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >