Hi Shengkai, Hi Jark,

thanks for the additional explanation and the update of the FLIP. This will help us in the future for documenting our decisions. The arguments why to include the Gateway into the main repo make a lot of sense to me. Esp. also because both CLI and gateway need some parsing functionality that is dependent on the current state of the SQL syntax.

Here is my last set of feedback, other than that +1 for the proposal:

Serialize the LogicalType

The FLIP mentions LogicalTypeJsonSerializer but the shown JSON is different from the current master. We are using the serializable representation of LogicalType as much as possible nowadays. We should definitely just use LogicalTypeJsonSerializer and not a second JSON representation.

1) Serialize the RowData

Side note for serializing ROWs: we should not use field names in JSON object keys. As e.g. `null` and other names with special characters cause issues in JSON.

2) We propose the state machine like HiveServer2

Have you also thought about using Flink's state types from Flink tasks/jobs? If we were using Flink types directly, it would be easier to monitor the execution of a INSERT INTO job via the gateway without having to map state types. Monitoring jobs is the most important functionality and should be in sync with regular Flink job monitoring. A HiveServer2 endpoint can still perform mapping if necessary, but within the Flink code base we use a consistent state transition scheme.

3) Options to configure the REST endpoint

Given that we also want to be able to put endpoint options into flink-conf.yaml, we should make those option a bit more globally understandable:

endpoint.type -> sql-gateway.endpoint.type
endpoint.rest.port -> sql-gateway.endpoint.rest.port
...

4) Naming conversion

This is very nit picking, but please make sure to name the interfaces consistently for abbreviations. Currently, we mostly use `SqlInterface` instead of `SQLInterface`. So in the FLIP:

SQLGatewayEndpoint -> SqlGatewayEndpoint
SQLGatewayEndpointFactory -> SqlGatewayEndpointFactory
e.g. for REST we do it already: RestEndpointVersion

5) Another smaller shortcomings in the FLIP

SQLGatewayService.getFunction / UserDefinedFunctionInfo
Was it a conscious decision to not use FunctionIdentifier here? You will not be able to list system functions.

SQLGatewayService.getGatewayInfo()/getSessionConfig
Why not returning ReadableConfig here?

Thanks,
Timo




On 12.05.22 17:02, Jark Wu wrote:
Hi Shengkai,

One comment on the configure_session VS initialize_session,
I think configure_session is better than initialize_session,
because "initialize" sounds like this method can only be invoked once.
But I can see this method is useful to be invoked several times to configure the session during the lifecycle of a session. If calling it initialize_session,
should we limit it only be called once?

Best,
Jark



On Thu, 12 May 2022 at 22:38, Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>> wrote:

    Hi Timo,

    I agree with Shengkai that we should keep Gateway in the Flink main
    repo.
    Here are my thoughts:

    1) It is not feasible and maintainable to invoke different versions
    of Flink dependencies
    in one JVM. I'm afraid it would be a nightmare when the code is
    messed up with all the reflections.

    2) Actually, it's very mature to support submitting multi-version by
    deploying one
    gateway process for one version. Many systems are following this
    architecture as Shengkai
    mentioned above, including the Ververica Platform. And I don't see
    any problem with it.

    3) As Jingsong said, SQL CLI should build on top of Gateway to share
    code and have the ability to
      connect to Gateway. If SQL CLI is still kept in the Flink repo, we
    have to put Gateway there as well.

    4) Besides, Gateway is indispensable for a SQL engine (think of
    Trino/Presto, Spark, Hive).
    Otherwise, Flink will always be a processing system. With Gateway
    inside the Flink repo,
    we can provide an out-of-box experience as a SQL query engine. Users
    can try out the gateway
    for the latest version when a new version is released.

    5) Last, Gateway inside the Flink repo can ensure the highest degree
    of version compatibility.

    Best,
    Jark


    On Thu, 12 May 2022 at 19:19, Shengkai Fang <fskm...@gmail.com
    <mailto:fskm...@gmail.com>> wrote:

        Hi Timo.

        Thanks for your feedback!

         > It would be great if you can copy/paste some of the tricky design
        decisions into the FLIP.

        Yes. I have already added a section about the `Rejected
        Alternatives`. It
        includes the discuss about the
        - TableInfo and FunctionInfo VS CatalogTable and CatalogFunction
        - Support the multi-version Flink in the Gateway VS Support the
        multi-version in the external Service
        - Merge Gateway into the Flink code base VS Support Gateway in
        the another
        repo
        ...


         > Separate code base

        Let me summarize all the discussion about the separate code
        base. Please
        correct me if anything is wrong.


        About support for the Gateway in the Flink code base.


        1. It reduces the cost including tests, release and document. We
        don't need
        to upgrade the Gateway when Flink releases.
        2. The SQL Client has the ability to submit the SQL to the
        Gateway. If we
        move the Gateway to the outside repo, the SQL Client doesn't
        have the
        related dependencies.
        3. It is benefit for us to reuse code in the same repo.

        About supporting the Gateway in another repo.

        1. It is easier to support the multi-version Flink;


        First of all, I think we should support the multiple Flink
        version with
        another component that uses the Gateway from the individual
        Flink releases.
        Zeppelin, Livy, and Kyuubi all use similar architecture.

        The current Gateway in the FLIP is bound to the specific Flink
        version,
        which needs to compile the SQL and submit the job to the cluster
        with the
        specified Flink version.
        We can introduce a service in the future(Maybe we can name it
        MultipleVersionFlinkService? ). It may tell the client where the
        Gateway
        with specified version is and the client just communicate with
        the Gateway
        later. It may also just parse the request,  convert the request
        to the REST
        API in the FLIP and send to the Gateway with specified version.

        Therefore, I think we should merge the SQL Gateway inside the
        Flink repo.
        The MultipleVersionFlinkService may be inside or outside the repo.

        I add more details in the `Rejected Alternatives` in the FLIP.

         > Missing REST API spec

        Good point! I add a section in the `Appendix` about the
        serialization of
        the ResultSet. It includes how to serialize the schema, RowData and
        exceptions.

         > Consistency of concepts

        configure_session VS initialize_session

        Okay. I think it's better to use the initialize_session.

        TableInfo + UserDefinedFunctionInfo  VS  CatalogTable and
        CatalogFunction

        The CatalogTable and CatalogFunction are much heavier than the
        TableInfo
        and UserDefinedFunction. The CatalogManager requires reading
        from the
        Catalog to get the schema. But in the listTables only care about
        the table
        name, which is much lighter. Therefore, we propose to use the
        TableInfo
        with required fields.

         > Result Retrieval

        Yes. Currently the API is only used for the preview. I added a
        section in
        the `Rejected Alternatives` about this.

        Best,
        Shengkai


Reply via email to