Hi Jark, Thanks for starting the discussion. I think this is an important effort. I really like the general concept, but I have a few comments to the details of the FLIP.
1) I don't see any benefit in differentiating the computedColumn vs
column in the method name. It does add cognitive burden. Can we simply
have column in both cases?
2) I think we should use the Expression DSL for defining the expressions
of computed columns instead of just pure strings.
3) I am not convinced of having the Schema#proctime and
Schema#watermarkFor#boundedOutOfOrderTimestamps methods. That would
again make it different from the SQL DDL where we do:
| proctime AS proctime()
WATERMARK FOR rowtime AS rowtime - INTERVAL '3' SECONDS
|
respectively. Even if we do provide that helper methods I think the SQL
way should be the recommended approach. I would rather see it as:
| Schema()||
|| .column("proctime", proctime());||
|| .watermarkFor("rowtime", $("rowtime").minus(lit(3).seconds())) //
or .watermarkFor("rowtime", $("rowtime").minus(Duration.ofSeconds(3)))
once we properly support interval types|
4) I think the section about LIKE clause requires a second pass through.
The example is wrong. Moreover I am not sure what is the
LikeOption.INCLUDING.ALL? Is this a constant? Is this some kind of a
builder?
5) The classes like TableDescriptor/Schema described in the FLIP cover
the user facing helper methods. They do not define though the contract
which the planner/TableEnvironment expects. How will the planner use
those classes to create actual tables? From the point of view of the
TableEnvironment#createTable or TableEnvironment#from none of the
methods listed in the FLIP are necessary. At the same time there are no
methods to retrieve the Schema/Options etc. which are required to
actually create the Table.
6) Lastly, how about we try to remove the new keyword from the syntax?
Personally I'd very much prefer to have it like:
Schema
.column()
.column()
or
Connector
.type("kafka") // I am not strong on the "type" keyword here. I
can also see for/of/...
Best,
Dawid
On 07/07/2020 04:32, Jark Wu wrote:
> Hi everyone,
>
> Leonard and I prepared a FLIP about refactoring current Descriptor API,
> i.e. TableEnvironment#connect(). We would like to propose a new descriptor
> API to register connectors in Table API.
>
> Since Flink 1.9, the community focused more on the new SQL DDL feature.
> After a series of releases, the SQL DDL is powerful and has many rich
> features now. However, Descriptor API (the `TableEnvironment#connect()`)
> has been stagnant for a long time and missing lots of core features, such
> as computed columns and primary keys. That's frustrating for Table API
> users who want to register tables programmatically. Besides, currently, a
> connector must implement a corresponding Descriptor (e.g. `new Kafka()`)
> before using the "connect" API. Therefore, we hope to reduce this effort
> for connector developers, that custom source/sinks can be registered via
> the descriptor API without implementing a Descriptor.
>
> These are the problems we want to resolve in this FLIP. I'm looking forward
> to your comments.
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connector+for+Table+API
>
> Best,
> Jark
>
signature.asc
Description: OpenPGP digital signature
