Thanks Jark bring this discussion and organize the FLIP document.
Thanks Dawid and Timo for the feedback. Here are my thoughts.
1) I’m +1 with using column() for both cases.
2) Expression DSL vs pure SQL string for computed columns
I think we can support them both and implement the pure SQL String first,
I agree that Expression DSL brings more possibility and flexibility, but
using SQL string is a more unified way which can reuse most logic with DDL
like validation and persist in Catalog,
and Converting Expression DSL to SQL Expression is another big topic and
I did not figure out a feasible idea until now.
So, maybe we can postpone the Expression DSL support considered the
reality.
3) Methods Schema#proctime and
Schema#watermarkFor#boundedOutOfOrderTimestamps
+1 with Dawid’s proposal to offer SQL like methods.
Schema()
.column("proctime", proctime());
.watermarkFor("rowtime", $("rowtime").minus(lit(3).seconds()))
And we can simplify watermarkFor(“colName”, Expression
watermarkStrategy)to watermark(“colName”, Expression watermarkStrategy), I
think the later one has can express the meaning of “ WATERMARK FOR
column_name AS watermark_strategy_expression“ well.
5)6)7) The new keyword vs the static method vs builder pattern
I have not strong tendency, the new keyword and the static method on
descriptor can nearly treated as a builder and do same things like
builder.
For the builder pattern, we will introduce six
methods(connector.Builder()、connector.Builder.build(), format.Builder(),
format.Builder.build(), Schema.Builder(),Schema.Builder.build() ),I think
we could reduce these unnecessary methods. I ‘m slightly +1 for new
keyword if we need a choice.
8) `Connector.option(...)` class should also accept `ConfigOption`
I’m slightly -1 for this, ConfigOption may not work because the key for
format configOption has not format prefix eg: FAIL_ON_MISSING_FIELD of
json, we need “json.fail-on-missing-field” rather than
“fail-on-missing-field”.
public static final ConfigOption<Boolean> FAIL_ON_MISSING_FIELD =
ConfigOptions
.key("fail-on-missing-field")
.booleanType()
.defaultValue(false)
WDYT?
Best,
Leonard Xu
在 2020年7月15日,16:37,Timo Walther <twal...@apache.org> 写道:
Hi Jark,
thanks for working on this issue. It is time to fix this last part of
inconsistency in the API. I also like the core parts of the FLIP, esp. that
TableDescriptor is one entity that can be passed to different methods. Here
is some feedback from my side:
1) +1 for just `column(...)`
2) Expression DSL vs pure SQL string for computed columns
I agree with Dawid. Using the Expression DSL is desireable for a
consistent API. Furthermore, otherwise people need to register functions if
they want to use them in an expression. Refactoring TableSchema is
definitely on the list for 1.12. Maybe we can come up with some
intermediate solution where we transform the expression to a SQL expression
for the catalog. Until the discussions around FLIP-80 and
CatalogTableSchema have been finalized.
3) Schema#proctime and Schema#watermarkFor#boundedOutOfOrderTimestamps
We should design the descriptor very close to the SQL syntax. The more
similar the syntax the more likely it is too keep the new descriptor API
stable.
6) static method vs new keyword
Actually, the `new` keyword was one of the things that bothered me
most in the old design. Fluent APIs avoid this nowadays.
7) make the descriptors immutable with builders
The descriptors are some kind of builders already. But they are not
called "builder". Instead of coming up with the new concept of a
"descriptor", we should use terminology that people esp. Java/Scala users
are familiar with already.
We could make the descriptors immutable to pass them around easily.
Btw "Connector" and "Format" should always be in the classname. This
was also a mistake in the past. Instead of calling the descriptor just
`Kafka` we could call it `KafkaConnector`. An entire example could look
like:
tEnv.createTemporaryTable(
"OrdersInKafka",
KafkaConnector.newBuilder() // builder pattern supported by IDE
.topic("user_logs")
.property("bootstrap.servers", "localhost:9092")
.property("group.id", "test-group")
.format(JsonFormat.newInstance()) // shortcut for no parameters
.schema(
Schema.newBuilder()
.column("user_id", DataTypes.BIGINT())
.column("score", DataTypes.DECIMAL(10, 2))
.column("log_ts", DataTypes.TIMESTAMP(3))
.column("my_ts", toTimestamp($("log_ts"))
.build()
)
.build()
);
Instead of refacoring the existing classes, we could also think about
a completly new stack. I think this would avoid confusion for the old
users. We could deprecate the entire `Kafka` class instead of dealing with
backwards compatibility.
8) minor extensions
A general `Connector.option(...)` class should also accept
`ConfigOption` instead of only strings.
A `Schema.column()` should accept `AbstractDataType` that can be
resolved to a `DataType` by access to a `DataTypeFactory`.
What do you think?
Thanks,
Timo
On 09.07.20 18:51, Jark Wu wrote:
Hi Dawid,
Thanks for the great feedback! Here are my responses:
1) computedColumn(..) vs column(..)
I'm fine to use `column(..)` in both cases.
2) Expression DSL vs pure SQL string for computed columns
This is a good point. Actually, I also prefer to use Expression DSL
because
this is more Table API style.
However, this requires to modify TableSchema again to accept & expose
Expression as computed columns.
I'm not convinced about this, because AFAIK, we want to have a
CatalogTableSchema to hold this information
and don't want to extend TableSchema. Maybe Timo can give some points
here.
Besides, this will make the descriptor API can't be persisted in
Catalog
unless FLIP-80 is done.
3) Schema#proctime and Schema#watermarkFor#boundedOutOfOrderTimestamps
The original intention behind these APIs are providing shortcut APIs
for
Table API users.
But I'm also fine to only provide the DDL-like methods if you have
concerns. We can discuss shortcuts in the future if users request.
4) LikeOption
LikeOption.INCLUDING.ALL is a constant (enum values). I have added
more
description about this in the FLIP.
5) implementation?
I don't want to mention too much about implementation details in the
FLIP
at the beginning, because the API is already very long.
But I also added an "Implementation" section to explain them.
6) static method vs new keyword
Personally I prefer the new keyword because it makes the API cleaner.
If we
want remove new keyword and use static methods, we have to:
Either adding a `Schema.builder()/create()` method as the starting
method,
Or duplicating all the methods as static methods, e.g. we have 12
methods
in `Kafka`, any of them can be a starting method, then we will have 24
methods in `Kafka`.
Both are not good, and it's hard to keep all the descriptors having
the
same starting method name, but all the descriptors can start from the
same
new keyword.
Best,
Jark
On Thu, 9 Jul 2020 at 15:48, Dawid Wysakowicz <dwysakow...@apache.org
wrote:
Correction to my point 4. The example is correct. I did not read it
carefully enough. Sorry for the confusion. Nevertheless I'd still
like
to see a bit more explanation on the LikeOptions.
On 07/07/2020 04:32, Jark Wu wrote:
Hi everyone,
Leonard and I prepared a FLIP about refactoring current Descriptor
API,
i.e. TableEnvironment#connect(). We would like to propose a new
descriptor
API to register connectors in Table API.
Since Flink 1.9, the community focused more on the new SQL DDL
feature.
After a series of releases, the SQL DDL is powerful and has many
rich
features now. However, Descriptor API (the
`TableEnvironment#connect()`)
has been stagnant for a long time and missing lots of core
features, such
as computed columns and primary keys. That's frustrating for Table
API
users who want to register tables programmatically. Besides,
currently, a
connector must implement a corresponding Descriptor (e.g. `new
Kafka()`)
before using the "connect" API. Therefore, we hope to reduce this
effort
for connector developers, that custom source/sinks can be
registered via
the descriptor API without implementing a Descriptor.
These are the problems we want to resolve in this FLIP. I'm looking
forward
to your comments.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connector+for+Table+API
Best,
Jark