Hi Jark,
I don't think many users use WatermarkSpec. UniqueConstraint could cause
some confusion but this mostly affects catalog or connector
implementers. After deprecating the old APIs it should be obvious when
an outdated interface is used. I'm fine with using a different name, do
we have a better name? But otherwise I would just maintain two classes
in different packages for a 1-2 releases.
Regards,
Timo
On 10.02.21 02:54, Jark Wu wrote:
Hi Timo,
1) I'm fine with `Column`, but are we going to introduce new interfaces
for `UniqueConstraint` and `WatermarkSpec`? If we want to introduce
a new stack, it would be better to have a different name, otherwise,
it's easy to use a wrong class for users.
Best,
Jark
On Wed, 10 Feb 2021 at 09:49, Rui Li <lirui.fu...@gmail.com> wrote:
I see. Makes sense to me. Thanks Timo for the detailed explanation!
On Tue, Feb 9, 2021 at 9:48 PM Timo Walther <twal...@apache.org> wrote:
Hi Rui,
1. It depends whether you would like to declare (unresolved) or use
(resolved) a schema. In catalogs and APIs, people would actually like to
declare a schema. Because the schema might reference objects from other
catalogs etc. However, whenever the schema comes out of the framework it
is fully resolved and people can use to configure their UI, connector,
etc.
2. No, `getTable` doesn't have to return a resolved schema. Actually,
this was my initial design (see Rejected Alternatives 1) where we pass
the SchemaResolver into the Catalog. However, a catalog must not deal
with resolution. When storing a table we need a resolved schema to
perist the fully expanded properties, however, when reading those
properties in again the schema can be resolved in a later stage.
Regards,
Timo
On 09.02.21 14:07, Rui Li wrote:
Hi Timo,
Thanks for the FLIP. It looks good to me overall. I have two questions.
1. When should we use a resolved schema and when to use an unresolved
one?
2. The FLIP mentions only resolved tables/views can be stored into a
catalog. Does that mean the getTable method should also return a
resolved
object?
On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <twal...@apache.org>
wrote:
Hi Jark,
thanks for your feedback. Let me answer some of your comments:
1) Since we decided to build an entire new stack, we can also
introduce
better names for columns, constraints, and watermark spec. My goal was
to shorten the names during this refactoring. Therefore, `TableSchema`
becomes `Schema` and `TableColumn` becomes `Column`. This also fits
better to a `CatalogView` that has a schema but is actually not a
table
but a view. So `Column` is very generic. What do you think?
2) `ComputedColumn` and `WatermarkSpec` of the new generation will
store
`ResolvedExpression`.
3) I adopted most of the methods from `TableSchema` in
`ResolvedSchema`.
However, I skipped `getColumnDataTypes()` because the behavior is not
clear to me. Should it include computed columns or virtual metadata
columns? I think we should force users to think about what they
require.
Otherwise we implicitly introduce bugs.
Regards,
Timo
On 09.02.21 10:56, Jark Wu wrote:
Hi Timo,
The messy TableSchema confuses many developers.
It's great to see we can finally come up with a clean interface
hierarchy
and still backward compatible.
Thanks for preparing the nice FLIP. It looks good to me. I have some
minor
comments:
1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn`
instead
of
`Column`?
2) You mentioned ResolvedSchema should store ResolvedExpression,
should
we
extend
`ComputedColumn` and `WatermarkSpec` to allow
`ResolvedExpression`?
3) `ResolvedSchema` aims to replace `TableSchema`, it would be better
to
add un-deprecated
methods of `TableSchema` into `ResolvedSchema`
(e.g. `getColumnDataTypes()`).
Then users can have a smooth migration.
Best,
Jark
On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <
dwysakow...@apache.org>
wrote:
Hi Timo,
From my perspective the proposed changes look good. I agree it is
an
important step towards FLIP-129 and FLIP-136. Personally I feel
comfortable voting on the document.
Best,
Dawid
On 05/02/2021 16:09, Timo Walther wrote:
Hi everyone,
you might have seen that we discussed a better schema API in past
as
part of FLIP-129 and FLIP-136. We also discussed this topic during
different releases:
https://issues.apache.org/jira/browse/FLINK-17793
Jark and I had an offline discussion how we can finally fix this
shortcoming and maintain backwards compatibile for a couple of
releases to give people time to update their code.
I would like to propose the following FLIP:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs
The FLIP updates the class hierarchy to achieve the following
goals:
- make it visible whether a schema is resolved or unresolved and
when
the resolution happens
- offer a unified API for FLIP-129, FLIP-136, and catalogs
- allow arbitrary data types and expressions in the schema for
watermark spec or columns
- have access to other catalogs for declaring a data type or
expression via CatalogManager
- a cleaned up TableSchema
- remain backwards compatible in the persisted properties and API
Looking forward to your feedback.
Thanks,
Timo
--
Best regards!
Rui Li