[RESULT][VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-03 Thread Timo Walther

Hi all,

The voting time for FLIP-95 has passed. I'm closing the vote now.

There were 6 +1 votes, 4 of which are binding:
- Jark (binding)
- Benchao (non-binding)
- Kurt (binding)
- Jingsong (binding)
- Leonard (non-binding)
- Dawid (binding)

There were no disapproving votes.

Thus, FLIP-95 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

Best,
Timo


On 03.04.20 10:52, Timo Walther wrote:

Hi Dawid,

thanks for your response. Let's postpone the discussion around a single 
interface into a later issue. It seems that we reached a consensus on 
the remaining interfaces. Given the size of the FLIP, I think it is fine 
to conclude the voting now.


Thanks,
Timo

On 03.04.20 10:42, Dawid Wysakowicz wrote:

Hi all,

Just to make it clear. I don't want to block the whole effort. I'm big
+1 on the whole document and -0 on using the TableSchema for a
projection pushdown.

My personal opinion was that TableSchema is misleading for Projection
pushdown and I still think that way. That's why I wanted to bring it up
to see if I am the only one. If that's the case, let's just proceed with
the TableSchema.

Best,

Dawid

On 02/04/2020 17:19, Jark Wu wrote:

Hi Timo,

I don't think source should work with `CatalogTableSchema`. So far, a 
table

source doesn't need to know the logic information of computed column and
watermark.
IMO, we should provide a method to convert from `CatalogTableSchema` 
into

`TableSchema` without computed columns in source factory,
and a source should just hold the `TableSchema`.

I agree doing the intersection/diff logic is trivial, but maybe we can
provide utilities to do that? So that we can keep the interface clean.


Best,
Jark


On Thu, 2 Apr 2020 at 20:17, Timo Walther  wrote:


Hi Jark,

if catalogs use `CatalogTableSchema` in the future. The source would
internally also work with `CatalogTableSchema`. I'm fine with cleaning
up the `TableSchema` class but should a source deal with two different
schema classes then?

Another problem that I see is that connectors usually need to perform
some index arithmetics. Dealing with TableSchema and additionally 
within

a field with DataType might be a bit inconvenient. A dedicated class
with utilities might be helpful such that not every source needs to
implement the same intersection/diff logic again.

Regards,
Timo


On 02.04.20 14:06, Jark Wu wrote:

Hi Dawid,


How to express projections with TableSchema?

The TableSource holds the original TableSchema (i.e. from DDL) and the
pushed TableSchema represents the schema after projection.
Thus the table source can compare them to figure out changed field 
orders

or not matched types.
For most sources who maps physical storage by field names (e.g. jdbc,
hbase, json) they can just simply apply the pushed TableSchema.
But sources who maps by field indexes (e.g. csv), they need to 
figure out

the projected indexes by comparing the original and projected schema.
For example, the original schema is [a: String, b: Int, c: Timestamp],

and
b is pruned, then the pushed schema is [a: String, c: Timestamp]. 
So the

source can figure out index=1 is pruned.


How do we express projection of a nested field with TableSchema?

This is the same to the above one. For example, the original schema is

[rk:

String, f1 Row].
If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row].


TableSchema might be used at too many different places for different

responsibilities.
Agree. We have recognized that a structure and builder for pure table
schema is required in many places. But we mixed many concepts of 
catalog

table schema in TableSchema.
IIRC, in an offline discussion of FLIP-84, we want to introduce a new
`CatalogTableSchema` to represent the schema part of a DDL,
and remove all the watermark, computed column information from

TableSchema?

Then `TableSchema` can continue to serve as a pure table schema and it
stays in a good package.

Best,
Jark




On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:


Hi Dawid,

thanks for your feedback. I agree with your concerns. I also observed
that TableSchema might be used at too many different places for
different responsibilities.

How about we introduce a helper class for 
`SupportsProjectionPushDown`

and also `LookupTableSource#Context#getKeys()` to represent nested
structure of names. Data types, constraints, or computed columns 
are not

necessary at those locations.

We can also add utility methods for connectors to this helper class
there to quickly figuring out differences between the original table
schema and the new one.

SelectedFields {

  private LogicalType orignalRowType; // set by the planner

  private int[][] indices;

  getNames(int... at): String[]

  getNames(String... at): String[]

  getIndices(int... at): int[]

  getNames(String... at): String[]

  toTableSchema(): TableSchema
}

What do others think?

Thanks,
Timo



On 02.04.20 12:28, Dawid W

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-03 Thread Timo Walther

Hi Dawid,

thanks for your response. Let's postpone the discussion around a single 
interface into a later issue. It seems that we reached a consensus on 
the remaining interfaces. Given the size of the FLIP, I think it is fine 
to conclude the voting now.


Thanks,
Timo

On 03.04.20 10:42, Dawid Wysakowicz wrote:

Hi all,

Just to make it clear. I don't want to block the whole effort. I'm big
+1 on the whole document and -0 on using the TableSchema for a
projection pushdown.

My personal opinion was that TableSchema is misleading for Projection
pushdown and I still think that way. That's why I wanted to bring it up
to see if I am the only one. If that's the case, let's just proceed with
the TableSchema.

Best,

Dawid

On 02/04/2020 17:19, Jark Wu wrote:

Hi Timo,

I don't think source should work with `CatalogTableSchema`. So far, a table
source doesn't need to know the logic information of computed column and
watermark.
IMO, we should provide a method to convert from `CatalogTableSchema` into
`TableSchema` without computed columns in source factory,
and a source should just hold the `TableSchema`.

I agree doing the intersection/diff logic is trivial, but maybe we can
provide utilities to do that? So that we can keep the interface clean.


Best,
Jark


On Thu, 2 Apr 2020 at 20:17, Timo Walther  wrote:


Hi Jark,

if catalogs use `CatalogTableSchema` in the future. The source would
internally also work with `CatalogTableSchema`. I'm fine with cleaning
up the `TableSchema` class but should a source deal with two different
schema classes then?

Another problem that I see is that connectors usually need to perform
some index arithmetics. Dealing with TableSchema and additionally within
a field with DataType might be a bit inconvenient. A dedicated class
with utilities might be helpful such that not every source needs to
implement the same intersection/diff logic again.

Regards,
Timo


On 02.04.20 14:06, Jark Wu wrote:

Hi Dawid,


How to express projections with TableSchema?

The TableSource holds the original TableSchema (i.e. from DDL) and the
pushed TableSchema represents the schema after projection.
Thus the table source can compare them to figure out changed field orders
or not matched types.
For most sources who maps physical storage by field names (e.g. jdbc,
hbase, json) they can just simply apply the pushed TableSchema.
But sources who maps by field indexes (e.g. csv), they need to figure out
the projected indexes by comparing the original and projected schema.
For example, the original schema is [a: String, b: Int, c: Timestamp],

and

b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
source can figure out index=1 is pruned.


How do we express projection of a nested field with TableSchema?

This is the same to the above one. For example, the original schema is

[rk:

String, f1 Row].
If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row].


TableSchema might be used at too many different places for different

responsibilities.
Agree. We have recognized that a structure and builder for pure table
schema is required in many places. But we mixed many concepts of catalog
table schema in TableSchema.
IIRC, in an offline discussion of FLIP-84, we want to introduce a new
`CatalogTableSchema` to represent the schema part of a DDL,
and remove all the watermark, computed column information from

TableSchema?

Then `TableSchema` can continue to serve as a pure table schema and it
stays in a good package.

Best,
Jark




On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:


Hi Dawid,

thanks for your feedback. I agree with your concerns. I also observed
that TableSchema might be used at too many different places for
different responsibilities.

How about we introduce a helper class for `SupportsProjectionPushDown`
and also `LookupTableSource#Context#getKeys()` to represent nested
structure of names. Data types, constraints, or computed columns are not
necessary at those locations.

We can also add utility methods for connectors to this helper class
there to quickly figuring out differences between the original table
schema and the new one.

SelectedFields {

  private LogicalType orignalRowType; // set by the planner

  private int[][] indices;

  getNames(int... at): String[]

  getNames(String... at): String[]

  getIndices(int... at): int[]

  getNames(String... at): String[]

  toTableSchema(): TableSchema
}

What do others think?

Thanks,
Timo



On 02.04.20 12:28, Dawid Wysakowicz wrote:

Generally +1

One slight concern I have is about the |SupportsProjectionPushDown.|I
don't necessarily understand how can we express projections with
TableSchema. It's unclear for me what happens when a type of a field
changes, fields are in a different order, when types do not match. How
do we express projection of a nested field with TableSchema?

I don't think this changes the core design presented in the FLIP,
therefore I'm 

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-03 Thread Dawid Wysakowicz
Hi all,

Just to make it clear. I don't want to block the whole effort. I'm big
+1 on the whole document and -0 on using the TableSchema for a
projection pushdown.

My personal opinion was that TableSchema is misleading for Projection
pushdown and I still think that way. That's why I wanted to bring it up
to see if I am the only one. If that's the case, let's just proceed with
the TableSchema.

Best,

Dawid

On 02/04/2020 17:19, Jark Wu wrote:
> Hi Timo,
>
> I don't think source should work with `CatalogTableSchema`. So far, a table
> source doesn't need to know the logic information of computed column and
> watermark.
> IMO, we should provide a method to convert from `CatalogTableSchema` into
> `TableSchema` without computed columns in source factory,
> and a source should just hold the `TableSchema`.
>
> I agree doing the intersection/diff logic is trivial, but maybe we can
> provide utilities to do that? So that we can keep the interface clean.
>
>
> Best,
> Jark
>
>
> On Thu, 2 Apr 2020 at 20:17, Timo Walther  wrote:
>
>> Hi Jark,
>>
>> if catalogs use `CatalogTableSchema` in the future. The source would
>> internally also work with `CatalogTableSchema`. I'm fine with cleaning
>> up the `TableSchema` class but should a source deal with two different
>> schema classes then?
>>
>> Another problem that I see is that connectors usually need to perform
>> some index arithmetics. Dealing with TableSchema and additionally within
>> a field with DataType might be a bit inconvenient. A dedicated class
>> with utilities might be helpful such that not every source needs to
>> implement the same intersection/diff logic again.
>>
>> Regards,
>> Timo
>>
>>
>> On 02.04.20 14:06, Jark Wu wrote:
>>> Hi Dawid,
>>>
 How to express projections with TableSchema?
>>> The TableSource holds the original TableSchema (i.e. from DDL) and the
>>> pushed TableSchema represents the schema after projection.
>>> Thus the table source can compare them to figure out changed field orders
>>> or not matched types.
>>> For most sources who maps physical storage by field names (e.g. jdbc,
>>> hbase, json) they can just simply apply the pushed TableSchema.
>>> But sources who maps by field indexes (e.g. csv), they need to figure out
>>> the projected indexes by comparing the original and projected schema.
>>> For example, the original schema is [a: String, b: Int, c: Timestamp],
>> and
>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
>>> source can figure out index=1 is pruned.
>>>
 How do we express projection of a nested field with TableSchema?
>>> This is the same to the above one. For example, the original schema is
>> [rk:
>>> String, f1 Row].
>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row>> Double>].
>>>
 TableSchema might be used at too many different places for different
>>> responsibilities.
>>> Agree. We have recognized that a structure and builder for pure table
>>> schema is required in many places. But we mixed many concepts of catalog
>>> table schema in TableSchema.
>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
>>> `CatalogTableSchema` to represent the schema part of a DDL,
>>> and remove all the watermark, computed column information from
>> TableSchema?
>>> Then `TableSchema` can continue to serve as a pure table schema and it
>>> stays in a good package.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>>
>>>
>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:
>>>
 Hi Dawid,

 thanks for your feedback. I agree with your concerns. I also observed
 that TableSchema might be used at too many different places for
 different responsibilities.

 How about we introduce a helper class for `SupportsProjectionPushDown`
 and also `LookupTableSource#Context#getKeys()` to represent nested
 structure of names. Data types, constraints, or computed columns are not
 necessary at those locations.

 We can also add utility methods for connectors to this helper class
 there to quickly figuring out differences between the original table
 schema and the new one.

 SelectedFields {

  private LogicalType orignalRowType; // set by the planner

  private int[][] indices;

  getNames(int... at): String[]

  getNames(String... at): String[]

  getIndices(int... at): int[]

  getNames(String... at): String[]

  toTableSchema(): TableSchema
 }

 What do others think?

 Thanks,
 Timo



 On 02.04.20 12:28, Dawid Wysakowicz wrote:
> Generally +1
>
> One slight concern I have is about the |SupportsProjectionPushDown.|I
> don't necessarily understand how can we express projections with
> TableSchema. It's unclear for me what happens when a type of a field
> changes, fields are in a different order, when types do not match. How
> do we expr

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-02 Thread Jark Wu
Hi Timo,

I don't think source should work with `CatalogTableSchema`. So far, a table
source doesn't need to know the logic information of computed column and
watermark.
IMO, we should provide a method to convert from `CatalogTableSchema` into
`TableSchema` without computed columns in source factory,
and a source should just hold the `TableSchema`.

I agree doing the intersection/diff logic is trivial, but maybe we can
provide utilities to do that? So that we can keep the interface clean.


Best,
Jark


On Thu, 2 Apr 2020 at 20:17, Timo Walther  wrote:

> Hi Jark,
>
> if catalogs use `CatalogTableSchema` in the future. The source would
> internally also work with `CatalogTableSchema`. I'm fine with cleaning
> up the `TableSchema` class but should a source deal with two different
> schema classes then?
>
> Another problem that I see is that connectors usually need to perform
> some index arithmetics. Dealing with TableSchema and additionally within
> a field with DataType might be a bit inconvenient. A dedicated class
> with utilities might be helpful such that not every source needs to
> implement the same intersection/diff logic again.
>
> Regards,
> Timo
>
>
> On 02.04.20 14:06, Jark Wu wrote:
> > Hi Dawid,
> >
> >> How to express projections with TableSchema?
> > The TableSource holds the original TableSchema (i.e. from DDL) and the
> > pushed TableSchema represents the schema after projection.
> > Thus the table source can compare them to figure out changed field orders
> > or not matched types.
> > For most sources who maps physical storage by field names (e.g. jdbc,
> > hbase, json) they can just simply apply the pushed TableSchema.
> > But sources who maps by field indexes (e.g. csv), they need to figure out
> > the projected indexes by comparing the original and projected schema.
> > For example, the original schema is [a: String, b: Int, c: Timestamp],
> and
> > b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
> > source can figure out index=1 is pruned.
> >
> >> How do we express projection of a nested field with TableSchema?
> > This is the same to the above one. For example, the original schema is
> [rk:
> > String, f1 Row].
> > If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row > Double>].
> >
> >> TableSchema might be used at too many different places for different
> > responsibilities.
> > Agree. We have recognized that a structure and builder for pure table
> > schema is required in many places. But we mixed many concepts of catalog
> > table schema in TableSchema.
> > IIRC, in an offline discussion of FLIP-84, we want to introduce a new
> > `CatalogTableSchema` to represent the schema part of a DDL,
> > and remove all the watermark, computed column information from
> TableSchema?
> > Then `TableSchema` can continue to serve as a pure table schema and it
> > stays in a good package.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> > On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:
> >
> >> Hi Dawid,
> >>
> >> thanks for your feedback. I agree with your concerns. I also observed
> >> that TableSchema might be used at too many different places for
> >> different responsibilities.
> >>
> >> How about we introduce a helper class for `SupportsProjectionPushDown`
> >> and also `LookupTableSource#Context#getKeys()` to represent nested
> >> structure of names. Data types, constraints, or computed columns are not
> >> necessary at those locations.
> >>
> >> We can also add utility methods for connectors to this helper class
> >> there to quickly figuring out differences between the original table
> >> schema and the new one.
> >>
> >> SelectedFields {
> >>
> >>  private LogicalType orignalRowType; // set by the planner
> >>
> >>  private int[][] indices;
> >>
> >>  getNames(int... at): String[]
> >>
> >>  getNames(String... at): String[]
> >>
> >>  getIndices(int... at): int[]
> >>
> >>  getNames(String... at): String[]
> >>
> >>  toTableSchema(): TableSchema
> >> }
> >>
> >> What do others think?
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >>
> >> On 02.04.20 12:28, Dawid Wysakowicz wrote:
> >>> Generally +1
> >>>
> >>> One slight concern I have is about the |SupportsProjectionPushDown.|I
> >>> don't necessarily understand how can we express projections with
> >>> TableSchema. It's unclear for me what happens when a type of a field
> >>> changes, fields are in a different order, when types do not match. How
> >>> do we express projection of a nested field with TableSchema?
> >>>
> >>> I don't think this changes the core design presented in the FLIP,
> >>> therefore I'm fine with accepting the FLIP. I wanted to mention my
> >>> concerns, so that maybe we can adjust the passed around structures
> >> slightly.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>> ||
> >>>
> >>> On 30/03/2020 14:42, Leonard Xu wrote:
>  +1(non-binding)
> 
>  Best,
>  Leonard Xu
> 
> > 在 2020年3月30日,16:43,Jingsong Li  写道:
> 

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-02 Thread Timo Walther

Hi Jark,

if catalogs use `CatalogTableSchema` in the future. The source would 
internally also work with `CatalogTableSchema`. I'm fine with cleaning 
up the `TableSchema` class but should a source deal with two different 
schema classes then?


Another problem that I see is that connectors usually need to perform 
some index arithmetics. Dealing with TableSchema and additionally within 
a field with DataType might be a bit inconvenient. A dedicated class 
with utilities might be helpful such that not every source needs to 
implement the same intersection/diff logic again.


Regards,
Timo


On 02.04.20 14:06, Jark Wu wrote:

Hi Dawid,


How to express projections with TableSchema?

The TableSource holds the original TableSchema (i.e. from DDL) and the
pushed TableSchema represents the schema after projection.
Thus the table source can compare them to figure out changed field orders
or not matched types.
For most sources who maps physical storage by field names (e.g. jdbc,
hbase, json) they can just simply apply the pushed TableSchema.
But sources who maps by field indexes (e.g. csv), they need to figure out
the projected indexes by comparing the original and projected schema.
For example, the original schema is [a: String, b: Int, c: Timestamp], and
b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
source can figure out index=1 is pruned.


How do we express projection of a nested field with TableSchema?

This is the same to the above one. For example, the original schema is [rk:
String, f1 Row].
If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row].


TableSchema might be used at too many different places for different

responsibilities.
Agree. We have recognized that a structure and builder for pure table
schema is required in many places. But we mixed many concepts of catalog
table schema in TableSchema.
IIRC, in an offline discussion of FLIP-84, we want to introduce a new
`CatalogTableSchema` to represent the schema part of a DDL,
and remove all the watermark, computed column information from TableSchema?
Then `TableSchema` can continue to serve as a pure table schema and it
stays in a good package.

Best,
Jark




On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:


Hi Dawid,

thanks for your feedback. I agree with your concerns. I also observed
that TableSchema might be used at too many different places for
different responsibilities.

How about we introduce a helper class for `SupportsProjectionPushDown`
and also `LookupTableSource#Context#getKeys()` to represent nested
structure of names. Data types, constraints, or computed columns are not
necessary at those locations.

We can also add utility methods for connectors to this helper class
there to quickly figuring out differences between the original table
schema and the new one.

SelectedFields {

 private LogicalType orignalRowType; // set by the planner

 private int[][] indices;

 getNames(int... at): String[]

 getNames(String... at): String[]

 getIndices(int... at): int[]

 getNames(String... at): String[]

 toTableSchema(): TableSchema
}

What do others think?

Thanks,
Timo



On 02.04.20 12:28, Dawid Wysakowicz wrote:

Generally +1

One slight concern I have is about the |SupportsProjectionPushDown.|I
don't necessarily understand how can we express projections with
TableSchema. It's unclear for me what happens when a type of a field
changes, fields are in a different order, when types do not match. How
do we express projection of a nested field with TableSchema?

I don't think this changes the core design presented in the FLIP,
therefore I'm fine with accepting the FLIP. I wanted to mention my
concerns, so that maybe we can adjust the passed around structures

slightly.


Best,

Dawid
||

On 30/03/2020 14:42, Leonard Xu wrote:

+1(non-binding)

Best,
Leonard Xu


在 2020年3月30日,16:43,Jingsong Li  写道:

+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:


+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li

wrote:



+1 (non-binding)

Jark Wu  于2020年3月30日周一 下午3:57写道:


+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther

wrote:



Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed

and

reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an
objection or not enough votes.

Thanks,
Timo

[1]





https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces

[2]





https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E

--

Benchao Li
School of Electronics Engineering and Computer Science, Peking

University

Tel:+86-15650713730
Email:libenc...@gmail.com;libenc...@pku.edu.cn


--
Best, Jingsong Lee









Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-02 Thread Jark Wu
Hi Dawid,

> How to express projections with TableSchema?
The TableSource holds the original TableSchema (i.e. from DDL) and the
pushed TableSchema represents the schema after projection.
Thus the table source can compare them to figure out changed field orders
or not matched types.
For most sources who maps physical storage by field names (e.g. jdbc,
hbase, json) they can just simply apply the pushed TableSchema.
But sources who maps by field indexes (e.g. csv), they need to figure out
the projected indexes by comparing the original and projected schema.
For example, the original schema is [a: String, b: Int, c: Timestamp], and
b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
source can figure out index=1 is pruned.

> How do we express projection of a nested field with TableSchema?
This is the same to the above one. For example, the original schema is [rk:
String, f1 Row].
If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row].

> TableSchema might be used at too many different places for different
responsibilities.
Agree. We have recognized that a structure and builder for pure table
schema is required in many places. But we mixed many concepts of catalog
table schema in TableSchema.
IIRC, in an offline discussion of FLIP-84, we want to introduce a new
`CatalogTableSchema` to represent the schema part of a DDL,
and remove all the watermark, computed column information from TableSchema?
Then `TableSchema` can continue to serve as a pure table schema and it
stays in a good package.

Best,
Jark




On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:

> Hi Dawid,
>
> thanks for your feedback. I agree with your concerns. I also observed
> that TableSchema might be used at too many different places for
> different responsibilities.
>
> How about we introduce a helper class for `SupportsProjectionPushDown`
> and also `LookupTableSource#Context#getKeys()` to represent nested
> structure of names. Data types, constraints, or computed columns are not
> necessary at those locations.
>
> We can also add utility methods for connectors to this helper class
> there to quickly figuring out differences between the original table
> schema and the new one.
>
> SelectedFields {
>
> private LogicalType orignalRowType; // set by the planner
>
> private int[][] indices;
>
> getNames(int... at): String[]
>
> getNames(String... at): String[]
>
> getIndices(int... at): int[]
>
> getNames(String... at): String[]
>
> toTableSchema(): TableSchema
> }
>
> What do others think?
>
> Thanks,
> Timo
>
>
>
> On 02.04.20 12:28, Dawid Wysakowicz wrote:
> > Generally +1
> >
> > One slight concern I have is about the |SupportsProjectionPushDown.|I
> > don't necessarily understand how can we express projections with
> > TableSchema. It's unclear for me what happens when a type of a field
> > changes, fields are in a different order, when types do not match. How
> > do we express projection of a nested field with TableSchema?
> >
> > I don't think this changes the core design presented in the FLIP,
> > therefore I'm fine with accepting the FLIP. I wanted to mention my
> > concerns, so that maybe we can adjust the passed around structures
> slightly.
> >
> > Best,
> >
> > Dawid
> > ||
> >
> > On 30/03/2020 14:42, Leonard Xu wrote:
> >> +1(non-binding)
> >>
> >> Best,
> >> Leonard Xu
> >>
> >>> 在 2020年3月30日,16:43,Jingsong Li  写道:
> >>>
> >>> +1
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:
> >>>
>  +1
> 
>  Best,
>  Kurt
> 
> 
>  On Mon, Mar 30, 2020 at 4:08 PM Benchao Li
> wrote:
> 
> > +1 (non-binding)
> >
> > Jark Wu  于2020年3月30日周一 下午3:57写道:
> >
> >> +1 from my side.
> >>
> >> Thanks Timo for driving this.
> >>
> >> Best,
> >> Jark
> >>
> >> On Mon, 30 Mar 2020 at 15:36, Timo Walther
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I would like to start the vote for FLIP-95 [1], which is discussed
>  and
> >>> reached a consensus in the discussion thread [2].
> >>>
> >>> The vote will be open until April 2nd (72h), unless there is an
> >>> objection or not enough votes.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>> [1]
> >>>
> >>>
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> >>> [2]
> >>>
> >>>
> 
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking
> University
> > Tel:+86-15650713730
> > Email:libenc...@gmail.com;libenc...@pku.edu.cn
> >
> >>> --
> >>> Best, Jingsong Lee
>
>


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-02 Thread Timo Walther

Hi Dawid,

thanks for your feedback. I agree with your concerns. I also observed 
that TableSchema might be used at too many different places for 
different responsibilities.


How about we introduce a helper class for `SupportsProjectionPushDown` 
and also `LookupTableSource#Context#getKeys()` to represent nested 
structure of names. Data types, constraints, or computed columns are not 
necessary at those locations.


We can also add utility methods for connectors to this helper class 
there to quickly figuring out differences between the original table 
schema and the new one.


SelectedFields {

private LogicalType orignalRowType; // set by the planner

private int[][] indices;

getNames(int... at): String[]

getNames(String... at): String[]

getIndices(int... at): int[]

getNames(String... at): String[]

toTableSchema(): TableSchema
}

What do others think?

Thanks,
Timo



On 02.04.20 12:28, Dawid Wysakowicz wrote:

Generally +1

One slight concern I have is about the |SupportsProjectionPushDown.|I 
don't necessarily understand how can we express projections with 
TableSchema. It's unclear for me what happens when a type of a field 
changes, fields are in a different order, when types do not match. How 
do we express projection of a nested field with TableSchema?


I don't think this changes the core design presented in the FLIP, 
therefore I'm fine with accepting the FLIP. I wanted to mention my 
concerns, so that maybe we can adjust the passed around structures slightly.


Best,

Dawid
||

On 30/03/2020 14:42, Leonard Xu wrote:

+1(non-binding)

Best,
Leonard Xu


在 2020年3月30日,16:43,Jingsong Li  写道:

+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:


+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:


+1 (non-binding)

Jark Wu  于2020年3月30日周一 下午3:57写道:


+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:


Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed

and

reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an
objection or not enough votes.

Thanks,
Timo

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces

[2]



https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E

--

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email:libenc...@gmail.com;libenc...@pku.edu.cn


--
Best, Jingsong Lee




Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-02 Thread Dawid Wysakowicz
Generally +1

One slight concern I have is about the |SupportsProjectionPushDown.|I
don't necessarily understand how can we express projections with
TableSchema. It's unclear for me what happens when a type of a field
changes, fields are in a different order, when types do not match. How
do we express projection of a nested field with TableSchema?

I don't think this changes the core design presented in the FLIP,
therefore I'm fine with accepting the FLIP. I wanted to mention my
concerns, so that maybe we can adjust the passed around structures slightly.

Best,

Dawid
||

On 30/03/2020 14:42, Leonard Xu wrote:
> +1(non-binding)
>
> Best,
> Leonard Xu
>
>> 在 2020年3月30日,16:43,Jingsong Li  写道:
>>
>> +1
>>
>> Best,
>> Jingsong Lee
>>
>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:
>>
>>> +1
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:
>>>
 +1 (non-binding)

 Jark Wu  于2020年3月30日周一 下午3:57写道:

> +1 from my side.
>
> Thanks Timo for driving this.
>
> Best,
> Jark
>
> On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
>
>> Hi all,
>>
>> I would like to start the vote for FLIP-95 [1], which is discussed
>>> and
>> reached a consensus in the discussion thread [2].
>>
>> The vote will be open until April 2nd (72h), unless there is an
>> objection or not enough votes.
>>
>> Thanks,
>> Timo
>>
>> [1]
>>
>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>> [2]
>>
>>
>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E

 --

 Benchao Li
 School of Electronics Engineering and Computer Science, Peking University
 Tel:+86-15650713730
 Email: libenc...@gmail.com; libenc...@pku.edu.cn

>>
>> -- 
>> Best, Jingsong Lee


signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Leonard Xu
+1(non-binding)

Best,
Leonard Xu

> 在 2020年3月30日,16:43,Jingsong Li  写道:
> 
> +1
> 
> Best,
> Jingsong Lee
> 
> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:
> 
>> +1
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> Jark Wu  于2020年3月30日周一 下午3:57写道:
>>> 
 +1 from my side.
 
 Thanks Timo for driving this.
 
 Best,
 Jark
 
 On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
 
> Hi all,
> 
> I would like to start the vote for FLIP-95 [1], which is discussed
>> and
> reached a consensus in the discussion thread [2].
> 
> The vote will be open until April 2nd (72h), unless there is an
> objection or not enough votes.
> 
> Thanks,
> Timo
> 
> [1]
> 
> 
 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
> 
> 
 
>>> 
>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> 
 
>>> 
>>> 
>>> --
>>> 
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>> 
>> 
> 
> 
> -- 
> Best, Jingsong Lee



Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Jingsong Li
+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:

> +1
>
> Best,
> Kurt
>
>
> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:
>
> > +1 (non-binding)
> >
> > Jark Wu  于2020年3月30日周一 下午3:57写道:
> >
> > > +1 from my side.
> > >
> > > Thanks Timo for driving this.
> > >
> > > Best,
> > > Jark
> > >
> > > On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for FLIP-95 [1], which is discussed
> and
> > > > reached a consensus in the discussion thread [2].
> > > >
> > > > The vote will be open until April 2nd (72h), unless there is an
> > > > objection or not enough votes.
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>


-- 
Best, Jingsong Lee


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Kurt Young
+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:

> +1 (non-binding)
>
> Jark Wu  于2020年3月30日周一 下午3:57写道:
>
> > +1 from my side.
> >
> > Thanks Timo for driving this.
> >
> > Best,
> > Jark
> >
> > On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-95 [1], which is discussed and
> > > reached a consensus in the discussion thread [2].
> > >
> > > The vote will be open until April 2nd (72h), unless there is an
> > > objection or not enough votes.
> > >
> > > Thanks,
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Benchao Li
+1 (non-binding)

Jark Wu  于2020年3月30日周一 下午3:57写道:

> +1 from my side.
>
> Thanks Timo for driving this.
>
> Best,
> Jark
>
> On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
>
> > Hi all,
> >
> > I would like to start the vote for FLIP-95 [1], which is discussed and
> > reached a consensus in the discussion thread [2].
> >
> > The vote will be open until April 2nd (72h), unless there is an
> > objection or not enough votes.
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > [2]
> >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> >
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Jark Wu
+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:

> Hi all,
>
> I would like to start the vote for FLIP-95 [1], which is discussed and
> reached a consensus in the discussion thread [2].
>
> The vote will be open until April 2nd (72h), unless there is an
> objection or not enough votes.
>
> Thanks,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
>
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>


[VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Timo Walther

Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed and
reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an 
objection or not enough votes.


Thanks,
Timo

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[2] 
https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E