Hi all, Just to make it clear. I don't want to block the whole effort. I'm big +1 on the whole document and -0 on using the TableSchema for a projection pushdown.
My personal opinion was that TableSchema is misleading for Projection pushdown and I still think that way. That's why I wanted to bring it up to see if I am the only one. If that's the case, let's just proceed with the TableSchema. Best, Dawid On 02/04/2020 17:19, Jark Wu wrote: > Hi Timo, > > I don't think source should work with `CatalogTableSchema`. So far, a table > source doesn't need to know the logic information of computed column and > watermark. > IMO, we should provide a method to convert from `CatalogTableSchema` into > `TableSchema` without computed columns in source factory, > and a source should just hold the `TableSchema`. > > I agree doing the intersection/diff logic is trivial, but maybe we can > provide utilities to do that? So that we can keep the interface clean. > > > Best, > Jark > > > On Thu, 2 Apr 2020 at 20:17, Timo Walther <twal...@apache.org> wrote: > >> Hi Jark, >> >> if catalogs use `CatalogTableSchema` in the future. The source would >> internally also work with `CatalogTableSchema`. I'm fine with cleaning >> up the `TableSchema` class but should a source deal with two different >> schema classes then? >> >> Another problem that I see is that connectors usually need to perform >> some index arithmetics. Dealing with TableSchema and additionally within >> a field with DataType might be a bit inconvenient. A dedicated class >> with utilities might be helpful such that not every source needs to >> implement the same intersection/diff logic again. >> >> Regards, >> Timo >> >> >> On 02.04.20 14:06, Jark Wu wrote: >>> Hi Dawid, >>> >>>> How to express projections with TableSchema? >>> The TableSource holds the original TableSchema (i.e. from DDL) and the >>> pushed TableSchema represents the schema after projection. >>> Thus the table source can compare them to figure out changed field orders >>> or not matched types. >>> For most sources who maps physical storage by field names (e.g. jdbc, >>> hbase, json) they can just simply apply the pushed TableSchema. >>> But sources who maps by field indexes (e.g. csv), they need to figure out >>> the projected indexes by comparing the original and projected schema. >>> For example, the original schema is [a: String, b: Int, c: Timestamp], >> and >>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the >>> source can figure out index=1 is pruned. >>> >>>> How do we express projection of a nested field with TableSchema? >>> This is the same to the above one. For example, the original schema is >> [rk: >>> String, f1 Row<q1 Int, q2 Double>]. >>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 >>> Double>]. >>> >>>> TableSchema might be used at too many different places for different >>> responsibilities. >>> Agree. We have recognized that a structure and builder for pure table >>> schema is required in many places. But we mixed many concepts of catalog >>> table schema in TableSchema. >>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new >>> `CatalogTableSchema` to represent the schema part of a DDL, >>> and remove all the watermark, computed column information from >> TableSchema? >>> Then `TableSchema` can continue to serve as a pure table schema and it >>> stays in a good package. >>> >>> Best, >>> Jark >>> >>> >>> >>> >>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <twal...@apache.org> wrote: >>> >>>> Hi Dawid, >>>> >>>> thanks for your feedback. I agree with your concerns. I also observed >>>> that TableSchema might be used at too many different places for >>>> different responsibilities. >>>> >>>> How about we introduce a helper class for `SupportsProjectionPushDown` >>>> and also `LookupTableSource#Context#getKeys()` to represent nested >>>> structure of names. Data types, constraints, or computed columns are not >>>> necessary at those locations. >>>> >>>> We can also add utility methods for connectors to this helper class >>>> there to quickly figuring out differences between the original table >>>> schema and the new one. >>>> >>>> SelectedFields { >>>> >>>> private LogicalType orignalRowType; // set by the planner >>>> >>>> private int[][] indices; >>>> >>>> getNames(int... at): String[] >>>> >>>> getNames(String... at): String[] >>>> >>>> getIndices(int... at): int[] >>>> >>>> getNames(String... at): String[] >>>> >>>> toTableSchema(): TableSchema >>>> } >>>> >>>> What do others think? >>>> >>>> Thanks, >>>> Timo >>>> >>>> >>>> >>>> On 02.04.20 12:28, Dawid Wysakowicz wrote: >>>>> Generally +1 >>>>> >>>>> One slight concern I have is about the |SupportsProjectionPushDown.|I >>>>> don't necessarily understand how can we express projections with >>>>> TableSchema. It's unclear for me what happens when a type of a field >>>>> changes, fields are in a different order, when types do not match. How >>>>> do we express projection of a nested field with TableSchema? >>>>> >>>>> I don't think this changes the core design presented in the FLIP, >>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my >>>>> concerns, so that maybe we can adjust the passed around structures >>>> slightly. >>>>> Best, >>>>> >>>>> Dawid >>>>> || >>>>> >>>>> On 30/03/2020 14:42, Leonard Xu wrote: >>>>>> +1(non-binding) >>>>>> >>>>>> Best, >>>>>> Leonard Xu >>>>>> >>>>>>> 在 2020年3月30日,16:43,Jingsong Li<jingsongl...@gmail.com> 写道: >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<k...@apache.org> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Best, >>>>>>>> Kurt >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<libenc...@gmail.com> >>>> wrote: >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Jark Wu<imj...@gmail.com> 于2020年3月30日周一 下午3:57写道: >>>>>>>>> >>>>>>>>>> +1 from my side. >>>>>>>>>> >>>>>>>>>> Thanks Timo for driving this. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jark >>>>>>>>>> >>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<twal...@apache.org> >>>> wrote: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is >> discussed >>>>>>>> and >>>>>>>>>>> reached a consensus in the discussion thread [2]. >>>>>>>>>>> >>>>>>>>>>> The vote will be open until April 2nd (72h), unless there is an >>>>>>>>>>> objection or not enough votes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>>>>>>>> [2] >>>>>>>>>>> >>>>>>>>>>> >> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Benchao Li >>>>>>>>> School of Electronics Engineering and Computer Science, Peking >>>> University >>>>>>>>> Tel:+86-15650713730 >>>>>>>>> Email:libenc...@gmail.com;libenc...@pku.edu.cn >>>>>>>>> >>>>>>> -- >>>>>>> Best, Jingsong Lee >>>> >>
signature.asc
Description: OpenPGP digital signature