Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API

Alexey Leonov-Vendrovskiy Fri, 04 Aug 2023 14:07:14 -0700

Looks like both kinds of system columns can converge.
We can say that any operator can introduce system (psedo) columns.


cc Eugene who is also interested in the subject.

On Wed, Aug 2, 2023 at 1:03 AM Paul Lam <paullin3...@gmail.com> wrote:

> Hi Timo,
>
> Thanks for starting the discussion! System columns are no doubt a
> good boost on Flink SQL’s usability, and I see the feedbacks are
> mainly concerns about the accessibility of system columns.
>
> I think most of the concerns could be solved by clarifying the
> ownership of the system columns. Different from databases like
> Oracle/BigQuery/PG who owns the data/metadata, Flink uses the
> data/metadata from external systems. That means Flink could
> have 2 kinds of system columns (take ROWID for example):
>
> 1. system columns provided by external systems via catalogs, such
>     as ROWID from the original system.
> 2. system columns generated by Flink, such as ROWID generated by
>     Flink itself.
>
> IIUC, the FLIP is proposing the 1st approach: the catalog defines what
> system columns to provide, and Flink treats them as normal columns
> with a special naming pattern.
>
> On the other hand, Jark is proposing the 2nd one: the system columns
> are defined and owned by Flink, and can be inferred from external
> systems. Therefore, system columns should be predefined by Flink,
> and optionally implemented by the catalogs.
>
> Personally, I’m in favor of the 2nd approach, because it makes the
> system columns very accessible and more aligned across the catalogs.
>
> BTW, I second Alexey that systems columns should not be shown with
> DESCRIBE statements.
>
> WDYT? Thanks!
>
> Best,
> Paul Lam
>
> > 2023年7月31日 23:54，Jark Wu <imj...@gmail.com> 写道：
> >
> > Hi Timo,
> >
> > Thanks for your proposal. I think this is a nice feature for users and I
> > prefer option 3.
> >
> > I only have one concern about the concept of pseudo-column or
> > system-column,
> > because this is the first time we introduce it in Flink SQL. The
> > confusion is similar to the
> > question of Benchao and Sergey about the propagation of pseudo-column.
> >
> > From my understanding, a pseudo-column can be get from an arbitrary
> query,
> > just similar to
> > ROWNUM in Oracle[1], such as :
> >
> > SELECT *
> > FROM (SELECT * FROM employees ORDER BY employee_id)
> > WHERE ROWNUM < 11;
> >
> > However, IIUC, the proposed "$rowtime" pseudo-column can only be got from
> > the physical table
> > and can't be got from queries even if the query propagates the rowtime
> > attribute. There was also
> > a discussion about adding a pseudo-column "_proctime" [2] to make lookup
> > join easier to use
> > which can be got from arbitrary queries. That "_proctime" may conflict
> with
> > the proposed
> > pseudo-column concept.
> >
> > Did you consider making it as a built-in defined pseudo-column "$rowtime"
> > which returns the
> > time attribute value (if exists) or null (if non-exists) for every
> > table/query, and pseudo-column
> > "$proctime" always returns PROCTIME() value for each table/query. In this
> > way, catalogs only need
> > to provide a default rowtime attribute and users can get it in the same
> > way. And we don't need
> > to introduce the contract interface of "Metadata Key Prefix Constraint"
> > which is still a little complex
> > for users and devs to understand.
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255
> > [2]: https://lists.apache.org/thread/7ln106qxyw8sp7ljq40hs2p1lb1gdwj5
> >
> >
> >
> >
> > On Fri, 28 Jul 2023 at 06:18, Alexey Leonov-Vendrovskiy <
> > vendrov...@gmail.com> wrote:
> >
> >>>
> >>> `SELECT * FROM (SELECT $rowtime, * FROM t);`
> >>> Am I right that it will show `$rowtime` in output ?
> >>
> >>
> >> Yes, all explicitly selected columns become a part of the result (and
> >> intermediate) schema, and hence propagate.
> >>
> >> On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy <
> >> vendrov...@gmail.com> wrote:
> >>
> >>> Thank you, Timo, for starting this FLIP!
> >>>
> >>> I propose the following change:
> >>>
> >>> Remove the requirement that DESCRIBE need to show system columns.
> >>>
> >>>
> >>> Some concrete vendor specific catalog implementations might prefer this
> >>> approach.
> >>> Usually the same system columns are available on all (or family) of
> >>> tables, and it can be easily captured in the documentation.
> >>>
> >>> For example, BigQuery does exactly this: there, pseudo-columns do not
> >> show
> >>> up in the table schema in any place, but can be accessed via reference.
> >>>
> >>> So I propose we:
> >>> a) Either we say that DESCRIBE doesn't show system columns,
> >>> b) Or leave this vendor-specific / or configurable via flag (if
> needed).
> >>>
> >>> Regards,
> >>> Alexey
> >>>
> >>> On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin <snuyan...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Timo,
> >>>>
> >>>> Thanks for the FLIP.
> >>>> I also tend to think that Option 3 is better.
> >>>>
> >>>> I would be also interested in a question mentioned by Benchao Li.
> >>>> And a similar question about nested queries like
> >>>> `SELECT * FROM (SELECT $rowtime, * FROM t);`
> >>>> Am I right that it will show `$rowtime` in output ?
> >>>>
> >>>>
> >>>> On Thu, Jul 27, 2023 at 6:58 AM Benchao Li <libenc...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Timo,
> >>>>>
> >>>>> Thanks for the FLIP, I also like the idea and option 3 sounds good to
> >>>> me.
> >>>>>
> >>>>> I would like to discuss a case which is not mentioned in the current
> >>>> FLIP.
> >>>>> How are the "System column"s expressed in intermediate result, e.g.
> >>>> Join?
> >>>>> E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include
> "system
> >>>>> columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime,
> >> *
> >>>>> FROM t1 JOIN t2`, it should also be valid.
> >>>>> Then the question is how to you plan to implement the "system
> >> columns",
> >>>> do
> >>>>> we need to add it to `RelNode` level? Or we just need to do it in the
> >>>>> parsing/validating phase?
> >>>>> I'm not sure that Calcite's "system column" feature is fully ready
> for
> >>>> this
> >>>>> since the code about this part is imported from the earlier project
> >>>> before
> >>>>> it gets into Apache, and has not been considered much in the past
> >>>>> development.
> >>>>>
> >>>>>
> >>>>> Jing Ge <j...@ververica.com.invalid> 于2023年7月26日周三 00:01写道：
> >>>>>
> >>>>>> Hi Timo,
> >>>>>>
> >>>>>> Thanks for your proposal. It is a very pragmatic feature. Among all
> >>>>> options
> >>>>>> in the FLIP, option 3 is one I prefer too and I'd like to ask some
> >>>>>> questions to understand your thoughts.
> >>>>>>
> >>>>>> 1. I did some research on pseudo columns, just out of curiosity, do
> >>>> you
> >>>>>> know why most SQL systems do not need any prefix with their pseudo
> >>>>> column?
> >>>>>> 2. Some platform providers will use ${variable_name} to define their
> >>>> own
> >>>>>> configurations and allow them to be embedded into SQL scripts. Will
> >>>> there
> >>>>>> be any conflict with option 3?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Jing
> >>>>>>
> >>>>>> On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf <kna...@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Timo,
> >>>>>>>
> >>>>>>> this makes sense to me. Option 3 seems reasonable, too.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> Konstantin
> >>>>>>>
> >>>>>>> Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther <
> >>>>>>> twal...@apache.org
> >>>>>>>> :
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> I would like to start a discussion about introducing the concept
> >>>> of
> >>>>>>>> "System Columns" in SQL and Table API.
> >>>>>>>>
> >>>>>>>> The subject sounds bigger than it actually is. Luckily, Flink
> >> SQL
> >>>>>>>> already exposes the concept of metadata columns. And this
> >>>> proposal is
> >>>>>>>> just a slight adjustment for how metadata columns can be used as
> >>>>> system
> >>>>>>>> columns.
> >>>>>>>>
> >>>>>>>> The biggest problem of metadata columns currently is that a
> >>>> catalog
> >>>>>>>> implementation can't provide them by default because they would
> >>>>> affect
> >>>>>>>> `SELECT *` when adding another one.
> >>>>>>>>
> >>>>>>>> Looking forward to your feedback on FLIP-348:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Timo
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> https://twitter.com/snntrable
> >>>>>>> https://github.com/knaufk
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best,
> >>>>> Benchao Li
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Sergey
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API

Reply via email to