Hi Timo,

Thanks for starting the discussion! System columns are no doubt a
good boost on Flink SQL’s usability, and I see the feedbacks are
mainly concerns about the accessibility of system columns.

I think most of the concerns could be solved by clarifying the 
ownership of the system columns. Different from databases like 
Oracle/BigQuery/PG who owns the data/metadata, Flink uses the
data/metadata from external systems. That means Flink could
have 2 kinds of system columns (take ROWID for example):

1. system columns provided by external systems via catalogs, such
    as ROWID from the original system.
2. system columns generated by Flink, such as ROWID generated by
    Flink itself.

IIUC, the FLIP is proposing the 1st approach: the catalog defines what
system columns to provide, and Flink treats them as normal columns
with a special naming pattern.

On the other hand, Jark is proposing the 2nd one: the system columns
are defined and owned by Flink, and can be inferred from external 
systems. Therefore, system columns should be predefined by Flink,
and optionally implemented by the catalogs.

Personally, I’m in favor of the 2nd approach, because it makes the
system columns very accessible and more aligned across the catalogs.

BTW, I second Alexey that systems columns should not be shown with
DESCRIBE statements.

WDYT? Thanks!

Best,
Paul Lam

> 2023年7月31日 23:54,Jark Wu <imj...@gmail.com> 写道:
> 
> Hi Timo,
> 
> Thanks for your proposal. I think this is a nice feature for users and I
> prefer option 3.
> 
> I only have one concern about the concept of pseudo-column or
> system-column,
> because this is the first time we introduce it in Flink SQL. The
> confusion is similar to the
> question of Benchao and Sergey about the propagation of pseudo-column.
> 
> From my understanding, a pseudo-column can be get from an arbitrary query,
> just similar to
> ROWNUM in Oracle[1], such as :
> 
> SELECT *
> FROM (SELECT * FROM employees ORDER BY employee_id)
> WHERE ROWNUM < 11;
> 
> However, IIUC, the proposed "$rowtime" pseudo-column can only be got from
> the physical table
> and can't be got from queries even if the query propagates the rowtime
> attribute. There was also
> a discussion about adding a pseudo-column "_proctime" [2] to make lookup
> join easier to use
> which can be got from arbitrary queries. That "_proctime" may conflict with
> the proposed
> pseudo-column concept.
> 
> Did you consider making it as a built-in defined pseudo-column "$rowtime"
> which returns the
> time attribute value (if exists) or null (if non-exists) for every
> table/query, and pseudo-column
> "$proctime" always returns PROCTIME() value for each table/query. In this
> way, catalogs only need
> to provide a default rowtime attribute and users can get it in the same
> way. And we don't need
> to introduce the contract interface of "Metadata Key Prefix Constraint"
> which is still a little complex
> for users and devs to understand.
> 
> Best,
> Jark
> 
> [1]:
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255
> [2]: https://lists.apache.org/thread/7ln106qxyw8sp7ljq40hs2p1lb1gdwj5
> 
> 
> 
> 
> On Fri, 28 Jul 2023 at 06:18, Alexey Leonov-Vendrovskiy <
> vendrov...@gmail.com> wrote:
> 
>>> 
>>> `SELECT * FROM (SELECT $rowtime, * FROM t);`
>>> Am I right that it will show `$rowtime` in output ?
>> 
>> 
>> Yes, all explicitly selected columns become a part of the result (and
>> intermediate) schema, and hence propagate.
>> 
>> On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy <
>> vendrov...@gmail.com> wrote:
>> 
>>> Thank you, Timo, for starting this FLIP!
>>> 
>>> I propose the following change:
>>> 
>>> Remove the requirement that DESCRIBE need to show system columns.
>>> 
>>> 
>>> Some concrete vendor specific catalog implementations might prefer this
>>> approach.
>>> Usually the same system columns are available on all (or family) of
>>> tables, and it can be easily captured in the documentation.
>>> 
>>> For example, BigQuery does exactly this: there, pseudo-columns do not
>> show
>>> up in the table schema in any place, but can be accessed via reference.
>>> 
>>> So I propose we:
>>> a) Either we say that DESCRIBE doesn't show system columns,
>>> b) Or leave this vendor-specific / or configurable via flag (if needed).
>>> 
>>> Regards,
>>> Alexey
>>> 
>>> On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin <snuyan...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Timo,
>>>> 
>>>> Thanks for the FLIP.
>>>> I also tend to think that Option 3 is better.
>>>> 
>>>> I would be also interested in a question mentioned by Benchao Li.
>>>> And a similar question about nested queries like
>>>> `SELECT * FROM (SELECT $rowtime, * FROM t);`
>>>> Am I right that it will show `$rowtime` in output ?
>>>> 
>>>> 
>>>> On Thu, Jul 27, 2023 at 6:58 AM Benchao Li <libenc...@apache.org>
>> wrote:
>>>> 
>>>>> Hi Timo,
>>>>> 
>>>>> Thanks for the FLIP, I also like the idea and option 3 sounds good to
>>>> me.
>>>>> 
>>>>> I would like to discuss a case which is not mentioned in the current
>>>> FLIP.
>>>>> How are the "System column"s expressed in intermediate result, e.g.
>>>> Join?
>>>>> E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system
>>>>> columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime,
>> *
>>>>> FROM t1 JOIN t2`, it should also be valid.
>>>>> Then the question is how to you plan to implement the "system
>> columns",
>>>> do
>>>>> we need to add it to `RelNode` level? Or we just need to do it in the
>>>>> parsing/validating phase?
>>>>> I'm not sure that Calcite's "system column" feature is fully ready for
>>>> this
>>>>> since the code about this part is imported from the earlier project
>>>> before
>>>>> it gets into Apache, and has not been considered much in the past
>>>>> development.
>>>>> 
>>>>> 
>>>>> Jing Ge <j...@ververica.com.invalid> 于2023年7月26日周三 00:01写道:
>>>>> 
>>>>>> Hi Timo,
>>>>>> 
>>>>>> Thanks for your proposal. It is a very pragmatic feature. Among all
>>>>> options
>>>>>> in the FLIP, option 3 is one I prefer too and I'd like to ask some
>>>>>> questions to understand your thoughts.
>>>>>> 
>>>>>> 1. I did some research on pseudo columns, just out of curiosity, do
>>>> you
>>>>>> know why most SQL systems do not need any prefix with their pseudo
>>>>> column?
>>>>>> 2. Some platform providers will use ${variable_name} to define their
>>>> own
>>>>>> configurations and allow them to be embedded into SQL scripts. Will
>>>> there
>>>>>> be any conflict with option 3?
>>>>>> 
>>>>>> Best regards,
>>>>>> Jing
>>>>>> 
>>>>>> On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf <kna...@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Timo,
>>>>>>> 
>>>>>>> this makes sense to me. Option 3 seems reasonable, too.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Konstantin
>>>>>>> 
>>>>>>> Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther <
>>>>>>> twal...@apache.org
>>>>>>>> :
>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> 
>>>>>>>> I would like to start a discussion about introducing the concept
>>>> of
>>>>>>>> "System Columns" in SQL and Table API.
>>>>>>>> 
>>>>>>>> The subject sounds bigger than it actually is. Luckily, Flink
>> SQL
>>>>>>>> already exposes the concept of metadata columns. And this
>>>> proposal is
>>>>>>>> just a slight adjustment for how metadata columns can be used as
>>>>> system
>>>>>>>> columns.
>>>>>>>> 
>>>>>>>> The biggest problem of metadata columns currently is that a
>>>> catalog
>>>>>>>> implementation can't provide them by default because they would
>>>>> affect
>>>>>>>> `SELECT *` when adding another one.
>>>>>>>> 
>>>>>>>> Looking forward to your feedback on FLIP-348:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Timo
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> https://twitter.com/snntrable
>>>>>>> https://github.com/knaufk
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best,
>>>>> Benchao Li
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Sergey
>>>> 
>>> 
>> 

Reply via email to