Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

Aljoscha Krettek Fri, 04 Sep 2020 04:37:21 -0700

I like the proposal! I didn't check the implementation section in detailbut the SQL DDL examples look good as well as the options for specifyinghow fields are mapped to keys/values look good.


Aljoscha


On 04.09.20 11:47, Dawid Wysakowicz wrote:

Hi Timo,

Thank you very much for the update. It indeed covers the full story in
more details. I agree with the proposal.

On 04/09/2020 10:48, Timo Walther wrote:

Hi everyone,

I completely reworked FLIP-107. It now covers the full story how to
read and write metadata from/to connectors and formats. It considers
all of the latest FLIPs, namely FLIP-95, FLIP-132 and FLIP-122. It
introduces the concept of PERSISTED computed columns and leaves out
partitioning for now.

Looking forward to your feedback.

Regards,
Timo


On 04.03.20 09:45, Kurt Young wrote:

Sorry, forgot one question.

4. Can we make the value.fields-include more orthogonal? Like one can
specify it as "EXCEPT_KEY, EXCEPT_TIMESTAMP".
With current  EXCEPT_KEY and EXCEPT_KEY_TIMESTAMP, users can not
config to
just ignore timestamp but keep key.

Best,
Kurt


On Wed, Mar 4, 2020 at 4:42 PM Kurt Young <[email protected]> wrote:

Hi Dawid,

I have a couple of questions around key fields, actually I also have
some
other questions but want to be focused on key fields first.

1. I don't fully understand the usage of "key.fields". Is this
option only
valid during write operation? Because for
reading, I can't imagine how such options can be applied. I would
expect
that there might be a SYSTEM_METADATA("key")
to read and assign the key to a normal field?

2. If "key.fields" is only valid in write operation, I want to
propose we
can simplify the options to not introducing key.format.type and
other related options. I think a single "key.field" (not fields)
would be
enough, users can use UDF to calculate whatever key they
want before sink.

3. Also I don't want to introduce "value.format.type" and
"value.format.xxx" with the "value" prefix. Not every connector has a
concept
of key and values. The old parameter "format.type" already good
enough to
use.

Best,
Kurt


On Tue, Mar 3, 2020 at 10:40 PM Jark Wu <[email protected]> wrote:

Thanks Dawid,

I have two more questions.

SupportsMetadata

Introducing SupportsMetadata sounds good to me. But I have some
questions
regarding to this interface.
1) How do the source know what the expected return type of each
metadata?
2) Where to put the metadata fields? Append to the existing physical
fields?
If yes, I would suggest to change the signature to `TableSource
appendMetadataFields(String[] metadataNames, DataType[]
metadataTypes)`

SYSTEM_METADATA("partition")

Can SYSTEM_METADATA() function be used nested in a computed column
expression? If yes, how to specify the return type of SYSTEM_METADATA?

Best,
Jark

On Tue, 3 Mar 2020 at 17:06, Dawid Wysakowicz <[email protected]>
wrote:

Hi,

1. I thought a bit more on how the source would emit the columns
and I
now see its not exactly the same as regular columns. I see a need to
elaborate a bit more on that in the FLIP as you asked, Jark.

I do agree mostly with Danny on how we should do that. One additional
things I would introduce is an

interface SupportsMetadata {

     boolean supportsMetadata(Set<String> metadataFields);

     TableSource generateMetadataFields(Set<String> metadataFields);

}

This way the source would have to declare/emit only the requested
metadata fields. In order not to clash with user defined fields. When
emitting the metadata field I would prepend the column name with
__system_{property_name}. Therefore when requested
SYSTEM_METADATA("partition") the source would append a field
__system_partition to the schema. This would be never visible to the
user as it would be used only for the subsequent computed columns. If
that makes sense to you, I will update the FLIP with this
description.

2. CAST vs explicit type in computed columns

Here I agree with Danny. It is also the current state of the
proposal.

3. Partitioning on computed column vs function

Here I also agree with Danny. I also think those are orthogonal. I
would
leave out the STORED computed columns out of the discussion. I
don't see
how do they relate to the partitioning. I already put both of those
cases in the document. We can either partition on a computed
column or
use a udf in a partioned by clause. I am fine with leaving out the
partitioning by udf in the first version if you still have some

concerns.


As for your question Danny. It depends which partitioning strategy
you

use.


For the HASH partitioning strategy I thought it would work as you
explained. It would be N = MOD(expr, num). I am not sure though if we
should introduce the PARTITIONS clause. Usually Flink does not own
the
data and the partitions are already an intrinsic property of the
underlying source e.g. for kafka we do not create topics, but we just
describe pre-existing pre-partitioned topic.

4. timestamp vs timestamp.field vs connector.field vs ...

I am fine with changing it to timestamp.field to be consistent with
other value.fields and key.fields. Actually that was also my initial
proposal in a first draft I prepared. I changed it afterwards to
shorten
the key.

Best,

Dawid

On 03/03/2020 09:00, Danny Chan wrote:

Thanks Dawid for bringing up this discussion, I think it is a useful

feature ~


About how the metadata outputs from source

I think it is completely orthogonal, computed column push down is

another topic, this should not be a blocker but a promotion, if we do

not

have any filters on the computed column, there is no need to do any
pushings; the source node just emit the complete record with full

metadata

with the declared physical schema, then when generating the virtual
columns, we would extract the metadata info and output as full

columns(with

full schema).


About the type of metadata column

Personally i prefer explicit type instead of CAST, they are symantic

equivalent though, explict type is more straight-forward and we can

declare

the nullable attribute there.


About option A: partitioning based on acomputed column VS option B:

partitioning with just a function


  From the FLIP, it seems that B's partitioning is just a strategy
when

writing data, the partiton column is not included in the table
schema,

so

it's just useless when reading from that.


- Compared to A, we do not need to generate the partition column
when

selecting from the table(but insert into)

- For A we can also mark the column as STORED when we want to
persist

that


So in my opition they are orthogonal, we can support both, i saw
that

MySQL/Oracle[1][2] would suggest to also define the PARTITIONS
num, and

the

partitions are managed under a "tablenamespace", the partition in
which

the

record is stored is partition number N, where N = MOD(expr, num), for

your

design, which partiton the record would persist ?


[1] https://dev.mysql.com/doc/refman/5.7/en/partitioning-hash.html
[2]

https://docs.oracle.com/database/121/VLDBG/GUID-F023D3ED-262F-4B19-950A-D3C8F8CDB4F4.htm#VLDBG1270


Best,
Danny Chan
在 2020年3月2日 +0800 PM6:16，Dawid Wysakowicz
<[email protected]

，写道：

Hi Jark,
Ad. 2 I added a section to discuss relation to FLIP-63
Ad. 3 Yes, I also tried to somewhat keep hierarchy of properties.

Therefore you have the key.format.type.

I also considered exactly what you are suggesting (prefixing with

connector or kafka). I should've put that into an Option/Rejected
alternatives.

I agree timestamp, key.*, value.* are connector properties. Why I

wanted to suggest not adding that prefix in the first version is that
actually all the properties in the WITH section are connector

properties.

Even format is in the end a connector property as some of the sources

might

not have a format, imo. The benefit of not adding the prefix is
that it
makes the keys a bit shorter. Imagine prefixing all the properties
with
connector (or if we go with FLINK-12557: elasticsearch):

elasticsearch.key.format.type: csv
elasticsearch.key.format.field: ....
elasticsearch.key.format.delimiter: ....
elasticsearch.key.format.*: ....
I am fine with doing it though if this is a preferred approach
in the

community.

Ad in-line comments:
I forgot to update the `value.fields.include` property. It
should be

value.fields-include. Which I think you also suggested in the
comment,
right?

As for the cast vs declaring output type of computed column. I
think

it's better not to use CAST, but declare a type of an expression and

later

on infer the output type of SYSTEM_METADATA. The reason is I think
this

way

it will be easier to implement e.g. filter push downs when working
with

the

native types of the source, e.g. in case of Kafka's offset, i
think it's
better to pushdown long rather than string. This could let us push
expression like e.g. offset > 12345 & offset < 59382. Otherwise we
would
have to push down cast(offset, long) > 12345 && cast(offset, long) <

59382.

Moreover I think we need to introduce the type for computed columns

anyway

to support functions that infer output type based on expected return

type.

As for the computed column push down. Yes, SYSTEM_METADATA would
have

to be pushed down to the source. If it is not possible the planner

should

fail. As far as I know computed columns push down will be part of
source
rework, won't it? ;)

As for the persisted computed column. I think it is completely

orthogonal. In my current proposal you can also partition by a
computed
column. The difference between using a udf in partitioned by vs

partitioned

by a computed column is that when you partition by a computed column

this

column must be also computed when reading the table. If you use a
udf in
the partitioned by, the expression is computed only when inserting
into

the

table.

Hope this answers some of your questions. Looking forward for
further

suggestions.

Best,
Dawid


On 02/03/2020 05:18, Jark Wu wrote:

Hi,

Thanks Dawid for starting such a great discussion. Reaing metadata

and

key-part information from source is an important feature for

streaming

users.

In general, I agree with the proposal of the FLIP.
I will leave my thoughts and comments here:

1) +1 to use connector properties instead of introducing HEADER

keyword as

the reason you mentioned in the FLIP.
2) we already introduced PARTITIONED BY in FLIP-63. Maybe we
should

add a

section to explain what's the relationship between them.
     Do their concepts conflict? Could INSERT PARTITION be used
on the
PARTITIONED table in this FLIP?
3) Currently, properties are hierarchical in Flink SQL. Shall we

make

the

new introduced properties more hierarchical?
     For example, "timestamp" => "connector.timestamp"?
(actually, I

prefer

"kafka.timestamp" which is another improvement for properties

FLINK-12557)

     A single "timestamp" in properties may mislead users that the

field

is

a rowtime attribute.

I also left some minor comments in the FLIP.

Thanks,
Jark



On Sun, 1 Mar 2020 at 22:30, Dawid Wysakowicz <

[email protected]>

wrote:

Hi,

I would like to propose an improvement that would enable reading

table

columns from different parts of source records. Besides the main

payload

majority (if not all of the sources) expose additional

information. It

can be simply a read-only metadata such as offset, ingestion time

or a

read and write  parts of the record that contain data but

additionally

serve different purposes (partitioning, compaction etc.), e.g.
key

or

timestamp in Kafka.

We should make it possible to read and write data from all of
those
locations. In this proposal I discuss reading partitioning data,

for

completeness this proposal discusses also the partitioning when

writing

data out.

I am looking forward to your comments.

You can access the FLIP here:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode


Best,

Dawid

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

Reply via email to