Thanks for your feedback Kurt and Jark.

I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like.

Parameters should be passed when calling the function instead.

Users can use the query itself to parametize the function: `SELECT bf(0.0001, a), bf(0.005, a)`


If at all, we can think about a `CREATE FUNCTION bf1 WITH (precision=0.001)` DDL syntax. But with registering instances we might shoot ourselves in the foot in the future.

Usually, function exist as classes, it seems unnatural to me that users need to instantiate the function first before registering it.


What do think?

Thanks,
Timo

On 11.10.19 05:10, Jark Wu wrote:
Hi, Timo

Regarding to createTemporaryFunction, I agree with Kurt, we should
encourage the class way, but should keep the instance way.
The reason is that this is an important feature to support parameterize
function as you said, but this can't be addressed by global job parameters
and open() methods.
Because users may want to instantiate more than one instance, e.g. `val bf1
= new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they are
used in one query.

The another reason is that this can't make the function call string
serializable, because in the Scala Table API path, users can still pass in
a function instance, e.g. table.select(bf1('a))

Best,
Jark

On Thu, 10 Oct 2019 at 23:09, Kurt Young <ykt...@gmail.com> wrote:

Regarding to createTemporaryFunction, could we just keep both choices? I
agree
we should encourage users to use the class, and it's the only choice when
user
using DDL. But for some Table API & SQL programs, I think it's actually
more nature
to use an object instead of use the classes. But this just depends on
personal
choices. One use case came to mind which might be affected by this is if we
want
to support some anonymous lambda functions as UDF. It will need such create
function
with objects support.

Best,
Kurt


On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz <dwysakow...@apache.org>
wrote:

Thanks for the comments.

@Timo I think your suggestion makes sense. I changed the signature to

void createTemporaryFunction(String path, Class<? extends
UserDefinedFunction> functionClass);
This will mean that after we make QueryOperations string serializable and
we expose the type inference in functions we will have only a single
object
that cannot be persisted in a catalog and thus have only a "Temporary"
method:

void createTemporaryView(String path, DataStream stream);

I updated the FLIP page accordingly. I also listed explicitly methods
that
we need to deprecate and added a note that the implementation of the
methods concerning UserDefinedFunctions has to be postponed until we have
type inference in place.

As the voting period for FLIP-57 ended, I will start the VOTE thread for
FLIP-64 tomorrow morning, unless there are some objections until then.

Best,

Dawid


On 10/10/2019 16:16, Kurt Young wrote:

Thanks for the clarification Timo, that's sounds good to me. Let's
continue
to discuss other things.

Best,
Kurt


On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <twal...@apache.org> <
twal...@apache.org> wrote:

The purpose of ConnectTableDescriptor was to have a programmatic way of
expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree
that it needs some rework in the future, once we have finalized the DDL
to ensure that both concepts are in sync.

Regards,
Timo


On 10.10.19 16:08, Kurt Young wrote:

Regarding to ConnectTableDescriptor, if in the end it becomes a builder
of CatalogTable, I would be ok with that. But it doesn't look like a

builder

now, it's more like another form of TableSource/TableSink.

Best,
Kurt


On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <twal...@apache.org> <
twal...@apache.org> wrote:

Hi everyone,

thanks for the great discussion with a good outcome.

I have one last comment about:

void createTemporaryFunction(String path, UserDefinedFunction function);

We should encourage users to register a class instead of an instance. We
should enforce a default constructor in the future. If we need metadata
or type inference, the planner can simply instantiate the class and call
the corresponding getters. If people would like to parameterize a
function, they can use global job parameters instead - which are
available in the open() method.

I suggest:

void createTemporaryFunction(String path, Class<? extends
UserDefinedFunction> functionClass);

This is also closer to the SQL DDL that is about to be implemented in:
https://issues.apache.org/jira/browse/FLINK-7151
Thanks,
Timo


On 09.10.19 17:07, Bowen Li wrote:

Hi Dawid,

+1 for proposed changes

On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <

dwysakow...@apache.org

wrote:


Sorry for a very delayed response.

@Kurt Yes, this is the goal to have a function created like new
Function(...) also be wrapped into CatalogFunction. This would have to
be though a temporary function as we cannot represent that as a set of
properties. Similar to the createTemporaryView(DataStream stream).

As for the ConnectTableDescriptor I agree this is very similar to
CatalogTable. I am not sure though if we should get rid of it. In the
end I see it as a builder for a CatalogTable, which is a slightly more
internal API, but we might revisit that some time in the future if we
find that it makes more sense.

@All I updated the FLIP page with some more details from the outcome

of

the discussions around FLIP-57. Please take a look. I would like to
start a vote on this FLIP as soon as the vote on FLIP-57 goes through.

Best,

Dawid


On 19/09/2019 09:24, Kurt Young wrote:

IIUC it's good to see that both serializable (tables description from

DDL)

and unserializable (tables with DataStream underneath) tables are

treated

unify with CatalogTable.

Can I also assume functions that either come from a function class

(from

DDL)
or function objects (newed by user) will also treated unify with
CatalogFunction?

This will greatly simplify and unify current API level concepts and

design.

And it seems only one thing left, how do we deal with
ConnectTableDescriptor?
It's actually very similar with serializable CatalogTable, both carry

some

text
properties which even are the same. Is there any chance we can

further

unify

this to CatalogTable?

object
Best,
Kurt


On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> <
imj...@gmail.com> wrote:

Thanks Dawid for the design doc.

In general, I’m +1 to the FLIP.


+1 to the single-string and parse way to express object path.

+1 to deprecate registerTableSink & registerTableSource.
But I would suggest to provide an easy way to register a custom
source/sink before we drop them (this is another story).
Currently, it’s not easy to implement a custom connector descriptor.

Best,
Jark



在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> <
wysakowicz.da...@gmail.com>
写道:

Hi JingsongLee,
   From my understanding they can. Underneath they will be

CatalogTables.

The

difference is the lifetime of the tables. Plus some of the user

facing

interfaces cannot be persisted e.g. datastream. Therefore we must

have

a

separate methods for that. In the end the temporary tables are held

in

memory as CatalogTables.
Best,
Dawid

On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com

.invalid>

wrote:


Hi dawid:
Can temporary tables achieve the same capabilities as catalog

table?

like statistics: CatalogTableStatistics, CatalogColumnStatistics,
PartitionStatistics
like partition support: we have added some catalog equivalent

interfaces

on TableSource/TableSink: getPartitions, getPartitionFieldNames
Maybe it's not a good idea to add these interfaces to
TableSource/TableSink. What do you think?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <ykt...@gmail.com> <ykt...@gmail.com>
Send Time:2019年9月18日(星期三) 17:54
To:dev <dev@flink.apache.org> <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in

Table

module

Hi all,

Sorry to join this party late. Big +1 to this flip, especially for

the

dropping
"registerTableSink & registerTableSource" part. These are indeed

legacy

and we should try to unify them through CatalogTable after we

introduce

the concept of Catalog.

   From my understanding, what we can registered should all be

metadata,

TableSource/TableSink should only be the one who is responsible to

do

the real work, i.e. reading and writing data according to the

schema

and

other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <

lzljs3620...@aliyun.com

.invalid>
wrote:


After some development and thinking, I have a general

understanding.

+1 to registering a source/sink does not fit into the SQL world.
I am OK to have a deprecated registerTemporarySource/Sink to

compatible

with old ways.

Best,
Jingsong Lee




------------------------------------------------------------------

From:Timo Walther <twal...@apache.org> <twal...@apache.org>
Send Time:2019年9月17日(星期二) 08:00
To:dev <dev@flink.apache.org> <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in

Table

module

Hi Dawid,

thanks for the design document. It fixes big concept gaps due to
historical reasons with proper support for serializability and

catalog

support in mind.

I would not mind a registerTemporarySource/Sink, but the problem

that I

see is that many people think that this is the recommended way of
registering a table source/sink which is not true. We should

guide

users

to either use connect() or DDL API which can be validated and

stored

in

catalog.

Also from a concept perspective, registering a source/sink does

not

fit

into the SQL world. SQL does not know about source/sinks but only

about

tables. If the responsibility of a TableSource/TableSink is just

a

pure

physical data consumer/producer that is not connected to the

actual

logical table schema, we would need a possibility of defining

time

attributes and interpreting/converting a changelog. This should

be

done

by the framework with information from the DDL/connect() and not

be

defined in every table source.

Regards,
Timo


On 09.09.19 14:16, JingsongLee wrote:

Hi dawid:

It is difficult to describe specific examples.
Sometimes users will generate some java converters through some
    Java code, or generate some Java classes through third-party
    libraries. Of course, these can be best done through

properties.

But this requires additional work from users.My suggestion is to
    keep this Java instance class way that is user-friendly.

Best,
Jingsong Lee




------------------------------------------------------------------

From:Dawid Wysakowicz <dwysakow...@apache.org> <dwysakow...@apache.org>
Send Time:2019年9月6日(星期五) 16:21
To:dev <dev@flink.apache.org> <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in

Table

module

Hi all,
@Jingsong Could you elaborate a bit more what do you mean by
"some Connectors are difficult to convert all states to

properties"

All the Flink provided connectors will definitely be expressible

with

properties (In the end you should be able to use them from DDL).

I

think

if

a TableSource is complex enough that it handles filter push down,

partition

support etc. should rather be made available both from DDL &

java/scala

code. I'm happy to reconsider adding

registerTemporaryTable(String

path,

TableSource source) if you have some concrete examples in mind.

@Xuefu: We also considered the ObjectIdentifier (or actually

introducing

a new identifier representation to differentiate between resolved

and

unresolved identifiers) with the same concerns. We decided to

suggest

the

string & parsing logic because of usability.

       tEnv.from("cat.db.table")
is shorter and easier to write than
       tEnv.from(Identifier.for("cat", "db", "name")
And also implicitly solves the problem what happens if a user

(e.g.

used

to other systems) uses that API in a following manner:

       tEnv.from(Identifier.for("db.name")
I'm happy to revisit it if the general consensus is that it's

better

to

use the OO aproach.

Best,
Dawid

On 06/09/2019 10:00, Xuefu Z wrote:

Thanks to Dawid for starting the discussion and writeup. It

looks

pretty

good to me except that I'm a little concerned about the object

reference

and string parsing in the code, which seems to an anti-pattern

to

OOP.

Have

we considered using ObjectIdenitifier with optional catalog and

db

parts,

esp. if we are worried about arguments of variable length or

method

overloading? It's quite likely that the result of string parsing

is

an

ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as

it

duplicates part of the DDL/DML parsing, and they can easily get

out

of

sync.

Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <

lzljs3620...@aliyun.com

.invalid>

wrote:


Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink &

registerTableSource

    in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
    are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related

features,

like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee




------------------------------------------------------------------

From:Dawid Wysakowicz <dwysakow...@apache.org> <dwysakow...@apache.org>
Send Time:2019年9月4日(星期三) 22:20
To:dev <dev@flink.apache.org> <dev@flink.apache.org>
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in

Table

module

Hi all,
As part of FLIP-30 a Catalog API was introduced that enables

storing

table

meta objects permanently. At the same time the majority of

current

APIs

create temporary objects that cannot be serialized. We should

clarify

the

creation of meta objects (tables, views, functions) in a unified

way.

Another current problem in the API is that all the temporary

objects

are

stored in a special built-in catalog, which is not very

intuitive

for

many

users, as they must be aware of that catalog to reference

temporary

objects.

Lastly, different APIs have different ways of providing object

paths:

String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:



https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module


Reply via email to