Hi Timo, Sounds good to me.
Thanks, Jark On Fri, 11 Oct 2019 at 17:37, Timo Walther <twal...@apache.org> wrote: > Hi Jark, > > great to see that there is consensus on something like `CREATE FUNCTION > bf1 WITH (precision=0.001)`. > > In the near future, there should be no difference between a catalog > function and a temporary function. This should simplify the stack a lot. > > How about we postpone this topic after FLIP-65? I understand your points > but would also like to have similar semantics for SQL and Table API. We > can just skip the function part for now and also wait for the FLIP > around the function DDL (FLINK-7151). I would suggest to mention both > class and instance in Dawid's FLIP for now. > > Regards, > Timo > > On 11.10.19 10:32, Jark Wu wrote: > > Hi Timo, > > > > The reason to put the parameter in constructor instead of `eval` is that > > 1) the initialization for the parameters may take long > > 2) there might be a lot of parameters, it's verbose to specify them in > > every where it's used. > > > > Having some syntax like `CREATE FUNCTION bf1 WITH (precision=0.001)` is > > great. > > For catalog functions, I totally agree with that. But for temporal > > function, why do we have to enforce it is a class? > > >From my understanding, temporal objects are stored in memory and will > not > > be serialized to catalogs. > > > > If we want to prohibit function instances, are we also planning to drop > the > > support of `table.select(bf1('a))` in Scala Table API? > > > > Regards, > > Jark > > > > > > On Fri, 11 Oct 2019 at 15:34, Timo Walther <twal...@apache.org> wrote: > > > >> Thanks for your feedback Kurt and Jark. > >> > >> I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, > >> `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like. > >> > >> Parameters should be passed when calling the function instead. > >> > >> Users can use the query itself to parametize the function: `SELECT > >> bf(0.0001, a), bf(0.005, a)` > >> > >> > >> If at all, we can think about a `CREATE FUNCTION bf1 WITH > >> (precision=0.001)` DDL syntax. But with registering instances we might > >> shoot ourselves in the foot in the future. > >> > >> Usually, function exist as classes, it seems unnatural to me that users > >> need to instantiate the function first before registering it. > >> > >> > >> What do think? > >> > >> Thanks, > >> Timo > >> > >> On 11.10.19 05:10, Jark Wu wrote: > >>> Hi, Timo > >>> > >>> Regarding to createTemporaryFunction, I agree with Kurt, we should > >>> encourage the class way, but should keep the instance way. > >>> The reason is that this is an important feature to support parameterize > >>> function as you said, but this can't be addressed by global job > >> parameters > >>> and open() methods. > >>> Because users may want to instantiate more than one instance, e.g. `val > >> bf1 > >>> = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they > >> are > >>> used in one query. > >>> > >>> The another reason is that this can't make the function call string > >>> serializable, because in the Scala Table API path, users can still pass > >> in > >>> a function instance, e.g. table.select(bf1('a)) > >>> > >>> Best, > >>> Jark > >>> > >>> On Thu, 10 Oct 2019 at 23:09, Kurt Young <ykt...@gmail.com> wrote: > >>> > >>>> Regarding to createTemporaryFunction, could we just keep both > choices? I > >>>> agree > >>>> we should encourage users to use the class, and it's the only choice > >> when > >>>> user > >>>> using DDL. But for some Table API & SQL programs, I think it's > actually > >>>> more nature > >>>> to use an object instead of use the classes. But this just depends on > >>>> personal > >>>> choices. One use case came to mind which might be affected by this is > >> if we > >>>> want > >>>> to support some anonymous lambda functions as UDF. It will need such > >> create > >>>> function > >>>> with objects support. > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz < > >> dwysakow...@apache.org> > >>>> wrote: > >>>> > >>>>> Thanks for the comments. > >>>>> > >>>>> @Timo I think your suggestion makes sense. I changed the signature to > >>>>> > >>>>> void createTemporaryFunction(String path, Class<? extends > >>>> UserDefinedFunction> functionClass); > >>>>> This will mean that after we make QueryOperations string serializable > >> and > >>>>> we expose the type inference in functions we will have only a single > >>>> object > >>>>> that cannot be persisted in a catalog and thus have only a > "Temporary" > >>>>> method: > >>>>> > >>>>> void createTemporaryView(String path, DataStream stream); > >>>>> > >>>>> I updated the FLIP page accordingly. I also listed explicitly methods > >>>> that > >>>>> we need to deprecate and added a note that the implementation of the > >>>>> methods concerning UserDefinedFunctions has to be postponed until we > >> have > >>>>> type inference in place. > >>>>> > >>>>> As the voting period for FLIP-57 ended, I will start the VOTE thread > >> for > >>>>> FLIP-64 tomorrow morning, unless there are some objections until > then. > >>>>> > >>>>> Best, > >>>>> > >>>>> Dawid > >>>>> > >>>>> > >>>>> On 10/10/2019 16:16, Kurt Young wrote: > >>>>> > >>>>> Thanks for the clarification Timo, that's sounds good to me. Let's > >>>> continue > >>>>> to discuss other things. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <twal...@apache.org> < > >>>> twal...@apache.org> wrote: > >>>>> The purpose of ConnectTableDescriptor was to have a programmatic way > of > >>>>> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > >>>>> that it needs some rework in the future, once we have finalized the > DDL > >>>>> to ensure that both concepts are in sync. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 10.10.19 16:08, Kurt Young wrote: > >>>>> > >>>>> Regarding to ConnectTableDescriptor, if in the end it becomes a > builder > >>>>> of CatalogTable, I would be ok with that. But it doesn't look like a > >>>>> > >>>>> builder > >>>>> > >>>>> now, it's more like another form of TableSource/TableSink. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <twal...@apache.org> < > >>>> twal...@apache.org> wrote: > >>>>> Hi everyone, > >>>>> > >>>>> thanks for the great discussion with a good outcome. > >>>>> > >>>>> I have one last comment about: > >>>>> > >>>>> void createTemporaryFunction(String path, UserDefinedFunction > >> function); > >>>>> We should encourage users to register a class instead of an instance. > >> We > >>>>> should enforce a default constructor in the future. If we need > metadata > >>>>> or type inference, the planner can simply instantiate the class and > >> call > >>>>> the corresponding getters. If people would like to parameterize a > >>>>> function, they can use global job parameters instead - which are > >>>>> available in the open() method. > >>>>> > >>>>> I suggest: > >>>>> > >>>>> void createTemporaryFunction(String path, Class<? extends > >>>>> UserDefinedFunction> functionClass); > >>>>> > >>>>> This is also closer to the SQL DDL that is about to be implemented > in: > >>>> https://issues.apache.org/jira/browse/FLINK-7151 > >>>>> Thanks, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 09.10.19 17:07, Bowen Li wrote: > >>>>> > >>>>> Hi Dawid, > >>>>> > >>>>> +1 for proposed changes > >>>>> > >>>>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > >>>>> > >>>>> dwysakow...@apache.org > >>>>> > >>>>> wrote: > >>>>> > >>>>> > >>>>> Sorry for a very delayed response. > >>>>> > >>>>> @Kurt Yes, this is the goal to have a function created like new > >>>>> Function(...) also be wrapped into CatalogFunction. This would have > to > >>>>> be though a temporary function as we cannot represent that as a set > of > >>>>> properties. Similar to the createTemporaryView(DataStream stream). > >>>>> > >>>>> As for the ConnectTableDescriptor I agree this is very similar to > >>>>> CatalogTable. I am not sure though if we should get rid of it. In the > >>>>> end I see it as a builder for a CatalogTable, which is a slightly > more > >>>>> internal API, but we might revisit that some time in the future if we > >>>>> find that it makes more sense. > >>>>> > >>>>> @All I updated the FLIP page with some more details from the outcome > >>>>> > >>>>> of > >>>>> > >>>>> the discussions around FLIP-57. Please take a look. I would like to > >>>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes > through. > >>>>> > >>>>> Best, > >>>>> > >>>>> Dawid > >>>>> > >>>>> > >>>>> On 19/09/2019 09:24, Kurt Young wrote: > >>>>> > >>>>> IIUC it's good to see that both serializable (tables description from > >>>>> > >>>>> DDL) > >>>>> > >>>>> and unserializable (tables with DataStream underneath) tables are > >>>>> > >>>>> treated > >>>>> > >>>>> unify with CatalogTable. > >>>>> > >>>>> Can I also assume functions that either come from a function class > >>>>> > >>>>> (from > >>>>> > >>>>> DDL) > >>>>> or function objects (newed by user) will also treated unify with > >>>>> CatalogFunction? > >>>>> > >>>>> This will greatly simplify and unify current API level concepts and > >>>>> > >>>>> design. > >>>>> > >>>>> And it seems only one thing left, how do we deal with > >>>>> ConnectTableDescriptor? > >>>>> It's actually very similar with serializable CatalogTable, both carry > >>>>> > >>>>> some > >>>>> > >>>>> text > >>>>> properties which even are the same. Is there any chance we can > >>>>> > >>>>> further > >>>>> > >>>>> unify > >>>>> > >>>>> this to CatalogTable? > >>>>> > >>>>> object > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> < > >>>> imj...@gmail.com> wrote: > >>>>> Thanks Dawid for the design doc. > >>>>> > >>>>> In general, I’m +1 to the FLIP. > >>>>> > >>>>> > >>>>> +1 to the single-string and parse way to express object path. > >>>>> > >>>>> +1 to deprecate registerTableSink & registerTableSource. > >>>>> But I would suggest to provide an easy way to register a custom > >>>>> source/sink before we drop them (this is another story). > >>>>> Currently, it’s not easy to implement a custom connector descriptor. > >>>>> > >>>>> Best, > >>>>> Jark > >>>>> > >>>>> > >>>>> > >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> < > >>>> wysakowicz.da...@gmail.com> > >>>>> 写道: > >>>>> > >>>>> Hi JingsongLee, > >>>>> From my understanding they can. Underneath they will be > >>>>> > >>>>> CatalogTables. > >>>>> > >>>>> The > >>>>> > >>>>> difference is the lifetime of the tables. Plus some of the user > >>>>> > >>>>> facing > >>>>> > >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must > >>>>> > >>>>> have > >>>>> > >>>>> a > >>>>> > >>>>> separate methods for that. In the end the temporary tables are held > >>>>> > >>>>> in > >>>>> > >>>>> memory as CatalogTables. > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com > >>>>> > >>>>> .invalid> > >>>>> > >>>>> wrote: > >>>>> > >>>>> > >>>>> Hi dawid: > >>>>> Can temporary tables achieve the same capabilities as catalog > >>>>> > >>>>> table? > >>>>> > >>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>>> PartitionStatistics > >>>>> like partition support: we have added some catalog equivalent > >>>>> > >>>>> interfaces > >>>>> > >>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>>> Maybe it's not a good idea to add these interfaces to > >>>>> TableSource/TableSink. What do you think? > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> From:Kurt Young <ykt...@gmail.com> <ykt...@gmail.com> > >>>>> Send Time:2019年9月18日(星期三) 17:54 > >>>>> To:dev <dev@flink.apache.org> <dev@flink.apache.org> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi all, > >>>>> > >>>>> Sorry to join this party late. Big +1 to this flip, especially for > >>>>> > >>>>> the > >>>>> > >>>>> dropping > >>>>> "registerTableSink & registerTableSource" part. These are indeed > >>>>> > >>>>> legacy > >>>>> > >>>>> and we should try to unify them through CatalogTable after we > >>>>> > >>>>> introduce > >>>>> > >>>>> the concept of Catalog. > >>>>> > >>>>> From my understanding, what we can registered should all be > >>>>> > >>>>> metadata, > >>>>> > >>>>> TableSource/TableSink should only be the one who is responsible to > >>>>> > >>>>> do > >>>>> > >>>>> the real work, i.e. reading and writing data according to the > >>>>> > >>>>> schema > >>>>> > >>>>> and > >>>>> > >>>>> other information like computed column, partition, .e.g. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > >>>>> > >>>>> lzljs3620...@aliyun.com > >>>>> > >>>>> .invalid> > >>>>> wrote: > >>>>> > >>>>> > >>>>> After some development and thinking, I have a general > >>>>> > >>>>> understanding. > >>>>> > >>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>> I am OK to have a deprecated registerTemporarySource/Sink to > >>>>> > >>>>> compatible > >>>>> > >>>>> with old ways. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> > >>>>> From:Timo Walther <twal...@apache.org> <twal...@apache.org> > >>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>> To:dev <dev@flink.apache.org> <dev@flink.apache.org> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi Dawid, > >>>>> > >>>>> thanks for the design document. It fixes big concept gaps due to > >>>>> historical reasons with proper support for serializability and > >>>>> > >>>>> catalog > >>>>> > >>>>> support in mind. > >>>>> > >>>>> I would not mind a registerTemporarySource/Sink, but the problem > >>>>> > >>>>> that I > >>>>> > >>>>> see is that many people think that this is the recommended way of > >>>>> registering a table source/sink which is not true. We should > >>>>> > >>>>> guide > >>>>> > >>>>> users > >>>>> > >>>>> to either use connect() or DDL API which can be validated and > >>>>> > >>>>> stored > >>>>> > >>>>> in > >>>>> > >>>>> catalog. > >>>>> > >>>>> Also from a concept perspective, registering a source/sink does > >>>>> > >>>>> not > >>>>> > >>>>> fit > >>>>> > >>>>> into the SQL world. SQL does not know about source/sinks but only > >>>>> > >>>>> about > >>>>> > >>>>> tables. If the responsibility of a TableSource/TableSink is just > >>>>> > >>>>> a > >>>>> > >>>>> pure > >>>>> > >>>>> physical data consumer/producer that is not connected to the > >>>>> > >>>>> actual > >>>>> > >>>>> logical table schema, we would need a possibility of defining > >>>>> > >>>>> time > >>>>> > >>>>> attributes and interpreting/converting a changelog. This should > >>>>> > >>>>> be > >>>>> > >>>>> done > >>>>> > >>>>> by the framework with information from the DDL/connect() and not > >>>>> > >>>>> be > >>>>> > >>>>> defined in every table source. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>> > >>>>> Hi dawid: > >>>>> > >>>>> It is difficult to describe specific examples. > >>>>> Sometimes users will generate some java converters through some > >>>>> Java code, or generate some Java classes through third-party > >>>>> libraries. Of course, these can be best done through > >>>>> > >>>>> properties. > >>>>> > >>>>> But this requires additional work from users.My suggestion is to > >>>>> keep this Java instance class way that is user-friendly. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> > >>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> < > dwysakow...@apache.org > >>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>> To:dev <dev@flink.apache.org> <dev@flink.apache.org> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi all, > >>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>> "some Connectors are difficult to convert all states to > >>>>> > >>>>> properties" > >>>>> > >>>>> All the Flink provided connectors will definitely be expressible > >>>>> > >>>>> with > >>>>> > >>>>> properties (In the end you should be able to use them from DDL). > >>>>> > >>>>> I > >>>>> > >>>>> think > >>>>> > >>>>> if > >>>>> > >>>>> a TableSource is complex enough that it handles filter push down, > >>>>> > >>>>> partition > >>>>> > >>>>> support etc. should rather be made available both from DDL & > >>>>> > >>>>> java/scala > >>>>> > >>>>> code. I'm happy to reconsider adding > >>>>> > >>>>> registerTemporaryTable(String > >>>>> > >>>>> path, > >>>>> > >>>>> TableSource source) if you have some concrete examples in mind. > >>>>> > >>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>>> > >>>>> introducing > >>>>> > >>>>> a new identifier representation to differentiate between resolved > >>>>> > >>>>> and > >>>>> > >>>>> unresolved identifiers) with the same concerns. We decided to > >>>>> > >>>>> suggest > >>>>> > >>>>> the > >>>>> > >>>>> string & parsing logic because of usability. > >>>>> > >>>>> tEnv.from("cat.db.table") > >>>>> is shorter and easier to write than > >>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>> And also implicitly solves the problem what happens if a user > >>>>> > >>>>> (e.g. > >>>>> > >>>>> used > >>>>> > >>>>> to other systems) uses that API in a following manner: > >>>>> > >>>>> tEnv.from(Identifier.for("db.name") > >>>>> I'm happy to revisit it if the general consensus is that it's > >>>>> > >>>>> better > >>>>> > >>>>> to > >>>>> > >>>>> use the OO aproach. > >>>>> > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>> > >>>>> Thanks to Dawid for starting the discussion and writeup. It > >>>>> > >>>>> looks > >>>>> > >>>>> pretty > >>>>> > >>>>> good to me except that I'm a little concerned about the object > >>>>> > >>>>> reference > >>>>> > >>>>> and string parsing in the code, which seems to an anti-pattern > >>>>> > >>>>> to > >>>>> > >>>>> OOP. > >>>>> > >>>>> Have > >>>>> > >>>>> we considered using ObjectIdenitifier with optional catalog and > >>>>> > >>>>> db > >>>>> > >>>>> parts, > >>>>> > >>>>> esp. if we are worried about arguments of variable length or > >>>>> > >>>>> method > >>>>> > >>>>> overloading? It's quite likely that the result of string parsing > >>>>> > >>>>> is > >>>>> > >>>>> an > >>>>> > >>>>> ObjectIdentifier instance any way. > >>>>> > >>>>> Having string parsing logic in the code is a little dangerous as > >>>>> > >>>>> it > >>>>> > >>>>> duplicates part of the DDL/DML parsing, and they can easily get > >>>>> > >>>>> out > >>>>> > >>>>> of > >>>>> > >>>>> sync. > >>>>> > >>>>> Thanks, > >>>>> Xuefu > >>>>> > >>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > >>>>> > >>>>> lzljs3620...@aliyun.com > >>>>> > >>>>> .invalid> > >>>>> > >>>>> wrote: > >>>>> > >>>>> > >>>>> Thanks dawid, +1 for this approach. > >>>>> > >>>>> One concern is the removal of registerTableSink & > >>>>> > >>>>> registerTableSource > >>>>> > >>>>> in TableEnvironment. It has two alternatives: > >>>>> 1.the properties approach (DDL, descriptor). > >>>>> 2.from/toDataStream. > >>>>> > >>>>> #1 can only be properties, not java states, and some Connectors > >>>>> are difficult to convert all states to properties. > >>>>> #2 can contain java state. But can't use TableSource-related > >>>>> > >>>>> features, > >>>>> > >>>>> like project & filter push down, partition support, etc.. > >>>>> > >>>>> Any idea about this? > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> > >>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> < > dwysakow...@apache.org > >>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>> To:dev <dev@flink.apache.org> <dev@flink.apache.org> > >>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi all, > >>>>> As part of FLIP-30 a Catalog API was introduced that enables > >>>>> > >>>>> storing > >>>>> > >>>>> table > >>>>> > >>>>> meta objects permanently. At the same time the majority of > >>>>> > >>>>> current > >>>>> > >>>>> APIs > >>>>> > >>>>> create temporary objects that cannot be serialized. We should > >>>>> > >>>>> clarify > >>>>> > >>>>> the > >>>>> > >>>>> creation of meta objects (tables, views, functions) in a unified > >>>>> > >>>>> way. > >>>>> > >>>>> Another current problem in the API is that all the temporary > >>>>> > >>>>> objects > >>>>> > >>>>> are > >>>>> > >>>>> stored in a special built-in catalog, which is not very > >>>>> > >>>>> intuitive > >>>>> > >>>>> for > >>>>> > >>>>> many > >>>>> > >>>>> users, as they must be aware of that catalog to reference > >>>>> > >>>>> temporary > >>>>> > >>>>> objects. > >>>>> > >>>>> Lastly, different APIs have different ways of providing object > >>>>> > >>>>> paths: > >>>>> > >>>>> String path…, > >>>>> String path, String pathContinued… > >>>>> String name > >>>>> We should choose one approach and unify it across all APIs. > >>>>> I suggest a FLIP to address the above issues. > >>>>> Looking forward to your opinions. > >>>>> FLIP link: > >>>>> > >>>>> > >>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > >