Hi, It is a quite long discussion to follow and I hope I didn’t misunderstand anything. From the proposals presented by Xuefu I would vote:
-1 for #1 and #2 +1 for #3 Besides #3 being IMO more general and more consistent, having qualified names (#3) would help/make easier for someone to use cross databases/catalogs queries (joining multiple data sets/streams). For example with some functions to manipulate/clean up/convert the stored data in different catalogs registered in the respective catalogs. Piotrek > On 19 Sep 2019, at 06:35, Jark Wu <imj...@gmail.com> wrote: > > I agree with Xuefu that inconsistent handling with all the other objects is > not a big problem. > > Regarding to option#3, the special "system.system" namespace may confuse > users. > Users need to know the set of built-in function names to know when to use > "system.system" namespace. > What will happen if user registers a non-builtin function name under the > "system.system" namespace? > Besides, I think it doesn't solve the "explode" problem I mentioned at the > beginning of this thread. > > So here is my vote: > > +1 for #1 > 0 for #2 > -1 for #3 > > Best, > Jark > > > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <usxu...@gmail.com> wrote: > >> @Dawid, Re: we also don't need additional referencing the specialcatalog >> anywhere. >> >> True. But once we allow such reference, then user can do so in any possible >> place where a function name is expected, for which we have to handle. >> That's a big difference, I think. >> >> Thanks, >> Xuefu >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz < >> wysakowicz.da...@gmail.com> >> wrote: >> >>> @Bowen I am not suggesting introducing additional catalog. I think we >> need >>> to get rid of the current built-in catalog. >>> >>> @Xuefu in option #3 we also don't need additional referencing the special >>> catalog anywhere else besides in the CREATE statement. The resolution >>> behaviour is exactly the same in both options. >>> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <usxu...@gmail.com> wrote: >>> >>>> Hi Dawid, >>>> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It can >> be >>>> changed to something else for better. >>>> >>>> The difference between this and the #3 approach is that we only need >> the >>>> keyword for this create DDL. For other places (such as function >>>> referencing), no keyword or special namespace is needed. >>>> >>>> Thanks, >>>> Xuefu >>>> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz < >>>> wysakowicz.da...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> I think it makes sense to start voting at this point. >>>>> >>>>> Option 1: Only 1-part identifiers >>>>> PROS: >>>>> - allows shadowing built-in functions >>>>> CONS: >>>>> - incosistent with all the other objects, both permanent & temporary >>>>> - does not allow shadowing catalog functions >>>>> >>>>> Option 2: Special keyword for built-in function >>>>> I think this is quite similar to the special catalog/db. The thing I >> am >>>>> strongly against in this proposal is the GLOBAL keyword. This keyword >>>> has a >>>>> meaning in rdbms systems and means a function that is present for a >>>>> lifetime of a session in which it was created, but available in all >>> other >>>>> sessions. Therefore I really don't want to use this keyword in a >>>> different >>>>> context. >>>>> >>>>> Option 3: Special catalog/db >>>>> >>>>> PROS: >>>>> - allows shadowing built-in functions >>>>> - allows shadowing catalog functions >>>>> - consistent with other objects >>>>> CONS: >>>>> - we introduce a special namespace for built-in functions >>>>> >>>>> I don't see a problem with introducing the special namespace. In the >>> end >>>> it >>>>> is very similar to the keyword approach. In this case the catalog/db >>>>> combination would be the "keyword" >>>>> >>>>> Therefore my votes: >>>>> Option 1: -0 >>>>> Option 2: -1 (I might change to +0 if we can come up with a better >>>> keyword) >>>>> Option 3: +1 >>>>> >>>>> Best, >>>>> Dawid >>>>> >>>>> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <usxu...@gmail.com> wrote: >>>>> >>>>>> Hi Aljoscha, >>>>>> >>>>>> Thanks for the summary and these are great questions to be >> answered. >>>> The >>>>>> answer to your first question is clear: there is a general >> agreement >>> to >>>>>> override built-in functions with temp functions. >>>>>> >>>>>> However, your second and third questions are sort of related, as a >>>>> function >>>>>> reference can be either just function name (like "func") or in the >>> form >>>>> or >>>>>> "cat.db.func". When a reference is just function name, it can mean >>>>> either a >>>>>> built-in function or a function defined in the current cat/db. If >> we >>>>>> support overriding a built-in function with a temp function, such >>>>>> overriding can also cover a function in the current cat/db. >>>>>> >>>>>> I think what Timo referred as "overriding a catalog function" >> means a >>>>> temp >>>>>> function defined as "cat.db.func" overrides a catalog function >> "func" >>>> in >>>>>> cat/db even if cat/db is not current. To support this, temp >> function >>>> has >>>>> to >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd >>>>> questions >>>>>> are related. The problem with such support is the ambiguity when >> user >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func >>> ...". >>>>>> Here "func" can means a global temp function, or a temp function in >>>>> current >>>>>> cat/db. If we can assume the former, this creates an inconsistency >>>>> because >>>>>> "CREATE FUNCTION func" actually means a function in current cat/db. >>> If >>>> we >>>>>> assume the latter, then there is no way for user to create a global >>>> temp >>>>>> function. >>>>>> >>>>>> Giving a special namespace for built-in functions may solve the >>>> ambiguity >>>>>> problem above, but it also introduces artificial catalog/database >>> that >>>>>> needs special treatment and pollutes the cleanness of the code. I >>>> would >>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE >>>>>> [GLOBAL] TEMPORARY FUNCTION func". >>>>>> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting >>>>> purposes: >>>>>> >>>>>> 1. Support only global, temporary functions without namespace. Such >>>> temp >>>>>> functions overrides built-in functions and catalog functions in >>> current >>>>>> cat/db. The resolution order is: temp functions -> built-in >> functions >>>> -> >>>>>> catalog functions. (Partially or fully qualified functions has no >>>>>> ambiguity!) >>>>>> >>>>>> 2. In addition to #1, support creating and referencing temporary >>>>> functions >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global >>> temp >>>>>> functions. The resolution order is: global temp functions -> >> built-in >>>>>> functions -> temp functions in current cat/db -> catalog function. >>>>>> (Resolution for partially or fully qualified function reference is: >>>> temp >>>>>> functions -> persistent functions.) >>>>>> >>>>>> 3. In addition to #1, support creating and referencing temporary >>>>> functions >>>>>> associated with a cat/db with a special namespace for built-in >>>> functions >>>>>> and global temp functions. The resolution is the same as #2, except >>>> that >>>>>> the special namespace might be prefixed to a reference to a >> built-in >>>>>> function or global temp function. (In absence of the special >>> namespace, >>>>> the >>>>>> resolution order is the same as in #2.) >>>>>> >>>>>> My personal preference is #1, given the unknown use case and >>> introduced >>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative. >>>> Thus, >>>>>> my votes are: >>>>>> >>>>>> +1 for #1 >>>>>> +0 for #2 >>>>>> -1 for #3 >>>>>> >>>>>> Everyone, please cast your vote (in above format please!), or let >> me >>>> know >>>>>> if you have more questions or other candidates. >>>>>> >>>>>> Thanks, >>>>>> Xuefu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek < >>> aljos...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I think this discussion and the one for FLIP-64 are very >> connected. >>>> To >>>>>>> resolve the differences, think we have to think about the basic >>>>>> principles >>>>>>> and find consensus there. The basic questions I see are: >>>>>>> >>>>>>> - Do we want to support overriding builtin functions? >>>>>>> - Do we want to support overriding catalog functions? >>>>>>> - And then later: should temporary functions be tied to a >>>>>>> catalog/database? >>>>>>> >>>>>>> I don’t have much to say about these, except that we should >>> somewhat >>>>>> stick >>>>>>> to what the industry does. But I also understand that the >> industry >>> is >>>>>>> already very divided on this. >>>>>>> >>>>>>> Best, >>>>>>> Aljoscha >>>>>>> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <imj...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We >>> are >>>>>>> close to the truth. It will waste a lot of time if we resume the >>>> topic >>>>>> some >>>>>>> time later. >>>>>>>> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s >>> “cat.db.fun” >>>>> way >>>>>>> to override a catalog function. >>>>>>>> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a >>> nonexistent >>>>> cat >>>>>>> & db? And we still need to do special treatment for the dedicated >>>>>>> system.system cat & db? >>>>>>>> >>>>>>>> Best, >>>>>>>> Jark >>>>>>>> >>>>>>>> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <twal...@apache.org> 写道: >>>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> @Xuefu: I would like to avoid adding too many things >>>> incrementally. >>>>>>> Users should be able to override all catalog objects consistently >>>>>> according >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If >>>>> functions >>>>>>> are treated completely different, we need more code and special >>>> cases. >>>>>> From >>>>>>> an implementation perspective, this topic only affects the lookup >>>> logic >>>>>>> which is rather low implementation effort which is why I would >> like >>>> to >>>>>>> clarify the remaining items. As you said, we have a slight >> consenus >>>> on >>>>>>> overriding built-in functions; we should also strive for reaching >>>>>> consensus >>>>>>> on the remaining topics. >>>>>>>>> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog >>> objects >>>>>>> consistent and the overriding of built-in functions more >> explicit. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17.09.19 11:59, kai wang wrote: >>>>>>>>>> hi, everyone >>>>>>>>>> I think this flip is very meaningful. it supports functions >>> that >>>>> can >>>>>> be >>>>>>>>>> shared by different catalogs and dbs, reducing the >> duplication >>> of >>>>>>> functions. >>>>>>>>>> >>>>>>>>>> Our group based on flink's sql parser module implements >> create >>>>>> function >>>>>>>>>> feature, stores the parsed function metadata and schema into >>>> mysql, >>>>>> and >>>>>>>>>> also customizes the catalog, customizes sql-client to support >>>>> custom >>>>>>>>>> schemas and functions. Loaded, but the function is currently >>>>> global, >>>>>>> and is >>>>>>>>>> not subdivided according to catalog and db. >>>>>>>>>> >>>>>>>>>> In addition, I very much hope to participate in the >> development >>>> of >>>>>> this >>>>>>>>>> flip, I have been paying attention to the community, but >> found >>> it >>>>> is >>>>>>> more >>>>>>>>>> difficult to join. >>>>>>>>>> thank you. >>>>>>>>>> >>>>>>>>>> Xuefu Z <usxu...@gmail.com> 于2019年9月17日周二 上午11:19写道: >>>>>>>>>> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts. >>>>>>>>>>> >>>>>>>>>>> It seems to me that there is a general consensus on having >>> temp >>>>>>> functions >>>>>>>>>>> that have no namespaces and overwrite built-in functions. >> (As >>> a >>>>> side >>>>>>> note >>>>>>>>>>> for comparability, the current user defined functions are >> all >>>>>>> temporary and >>>>>>>>>>> having no namespaces.) >>>>>>>>>>> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced >>> temp >>>>>>> functions >>>>>>>>>>> that can overwrite functions defined in a specific cat/db. >>>>> However, >>>>>>> this >>>>>>>>>>> idea appears orthogonal to the former and can be added >>>>>> incrementally. >>>>>>>>>>> >>>>>>>>>>> How about we first implement non-namespaced temp functions >> now >>>> and >>>>>>> leave >>>>>>>>>>> the door open for namespaced ones for later releases as the >>>>>>> requirement >>>>>>>>>>> might become more crystal? This also helps shorten the >> debate >>>> and >>>>>>> allow us >>>>>>>>>>> to make some progress along this direction. >>>>>>>>>>> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the >>>>>> temporary >>>>>>> temp >>>>>>>>>>> functions that don't have namespaces, my only concern is the >>>>> special >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as >>> evident >>>> in >>>>>>> treating >>>>>>>>>>> the built-in catalog currently. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Xuefiu >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz < >>>>>>>>>>> wysakowicz.da...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How >>> about >>>>> we >>>>>>> have a >>>>>>>>>>>> special namespace (catalog + database) for built-in >> objects? >>>> This >>>>>>> catalog >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting. >>>>>>>>>>>> >>>>>>>>>>>> Then users could still override built-in functions, if they >>>> fully >>>>>>> qualify >>>>>>>>>>>> object with the built-in namespace, but by default the >> common >>>>> logic >>>>>>> of >>>>>>>>>>>> current dB & cat would be used. >>>>>>>>>>>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ... >>>>>>>>>>>> registers temporary function in current cat & dB >>>>>>>>>>>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ... >>>>>>>>>>>> registers temporary function in cat db >>>>>>>>>>>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ... >>>>>>>>>>>> Overrides built-in function with temporary function >>>>>>>>>>>> >>>>>>>>>>>> The built-in/system namespace would not be writable for >>>> permanent >>>>>>>>>>> objects. >>>>>>>>>>>> WDYT? >>>>>>>>>>>> >>>>>>>>>>>> This way I think we can have benefits of both solutions. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Dawid >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, < >> twal...@apache.org >>>> >>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Bowen, >>>>>>>>>>>>> >>>>>>>>>>>>> I understand the potential benefit of overriding certain >>>>> built-in >>>>>>>>>>>>> functions. I'm open to such a feature if many people >> agree. >>>>>>> However, it >>>>>>>>>>>>> would be great to still support overriding catalog >> functions >>>>> with >>>>>>>>>>>>> temporary functions in order to prototype a query even >>> though >>>> a >>>>>>>>>>>>> catalog/database might not be available currently or >> should >>>> not >>>>> be >>>>>>>>>>>>> modified yet. How about we support both cases? >>>>>>>>>>>>> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs >>>>>>>>>>>>> -> creates/overrides a built-in function and never >>> consideres >>>>>>> current >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but >>>> acceptable >>>>>> for >>>>>>>>>>>>> functions I guess. >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun >>>>>>>>>>>>> -> creates/overrides a catalog function >>>>>>>>>>>>> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects >>>> (tables, >>>>>>> views) >>>>>>>>>>>>> except functions", this might change in the near future. >>> Take >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an >>>>> example. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote: >>>>>>>>>>>>>> Hi Fabian, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable >>> thus I >>>>>>> didn't >>>>>>>>>>>>>> include that as a voting option, and the discussion is >>> mainly >>>>>>> between >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Re > However, it means that temp functions are >> differently >>>>>> treated >>>>>>>>>>> than >>>>>>>>>>>>>> other db objects. >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that >>>>>> functions >>>>>>>>>>> are >>>>>>>>>>>> a >>>>>>>>>>>>>> bit different from other objects - Flink don't have any >>> other >>>>>>>>>>> built-in >>>>>>>>>>>>>> objects (tables, views) except functions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Xuefu Zhang >>>>>>>>>>> >>>>>>>>>>> "In Honey We Trust!" >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Xuefu Zhang >>>>>> >>>>>> "In Honey We Trust!" >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Xuefu Zhang >>>> >>>> "In Honey We Trust!" >>>> >>> >> >> >> -- >> Xuefu Zhang >> >> "In Honey We Trust!" >>