Looks like I'm the only person who is willing to +1 to #2 for now :-) But I would suggest to change the keyword from GLOBAL to something like BUILTIN.
I think #2 and #3 are almost the same proposal, just with different format to indicate whether it want to override built-in functions. My biggest reason to choose it is I want this behavior be consistent with temporal tables. I will give some examples to show the behavior and also make sure I'm not misunderstanding anything here. For most DBs, when user create a temporary table with: CREATE TEMPORARY TABLE t1 It's actually equivalent with: CREATE TEMPORARY TABLE `curent_db`.t1 If user change current database, they will not be able to access t1 without fully qualified name, .i.e db1.t1 (assuming db1 is current database when this temporary table is created). Only #2 and #3 followed this behavior and I would vote for this since this makes such behavior consistent through temporal tables and functions. Why I'm not voting for #3 is a special catalog and database just looks very hacky to me. It gave a imply that our built-in functions saved at a special catalog and database, which is actually not. Introducing a dedicated keyword like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and straightforward. One can argue that we should avoid introducing new keyword, but it's also very rare that a system can overwrite built-in functions. Since we decided to support this, introduce a new keyword is not a big deal IMO. Best, Kurt On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com> wrote: > Hi, > > It is a quite long discussion to follow and I hope I didn’t misunderstand > anything. From the proposals presented by Xuefu I would vote: > > -1 for #1 and #2 > +1 for #3 > > Besides #3 being IMO more general and more consistent, having qualified > names (#3) would help/make easier for someone to use cross > databases/catalogs queries (joining multiple data sets/streams). For > example with some functions to manipulate/clean up/convert the stored data > in different catalogs registered in the respective catalogs. > > Piotrek > > > On 19 Sep 2019, at 06:35, Jark Wu <imj...@gmail.com> wrote: > > > > I agree with Xuefu that inconsistent handling with all the other objects > is > > not a big problem. > > > > Regarding to option#3, the special "system.system" namespace may confuse > > users. > > Users need to know the set of built-in function names to know when to use > > "system.system" namespace. > > What will happen if user registers a non-builtin function name under the > > "system.system" namespace? > > Besides, I think it doesn't solve the "explode" problem I mentioned at > the > > beginning of this thread. > > > > So here is my vote: > > > > +1 for #1 > > 0 for #2 > > -1 for #3 > > > > Best, > > Jark > > > > > > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <usxu...@gmail.com> wrote: > > > >> @Dawid, Re: we also don't need additional referencing the specialcatalog > >> anywhere. > >> > >> True. But once we allow such reference, then user can do so in any > possible > >> place where a function name is expected, for which we have to handle. > >> That's a big difference, I think. > >> > >> Thanks, > >> Xuefu > >> > >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz < > >> wysakowicz.da...@gmail.com> > >> wrote: > >> > >>> @Bowen I am not suggesting introducing additional catalog. I think we > >> need > >>> to get rid of the current built-in catalog. > >>> > >>> @Xuefu in option #3 we also don't need additional referencing the > special > >>> catalog anywhere else besides in the CREATE statement. The resolution > >>> behaviour is exactly the same in both options. > >>> > >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <usxu...@gmail.com> wrote: > >>> > >>>> Hi Dawid, > >>>> > >>>> "GLOBAL" is a temporary keyword that was given to the approach. It can > >> be > >>>> changed to something else for better. > >>>> > >>>> The difference between this and the #3 approach is that we only need > >> the > >>>> keyword for this create DDL. For other places (such as function > >>>> referencing), no keyword or special namespace is needed. > >>>> > >>>> Thanks, > >>>> Xuefu > >>>> > >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz < > >>>> wysakowicz.da...@gmail.com> > >>>> wrote: > >>>> > >>>>> Hi, > >>>>> I think it makes sense to start voting at this point. > >>>>> > >>>>> Option 1: Only 1-part identifiers > >>>>> PROS: > >>>>> - allows shadowing built-in functions > >>>>> CONS: > >>>>> - incosistent with all the other objects, both permanent & temporary > >>>>> - does not allow shadowing catalog functions > >>>>> > >>>>> Option 2: Special keyword for built-in function > >>>>> I think this is quite similar to the special catalog/db. The thing I > >> am > >>>>> strongly against in this proposal is the GLOBAL keyword. This keyword > >>>> has a > >>>>> meaning in rdbms systems and means a function that is present for a > >>>>> lifetime of a session in which it was created, but available in all > >>> other > >>>>> sessions. Therefore I really don't want to use this keyword in a > >>>> different > >>>>> context. > >>>>> > >>>>> Option 3: Special catalog/db > >>>>> > >>>>> PROS: > >>>>> - allows shadowing built-in functions > >>>>> - allows shadowing catalog functions > >>>>> - consistent with other objects > >>>>> CONS: > >>>>> - we introduce a special namespace for built-in functions > >>>>> > >>>>> I don't see a problem with introducing the special namespace. In the > >>> end > >>>> it > >>>>> is very similar to the keyword approach. In this case the catalog/db > >>>>> combination would be the "keyword" > >>>>> > >>>>> Therefore my votes: > >>>>> Option 1: -0 > >>>>> Option 2: -1 (I might change to +0 if we can come up with a better > >>>> keyword) > >>>>> Option 3: +1 > >>>>> > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> > >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <usxu...@gmail.com> wrote: > >>>>> > >>>>>> Hi Aljoscha, > >>>>>> > >>>>>> Thanks for the summary and these are great questions to be > >> answered. > >>>> The > >>>>>> answer to your first question is clear: there is a general > >> agreement > >>> to > >>>>>> override built-in functions with temp functions. > >>>>>> > >>>>>> However, your second and third questions are sort of related, as a > >>>>> function > >>>>>> reference can be either just function name (like "func") or in the > >>> form > >>>>> or > >>>>>> "cat.db.func". When a reference is just function name, it can mean > >>>>> either a > >>>>>> built-in function or a function defined in the current cat/db. If > >> we > >>>>>> support overriding a built-in function with a temp function, such > >>>>>> overriding can also cover a function in the current cat/db. > >>>>>> > >>>>>> I think what Timo referred as "overriding a catalog function" > >> means a > >>>>> temp > >>>>>> function defined as "cat.db.func" overrides a catalog function > >> "func" > >>>> in > >>>>>> cat/db even if cat/db is not current. To support this, temp > >> function > >>>> has > >>>>> to > >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd > >>>>> questions > >>>>>> are related. The problem with such support is the ambiguity when > >> user > >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func > >>> ...". > >>>>>> Here "func" can means a global temp function, or a temp function in > >>>>> current > >>>>>> cat/db. If we can assume the former, this creates an inconsistency > >>>>> because > >>>>>> "CREATE FUNCTION func" actually means a function in current cat/db. > >>> If > >>>> we > >>>>>> assume the latter, then there is no way for user to create a global > >>>> temp > >>>>>> function. > >>>>>> > >>>>>> Giving a special namespace for built-in functions may solve the > >>>> ambiguity > >>>>>> problem above, but it also introduces artificial catalog/database > >>> that > >>>>>> needs special treatment and pollutes the cleanness of the code. I > >>>> would > >>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE > >>>>>> [GLOBAL] TEMPORARY FUNCTION func". > >>>>>> > >>>>>> Thus, I'd like to summarize a few candidate proposals for voting > >>>>> purposes: > >>>>>> > >>>>>> 1. Support only global, temporary functions without namespace. Such > >>>> temp > >>>>>> functions overrides built-in functions and catalog functions in > >>> current > >>>>>> cat/db. The resolution order is: temp functions -> built-in > >> functions > >>>> -> > >>>>>> catalog functions. (Partially or fully qualified functions has no > >>>>>> ambiguity!) > >>>>>> > >>>>>> 2. In addition to #1, support creating and referencing temporary > >>>>> functions > >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global > >>> temp > >>>>>> functions. The resolution order is: global temp functions -> > >> built-in > >>>>>> functions -> temp functions in current cat/db -> catalog function. > >>>>>> (Resolution for partially or fully qualified function reference is: > >>>> temp > >>>>>> functions -> persistent functions.) > >>>>>> > >>>>>> 3. In addition to #1, support creating and referencing temporary > >>>>> functions > >>>>>> associated with a cat/db with a special namespace for built-in > >>>> functions > >>>>>> and global temp functions. The resolution is the same as #2, except > >>>> that > >>>>>> the special namespace might be prefixed to a reference to a > >> built-in > >>>>>> function or global temp function. (In absence of the special > >>> namespace, > >>>>> the > >>>>>> resolution order is the same as in #2.) > >>>>>> > >>>>>> My personal preference is #1, given the unknown use case and > >>> introduced > >>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative. > >>>> Thus, > >>>>>> my votes are: > >>>>>> > >>>>>> +1 for #1 > >>>>>> +0 for #2 > >>>>>> -1 for #3 > >>>>>> > >>>>>> Everyone, please cast your vote (in above format please!), or let > >> me > >>>> know > >>>>>> if you have more questions or other candidates. > >>>>>> > >>>>>> Thanks, > >>>>>> Xuefu > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek < > >>> aljos...@apache.org> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I think this discussion and the one for FLIP-64 are very > >> connected. > >>>> To > >>>>>>> resolve the differences, think we have to think about the basic > >>>>>> principles > >>>>>>> and find consensus there. The basic questions I see are: > >>>>>>> > >>>>>>> - Do we want to support overriding builtin functions? > >>>>>>> - Do we want to support overriding catalog functions? > >>>>>>> - And then later: should temporary functions be tied to a > >>>>>>> catalog/database? > >>>>>>> > >>>>>>> I don’t have much to say about these, except that we should > >>> somewhat > >>>>>> stick > >>>>>>> to what the industry does. But I also understand that the > >> industry > >>> is > >>>>>>> already very divided on this. > >>>>>>> > >>>>>>> Best, > >>>>>>> Aljoscha > >>>>>>> > >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <imj...@gmail.com> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We > >>> are > >>>>>>> close to the truth. It will waste a lot of time if we resume the > >>>> topic > >>>>>> some > >>>>>>> time later. > >>>>>>>> > >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s > >>> “cat.db.fun” > >>>>> way > >>>>>>> to override a catalog function. > >>>>>>>> > >>>>>>>> I’m not sure about “system.system.fun”, it introduces a > >>> nonexistent > >>>>> cat > >>>>>>> & db? And we still need to do special treatment for the dedicated > >>>>>>> system.system cat & db? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> > >>>>>>>> > >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <twal...@apache.org> 写道: > >>>>>>>>> > >>>>>>>>> Hi everyone, > >>>>>>>>> > >>>>>>>>> @Xuefu: I would like to avoid adding too many things > >>>> incrementally. > >>>>>>> Users should be able to override all catalog objects consistently > >>>>>> according > >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If > >>>>> functions > >>>>>>> are treated completely different, we need more code and special > >>>> cases. > >>>>>> From > >>>>>>> an implementation perspective, this topic only affects the lookup > >>>> logic > >>>>>>> which is rather low implementation effort which is why I would > >> like > >>>> to > >>>>>>> clarify the remaining items. As you said, we have a slight > >> consenus > >>>> on > >>>>>>> overriding built-in functions; we should also strive for reaching > >>>>>> consensus > >>>>>>> on the remaining topics. > >>>>>>>>> > >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog > >>> objects > >>>>>>> consistent and the overriding of built-in functions more > >> explicit. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 17.09.19 11:59, kai wang wrote: > >>>>>>>>>> hi, everyone > >>>>>>>>>> I think this flip is very meaningful. it supports functions > >>> that > >>>>> can > >>>>>> be > >>>>>>>>>> shared by different catalogs and dbs, reducing the > >> duplication > >>> of > >>>>>>> functions. > >>>>>>>>>> > >>>>>>>>>> Our group based on flink's sql parser module implements > >> create > >>>>>> function > >>>>>>>>>> feature, stores the parsed function metadata and schema into > >>>> mysql, > >>>>>> and > >>>>>>>>>> also customizes the catalog, customizes sql-client to support > >>>>> custom > >>>>>>>>>> schemas and functions. Loaded, but the function is currently > >>>>> global, > >>>>>>> and is > >>>>>>>>>> not subdivided according to catalog and db. > >>>>>>>>>> > >>>>>>>>>> In addition, I very much hope to participate in the > >> development > >>>> of > >>>>>> this > >>>>>>>>>> flip, I have been paying attention to the community, but > >> found > >>> it > >>>>> is > >>>>>>> more > >>>>>>>>>> difficult to join. > >>>>>>>>>> thank you. > >>>>>>>>>> > >>>>>>>>>> Xuefu Z <usxu...@gmail.com> 于2019年9月17日周二 上午11:19写道: > >>>>>>>>>> > >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts. > >>>>>>>>>>> > >>>>>>>>>>> It seems to me that there is a general consensus on having > >>> temp > >>>>>>> functions > >>>>>>>>>>> that have no namespaces and overwrite built-in functions. > >> (As > >>> a > >>>>> side > >>>>>>> note > >>>>>>>>>>> for comparability, the current user defined functions are > >> all > >>>>>>> temporary and > >>>>>>>>>>> having no namespaces.) > >>>>>>>>>>> > >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced > >>> temp > >>>>>>> functions > >>>>>>>>>>> that can overwrite functions defined in a specific cat/db. > >>>>> However, > >>>>>>> this > >>>>>>>>>>> idea appears orthogonal to the former and can be added > >>>>>> incrementally. > >>>>>>>>>>> > >>>>>>>>>>> How about we first implement non-namespaced temp functions > >> now > >>>> and > >>>>>>> leave > >>>>>>>>>>> the door open for namespaced ones for later releases as the > >>>>>>> requirement > >>>>>>>>>>> might become more crystal? This also helps shorten the > >> debate > >>>> and > >>>>>>> allow us > >>>>>>>>>>> to make some progress along this direction. > >>>>>>>>>>> > >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the > >>>>>> temporary > >>>>>>> temp > >>>>>>>>>>> functions that don't have namespaces, my only concern is the > >>>>> special > >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as > >>> evident > >>>> in > >>>>>>> treating > >>>>>>>>>>> the built-in catalog currently. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Xuefiu > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz < > >>>>>>>>>>> wysakowicz.da...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How > >>> about > >>>>> we > >>>>>>> have a > >>>>>>>>>>>> special namespace (catalog + database) for built-in > >> objects? > >>>> This > >>>>>>> catalog > >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting. > >>>>>>>>>>>> > >>>>>>>>>>>> Then users could still override built-in functions, if they > >>>> fully > >>>>>>> qualify > >>>>>>>>>>>> object with the built-in namespace, but by default the > >> common > >>>>> logic > >>>>>>> of > >>>>>>>>>>>> current dB & cat would be used. > >>>>>>>>>>>> > >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ... > >>>>>>>>>>>> registers temporary function in current cat & dB > >>>>>>>>>>>> > >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ... > >>>>>>>>>>>> registers temporary function in cat db > >>>>>>>>>>>> > >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ... > >>>>>>>>>>>> Overrides built-in function with temporary function > >>>>>>>>>>>> > >>>>>>>>>>>> The built-in/system namespace would not be writable for > >>>> permanent > >>>>>>>>>>> objects. > >>>>>>>>>>>> WDYT? > >>>>>>>>>>>> > >>>>>>>>>>>> This way I think we can have benefits of both solutions. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Dawid > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, < > >> twal...@apache.org > >>>> > >>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi Bowen, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I understand the potential benefit of overriding certain > >>>>> built-in > >>>>>>>>>>>>> functions. I'm open to such a feature if many people > >> agree. > >>>>>>> However, it > >>>>>>>>>>>>> would be great to still support overriding catalog > >> functions > >>>>> with > >>>>>>>>>>>>> temporary functions in order to prototype a query even > >>> though > >>>> a > >>>>>>>>>>>>> catalog/database might not be available currently or > >> should > >>>> not > >>>>> be > >>>>>>>>>>>>> modified yet. How about we support both cases? > >>>>>>>>>>>>> > >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs > >>>>>>>>>>>>> -> creates/overrides a built-in function and never > >>> consideres > >>>>>>> current > >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but > >>>> acceptable > >>>>>> for > >>>>>>>>>>>>> functions I guess. > >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun > >>>>>>>>>>>>> -> creates/overrides a catalog function > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects > >>>> (tables, > >>>>>>> views) > >>>>>>>>>>>>> except functions", this might change in the near future. > >>> Take > >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an > >>>>> example. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Timo > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote: > >>>>>>>>>>>>>> Hi Fabian, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable > >>> thus I > >>>>>>> didn't > >>>>>>>>>>>>>> include that as a voting option, and the discussion is > >>> mainly > >>>>>>> between > >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Re > However, it means that temp functions are > >> differently > >>>>>> treated > >>>>>>>>>>> than > >>>>>>>>>>>>>> other db objects. > >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that > >>>>>> functions > >>>>>>>>>>> are > >>>>>>>>>>>> a > >>>>>>>>>>>>>> bit different from other objects - Flink don't have any > >>> other > >>>>>>>>>>> built-in > >>>>>>>>>>>>>> objects (tables, views) except functions. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> Bowen > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Xuefu Zhang > >>>>>>>>>>> > >>>>>>>>>>> "In Honey We Trust!" > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> Xuefu Zhang > >>>>>> > >>>>>> "In Honey We Trust!" > >>>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Xuefu Zhang > >>>> > >>>> "In Honey We Trust!" > >>>> > >>> > >> > >> > >> -- > >> Xuefu Zhang > >> > >> "In Honey We Trust!" > >> > >