Hi, I think above are some valid points, and we can adopt the suggestions.
To elaborate a bit on the new SQL syntax, it would imply that, unlike "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] FUNCTIONS" would return functions' fully qualified names with catalog and db names. On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <twal...@apache.org> wrote: > Hi all, > > I support Fabian's arguments. In my opinion, temporary objects should > just be an additional layer on top of the regular catalog/database > lookup logic. Thus, a temporary table or function has always highest > precedence and should be stable within the local session. Otherwise it > could magically disappear while someone else is performing modifications > in the catalog. > > Furthermore, this feature is very useful for prototyping as users can > simply express that a catalog/database is present even through they > might not have access to it currently. > > Regards, > Timo > > > On 30.09.19 14:57, Fabian Hueske wrote: > > Hi all, > > > > Sorry for the late reply. > > > > I think it might lead to confusing situations if temporary functions (or > > any temporary db objects for that matter) are bound to the life cycle of > an > > (external) db/catalog. > > Imaging a situation where you create a temp function in a db in an > external > > catalog and use it but at some point it does not work anymore because > some > > other dropped the database from the external catalog. > > Shouldn't temporary objects be only controlled by the owner of a session? > > > > I agree that creating temp objects in non-existing db/catalogs sounds a > bit > > strange, but IMO the opposite (the db/catalog must exist for a temp > > function to be created/exist) can have significant implications like the > > one I described. > > I think it would be quite easy for users to understand that temporary > > objects are solely owned by them (and their session). > > The problem of listing temporary objects could be solved by adding a ALL > > [TEMPORARY] clause: > > > > SHOW ALL FUNCTIONS; could show all functions regardless of the > > catalog/database including temporary functions. > > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > regardless > > of the catalog/database. > > > > Best, > > Fabian > > > > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > bowenl...@gmail.com>: > > > >> @Dawid, do you have any other concerns? If not, I hope we can close the > >> voting. > >> > >> > >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <lirui.fu...@gmail.com> wrote: > >> > >>> I'm not sure how much benefit #1 can bring us. If users just want to > try > >>> out temporary functions, they can create temporary system functions > which > >>> don't require a catalog/DB. IIUC, the main reason why we allow > temporary > >>> catalog function is to let users override permanent catalog functions. > >>> Therefore a temporary function in a non-existing catalog won't serve > that > >>> purpose. Besides, each session is provided with a default catalog and > DB. > >>> So even if users simply want to create some catalog functions they can > >>> forget about after the session, wouldn't the default catalog/DB be > enough > >>> for such experiments? > >>> > >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <bowenl...@gmail.com> wrote: > >>> > >>>> Re 1) As described in the FLIP, a temp function lookup will first make > >>> sure > >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to > >>> remove > >>>> that temp function. > >>>> > >>>> I agree Hive doesn't handle it consistently, and we are not copying > >> Hive. > >>>> IMHO, allowing registering temp functions in nonexistent catalog/db is > >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list > system > >>>> functions and functions in the current catalog/db, since users cannot > >>>> designate a nonexistent catalog/db as current ones, how can they list > >>>> functions in nonexistent catalog/db? They may end up never knowing > what > >>>> temp functions they've created unless trying out with queries or we > >>>> introducing some more nonstandard SQL statements. The same applies to > >>> other > >>>> temp objects like temp tables. > >>>> > >>>> Re 2) A standalone FunctionIdentifier sounds good to me > >>>> > >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > >>>> wysakowicz.da...@gmail.com> > >>>> wrote: > >>>> > >>>>> Ad. 1 > >>>>> I wouldn't say it is hacky. > >>>>> Moreover, how do you want ensure that the dB always exists when a > >>>> temporary > >>>>> object is used?( in this particular case function). Do you want to > >>> query > >>>>> for the database existence whenever e.g a temporary function is > >> used? I > >>>>> think important aspect here is that the database can be dropped from > >>>>> external system, not just flink or a different flink session. > >>>>> > >>>>> E.g in case of hive, you cannot create a temporary table in a > >> database > >>>> that > >>>>> does not exist, that's true. But if you create a temporary table in a > >>>>> database and drop that database from a different session, you can > >> still > >>>>> query the previously created temporary table from the original > >> session. > >>>> It > >>>>> does not sound like a consistent behaviour to me. Why don't we make > >>> this > >>>>> behaviour of not binding a temporary objects to the lifetime of a > >>>> database > >>>>> explicit as part of the temporary objects contract? In the end they > >>> exist > >>>>> in different layers. Permanent objects & databases in a catalog (in > >>> case > >>>> of > >>>>> hive megastore) whereas temporary objects in flink sessions. That's > >>> also > >>>>> true for the original hive client. The temporary objects live in the > >>> hive > >>>>> client whereas databases are created in the metastore. > >>>>> > >>>>> Ad.2 > >>>>> I'm open for suggestions here. The one thing I wanted to achieve here > >>> is > >>>> so > >>>>> that we do not change the contract of ObjectIdentifier. One important > >>>> thing > >>>>> to remember here is that we need the function identifier to be part > >> of > >>>> the > >>>>> FunctionDefinition object and not only as the result of the function > >>>>> lookup. At some point we want to be able to store QueryOperations in > >>> the > >>>>> catalogs. They can contain function calls within which we need to > >> have > >>>> the > >>>>> identifier. > >>>>> > >>>>> I agree my initial suggestion is over complicated. How about we have > >>> just > >>>>> the FunctionIdentifier as top level class without making the > >>>>> ObjectIdentifier extend from it? I think it's pretty much the same > >> what > >>>> you > >>>>> suggested. The only difference is that it would be a top level class > >>>> with a > >>>>> more descriptive name. > >>>>> > >>>>> > >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <bowenl...@gmail.com> wrote: > >>>>> > >>>>>> Sorry, I missed some parts of the solution. The complete > >> alternative > >>> is > >>>>> the > >>>>>> following, basically having separate APIs in FunctionLookup for > >>>> ambiguous > >>>>>> and precise function lookup since planner is able to tell which API > >>> to > >>>>> call > >>>>>> with parsed queries, and have a unified result: > >>>>>> > >>>>>> ``` > >>>>>> class FunctionLookup { > >>>>>> > >>>>>> Optional<Result> lookupAmbiguousFunction(String name); > >>>>>> > >>>>>> > >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > >>>>>> > >>>>>> > >>>>>> class Result { > >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >>>>>> String getName() { ... } > >>>>>> // ... > >>>>>> } > >>>>>> > >>>>>> } > >>>>>> ``` > >>>>>> > >>>>>> Thanks, > >>>>>> Bowen > >>>>>> > >>>>>> > >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <bowenl...@gmail.com> > >>> wrote: > >>>>>>> Hi Dawid, > >>>>>>> > >>>>>>> Re 1): I agree making it easy for users to run experiments is > >>>>> important. > >>>>>>> However, I'm not sure allowing users to register temp functions > >> in > >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky, > >>> and > >>>>>> breaks > >>>>>>> the current contract between Flink and users that catalog/db must > >>> be > >>>>>> valid > >>>>>>> in order to operate on. > >>>>>>> > >>>>>>> How about we instead focus on making it convenient to create > >>>> catalogs? > >>>>>>> Users actually can already do it with ease via program or SQL CLI > >>>> yaml > >>>>>> file > >>>>>>> for an in-memory catalog which has neither extra dependency nor > >>>>> external > >>>>>>> connections. What we can further improve is DDL for catalogs, > >> and I > >>>>>> raised > >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven > >> by > >>>>> Terry > >>>>>>> now. > >>>>>>> > >>>>>>> In that case, if users would like to experiment via SQL, they can > >>>>> easily > >>>>>>> create an in memory catalog/database using DDL, then play with > >> temp > >>>>>>> functions. > >>>>>>> > >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has > >> not > >>>>> been > >>>>>>> resolved when stack call reaches > >> FunctionCatalog#lookupFunction(), > >>>> but > >>>>>> only > >>>>>>> been parsed? > >>>>>>> > >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with > >>> the > >>>>>>> suggested classes, though making ObjectIdentifier a subclass of > >>>>>>> FunctionIdentifier seem a bit counter intuitive. > >>>>>>> > >>>>>>> Another potentially simpler way is: > >>>>>>> > >>>>>>> ``` > >>>>>>> // in class FunctionLookup > >>>>>>> class Result { > >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >>>>>>> String getName() { ... } > >>>>>>> ... > >>>>>>> } > >>>>>>> ``` > >>>>>>> > >>>>>>> WDYT? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > >>>>>>> wysakowicz.da...@gmail.com> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> I really like the flip and think it clarifies important aspects > >> of > >>>> the > >>>>>>>> system. > >>>>>>>> > >>>>>>>> I have two, I hope small suggestions, which will not take much > >>> time > >>>> to > >>>>>>>> agree on. > >>>>>>>> > >>>>>>>> 1. Could we follow the MySQL approach in regards to the > >> existence > >>> of > >>>>>>>> cat/db > >>>>>>>> for temporary functions? That means not to check it, so e.g. > >> it's > >>>>>> possible > >>>>>>>> to create a temporary function in a database that does not > >> exist. > >>> I > >>>>>> think > >>>>>>>> it's really useful e.g in cases when user wants to perform > >>>> experiments > >>>>>> but > >>>>>>>> does not have access to the db yet or temporarily does not have > >>>>>> connection > >>>>>>>> to a catalog. > >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen > >>> the > >>>>>>>> requirements for all catalog objects such as tables, views, > >> types > >>>> just > >>>>>> for > >>>>>>>> the functions? It's really important later on from e.g the > >>>>>> serializability > >>>>>>>> perspective. The important aspect of the ObjectIdentifier is > >> that > >>> we > >>>>>> know > >>>>>>>> that it has been resolved. The suggested changes break that > >>>>> assumption. > >>>>>>>> What do you think about adding an interface FunctionIdentifier { > >>>>>>>> > >>>>>>>> String getName(); > >>>>>>>> > >>>>>>>> /** > >>>>>>>> Return 3-part identifier. Empty in case of a built-in > >> function. > >>>>>>>> */ > >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > >>>>>>>> } > >>>>>>>> > >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >>>>>>>> return Optional.of(this); > >>>>>>>> } > >>>>>>>> } > >>>>>>>> > >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier > >> {...} > >>>>>>>> WDYT? > >>>>>>>> > >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <usxu...@gmail.com> wrote: > >>>>>>>> > >>>>>>>>> +1. LGTM > >>>>>>>>> > >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > >> zjuwa...@gmail.com> > >>>>>> wrote: > >>>>>>>>>> +1 > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Terry Wang > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <ykt...@gmail.com> 写道: > >>>>>>>>>>> > >>>>>>>>>>> +1 > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Kurt > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > >>> bowenl...@gmail.com > >>>>>>>> wrote: > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which > >>>> we've > >>>>>>>> reached > >>>>>>>>>>>> consensus in [2]. > >>>>>>>>>>>> > >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > >>> UTC, > >>>>> Sep > >>>>>>>> 26. > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Bowen > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>>> > >>>>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >>>>>>>>>>>> [2] > >>>>>>>>>>>> > >>>>>>>>>>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >>>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Xuefu Zhang > >>>>>>>>> > >>>>>>>>> "In Honey We Trust!" > >>>>>>>>> > >>> > >>> -- > >>> Best regards! > >>> Rui Li > >>> > >