I'm not sure how much benefit #1 can bring us. If users just want to try out temporary functions, they can create temporary system functions which don't require a catalog/DB. IIUC, the main reason why we allow temporary catalog function is to let users override permanent catalog functions. Therefore a temporary function in a non-existing catalog won't serve that purpose. Besides, each session is provided with a default catalog and DB. So even if users simply want to create some catalog functions they can forget about after the session, wouldn't the default catalog/DB be enough for such experiments?
On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <bowenl...@gmail.com> wrote: > Re 1) As described in the FLIP, a temp function lookup will first make sure > the db exists. If the db doesn't exist, a lazy drop is triggered to remove > that temp function. > > I agree Hive doesn't handle it consistently, and we are not copying Hive. > > IMHO, allowing registering temp functions in nonexistent catalog/db is > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system > functions and functions in the current catalog/db, since users cannot > designate a nonexistent catalog/db as current ones, how can they list > functions in nonexistent catalog/db? They may end up never knowing what > temp functions they've created unless trying out with queries or we > introducing some more nonstandard SQL statements. The same applies to other > temp objects like temp tables. > > Re 2) A standalone FunctionIdentifier sounds good to me > > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > wysakowicz.da...@gmail.com> > wrote: > > > Ad. 1 > > I wouldn't say it is hacky. > > Moreover, how do you want ensure that the dB always exists when a > temporary > > object is used?( in this particular case function). Do you want to query > > for the database existence whenever e.g a temporary function is used? I > > think important aspect here is that the database can be dropped from > > external system, not just flink or a different flink session. > > > > E.g in case of hive, you cannot create a temporary table in a database > that > > does not exist, that's true. But if you create a temporary table in a > > database and drop that database from a different session, you can still > > query the previously created temporary table from the original session. > It > > does not sound like a consistent behaviour to me. Why don't we make this > > behaviour of not binding a temporary objects to the lifetime of a > database > > explicit as part of the temporary objects contract? In the end they exist > > in different layers. Permanent objects & databases in a catalog (in case > of > > hive megastore) whereas temporary objects in flink sessions. That's also > > true for the original hive client. The temporary objects live in the hive > > client whereas databases are created in the metastore. > > > > Ad.2 > > I'm open for suggestions here. The one thing I wanted to achieve here is > so > > that we do not change the contract of ObjectIdentifier. One important > thing > > to remember here is that we need the function identifier to be part of > the > > FunctionDefinition object and not only as the result of the function > > lookup. At some point we want to be able to store QueryOperations in the > > catalogs. They can contain function calls within which we need to have > the > > identifier. > > > > I agree my initial suggestion is over complicated. How about we have just > > the FunctionIdentifier as top level class without making the > > ObjectIdentifier extend from it? I think it's pretty much the same what > you > > suggested. The only difference is that it would be a top level class > with a > > more descriptive name. > > > > > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <bowenl...@gmail.com> wrote: > > > > > Sorry, I missed some parts of the solution. The complete alternative is > > the > > > following, basically having separate APIs in FunctionLookup for > ambiguous > > > and precise function lookup since planner is able to tell which API to > > call > > > with parsed queries, and have a unified result: > > > > > > ``` > > > class FunctionLookup { > > > > > > Optional<Result> lookupAmbiguousFunction(String name); > > > > > > > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > > > > > > > class Result { > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > String getName() { ... } > > > // ... > > > } > > > > > > } > > > ``` > > > > > > Thanks, > > > Bowen > > > > > > > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <bowenl...@gmail.com> wrote: > > > > > > > Hi Dawid, > > > > > > > > Re 1): I agree making it easy for users to run experiments is > > important. > > > > However, I'm not sure allowing users to register temp functions in > > > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and > > > breaks > > > > the current contract between Flink and users that catalog/db must be > > > valid > > > > in order to operate on. > > > > > > > > How about we instead focus on making it convenient to create > catalogs? > > > > Users actually can already do it with ease via program or SQL CLI > yaml > > > file > > > > for an in-memory catalog which has neither extra dependency nor > > external > > > > connections. What we can further improve is DDL for catalogs, and I > > > raised > > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by > > Terry > > > > now. > > > > > > > > In that case, if users would like to experiment via SQL, they can > > easily > > > > create an in memory catalog/database using DDL, then play with temp > > > > functions. > > > > > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not > > been > > > > resolved when stack call reaches FunctionCatalog#lookupFunction(), > but > > > only > > > > been parsed? > > > > > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the > > > > suggested classes, though making ObjectIdentifier a subclass of > > > > FunctionIdentifier seem a bit counter intuitive. > > > > > > > > Another potentially simpler way is: > > > > > > > > ``` > > > > // in class FunctionLookup > > > > class Result { > > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > > String getName() { ... } > > > > ... > > > > } > > > > ``` > > > > > > > > WDYT? > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > > > wysakowicz.da...@gmail.com> wrote: > > > > > > > >> Hi, > > > >> I really like the flip and think it clarifies important aspects of > the > > > >> system. > > > >> > > > >> I have two, I hope small suggestions, which will not take much time > to > > > >> agree on. > > > >> > > > >> 1. Could we follow the MySQL approach in regards to the existence of > > > >> cat/db > > > >> for temporary functions? That means not to check it, so e.g. it's > > > possible > > > >> to create a temporary function in a database that does not exist. I > > > think > > > >> it's really useful e.g in cases when user wants to perform > experiments > > > but > > > >> does not have access to the db yet or temporarily does not have > > > connection > > > >> to a catalog. > > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen the > > > >> requirements for all catalog objects such as tables, views, types > just > > > for > > > >> the functions? It's really important later on from e.g the > > > serializability > > > >> perspective. The important aspect of the ObjectIdentifier is that we > > > know > > > >> that it has been resolved. The suggested changes break that > > assumption. > > > >> > > > >> What do you think about adding an interface FunctionIdentifier { > > > >> > > > >> String getName(); > > > >> > > > >> /** > > > >> Return 3-part identifier. Empty in case of a built-in function. > > > >> */ > > > >> Optional<ObjectIdentifier> getObjectIdentifier() > > > >> } > > > >> > > > >> class ObjectIdentifier implements FunctionIdentifier { > > > >> Optional<ObjectIdentifier> getObjectIdentifier() { > > > >> return Optional.of(this); > > > >> } > > > >> } > > > >> > > > >> class SystemFunctionIdentifier implements FunctionIdentifier {...} > > > >> > > > >> WDYT? > > > >> > > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <usxu...@gmail.com> wrote: > > > >> > > > >> > +1. LGTM > > > >> > > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <zjuwa...@gmail.com> > > > wrote: > > > >> > > > > >> > > +1 > > > >> > > > > > >> > > Best, > > > >> > > Terry Wang > > > >> > > > > > >> > > > > > >> > > > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <ykt...@gmail.com> 写道: > > > >> > > > > > > >> > > > +1 > > > >> > > > > > > >> > > > Best, > > > >> > > > Kurt > > > >> > > > > > > >> > > > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <bowenl...@gmail.com > > > > > >> wrote: > > > >> > > > > > > >> > > >> Hi all, > > > >> > > >> > > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which > we've > > > >> reached > > > >> > > >> consensus in [2]. > > > >> > > >> > > > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, > > Sep > > > >> 26. > > > >> > > >> > > > >> > > >> Thanks, > > > >> > > >> Bowen > > > >> > > >> > > > >> > > >> [1] > > > >> > > >> > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > > >> > > >> [2] > > > >> > > >> > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > > >> > > >> > > > >> > > > > > >> > > > > > >> > > > > >> > -- > > > >> > Xuefu Zhang > > > >> > > > > >> > "In Honey We Trust!" > > > >> > > > > >> > > > > > > > > > > -- Best regards! Rui Li