Re: [VOTE] FLIP-57: Rework FunctionCatalog

Timo Walther Mon, 30 Sep 2019 06:38:33 -0700

Hi all,

I support Fabian's arguments. In my opinion, temporary objects shouldjust be an additional layer on top of the regular catalog/databaselookup logic. Thus, a temporary table or function has always highestprecedence and should be stable within the local session. Otherwise itcould magically disappear while someone else is performing modificationsin the catalog.

Furthermore, this feature is very useful for prototyping as users cansimply express that a catalog/database is present even through theymight not have access to it currently.


Regards,
Timo


On 30.09.19 14:57, Fabian Hueske wrote:

Hi all,

Sorry for the late reply.

I think it might lead to confusing situations if temporary functions (or
any temporary db objects for that matter) are bound to the life cycle of an
(external) db/catalog.
Imaging a situation where you create a temp function in a db in an external
catalog and use it but at some point it does not work anymore because some
other dropped the database from the external catalog.
Shouldn't temporary objects be only controlled by the owner of a session?

I agree that creating temp objects in non-existing db/catalogs sounds a bit
strange, but IMO the opposite (the db/catalog must exist for a temp
function to be created/exist) can have significant implications like the
one I described.
I think it would be quite easy for users to understand that temporary
objects are solely owned by them (and their session).
The problem of listing temporary objects could be solved by adding a ALL
[TEMPORARY] clause:

SHOW ALL FUNCTIONS; could show all functions regardless of the
catalog/database including temporary functions.
SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions regardless
of the catalog/database.

Best,
Fabian

Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <[email protected]>:

@Dawid, do you have any other concerns? If not, I hope we can close the
voting.


On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[email protected]> wrote:

I'm not sure how much benefit #1 can bring us. If users just want to try
out temporary functions, they can create temporary system functions which
don't require a catalog/DB. IIUC, the main reason why we allow temporary
catalog function is to let users override permanent catalog functions.
Therefore a temporary function in a non-existing catalog won't serve that
purpose. Besides, each session is provided with a default catalog and DB.
So even if users simply want to create some catalog functions they can
forget about after the session, wouldn't the default catalog/DB be enough
for such experiments?

On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[email protected]> wrote:

Re 1) As described in the FLIP, a temp function lookup will first make

sure

the db exists. If the db doesn't exist, a lazy drop is triggered to

remove

that temp function.

I agree Hive doesn't handle it consistently, and we are not copying

Hive.

IMHO, allowing registering temp functions in nonexistent catalog/db is
hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
functions and functions in the current catalog/db, since users cannot
designate a nonexistent catalog/db as current ones, how can they list
functions in nonexistent catalog/db? They may end up never knowing what
temp functions they've created unless trying out with queries or we
introducing some more nonstandard SQL statements. The same applies to

other

temp objects like temp tables.

Re 2) A standalone FunctionIdentifier sounds good to me

On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
[email protected]>
wrote:

Ad. 1
I wouldn't say it is hacky.
Moreover, how do you want ensure that the dB always exists when a

temporary

object is used?( in this particular case function). Do you want to

query

for the database existence whenever e.g a temporary function is

used? I

think important aspect here is that the database can be dropped from
external system, not just flink or a different flink session.

E.g in case of hive, you cannot create a temporary table in a

database

that

does not exist, that's true. But if you create a temporary table in a
database and drop that database from a different session, you can

still

query the previously created temporary table from the original

session.

It

does not sound like a consistent behaviour to me. Why don't we make

this

behaviour of not binding a temporary objects to the lifetime of a

database

explicit as part of the temporary objects contract? In the end they

exist

in different layers. Permanent objects & databases in a catalog (in

case

of

hive megastore) whereas temporary objects in flink sessions. That's

also

true for the original hive client. The temporary objects live in the

hive

client whereas databases are created in the metastore.

Ad.2
I'm open for suggestions here. The one thing I wanted to achieve here

is

so

that we do not change the contract of ObjectIdentifier. One important

thing

to remember here is that we need the function identifier to be part

of

the

FunctionDefinition object and not only as the result of the function
lookup. At some point we want to be able to store QueryOperations in

the

catalogs. They can contain function calls within which we need to

have

the

identifier.

I agree my initial suggestion is over complicated. How about we have

just

the FunctionIdentifier as top level class without making the
ObjectIdentifier extend from it? I think it's pretty much the same

what

you

suggested. The only difference is that it would be a top level class

with a

more descriptive name.


On Wed, 25 Sep 2019, 13:57 Bowen Li, <[email protected]> wrote:

Sorry, I missed some parts of the solution. The complete

alternative

is

the

following, basically having separate APIs in FunctionLookup for

ambiguous

and precise function lookup since planner is able to tell which API

to

call

with parsed queries, and have a unified result:

```
class FunctionLookup {

Optional<Result> lookupAmbiguousFunction(String name);


Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);


class Result {
     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
     String getName() { ... }
     // ...
}

}
```

Thanks,
Bowen


On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[email protected]>

wrote:

Hi Dawid,

Re 1): I agree making it easy for users to run experiments is

important.

However, I'm not sure allowing users to register temp functions

in

nonexistent catalog/db is the optimal way. It seems a bit hacky,

and

breaks

the current contract between Flink and users that catalog/db must

be

valid

in order to operate on.

How about we instead focus on making it convenient to create

catalogs?

Users actually can already do it with ease via program or SQL CLI

yaml

file

for an in-memory catalog which has neither extra dependency nor

external

connections. What we can further improve is DDL for catalogs,

and I

raised

it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven

by

Terry

now.

In that case, if users would like to experiment via SQL, they can

easily

create an in memory catalog/database using DDL, then play with

temp

functions.

Re 2): For the assumption, IIUIC, function ObjectIdentifier has

not

been

resolved when stack call reaches

FunctionCatalog#lookupFunction(),

but

only

been parsed?

I agree keeping ObjectIdentifier as-is would be good. I'm ok with

the

suggested classes, though making ObjectIdentifier a subclass of
FunctionIdentifier seem a bit counter intuitive.

Another potentially simpler way is:

```
// in class FunctionLookup
class Result {
     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
     String getName() { ... }
     ...
}
```

WDYT?



On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
[email protected]> wrote:

Hi,
I really like the flip and think it clarifies important aspects

of

the

system.

I have two, I hope small suggestions, which will not take much

time

to

agree on.

1. Could we follow the MySQL approach in regards to the

existence

of

cat/db
for temporary functions? That means not to check it, so e.g.

it's

possible

to create a temporary function in a database that does not

exist.

think

it's really useful e.g in cases when user wants to perform

experiments

but

does not have access to the db yet or temporarily does not have

connection

to a catalog.
2. Could we not change the ObjectIdentifier? Could we not loosen

the

requirements for all catalog objects such as tables, views,

types

just

for

the functions? It's really important later on from e.g the

serializability

perspective. The important aspect of the ObjectIdentifier is

that

we

know

that it has been resolved. The suggested changes break that

assumption.

What do you think about adding an interface FunctionIdentifier {

String getName();

/**
   Return 3-part identifier. Empty in case of a built-in

function.

*/
Optional<ObjectIdentifier> getObjectIdentifier()
}

class ObjectIdentifier implements FunctionIdentifier {
Optional<ObjectIdentifier> getObjectIdentifier() {
  return Optional.of(this);
}
}

class SystemFunctionIdentifier implements FunctionIdentifier

{...}

WDYT?

On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[email protected]> wrote:

+1. LGTM

On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <

[email protected]>

wrote:

+1

Best,
Terry Wang

在 2019年9月24日，上午10:42，Kurt Young <[email protected]> 写道：

+1

Best,
Kurt


On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <

[email protected]

wrote:

Hi all,

I'd like to start a voting thread for FLIP-57 [1], which

we've

reached

consensus in [2].

This voting will be open for minimum 3 days till 6:30pm

UTC,

Sep

26.

Thanks,
Bowen

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog

[2]

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613

--
Xuefu Zhang

"In Honey We Trust!"


--
Best regards!
Rui Li

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Reply via email to