Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Xuefu Z Wed, 04 Sep 2019 14:26:06 -0700

Thank all for the sharing thoughts. I think we have gathered some useful
initial feedback from this long discussion with a couple of focal points
sticking out.


 We will go back to do more research and adapt our proposal. Once it's
ready, we will ask for a new round of review. If there is any disagreement,
we will start a new discussion thread on each rather than having a mega
discussion like this.

Thanks to everyone for participating.

Regards,
Xuefu


On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <[email protected]> wrote:

> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
> However, I'd rather not change fundamentals of temporary functions in this
> FLIP. It belongs to a bigger story of how temporary objects should be
> redefined and be handled uniformly - currently temporary tables and views
> (those registered from TableEnv#registerTable()) behave different than what
> Dawid propose for temp functions, and we need a FLIP to just unify their
> APIs and behaviors.
>
> I agree that backward compatibility is not an issue w.r.t Jark's points.
>
> ***Seems we do have consensus that it's acceptable to prevent users
> registering a temp function in the same name as a built-in function. To
> help us move forward, I'd like to propose setting such a restraint on temp
> functions in this FLIP to simplify the design and avoid disputes.*** It
> will also leave rooms for improvements in the future.
>
>
> 2. For Hive built-in function:
>
> Thanks Timo for providing the Presto and Postgres examples. I feel modular
> built-in functions can be a good fit for the geo and ml example as a native
> Flink extension, but not sure if it fits well with external integrations.
> Anyway, I think modular built-in functions is a bigger story and can be on
> its own thread too, and our proposal doesn't prevent Flink from doing that
> in the future.
>
> ***Seems we have consensus that users should be able to use built-in
> functions of Hive or other external systems in SQL explicitly and
> deterministically regardless of Flink built-in functions and the potential
> modular built-in functions, via some new syntax like "mycat::func"? If so,
> I'd like to propose removing Hive built-in functions from ambiguous
> function resolution order, and empower users with such a syntax. This way
> we sacrifice a little convenience for certainty***
>
>
> What do you think?
>
> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <[email protected]>
> wrote:
>
> > Hi,
> >
> > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
> are
> > very inconsistent in that manner (spark being way worse on that).
> >
> > Hive:
> >
> > You cannot overwrite all the built-in functions. I could overwrite most
> of
> > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > functions I cannot overwrite e.g. CAST, ARRAY I get:
> >
> >
> > *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
> >
> > What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > overwrite a function. When I drop the temporary function the native
> > function is still available.
> >
> > Spark:
> >
> > Spark's behavior imho is super bad.
> >
> > Theoretically I could overwrite all functions. I was able e.g. to
> > overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> > FUNCTION syntax. Otherwise I get an exception that a function already
> > exists. However when I used the CAST function in a query it used the
> > native, built-in one.
> >
> > When I overwrote current_date() function, it was used in a query, but it
> > completely replaces the built-in function and I can no longer use the
> > native function in any way. I cannot also drop the temporary function. I
> > get:
> >
> > *    Error in query: Cannot drop native function 'current_date';*
> >
> > Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> > with a database. Temporary functions are always represented as a single
> > name.
> >
> > In my opinion neither of the systems have consistent behavior. Generally
> > speaking I think overwriting any system provided functions is just
> > dangerous.
> >
> > Regarding Jark's concerns. Such functions would be registered in a
> current
> > catalog/database schema, so a user could still use its own function, but
> > would have to fully qualify the function (because built-in functions take
> > precedence). Moreover users would have the same problem with permanent
> > functions. Imagine a user have a permanent function 'cat.db.explode'. In
> > 1.9 the user could use just the 'explode' function as long as the 'cat' &
> > 'db' were the default catalog & database. If we introduce 'explode'
> > built-in function in 1.10, the user has to fully qualify the function.
> >
> > Best,
> >
> > Dawid
> > On 04/09/2019 15:19, Timo Walther wrote:
> >
> > Hi all,
> >
> > thanks for the healthy discussion. It is already a very long discussion
> > with a lot of text. So I will just post my opinion to a couple of
> > statements:
> >
> > > Hive built-in functions are not part of Flink built-in functions, they
> > are catalog functions
> >
> > That is not entirely true. Correct me if I'm wrong but I think Hive
> > built-in functions are also not catalog functions. They are not stored in
> > every Hive metastore catalog that is freshly created but are a set of
> > functions that are listed somewhere and made available.
> >
> > > ambiguous functions reference just shouldn't be resolved to a different
> > catalog
> >
> > I agree. They should not be resolved to a different catalog. That's why I
> > am suggesting to split the concept of built-in functions and catalog
> lookup
> > semantics.
> >
> > > I don't know if any other databases handle built-in functions like that
> >
> > What I called "module" is:
> > - Extension in Postgres [1]
> > - Plugin in Presto [2]
> >
> > Btw. Presto even mentions example modules that are similar to the ones
> > that we will introduce in the near future both for ML and System XYZ
> > compatibility:
> > "See either the presto-ml module for machine learning functions or the
> > presto-teradata-functions module for Teradata-compatible functions, both
> in
> > the root of the Presto source."
> >
> > > functions should be either built-in already or just libraries
> functions,
> > and library functions can be adapted to catalog APIs or of some other
> > syntax to use
> >
> > Regarding "built-in already", of course we can add a lot of functions as
> > built-ins but we will end-up in a dependency hell in the near future if
> we
> > don't introduce a pluggable approach. Library functions is what you also
> > suggest but storing them in a catalog means to always fully qualify them
> or
> > modifying the existing catalog design that was inspired by the standard.
> >
> > I don't think "it brings in even more complicated scenarios to the
> > design", it just does clear separation of concerns. Integrating the
> > functionality into the current design makes the catalog API more
> > complicated.
> >
> > > why would users name a temporary function the same as a built-in
> > function then?
> >
> > Because you never know what users do. If they don't, my suggested
> > resolution order should not be a problem, right?
> >
> > > I don't think hive functions deserves be a function module
> >
> > Our goal is not to create a Hive clone. We need to think forward and Hive
> > is just one of many systems that we can support. Not every built-in
> > function behaves and will behave exactly like Hive.
> >
> > > regarding temporary functions, there are few systems that support it
> >
> > IMHO Spark and Hive are not always the best examples for consistent
> > design. Systems like Postgres, Presto, or SQL Server should be used as a
> > reference. I don't think that a user can overwrite a built-in function
> > there.
> >
> > Regards,
> > Timo
> >
> > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > [2] https://prestodb.github.io/docs/current/develop/functions.html
> >
> >
> > On 04.09.19 13:44, Jark Wu wrote:
> >
> > Hi all,
> >
> > Regarding #1 temp function <> built-in function and naming.
> > I'm fine with temp functions should precede built-in function and can
> > override built-in functions (we already support to override built-in
> > function in 1.9).
> > If we don't allow the same name as a built-in function, I'm afraid we
> will
> > have compatibility issues in the future.
> > Say users register a user defined function named "explode" in 1.9, and we
> > support a built-in "explode" function in 1.10.
> > Then the user's jobs which call the registered "explode" function in 1.9
> > will all fail in 1.10 because of naming conflict.
> >
> > Regarding #2 "External" built-in functions.
> > I think if we store external built-in functions in catalog, then
> > "hive1::sqrt" is a good way to go.
> > However, I would prefer to support a discovery mechanism (e.g. SPI) for
> > built-in functions as Timo suggested above.
> > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > function set as built-in functions in an easy way.
> >
> > Best,
> > Jark
> >
> > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <[email protected]>
> > <[email protected]> wrote:
> >
> > Hi David,
> >
> > Thank you for sharing your findings. It seems to me that there is no SQL
> > standard regarding temporary functions. There are few systems that
> support
> > it. Here are what I have found:
> >
> > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > 2. Spark: basically follows Hive (
> >
> >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > )
> > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> > behavior. (
> > http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> )
> >
> > Because of lack of standard, it's perfectly fine for Flink to define
> > whatever it sees appropriate. Thus, your proposal (no overwriting and
> must
> > have DB as holder) is one option. The advantage is simplicity, The
> > downside
> > is the deviation from Hive, which is popular and de facto standard in big
> > data world.
> >
> > However, I don't think we have to follow Hive. More importantly, we need
> a
> > consensus. I have no objection if your proposal is generally agreed upon.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <[email protected]
> >
> > <[email protected]>
> > wrote:
> >
> > Hi all,
> >
> > Just an opinion on the built-in <> temporary functions resolution and
> > NAMING issue. I think we should not allow overriding the built-in
> > functions, as this may pose serious issues and to be honest is rather
> > not feasible and would require major rework. What happens if a user
> > wants to override CAST? Calls to that function are generated at
> > different layers of the stack that unfortunately does not always go
> > through the Catalog API (at least yet). Moreover from what I've checked
> > no other systems allow overriding the built-in functions. All the
> > systems I've checked so far register temporary functions in a
> > database/schema (either special database for temporary functions, or
> > just current database). What I would suggest is to always register
> > temporary functions with a 3 part identifier. The same way as tables,
> > views etc. This effectively means you cannot override built-in
> > functions. With such approach it is natural that the temporary functions
> > end up a step lower in the resolution order:
> >
> > 1. built-in functions (1 part, maybe 2? - this is still under discussion)
> >
> > 2. temporary functions (always 3 part path)
> >
> > 3. catalog functions (always 3 part path)
> >
> > Let me know what do you think.
> >
> > Best,
> >
> > Dawid
> >
> > On 04/09/2019 06:13, Bowen Li wrote:
> >
> > Hi,
> >
> > I agree with Xuefu that the main controversial points are mainly the
> >
> > two
> >
> > places. My thoughts on them:
> >
> > 1) Determinism of referencing Hive built-in functions. We can either
> >
> > remove
> >
> > Hive built-in functions from ambiguous function resolution and require
> > users to use special syntax for their qualified names, or add a config
> >
> > flag
> >
> > to catalog constructor/yaml for turning on and off Hive built-in
> >
> > functions
> >
> > with the flag set to 'false' by default and proper doc added to help
> >
> > users
> >
> > make their decisions.
> >
> > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> >
> > function
> >
> > resolution order. We believe Flink temp functions should precede Flink
> > built-in functions, and I have presented my reasons. Just in case if we
> > cannot reach an agreement, I propose forbid users registering temp
> > functions in the same name as a built-in function, like MySQL's
> >
> > approach,
> >
> > for the moment. It won't have any performance concern, since built-in
> > functions are all in memory and thus cost of a name check will be
> >
> > really
> >
> > trivial.
> >
> >
> > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <[email protected]>
> > <[email protected]> wrote:
> >
> >  From what I have seen, there are a couple of focal disagreements:
> >
> > 1. Resolution order: temp function --> flink built-in function -->
> >
> > catalog
> >
> > function vs flink built-in function --> temp function -> catalog
> >
> > function.
> >
> > 2. "External" built-in functions: how to treat built-in functions in
> > external system and how users reference them
> >
> > For #1, I agree with Bowen that temp function needs to be at the
> >
> > highest
> >
> > priority because that's how a user might overwrite a built-in function
> > without referencing a persistent, overwriting catalog function with a
> >
> > fully
> >
> > qualified name. Putting built-in functions at the highest priority
> > eliminates that usage.
> >
> > For #2, I saw a general agreement on referencing "external" built-in
> > functions such as those in Hive needs to be explicit and deterministic
> >
> > even
> >
> > though different approaches are proposed. To limit the scope and
> >
> > simply
> >
> > the
> >
> > usage, it seems making sense to me to introduce special syntax for
> >
> > user  to
> >
> > explicitly reference an external built-in function such as hive1::sqrt
> >
> > or
> >
> > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> >
> > call
> >
> > hive1.getFunction(ObjectPath functionName) where the database name is
> > absent for bulit-in functions available in that catalog hive1. I
> >
> > understand
> >
> > that Bowen's original proposal was trying to avoid this, but this
> >
> > could
> >
> > turn out to be a clean and simple solution.
> >
> > (Timo's modular approach is great way to "expand" Flink's built-in
> >
> > function
> >
> > set, which seems orthogonal and complementary to this, which could be
> > tackled in further future work.)
> >
> > I'd be happy to hear further thoughts on the two points.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <[email protected]>
> > <[email protected]> wrote:
> >
> > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> >
> > the
> >
> > same
> > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > suggestion.
> >
> > The reason is backward compatibility. If we follow Bowen's approach,
> >
> > let's
> >
> > say we
> > first find function in Flink's built-in functions, and then hive's
> > built-in. For example, `foo`
> > is not supported by Flink, but hive has such built-in function. So
> >
> > user
> >
> > will have hive's
> > behavior for function `foo`. And in next release, Flink realize this
> >
> > is a
> >
> > very popular function
> > and add it into Flink's built-in functions, but with different
> >
> > behavior
> >
> > as
> >
> > hive's. So in next
> > release, the behavior changes.
> >
> > With Timo's approach, IIUC user have to tell the framework explicitly
> >
> > what
> >
> > kind of
> > built-in functions he would like to use. He can just tell framework
> >
> > to
> >
> > abandon Flink's built-in
> > functions, and use hive's instead. User can only choose between them,
> >
> > but
> >
> > not use
> > them at the same time. I think this approach is more predictable.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <[email protected]>
> > <[email protected]> wrote:
> >
> > Hi all,
> >
> > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> >
> > section
> >
> > in the google doc was updated, please take a look first and let me
> >
> > know
> >
> > if
> >
> > you have more questions.
> >
> > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <[email protected]>
> > <[email protected]>
> >
> > wrote:
> >
> > Hi Timo,
> >
> > Re> 1) We should not have the restriction "hive built-in functions
> >
> > can
> >
> > only
> >
> > be used when current catalog is hive catalog". Switching a catalog
> > should only have implications on the cat.db.object resolution but
> >
> > not
> >
> > functions. It would be quite convinient for users to use Hive
> >
> > built-ins
> >
> > even if they use a Confluent schema registry or just the in-memory
> >
> > catalog.
> >
> > There might be a misunderstanding here.
> >
> > First of all, Hive built-in functions are not part of Flink
> >
> > built-in
> >
> > functions, they are catalog functions, thus if the current catalog
> >
> > is
> >
> > not a
> >
> > HiveCatalog but, say, a schema registry catalog, ambiguous
> >
> > functions
> >
> > reference just shouldn't be resolved to a different catalog.
> >
> > Second, Hive built-in functions can potentially be referenced
> >
> > across
> >
> > catalog, but it doesn't have db namespace and we currently just
> >
> > don't
> >
> > have
> >
> > a SQL syntax for it. It can be enabled when such a SQL syntax is
> >
> > defined,
> >
> > e.g. "catalog::function", but it's out of scope of this FLIP.
> >
> > 2) I would propose to have separate concepts for catalog and
> >
> > built-in
> >
> > functions. In particular it would be nice to modularize built-in
> > functions. Some built-in functions are very crucial (like AS, CAST,
> > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> >
> > maybe
> >
> > we add more experimental functions in the future or function for
> >
> > some
> >
> > special application area (Geo functions, ML functions). A data
> >
> > platform
> >
> > team might not want to make every built-in function available. Or a
> > function module like ML functions is in a different Maven module.
> >
> > I think this is orthogonal to this FLIP, especially we don't have
> >
> > the
> >
> > "external built-in functions" anymore and currently the built-in
> >
> > function
> >
> > category remains untouched.
> >
> > But just to share some thoughts on the proposal, I'm not sure about
> >
> > it:
> >
> > - I don't know if any other databases handle built-in functions
> >
> > like
> >
> > that.
> >
> > Maybe you can give some examples? IMHO, built-in functions are
> >
> > system
> >
> > info
> >
> > and should be deterministic, not depending on loaded libraries. Geo
> > functions should be either built-in already or just libraries
> >
> > functions,
> >
> > and library functions can be adapted to catalog APIs or of some
> >
> > other
> >
> > syntax to use
> > - I don't know if all use cases stand, and many can be achieved by
> >
> > other
> >
> > approaches too. E.g. experimental functions can be taken good care
> >
> > of
> >
> > by
> >
> > documentations, annotations, etc
> > - the proposal basically introduces some concept like a pluggable
> >
> > built-in
> >
> > function catalog, despite the already existing catalog APIs
> > - it brings in even more complicated scenarios to the design. E.g.
> >
> > how
> >
> > do
> >
> > you handle built-in functions in different modules but different
> >
> > names?
> >
> > In short, I'm not sure if it really stands and it looks like an
> >
> > overkill
> >
> > to me. I'd rather not go to that route. Related discussion can be
> >
> > on
> >
> > its
> >
> > own thread.
> >
> > 3) Following the suggestion above, we can have a separate discovery
> > mechanism for built-in functions. Instead of just going through a
> >
> > static
> >
> > list like in BuiltInFunctionDefinitions, a platform team should be
> >
> > able
> >
> > to select function modules like
> > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > HiveFunctions) or via service discovery;
> >
> > Same as above. I'll leave it to its own thread.
> >
> > re > 3) Dawid and I discussed the resulution order again. I agree
> >
> > with
> >
> > Kurt
> >
> > that we should unify built-in function (external or internal)
> >
> > under a
> >
> > common layer. However, the resolution order should be:
> >    1. built-in functions
> >    2. temporary functions
> >    3. regular catalog resolution logic
> > Otherwise a temporary function could cause clashes with Flink's
> >
> > built-in
> >
> > functions. If you take a look at other vendors, like SQL Server
> >
> > they
> >
> > also do not allow to overwrite built-in functions.
> >
> > ”I agree with Kurt that we should unify built-in function (external
> >
> > or
> >
> > internal) under a common layer.“ <- I don't think this is what Kurt
> >
> > means.
> >
> > Kurt and I are in favor of unifying built-in functions of external
> >
> > systems
> >
> > and catalog functions. Did you type a mistake?
> >
> > Besides, I'm not sure about the resolution order you proposed.
> >
> > Temporary
> >
> > functions have a lifespan over a session and are only visible to
> >
> > the
> >
> > session owner, they are unique to each user, and users create them
> >
> > on
> >
> > purpose to be the highest priority in order to overwrite system
> >
> > info
> >
> > (built-in functions in this case).
> >
> > In your case, why would users name a temporary function the same
> >
> > as a
> >
> > built-in function then? Since using that name in ambiguous function
> > reference will always be resolved to built-in functions, creating a
> > same-named temp function would be meaningless in the end.
> >
> >
> > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <[email protected]>
> > <[email protected]>
> >
> > wrote:
> >
> > Hi Jingsong,
> >
> > Re> 1.Hive built-in functions is an intermediate solution. So we
> >
> > should
> >
> > not introduce interfaces to influence the framework. To make
> > Flink itself more powerful, we should implement the functions
> > we need to add.
> >
> > Yes, please see the doc.
> >
> > Re> 2.Non-flink built-in functions are easy for users to change
> >
> > their
> >
> > behavior. If we support some flink built-in functions in the
> > future but act differently from non-flink built-in, this will
> >
> > lead
> >
> > to
> >
> > changes in user behavior.
> >
> > There's no such concept as "external built-in functions" any more.
> > Built-in functions of external systems will be treated as special
> >
> > catalog
> >
> > functions.
> >
> > Re> Another question is, does this fallback include all
> >
> > hive built-in functions? As far as I know, some hive functions
> > have some hacky. If possible, can we start with a white list?
> > Once we implement some functions to flink built-in, we can
> > also update the whitelist.
> >
> > Yes, that's something we thought of too. I don't think it's super
> > critical to the scope of this FLIP, thus I'd like to leave it to
> >
> > future
> >
> > efforts as a nice-to-have feature.
> >
> >
> > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <[email protected]>
> > <[email protected]>
> >
> > wrote:
> >
> > Hi Kurt,
> >
> > Re: > What I want to propose is we can merge #3 and #4, make them
> >
> > both
> >
> > under
> >
> > "catalog" concept, by extending catalog function to make it have
> >
> > ability to
> >
> > have built-in catalog functions. Some benefits I can see from
> >
> > this
> >
> > approach:
> >
> > 1. We don't have to introduce new concept like external built-in
> >
> > functions.
> >
> > Actually I don't see a full story about how to treat a built-in
> >
> > functions, and it
> >
> > seems a little bit disrupt with catalog. As a result, you have
> >
> > to
> >
> > make
> >
> > some restriction
> >
> > like "hive built-in functions can only be used when current
> >
> > catalog
> >
> > is
> >
> > hive catalog".
> >
> > Yes, I've unified #3 and #4 but it seems I didn't update some
> >
> > part
> >
> > of
> >
> > the doc. I've modified those sections, and they are up to date
> >
> > now.
> >
> > In short, now built-in function of external systems are defined
> >
> > as
> >
> > a
> >
> > special kind of catalog function in Flink, and handled by Flink
> >
> > as
> >
> > following:
> > - An external built-in function must be associated with a catalog
> >
> > for
> >
> > the purpose of decoupling flink-table and external systems.
> > - It always resides in front of catalog functions in ambiguous
> >
> > function
> >
> > reference order, just like in its own external system
> > - It is a special catalog function that doesn’t have a
> >
> > schema/database
> >
> > namespace
> > - It goes thru the same instantiation logic as other user defined
> > catalog functions in the external system
> >
> > Please take another look at the doc, and let me know if you have
> >
> > more
> >
> > questions.
> >
> >
> > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <[email protected]>
> > <[email protected]>
> >
> > wrote:
> >
> > Hi Kurt,
> >
> > it should not affect the functions and operations we currently
> >
> > have
> >
> > in
> >
> > SQL. It just categorizes the available built-in functions. It is
> >
> > kind
> >
> > of
> > an orthogonal concept to the catalog API but built-in functions
> >
> > deserve
> >
> > this special kind of treatment. CatalogFunction still fits
> >
> > perfectly
> >
> > in
> >
> > there because the regular catalog object resolution logic is not
> > affected. So tables and functions are resolved in the same way
> >
> > but
> >
> > with
> >
> > built-in functions that have priority as in the original design.
> >
> > Regards,
> > Timo
> >
> >
> > On 03.09.19 15:26, Kurt Young wrote:
> >
> > Does this only affect the functions and operations we currently
> >
> > have
> >
> > in SQL
> >
> > and
> > have no effect on tables, right? Looks like this is an
> >
> > orthogonal
> >
> > concept
> >
> > with Catalog?
> > If the answer are both yes, then the catalog function will be a
> >
> > weird
> >
> > concept?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
> >
> > [email protected]
> >
> > wrote:
> >
> > The way you proposed are basically the same as what Calcite
> >
> > does, I
> >
> > think
> >
> > we are in the same line.
> >
> > Best,
> > Danny Chan
> > 在 2019年9月3日 +0800 PM7:57，Timo Walther <[email protected]
> >
> > ，写道：
> >
> > This sounds exactly as the module approach I mentioned, no?
> >
> > Regards,
> > Timo
> >
> > On 03.09.19 13:42, Danny Chan wrote:
> >
> > Thanks Bowen for bring up this topic, I think it’s a useful
> >
> > refactoring to make our function usage more user friendly.
> >
> > For the topic of how to organize the builtin operators and
> >
> > operators
> >
> > of Hive, here is a solution from Apache Calcite, the Calcite
> >
> > way
> >
> > is
> >
> > to make
> >
> > every dialect operators a “Library”, user can specify which
> >
> > libraries they
> >
> > want to use for a sql query. The builtin operators always
> >
> > comes
> >
> > as
> >
> > the
> >
> > first class objects and the others are used from the order
> >
> > they
> >
> > appears.
> >
> > Maybe you can take a reference.
> >
> > [1]
> >
> >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >
> > Best,
> > Danny Chan
> > 在 2019年8月28日 +0800 AM2:50，Bowen Li <[email protected]
> >
> > ，写道：
> >
> > Hi folks,
> >
> > I'd like to kick off a discussion on reworking Flink's
> >
> > FunctionCatalog.
> >
> > It's critically helpful to improve function usability in
> >
> > SQL.
> >
> >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >
> > In short, it:
> > - adds support for precise function reference with
> >
> > fully/partially
> >
> > qualified name
> > - redefines function resolution order for ambiguous
> >
> > function
> >
> > reference
> >
> > - adds support for Hive's rich built-in functions (support
> >
> > for
> >
> > Hive
> >
> > user
> >
> > defined functions was already added in 1.9.0)
> > - clarifies the concept of temporary functions
> >
> > Would love to hear your thoughts.
> >
> > Bowen
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
> >
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Reply via email to