[ 
https://issues.apache.org/jira/browse/CALCITE-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-4564:
------------------------------------
    Labels: pull-request-available  (was: )

> Initialization context for non-static user-defined functions (UDFs)
> -------------------------------------------------------------------
>
>                 Key: CALCITE-4564
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4564
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I propose to allow user-defined functions (UDFs) to read from an 
> initialization context during construction. The initialization context would 
> be a new Java {{interface UdfInitializer}} that provides, among other things, 
> a type factory and the values of the arguments to the function call whose 
> values are literals.
> The purpose of this feature is to allow functions to do more work at 
> initialization time and less work on each invocation. Suppose I wanted to 
> write a UDF {{regexMatch(pattern, string)}} that matches Java regular 
> expressions. If {{pattern}} is a literal, I would like to create an instance 
> of the function object that calls {{Pattern.compile(pattern)}} in its 
> constructor and stores the resulting {{Pattern}} object as a field. Each 
> invocation of the function can use that {{Pattern}} object, and does not have 
> to pay the cost of compilation.
> In order to use this feature, a UDF class would have a public constructor 
> with a single argument that is a {{UdfInitializer}}. The method that invokes 
> the function, conventionally called {{eval}}, must be non-static.
> This feature is optional. A UDF that has a public constructor with zero 
> arguments (which is the current contract for non-static UDFs) will continue 
> to work. [class 
> MyPlusFunction|https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/test/java/org/apache/calcite/util/Smalls.java#L429]
>  is an example of this kind of UDF.
> This feature would apply to all UDFs, including table functions (i.e. those 
> whose argument are tables or which return tables) and aggregate functions.
> The initialization context would not affect type derivation aspects of the 
> function. The return type, operand types, and so forth, will already have 
> been derived during validate time, and is complete well before any code is 
> generated or executed. If you want to control type derivation, you should 
> create your own sub-class of {{SqlOperator}}, as today.
> There are some implementation challenges:
> * The code generator will need to generate an instance of {{UdfInitializer}} 
> for each UDF call that occurs in the query. Some data structures that are 
> readily available at validate time (e.g. {{RexCall}}) are not easily 
> re-created at run time, so we should be conservative what information is 
> available via {{UdfInitializer}}.
> * The code generator must ensure that those instances are constructed exactly 
> once during the execution of the query; those instances should not be 
> variables in the {{execute}} method, but should instead be fields, or perhaps 
> static fields, in the generated class.
> * This functionality needs to work through both the interpreter ({{Bindable}} 
> convention) and generated code ({{Enumerable}} convention).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to