Julian Hyde created CALCITE-4564:
------------------------------------

             Summary: Initialization context for non-static user-defined 
functions (UDFs)
                 Key: CALCITE-4564
                 URL: https://issues.apache.org/jira/browse/CALCITE-4564
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


I propose to allow user-defined functions (UDFs) to read from an initialization 
context during construction. The initialization context would be a new Java 
{{interface UdfInitializer}} that provides, among other things, a type factory 
and the values of the arguments to the function call whose values are literals.

The purpose of this feature is to allow functions to do more work at 
initialization time and less work on each invocation. Suppose I wanted to write 
a UDF {{regexMatch(pattern, string)}} that matches Java regular expressions. If 
{{pattern}} is a literal, I would like to create an instance of the function 
object that calls {{Pattern.compile(pattern)}} in its constructor and stores 
the resulting {{Pattern}} object as a field. Each invocation of the function 
can use that {{Pattern}} object, and does not have to pay the cost of 
compilation.

In order to use this feature, a UDF class would have a public constructor with 
a single argument that is a {{UdfInitializer}}. The method that invokes the 
function, conventionally called {{eval}}, must be non-static.

This feature is optional. A UDF that has a public constructor with zero 
arguments (which is the current contract for non-static UDFs) will continue to 
work. [class 
MyPlusFunction|https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/test/java/org/apache/calcite/util/Smalls.java#L429]
 is an example of this kind of UDF.

This feature would apply to all UDFs, including table functions (i.e. those 
whose argument are tables or which return tables) and aggregate functions.

The initialization context would not affect type derivation aspects of the 
function. The return type, operand types, and so forth, will already have been 
derived during validate time, and is complete well before any code is generated 
or executed. If you want to control type derivation, you should create your own 
sub-class of {{SqlOperator}}, as today.

There are some implementation challenges:
* The code generator will need to generate an instance of {{UdfInitializer}} 
for each UDF call that occurs in the query. Some data structures that are 
readily available at validate time (e.g. {{RexCall}}) are not easily re-created 
at run time, so we should be conservative what information is available via 
{{UdfInitializer}}.
* The code generator must ensure that those instances are constructed exactly 
once during the execution of the query; those instances should not be variables 
in the {{execute}} method, but should instead be fields, or perhaps static 
fields, in the generated class.
* This functionality needs to work through both the interpreter ({{Bindable}} 
convention) and generated code ({{Enumerable}} convention).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to