Understanding UDF annotations

2022-02-02 Thread X L
Hi,

I was wondering if my understanding about UDFs' annotations in
Calcite is correct. For instance, I notice there is a `@Deterministic`
label to specify a deterministic UDF. Then, relevant query rewrite rules
(e.g., ReduceExpressionsRule) need to check the existence of this label
(e.g., isDeterministic()) in order to decide whether the rule is applicable
(without changing the query semantics).
I do see that is a way to leverage annotations when optimizing queries that
contain UDFs, and it seems to require adding such annotation checks in
almost every query rewrite rules.

Thanks and Regards,
Xinyu


Re: Understanding UDF annotations

2022-02-02 Thread Julian Hyde
Yes, the UDF annotations are intended to allow better optimizations.

I’m sure there are bugs. If a query with a UDF is not being optimized, log a 
bug. The fix is probably at the level of a particular RelOptRule, and therefore 
the test case would be in RelOptRulesTest.

But I don’t think that each rule needs to look for annotations. The rules 
should* use methods on the RexNode (expression). If the expression is (or 
contains) a call to a UDF then the RexCall to that UDF will use the annotations.

* By ’should’ I mean ‘in an ideal world’, not ‘I believe that it is currently 
the case that…’.

I see that ReduceExpressionsRule has an inner class ReducibleExprLocator which 
uses calls such as SqlOperator.isDynamicFunction(). That functionality should 
(again, in an ideal world) be moved out of ReduceExpressionsRule in a way that 
other code can use it. I don’t know how.

For years I have been begging for someone to take ownership of this issue. This 
would include rigorous definitions of the terms ‘dynamic’ and ‘deterministic’. 
See https://issues.apache.org/jira/browse/CALCITE-4424 
 and 
https://issues.apache.org/jira/browse/CALCITE-2823 
. Perhaps also ‘pure’ and 
’strict’ (which are accepted terms in the programming language theory 
community).

Julian



> On Feb 1, 2022, at 8:49 AM, X L  wrote:
> 
> Hi,
> 
> I was wondering if my understanding about UDFs' annotations in
> Calcite is correct. For instance, I notice there is a `@Deterministic`
> label to specify a deterministic UDF. Then, relevant query rewrite rules
> (e.g., ReduceExpressionsRule) need to check the existence of this label
> (e.g., isDeterministic()) in order to decide whether the rule is applicable
> (without changing the query semantics).
> I do see that is a way to leverage annotations when optimizing queries that
> contain UDFs, and it seems to require adding such annotation checks in
> almost every query rewrite rules.
> 
> Thanks and Regards,
> Xinyu