[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040289#comment-17040289
 ] 

Julian Hyde commented on CALCITE-3760:
--------------------------------------

I don't think this PR approaches the problem the right way:
* I don't yet see a need to add {{boolean isDeterministic()}} method to 
{{org.apache.calcite.schema.Function}}. We already have {{boolean 
SqlOperator.isDeterministic()}} and we have the annotation 
{{org.apache.calcite.linq4j.function.Deterministic}} to be applied to Java UDFs.
* Figuring out whether an expression is deterministic by tree-walking into the 
expression, as {{SqlUtil.isDeterminstic}} does, is the wrong approach. It will 
work on the original query, but the determinism will get broken during rewrite.
* I think the robust solution is to ensure that operators that are 
non-deterministic appear only as the topmost expression in a {{Project}}. That 
way they are computed only once, and any expression referencing them is using a 
field that has already been computed. Probably it would be up to 
{{SqlToRelConverter}} to carve up expressions so that non-deterministic 
operators are on top. And up to {{ProjectMergeRule}} and similar rules to 
ensure that they stay on top.
* The definition of 'deterministic' is ambiguous. The documentation should 
state explicitly whether functions that return the same result for the duration 
of a query (e.g. {{CURRENT_TIMESTAMP}}) or the same result for the current row 
(e.g. {{NEXT_VALUE}} of a sequence) are considered to be deterministic. Really 
we need deterministic to be an enum with 4 values, not a boolean. Maybe there's 
a fifth: a function that must be evaluated even if its result is not used.

> Rewriting non-deterministic function can break query semantics
> --------------------------------------------------------------
>
>                 Key: CALCITE-3760
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3760
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Jin Xing
>            Assignee: Jin Xing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to