subject:"Re\: \[DISCUSS\] \[Calcite\-2683\] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once"

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

2018-11-19 Thread jincheng sun

Hi Hequn,
Thanks for report this issue. I think the premise of rule optimization is
to guarantee the correctness of the semantics. When the rule is matched,
the isDeterministics attribute of UDF must be considered.

+1 to be fixed.

Thanks,
Jincheng

Hequn Cheng  于2018年11月19日周一 下午11:52写道：

> Hi,
>
> Currently, there are some merge rules for Project, such as CalcMergeRule,
> ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules
> should not be performed when Nondeterministic expression of the
> bottom(inner) project has been referenced more than once by the top(outer)
> project. Take the following test as an example:
>
>   @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
> Exception {
> HepProgram program = new HepProgramBuilder()
> .addRuleInstance(FilterProjectTransposeRule.INSTANCE)
> .addRuleInstance(ProjectMergeRule.INSTANCE)
> .build();
>
> checkPlanning(program,
> "select name, a as a1, a as a2 from (\n"
> + "  select *, rand() as a\n"
> + "  from dept)\n"
> + "where deptno = 10\n");
>   }
>
> The first select generates `a` from `rand()` and the second select generate
> `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
> Let's take a look at the result plan:
>
> LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
>   LogicalFilter(condition=[=($0, 10)])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>
> In the plan, a1 may not equal to a2 due to the projects merge which is
> against the SQL(a1 equals to a2).
> In order to let a1 equal to a2, one option to solve the problem is to
> disable these merge rules in such cases, so that the result plan will be:
>
> LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
>   LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
> LogicalFilter(condition=[=($0, 10)])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>
> Do you guys have any good ideas or encountered similar problems? Any
> suggestions are greatly appreciated.
>
> Best,
> Hequn
>
> [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683
>

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

2018-11-19 Thread Michael Mior

While I think it's probably true, I'm hesitant to believe that just because
the rewritten version has the same number of calls, that it will always
produce the same results. My preference would probably be to go with the
"strict" semantic as per Julian's specification since that's least likely
to cause problems. If we allow other semantics, I would suggest that strict
should be the default unless we can clearly prove that other semantics will
always produce correct results.

--
Michael Mior
mm...@apache.org


Le lun. 19 nov. 2018 à 13:45, Julian Hyde  a écrit :

> Repeating the comments I made in the JIRA case [1].
>
> I do find your argument compelling, that if the rewritten version contains
> the same number of calls to the UDF, it should be OK.
>
> But there are other possible semantics. For instance, a “strict” semantic
> could allow rewrite only if the calls to the UDF are guaranteed to be the
> same number, and the same order. A “relaxed” semantic would allow
> non-deterministic functions (and dynamic functions, see [2]) to be
> rewritten any time.
>
> Perhaps there could be variants of this rule, one for each semantic, and
> the semantics could be chosen via a connection- or statement-level
> property. To enforce a particular semantic, several rules will need to
> modify their behavior (e.g. FilterProjectTransposeRule), so those rules
> would be parameterized on semantic also.
>
> Julian
>
> [2] https://issues.apache.org/jira/browse/CALCITE-2638 <
> https://issues.apache.org/jira/browse/CALCITE-2638>
>
> > On Nov 19, 2018, at 7:51 AM, Hequn Cheng  wrote:
> >
> > Hi,
> >
> > Currently, there are some merge rules for Project, such as CalcMergeRule,
> > ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge
> rules
> > should not be performed when Nondeterministic expression of the
> > bottom(inner) project has been referenced more than once by the
> top(outer)
> > project. Take the following test as an example:
> >
> >  @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
> > Exception {
> >HepProgram program = new HepProgramBuilder()
> >.addRuleInstance(FilterProjectTransposeRule.INSTANCE)
> >.addRuleInstance(ProjectMergeRule.INSTANCE)
> >.build();
> >
> >checkPlanning(program,
> >"select name, a as a1, a as a2 from (\n"
> >+ "  select *, rand() as a\n"
> >+ "  from dept)\n"
> >+ "where deptno = 10\n");
> >  }
> >
> > The first select generates `a` from `rand()` and the second select
> generate
> > `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
> > Let's take a look at the result plan:
> >
> > LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
> >  LogicalFilter(condition=[=($0, 10)])
> >LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> >
> > In the plan, a1 may not equal to a2 due to the projects merge which is
> > against the SQL(a1 equals to a2).
> > In order to let a1 equal to a2, one option to solve the problem is to
> > disable these merge rules in such cases, so that the result plan will be:
> >
> > LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
> >  LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
> >LogicalFilter(condition=[=($0, 10)])
> >  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> >
> > Do you guys have any good ideas or encountered similar problems? Any
> > suggestions are greatly appreciated.
> >
> > Best,
> > Hequn
> >
> > [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683
>
>

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

2018-11-19 Thread Julian Hyde

Repeating the comments I made in the JIRA case [1].

I do find your argument compelling, that if the rewritten version contains the 
same number of calls to the UDF, it should be OK.

But there are other possible semantics. For instance, a “strict” semantic could 
allow rewrite only if the calls to the UDF are guaranteed to be the same 
number, and the same order. A “relaxed” semantic would allow non-deterministic 
functions (and dynamic functions, see [2]) to be rewritten any time.

Perhaps there could be variants of this rule, one for each semantic, and the 
semantics could be chosen via a connection- or statement-level property. To 
enforce a particular semantic, several rules will need to modify their behavior 
(e.g. FilterProjectTransposeRule), so those rules would be parameterized on 
semantic also.

Julian

[2] https://issues.apache.org/jira/browse/CALCITE-2638 

> On Nov 19, 2018, at 7:51 AM, Hequn Cheng  wrote:
> 
> Hi,
> 
> Currently, there are some merge rules for Project, such as CalcMergeRule,
> ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules
> should not be performed when Nondeterministic expression of the
> bottom(inner) project has been referenced more than once by the top(outer)
> project. Take the following test as an example:
> 
>  @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
> Exception {
>HepProgram program = new HepProgramBuilder()
>.addRuleInstance(FilterProjectTransposeRule.INSTANCE)
>.addRuleInstance(ProjectMergeRule.INSTANCE)
>.build();
> 
>checkPlanning(program,
>"select name, a as a1, a as a2 from (\n"
>+ "  select *, rand() as a\n"
>+ "  from dept)\n"
>+ "where deptno = 10\n");
>  }
> 
> The first select generates `a` from `rand()` and the second select generate
> `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
> Let's take a look at the result plan:
> 
> LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
>  LogicalFilter(condition=[=($0, 10)])
>LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> 
> In the plan, a1 may not equal to a2 due to the projects merge which is
> against the SQL(a1 equals to a2).
> In order to let a1 equal to a2, one option to solve the problem is to
> disable these merge rules in such cases, so that the result plan will be:
> 
> LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
>  LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
>LogicalFilter(condition=[=($0, 10)])
>  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> 
> Do you guys have any good ideas or encountered similar problems? Any
> suggestions are greatly appreciated.
> 
> Best,
> Hequn
> 
> [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

3 matches

Site Navigation

Mail list logo

Footer information