Re: [SUPPORT] Semantics change when applying ConverterRule in CheapestPlanReplacer

2023-01-27 Thread Julian Hyde
Done. Let's continue the conversation at CALCITE-5503.

On Fri, Jan 27, 2023 at 10:08 AM Moritz Mack  wrote:
>
> Julian, I’ve created CALCITE-5503 as suggested.
>
> Would you be able to add me as a contributor in Jira and assign the ticket to 
> me.
> I opened a PR with the test case you’ve asked for and a rather trivial fix.
> https://github.com/apache/calcite/pull/3050
>
> Thanks,
> Moritz
>
> On 27.01.23, 10:00, "Moritz Mack"  wrote:
>
> Thanks a lot, Julian.
> I’ll look into that the next few days.
>
> /Moritz
>
> On 25.01.23, 17:04, "Julian Hyde"  wrote:
>
> It would seem a reasonable and useful enhancement if CheapestPlanReplacer did 
> some memoization. If it sees the same RelNode twice it should emit the same 
> result. I would be conservative, and memoize based on identity of the 
> RelNodes, not just equality.
>
> It would be useful if you could come up with a test case in pure Calcite, 
> then log a jira case.
>
> Julian
>
> > On Jan 25, 2023, at 12:56 AM, Moritz Mack  wrote:
> >
> > Dear Calcite Devs,
> >
> > I’m currently looking into an issue in the SQL extension [1] of Apache Beam 
> > and was hoping to find some advice here.
> > Using a bunch of Calcite ConverterRules [2], we convert a RelNode tree into 
> > a tree of BeamRelNodes which is then used to build a Beam DAG. Pretty 
> > standard I suppose …
> >
> > What I’m scratching my head about is that applying the converter rules 
> > changes the semantics of the graph, which it shouldn’t I thought. Or is 
> > that a wrong expectation?
> > Here’s a very simple SQL example to illustrate this (see also 
> > BeamUnionRelTest [3]):
> >
> > SELECT  order_id, site_id, price FROM ORDER_DETAILS
> > UNION ALL
> > SELECT  order_id, site_id, price FROM ORDER_DETAILS
> >
> > This is where we start at, the corresponding RelNode:
> >
> > LogicalUnion.NONE(input#0=LogicalProject#8,input#1=LogicalProject#10,all=true)
> >
> > Once the corresponding conversion rule [4] is applied to above by the 
> > CheapestPlanReplacer, but before visiting its old inputs, we get the 
> > following:
> >
> > BeamUnionRel.BEAM_LOGICAL(input#0=RelSubset#25,input#1=RelSubset#25,all=true)
> >
> > At this point both inputs refer to the same node (#25). However, once 
> > visiting the inputs [5] in CheapestPlanReplacer, that semantic information 
> > is lost as RelNodes get copied if inputs change [5].
> > In below result, the two inputs refer to different nodes:
> >
> > BeamUnionRel.BEAM_LOGICAL(input#0=BeamCalcRel#49,input#1=BeamCalcRel#50,all=true)
> >
> > This, however, currently prevents caching of intermediate results in the 
> > Beam DAG when this might be beneficial.
> >
> > Would you have any advice how to better approach this?
> > Of course, I could use a stateful copy operation to handle such repeated 
> > copy operations with the same parameters. But this seems wrong to be honest.
> >
> > Thanks a million!
> > Kind regards,
> > Moritz
> >
> >
> > [1] 
> > https://urldefense.com/v3/__https://github.com/apache/beam/issues/24314__;!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrH1HemJw$
> > [2] 
> > https://urldefense.com/v3/__https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java*L136-L152__;Iw!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9Ubryge2rYo$
> > [3] 
> > https://urldefense.com/v3/__https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java*L52-L71__;Iw!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrYLpRpco$
> > [4] 
> > 

Re: [SUPPORT] Semantics change when applying ConverterRule in CheapestPlanReplacer

2023-01-27 Thread Moritz Mack
Julian, I’ve created CALCITE-5503 as suggested.

Would you be able to add me as a contributor in Jira and assign the ticket to 
me.
I opened a PR with the test case you’ve asked for and a rather trivial fix.
https://github.com/apache/calcite/pull/3050

Thanks,
Moritz

On 27.01.23, 10:00, "Moritz Mack"  wrote:

Thanks a lot, Julian.
I’ll look into that the next few days.

/Moritz

On 25.01.23, 17:04, "Julian Hyde"  wrote:

It would seem a reasonable and useful enhancement if CheapestPlanReplacer did 
some memoization. If it sees the same RelNode twice it should emit the same 
result. I would be conservative, and memoize based on identity of the RelNodes, 
not just equality.

It would be useful if you could come up with a test case in pure Calcite, then 
log a jira case.

Julian

> On Jan 25, 2023, at 12:56 AM, Moritz Mack  wrote:
>
> Dear Calcite Devs,
>
> I’m currently looking into an issue in the SQL extension [1] of Apache Beam 
> and was hoping to find some advice here.
> Using a bunch of Calcite ConverterRules [2], we convert a RelNode tree into a 
> tree of BeamRelNodes which is then used to build a Beam DAG. Pretty standard 
> I suppose …
>
> What I’m scratching my head about is that applying the converter rules 
> changes the semantics of the graph, which it shouldn’t I thought. Or is that 
> a wrong expectation?
> Here’s a very simple SQL example to illustrate this (see also 
> BeamUnionRelTest [3]):
>
> SELECT  order_id, site_id, price FROM ORDER_DETAILS
> UNION ALL
> SELECT  order_id, site_id, price FROM ORDER_DETAILS
>
> This is where we start at, the corresponding RelNode:
>
> LogicalUnion.NONE(input#0=LogicalProject#8,input#1=LogicalProject#10,all=true)
>
> Once the corresponding conversion rule [4] is applied to above by the 
> CheapestPlanReplacer, but before visiting its old inputs, we get the 
> following:
>
> BeamUnionRel.BEAM_LOGICAL(input#0=RelSubset#25,input#1=RelSubset#25,all=true)
>
> At this point both inputs refer to the same node (#25). However, once 
> visiting the inputs [5] in CheapestPlanReplacer, that semantic information is 
> lost as RelNodes get copied if inputs change [5].
> In below result, the two inputs refer to different nodes:
>
> BeamUnionRel.BEAM_LOGICAL(input#0=BeamCalcRel#49,input#1=BeamCalcRel#50,all=true)
>
> This, however, currently prevents caching of intermediate results in the Beam 
> DAG when this might be beneficial.
>
> Would you have any advice how to better approach this?
> Of course, I could use a stateful copy operation to handle such repeated copy 
> operations with the same parameters. But this seems wrong to be honest.
>
> Thanks a million!
> Kind regards,
> Moritz
>
>
> [1] 
> https://urldefense.com/v3/__https://github.com/apache/beam/issues/24314__;!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrH1HemJw$
> [2] 
> https://urldefense.com/v3/__https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java*L136-L152__;Iw!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9Ubryge2rYo$
> [3] 
> https://urldefense.com/v3/__https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java*L52-L71__;Iw!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrYLpRpco$
> [4] 
> https://urldefense.com/v3/__https://github.com/apache/beam/blob/a96afe2c57c45a869a622086eaa4f81305f06e72/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java__;!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrQdhrgjw$
> [5] 
> 

Re: [SUPPORT] Semantics change when applying ConverterRule in CheapestPlanReplacer

2023-01-27 Thread Moritz Mack
Thanks a lot, Julian.
I’ll look into that the next few days.

/Moritz

On 25.01.23, 17:04, "Julian Hyde"  wrote:

It would seem a reasonable and useful enhancement if CheapestPlanReplacer did 
some memoization. If it sees the same RelNode twice it should emit the same 
result. I would be conservative, and memoize based on identity of the RelNodes, 
not just equality.

It would be useful if you could come up with a test case in pure Calcite, then 
log a jira case.

Julian

> On Jan 25, 2023, at 12:56 AM, Moritz Mack  wrote:
>
> Dear Calcite Devs,
>
> I’m currently looking into an issue in the SQL extension [1] of Apache Beam 
> and was hoping to find some advice here.
> Using a bunch of Calcite ConverterRules [2], we convert a RelNode tree into a 
> tree of BeamRelNodes which is then used to build a Beam DAG. Pretty standard 
> I suppose …
>
> What I’m scratching my head about is that applying the converter rules 
> changes the semantics of the graph, which it shouldn’t I thought. Or is that 
> a wrong expectation?
> Here’s a very simple SQL example to illustrate this (see also 
> BeamUnionRelTest [3]):
>
> SELECT  order_id, site_id, price FROM ORDER_DETAILS
> UNION ALL
> SELECT  order_id, site_id, price FROM ORDER_DETAILS
>
> This is where we start at, the corresponding RelNode:
>
> LogicalUnion.NONE(input#0=LogicalProject#8,input#1=LogicalProject#10,all=true)
>
> Once the corresponding conversion rule [4] is applied to above by the 
> CheapestPlanReplacer, but before visiting its old inputs, we get the 
> following:
>
> BeamUnionRel.BEAM_LOGICAL(input#0=RelSubset#25,input#1=RelSubset#25,all=true)
>
> At this point both inputs refer to the same node (#25). However, once 
> visiting the inputs [5] in CheapestPlanReplacer, that semantic information is 
> lost as RelNodes get copied if inputs change [5].
> In below result, the two inputs refer to different nodes:
>
> BeamUnionRel.BEAM_LOGICAL(input#0=BeamCalcRel#49,input#1=BeamCalcRel#50,all=true)
>
> This, however, currently prevents caching of intermediate results in the Beam 
> DAG when this might be beneficial.
>
> Would you have any advice how to better approach this?
> Of course, I could use a stateful copy operation to handle such repeated copy 
> operations with the same parameters. But this seems wrong to be honest.
>
> Thanks a million!
> Kind regards,
> Moritz
>
>
> [1] 
> https://urldefense.com/v3/__https://github.com/apache/beam/issues/24314__;!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrH1HemJw$
> [2] 
> https://urldefense.com/v3/__https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java*L136-L152__;Iw!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9Ubryge2rYo$
> [3] 
> https://urldefense.com/v3/__https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java*L52-L71__;Iw!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrYLpRpco$
> [4] 
> https://urldefense.com/v3/__https://github.com/apache/beam/blob/a96afe2c57c45a869a622086eaa4f81305f06e72/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java__;!!CiXD_PY!UU9RIn-JXhMMhXinYi6Iq3oHsLnmlLXkqWMc4A6J85FY83_atUeh4tseWCHR_L5B7bcBk_fj9UbrQdhrgjw$
> [5] 
> 

Re: [SUPPORT] Semantics change when applying ConverterRule in CheapestPlanReplacer

2023-01-25 Thread Julian Hyde
It would seem a reasonable and useful enhancement if CheapestPlanReplacer did 
some memoization. If it sees the same RelNode twice it should emit the same 
result. I would be conservative, and memoize based on identity of the RelNodes, 
not just equality. 

It would be useful if you could come up with a test case in pure Calcite, then 
log a jira case. 

Julian

> On Jan 25, 2023, at 12:56 AM, Moritz Mack  wrote:
> 
> Dear Calcite Devs,
> 
> I’m currently looking into an issue in the SQL extension [1] of Apache Beam 
> and was hoping to find some advice here.
> Using a bunch of Calcite ConverterRules [2], we convert a RelNode tree into a 
> tree of BeamRelNodes which is then used to build a Beam DAG. Pretty standard 
> I suppose …
> 
> What I’m scratching my head about is that applying the converter rules 
> changes the semantics of the graph, which it shouldn’t I thought. Or is that 
> a wrong expectation?
> Here’s a very simple SQL example to illustrate this (see also 
> BeamUnionRelTest [3]):
> 
> SELECT  order_id, site_id, price FROM ORDER_DETAILS
> UNION ALL
> SELECT  order_id, site_id, price FROM ORDER_DETAILS
> 
> This is where we start at, the corresponding RelNode:
> 
> LogicalUnion.NONE(input#0=LogicalProject#8,input#1=LogicalProject#10,all=true)
> 
> Once the corresponding conversion rule [4] is applied to above by the 
> CheapestPlanReplacer, but before visiting its old inputs, we get the 
> following:
> 
> BeamUnionRel.BEAM_LOGICAL(input#0=RelSubset#25,input#1=RelSubset#25,all=true)
> 
> At this point both inputs refer to the same node (#25). However, once 
> visiting the inputs [5] in CheapestPlanReplacer, that semantic information is 
> lost as RelNodes get copied if inputs change [5].
> In below result, the two inputs refer to different nodes:
> 
> BeamUnionRel.BEAM_LOGICAL(input#0=BeamCalcRel#49,input#1=BeamCalcRel#50,all=true)
> 
> This, however, currently prevents caching of intermediate results in the Beam 
> DAG when this might be beneficial.
> 
> Would you have any advice how to better approach this?
> Of course, I could use a stateful copy operation to handle such repeated copy 
> operations with the same parameters. But this seems wrong to be honest.
> 
> Thanks a million!
> Kind regards,
> Moritz
> 
> 
> [1] https://github.com/apache/beam/issues/24314
> [2] 
> https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java#L136-L152
> [3] 
> https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java#L52-L71
> [4] 
> https://github.com/apache/beam/blob/a96afe2c57c45a869a622086eaa4f81305f06e72/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java
> [5] 
> https://github.com/apache/calcite/blob/a326bd2d0e0b4b6b3336f10217b0ecbb79522239/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L727-L739
> 
> As a recipient of an email from Talend, your contact personal data will be on 
> our systems. Please see our privacy notice. 
> 
> 


[SUPPORT] Semantics change when applying ConverterRule in CheapestPlanReplacer

2023-01-25 Thread Moritz Mack
Dear Calcite Devs,

I’m currently looking into an issue in the SQL extension [1] of Apache Beam and 
was hoping to find some advice here.
Using a bunch of Calcite ConverterRules [2], we convert a RelNode tree into a 
tree of BeamRelNodes which is then used to build a Beam DAG. Pretty standard I 
suppose …

What I’m scratching my head about is that applying the converter rules changes 
the semantics of the graph, which it shouldn’t I thought. Or is that a wrong 
expectation?
Here’s a very simple SQL example to illustrate this (see also BeamUnionRelTest 
[3]):

SELECT  order_id, site_id, price FROM ORDER_DETAILS
UNION ALL
SELECT  order_id, site_id, price FROM ORDER_DETAILS

This is where we start at, the corresponding RelNode:

LogicalUnion.NONE(input#0=LogicalProject#8,input#1=LogicalProject#10,all=true)

Once the corresponding conversion rule [4] is applied to above by the 
CheapestPlanReplacer, but before visiting its old inputs, we get the following:

BeamUnionRel.BEAM_LOGICAL(input#0=RelSubset#25,input#1=RelSubset#25,all=true)

At this point both inputs refer to the same node (#25). However, once visiting 
the inputs [5] in CheapestPlanReplacer, that semantic information is lost as 
RelNodes get copied if inputs change [5].
In below result, the two inputs refer to different nodes:

BeamUnionRel.BEAM_LOGICAL(input#0=BeamCalcRel#49,input#1=BeamCalcRel#50,all=true)

This, however, currently prevents caching of intermediate results in the Beam 
DAG when this might be beneficial.

Would you have any advice how to better approach this?
Of course, I could use a stateful copy operation to handle such repeated copy 
operations with the same parameters. But this seems wrong to be honest.

Thanks a million!
Kind regards,
Moritz


[1] https://github.com/apache/beam/issues/24314
[2] 
https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java#L136-L152
[3] 
https://github.com/apache/beam/blob/6ba647333c4c69fb6dfc65929456c7c11570382f/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRelTest.java#L52-L71
[4] 
https://github.com/apache/beam/blob/a96afe2c57c45a869a622086eaa4f81305f06e72/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java
[5] 
https://github.com/apache/calcite/blob/a326bd2d0e0b4b6b3336f10217b0ecbb79522239/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L727-L739

As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice.