[jira] [Commented] (CALCITE-3923) Refactor how planner rules are parameterized

2020-04-30 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097081#comment-17097081
 ] 

Haisheng Yuan commented on CALCITE-3923:


topProject convention should be always same with botttomProject convention. So 
the check is not needed.

> Refactor how planner rules are parameterized
> 
>
> Key: CALCITE-3923
> URL: https://issues.apache.org/jira/browse/CALCITE-3923
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>
> People often want different variants of planner rules. An example is 
> {{FilterJoinRule}}, which has a 'boolean smart’ parameter, a predicate (which 
> returns whether to pull up filter conditions), operands (which determine the 
> precise sub-classes of {{RelNode}} that the rule should match) and a 
> {{RelBuilderFactory}} (which controls the type of {{RelNode}} created by this 
> rule).
> Suppose you have an instance of {{FilterJoinRule}} and you want to change 
> {{smart}} from true to false. The {{smart}} parameter is immutable (good!) 
> but you can’t easily create a clone of the rule because you don’t know the 
> values of the other parameters. Your instance might even be (unbeknownst to 
> you) a sub-class with extra parameters and a private constructor.
> So, my proposal is to put all of the config information of a {{RelOptRule}} 
> into a single {{config}} parameter that contains all relevant properties. 
> Each sub-class of {{RelOptRule}} would have one constructor with just a 
> ‘config’ parameter. Each config knows which sub-class of {{RelOptRule}} to 
> create. Therefore it is easy to copy a config, change one or more properties, 
> and create a new rule instance.
> Adding a property to a rule’s config does not require us to add or deprecate 
> any constructors.
> The operands are part of the config, so if you have a rule that matches a 
> {{EnumerableFilter}} on an {{EnumerableJoin}} and you want to make it match 
> an {{EnumerableFilter}} on an {{EnumerableNestedLoopJoin}}, you can easily 
> create one with one changed operand.
> The config is immutable and self-describing, so we can use it to 
> automatically generate a unique description for each rule instance.
> (See the email thread [[DISCUSS] Refactor how planner rules are 
> parameterized|https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E].)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Laurent Goujon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097079#comment-17097079
 ] 

Laurent Goujon commented on CALCITE-3965:
-

I agree with the unnecessary xml writing, but adding a synchronized keyword 
when it is not necessary (which is my analysis, and maybe I wrong here, so I 
would appreciate a second pair of eyes) is also a cause for lock contention.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097077#comment-17097077
 ] 

Julian Hyde commented on CALCITE-3965:
--

I agree, very likely a duplicate of 3517. Please fix 3517, CPU use will go way 
down, and lock contention will reduce. Lock contention is just a symptom, not 
the cause.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3923) Refactor how planner rules are parameterized

2020-04-30 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097074#comment-17097074
 ] 

Xiening Dai commented on CALCITE-3923:
--

I am thinking about this. Once we have the RelBuilder change in place, we could 
do such in ProjectMerge rule for example -

{code:java}
RelBuilder relBuilder = call.builder();
if (topProject.getConvention() == bottomProject.getConvention()) {
relBuilder = topProject.getConvention().transformRelBuilder(relBuilder);
}
{code}

Based on my test, a simple change like this will reduce 50% project merge rule 
firings for an N-way join query. But I don't like the additional check here, 
maybe we should provide "target convention" as a config parameter? 

I understand this might not necessarily be part of your change. But we might 
need to add that to config some point in the future. 

> Refactor how planner rules are parameterized
> 
>
> Key: CALCITE-3923
> URL: https://issues.apache.org/jira/browse/CALCITE-3923
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>
> People often want different variants of planner rules. An example is 
> {{FilterJoinRule}}, which has a 'boolean smart’ parameter, a predicate (which 
> returns whether to pull up filter conditions), operands (which determine the 
> precise sub-classes of {{RelNode}} that the rule should match) and a 
> {{RelBuilderFactory}} (which controls the type of {{RelNode}} created by this 
> rule).
> Suppose you have an instance of {{FilterJoinRule}} and you want to change 
> {{smart}} from true to false. The {{smart}} parameter is immutable (good!) 
> but you can’t easily create a clone of the rule because you don’t know the 
> values of the other parameters. Your instance might even be (unbeknownst to 
> you) a sub-class with extra parameters and a private constructor.
> So, my proposal is to put all of the config information of a {{RelOptRule}} 
> into a single {{config}} parameter that contains all relevant properties. 
> Each sub-class of {{RelOptRule}} would have one constructor with just a 
> ‘config’ parameter. Each config knows which sub-class of {{RelOptRule}} to 
> create. Therefore it is easy to copy a config, change one or more properties, 
> and create a new rule instance.
> Adding a property to a rule’s config does not require us to add or deprecate 
> any constructors.
> The operands are part of the config, so if you have a rule that matches a 
> {{EnumerableFilter}} on an {{EnumerableJoin}} and you want to make it match 
> an {{EnumerableFilter}} on an {{EnumerableNestedLoopJoin}}, you can easily 
> create one with one changed operand.
> The config is immutable and self-describing, so we can use it to 
> automatically generate a unique description for each rule instance.
> (See the email thread [[DISCUSS] Refactor how planner rules are 
> parameterized|https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E].)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097072#comment-17097072
 ] 

Haisheng Yuan commented on CALCITE-3965:


Is it duplicate with CALCITE-3517?

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Laurent Goujon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097071#comment-17097071
 ] 

Laurent Goujon commented on CALCITE-3965:
-

This is somehow related to CALCITE-3517, but after a look at the code, 
{{DiffRepository#expand}} do not do write to disk, and the main issue is really 
contention around the instance lock. The method is synchronized, but most 
operations look thread-safe. They are some calls to get and set (which are also 
synchronized), but it doesn't look like they need to done "atomically".

Removing {{synchronized}} on the expand() method results in the build 
completing in 2min30s with no test failures.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Laurent Goujon (Jira)
Laurent Goujon created CALCITE-3965:
---

 Summary: Excessive time waiting on DiffRepository lock
 Key: CALCITE-3965
 URL: https://issues.apache.org/jira/browse/CALCITE-3965
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Laurent Goujon
Assignee: Laurent Goujon


When running the whole test suite from commandline, tests are parallelized and 
gradle/junit tries to use as many cores as possible (16 on my machine). But the 
tests take a very long time, approximatevely 90minutes on my machine, and 
several of them failed because they took too long to complete.

Using jstack to look at the threads state while tests are running show that 
most of them are waiting on {{DiffRepository}} methods 
({{DiffRepository#expand}} in most cases) while one of the thread obtained the 
lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-2970) Performance issue when enabling abstract converter for EnumerableConvention

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097033#comment-17097033
 ] 

Julian Hyde commented on CALCITE-2970:
--

I take some of that back. I see that {{RelBuilder}} has a method {{RelBuilder 
transform(UnaryOperator transform)}} that creates a copy of a 
{{RelBuilder}} with the same state (including context, cluster) but a different 
config. So I think that {{UnaryOperator transform}} would be viable.

> Performance issue when enabling abstract converter for EnumerableConvention
> ---
>
> Key: CALCITE-2970
> URL: https://issues.apache.org/jira/browse/CALCITE-2970
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> If we enable the use of abstract converter for {{EnumerableConvention}}, by 
> making {{useAbstractConvertersForConversion}} return true, 
> {{JDBCTest.testJoinManyWay}} will not complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode

2020-04-30 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097015#comment-17097015
 ] 

Xiening Dai commented on CALCITE-3963:
--

{quote}We shouldn't rely on the first rel or subset's best.{quote}

To further explain this, consider this example. We have a simple join case 
which has two alternatives.

Plan A:
{code:java}
HashJoin
TableScanA
TableScanB
{code}

Plan B:
{code:java}
MergeJoin
Sort
TableScanA
Sort
TableScanB
{code}

Assuming the self cost of hash join and merge join are similar, then plan A is 
better since it doesn't incur sorting. But because these two join nodes have 
different input subset, the input row counts are decided by each subset's best 
node. If for some reason, we report a smaller row count in plan B's Sort subset 
(in this simple example it shouldn't, but it's possible in real world when 
input is much more complex), we could end up picking plan B as its overall cost 
is lower. 

We've seen issues like this before.

> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> 
>
> Key: CALCITE-3963
> URL: https://issues.apache.org/jira/browse/CALCITE-3963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Xiening Dai
>Assignee: Xiening Dai
>Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3955) Remove the first operand of RexCall from SqlWindowTableFunction

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097014#comment-17097014
 ] 

Julian Hyde commented on CALCITE-3955:
--

It would help a lot if {{SqlGroupedWindowFunction}} extended 
{{SqlUserDefinedTableFunction}}. (I should have said this when I was reviewing 
CALCITE-3382.) Or, if they both implemented a new {{interface  
SqlTableFunction}}.

Note how [SqlToRelConverter asks a table function for its element 
type|https://github.com/apache/calcite/blob/f1c0756f33b79904ca3a429bbff79ebf0103ece9/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2496-L2501];
 moving to {{SqlTableFunction}} would help with polymorphic table functions.

Currently there are two places in {{SqlToRelConverter}} that add to 
{{bb.cursors}}. If we added {{interface SqlTableFunction}} we could get that 
down to one, which would be awesome.

Run {{SqlToRelConverterTest.testCollectionTableWithCursorParam}} in a debugger 
and see how {{SqlToRelConverter.convertCursor}} gets called. The group window 
table functions should use the same code path.

> Remove the first operand of RexCall from SqlWindowTableFunction
> ---
>
> Key: CALCITE-3955
> URL: https://issues.apache.org/jira/browse/CALCITE-3955
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.22.0
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CALCITE-3382, we introduced TUMBLE window function to replace the 
> deprecated group tumble window.
> But for query
> {code:sql}
> select *
> from table(tumble(table Shipments, descriptor(rowtime), INTERVAL '1' MINUTE))
> {code}
> the outputs plan is
> {code:xml}
> LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], window_end=[$3])
>   LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), 
> 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) 
> ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)])
> LogicalProject(ORDERID=[$0], ROWTIME=[$1])
>   LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]])
> {code}
> The first operand of TUMBLE rex call is always the last input field, but 
> actually it represents the source table which is the input rel node.
> This issue remove the first operand from the RexCall because it is useless 
> and confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-2270) [SQL:2016] Polymorphic table functions

2020-04-30 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated CALCITE-2270:
--
Issue Type: New Feature  (was: Bug)

> [SQL:2016] Polymorphic table functions
> --
>
> Key: CALCITE-2270
> URL: https://issues.apache.org/jira/browse/CALCITE-2270
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Chunhui Shi
>Assignee: Rui Wang
>Priority: Major
>
> Polymorphic table functions: table functions without predefined return type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode

2020-04-30 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097006#comment-17097006
 ] 

Xiening Dai commented on CALCITE-3963:
--

What I mean by "maintain" is more about associating these properties with 
RelSet rather than RelNode. They can still store in meta data cache somehow, 
which would be an implementation detail. But conceptually they should belong to 
RelSet.

For example, when calculate row count for a RelSubset, the logic today is to 
use row count of subset.best, and if best is not available, we use the row 
count of the first rel in the set. The logic is flawed in my opinion. 
Essentially the row count should be consistent across the entire set, and only 
changes when a new logical node is added to the set, or the set gets merged. We 
shouldn't rely on the first rel or subset's best.

One of the clear benefits, which Haisheng already mentioned, is to save a large 
amount of cache memory and avoid unnecessary re-calculation. But more 
importantly we plug this hole in the conceptual design.

In terms of how we derive logical properties for the set, I think in a lot of 
cases, we don't "aggregate" inputs from the nodes, but more likely we choose 
the most convincing, or promising, node to report this stat. In the "unique 
keys" example you mentioned, do you have a real world case where RelNodes 
within one set have different unique keys?

 

 

 

> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> 
>
> Key: CALCITE-3963
> URL: https://issues.apache.org/jira/browse/CALCITE-3963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Xiening Dai
>Assignee: Xiening Dai
>Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-2270) [SQL:2016] Polymorphic table functions

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096990#comment-17096990
 ] 

Julian Hyde commented on CALCITE-2270:
--

Can you add more details how you would support polymorphic table functions? Is 
the return type defined using some rule based on the types of the input 
columns? In Calcite we have many functions where the return type is defined 
using code (e.g. {{SqlOperator.inferReturnType}}); it is more powerful, 
requires Java code rather than SQL, but is probably less effort to implement 
than a rule-derived return type.

> [SQL:2016] Polymorphic table functions
> --
>
> Key: CALCITE-2270
> URL: https://issues.apache.org/jira/browse/CALCITE-2270
> Project: Calcite
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Rui Wang
>Priority: Major
>
> Polymorphic table functions: table functions without predefined return type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3964) Support in DESCRIPTOR operator

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096987#comment-17096987
 ] 

Julian Hyde commented on CALCITE-3964:
--

The above description is a bit abstract; the SQL standard folks hate including 
examples (because they are "non-normative") but a SQL example would help me 
understand.

> Support  in DESCRIPTOR operator 
> ---
>
> Key: CALCITE-3964
> URL: https://issues.apache.org/jira/browse/CALCITE-3964
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> DESCRIPTOR can include an optional data type, which will enable type checking 
> in SQL validator. It is useful for streaming windowing: the windowing is 
> required to be applied on TIMESTAMP type, and we can rely on descriptor to do 
> type validation.
> *The following is copied from SQL standard 2016*: 
> 8.15 
> {code:java}
>  ::=
>   
>  ::=
>   DESCRIPTOR   
>  ::=
>   
> [ {   }... ]
>  ::=
>[  ]
> {code}
> A  is the keyword DESCRIPTOR followed by a parenthesized 
> list of column names; each column name may optionally have a data type. If 
> every column name has a data type, then the descriptor describes a row type. 
> In the examples, CSVreader and Pivot use descriptor arguments that are just 
> lists of column names; ExecR is an example that uses a descriptor to pass a 
> complete row type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode

2020-04-30 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096985#comment-17096985
 ] 

Haisheng Yuan commented on CALCITE-3963:


As long as all the alternatives in a RelSet share the same logical properties, 
we don't care where the logical properties are stored.

I am afraid the 'fold' operator will make things complicated. What about 
cardinality and selectivity? We may just end up with choosing one blindly. It 
doesn't seem right that we use alternative 1's cardinality info, use 
alternative 2's selectivity info, and use all the alternatives' unique keys ...

Admittedly, each alternative's stats may vary a lot, one of the reason is that 
Calcite believes all the simplification should be done in VolcanoPlanner and 
selected based on cost, while other systems like Sql Server and Greenplum do 
all the simplification like constant folding, join simplification, predicate 
push-down before the logical plan goes into the MEMO.

One of the reason to share logical properties between alternatives in a group 
is that it becomes possible (in the future) to do early decision to stop 
exploring this group. If we use the 'fold' operator to decide the group's 
logical properties, when is it good time to decide? 

Option 1: whenever there is a new alternative, recomputing the logical 
properties. That may be not better than just storing logical properties for 
each relnode.

Option 2: roll it up after all the logical alternatives are generated. But 
there is no logical / physical difference, we don't know it is logical operator 
or not. Judging by convention is not perfect, because systems like Flink, 
Drill, Ignite define their own logical convention. There is no logical rule and 
physical rule difference either, they are matched and applied at the same 
stage. Physical rules can even generate logical operators, like 
ProjectMergeRule, will these generated logical operators be counted?

Another reason to share logical properties is to avoid redundant computation. 
For example,
{code:java}
SELECT a,b,c,max(d) FROM foo GROUP BY a,b,c;

HashAggregate
  +-- TableScan
{code}
In distributed system, suppose we generate HashAgg with distribution 
alternatives of all the 8 key combinations. In SQL Server, there is only 1 
physical operator HashAgg, but in Calcite, there are 8 HashAgg operators, the 
same HashAgg with different traitset. We will get another 8 exchange operators 
(in Calcite 1.22 and before, there were more than 50 exchange operators), we 
need to compute the logical properties for all the HashAgg and Exchange 
operators, even the result is cached in metadata system, but these operators 
are just throwing money that are left on the table by LogicalAggregate operator.

> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> 
>
> Key: CALCITE-3963
> URL: https://issues.apache.org/jira/browse/CALCITE-3963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Xiening Dai
>Assignee: Xiening Dai
>Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3923) Refactor how planner rules are parameterized

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096904#comment-17096904
 ] 

Julian Hyde commented on CALCITE-3923:
--

The single convention rule idea is worth pursuing. It is one possible solution 
to the "RelNode.copy problem". (See also CALCITE-2064.) But unless solving it 
completely removes the need to copy rule instances - which I think it won't - 
this change will still be needed. So, can we please treat it as an orthogonal 
issue? 

> Refactor how planner rules are parameterized
> 
>
> Key: CALCITE-3923
> URL: https://issues.apache.org/jira/browse/CALCITE-3923
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>
> People often want different variants of planner rules. An example is 
> {{FilterJoinRule}}, which has a 'boolean smart’ parameter, a predicate (which 
> returns whether to pull up filter conditions), operands (which determine the 
> precise sub-classes of {{RelNode}} that the rule should match) and a 
> {{RelBuilderFactory}} (which controls the type of {{RelNode}} created by this 
> rule).
> Suppose you have an instance of {{FilterJoinRule}} and you want to change 
> {{smart}} from true to false. The {{smart}} parameter is immutable (good!) 
> but you can’t easily create a clone of the rule because you don’t know the 
> values of the other parameters. Your instance might even be (unbeknownst to 
> you) a sub-class with extra parameters and a private constructor.
> So, my proposal is to put all of the config information of a {{RelOptRule}} 
> into a single {{config}} parameter that contains all relevant properties. 
> Each sub-class of {{RelOptRule}} would have one constructor with just a 
> ‘config’ parameter. Each config knows which sub-class of {{RelOptRule}} to 
> create. Therefore it is easy to copy a config, change one or more properties, 
> and create a new rule instance.
> Adding a property to a rule’s config does not require us to add or deprecate 
> any constructors.
> The operands are part of the config, so if you have a rule that matches a 
> {{EnumerableFilter}} on an {{EnumerableJoin}} and you want to make it match 
> an {{EnumerableFilter}} on an {{EnumerableNestedLoopJoin}}, you can easily 
> create one with one changed operand.
> The config is immutable and self-describing, so we can use it to 
> automatically generate a unique description for each rule instance.
> (See the email thread [[DISCUSS] Refactor how planner rules are 
> parameterized|https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E].)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-2970) Performance issue when enabling abstract converter for EnumerableConvention

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096896#comment-17096896
 ] 

Julian Hyde commented on CALCITE-2970:
--

The other reason, besides handling sub-classes of {{RelBuilder}}, to use a 
{{UnaryOperator is that you can start from an existing 
{{RelBuilder}} that has some properties and state. Other suggestions that have 
been made - using a {{RelBuilderFactory}} or a {{RelBuilder.Config}} - have 
cases where they lose information. (E.g. the current {{RelOptCluster}}, which 
is in the {{RelBuilder}} but not its {{Config}}, or the current 
{{ViewExpander}}, which comes from the {{Context}} but not the {{Config}}.)

> Performance issue when enabling abstract converter for EnumerableConvention
> ---
>
> Key: CALCITE-2970
> URL: https://issues.apache.org/jira/browse/CALCITE-2970
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> If we enable the use of abstract converter for {{EnumerableConvention}}, by 
> making {{useAbstractConvertersForConversion}} return true, 
> {{JDBCTest.testJoinManyWay}} will not complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096889#comment-17096889
 ] 

Julian Hyde commented on CALCITE-3963:
--

Minor quibble: in JIRA subject, use the imperative form of the verb 
("Maintain") rather than third-person active ("Maintains")

When you stay "maintain" do you mean "store"? I'm not sure I agree. The 
metadata system allows us to derive a property for any {{RelNode}} (e.g. 
calling {{RelMetadataQuery. getUniqueKeys(RelNode rel, boolean ignoreNulls)}} 
on a particular {{LogicalProject}}) and it also maintains a cache, so that once 
derived, the value does not have to be re-computed.

So, the metadata system allows us to not worry too much about whether values 
are stored, which is good.

Now, let's suppose that you want to know the unique keys of a particular 
{{RelSet}} (or {{RelSubSet}} - the reasoning is similar). Unique keys are a 
logical property, so we should be able to derive the set of unique keys by 
taking the union of the unique keys of every {{RelNode}} in that set.

If you add a {{RelNode}} to a set, or merge sets, then the set may acquire 
additional unique keys. And those keys may cause changes to unique keys (and 
other metadata) for any {{RelNode}} that consumes any {{RelNode}} in the set. 
It's complicated, so we should lean on the metadata system to maintain 
everything for us.

I think we need to add a 'fold' operator to each type of metadata to say how 
the metadata of the {{RelSet}} is derived from those of the constituent nodes. 
In the case of {{RelMdUniqueKeys}} the fold operator is 'union'. (In SQL terms, 
the 'fold' operator would be called a 'roll up', that is, an aggregate 
function. {{RelMdMinRowCount}} rolls up using {{MAX}}. Et cetera.)

As I said earlier, we should not focus on where the {{RelSet}}'s metadata is 
stored. Let the metadata system worry about that. Focus instead on how the 
metadata is derived.



> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> 
>
> Key: CALCITE-3963
> URL: https://issues.apache.org/jira/browse/CALCITE-3963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Xiening Dai
>Assignee: Xiening Dai
>Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3955) Remove the first operand of RexCall from SqlWindowTableFunction

2020-04-30 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096268#comment-17096268
 ] 

Danny Chen commented on CALCITE-3955:
-

After some try i have no idea yet, we may need a way to identify that the 
function is a window function when do validation rewrite, [~julianhyde] any 
ideas ?

> Remove the first operand of RexCall from SqlWindowTableFunction
> ---
>
> Key: CALCITE-3955
> URL: https://issues.apache.org/jira/browse/CALCITE-3955
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.22.0
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CALCITE-3382, we introduced TUMBLE window function to replace the 
> deprecated group tumble window.
> But for query
> {code:sql}
> select *
> from table(tumble(table Shipments, descriptor(rowtime), INTERVAL '1' MINUTE))
> {code}
> the outputs plan is
> {code:xml}
> LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], window_end=[$3])
>   LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), 
> 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) 
> ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)])
> LogicalProject(ORDERID=[$0], ROWTIME=[$1])
>   LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]])
> {code}
> The first operand of TUMBLE rex call is always the last input field, but 
> actually it represents the source table which is the input rel node.
> This issue remove the first operand from the RexCall because it is useless 
> and confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3955) Remove the first operand of RexCall from SqlWindowTableFunction

2020-04-30 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096189#comment-17096189
 ] 

Rui Wang commented on CALCITE-3955:
---

Ok. I have spent some time to investigate how to wrap the first argument into a 
CURSOR. The main problem I have seen so far is *WHERE* to wrap it as a CURSOR. 
if we do it in "StandardConvertletTable", it seems too late based on how 
CURSOR/subquery is registered in Calcite. 

The best place might be "SqlValidatorImpl.performUnconditionalRewrites", where 
we can wrap "table t" to CURSOR. However, it is possible that a query could use 
"FROM TUMBLE((SELECT ...) ...)", in which the subquery won't fall into the 
"table t" rewrite path.

I don't have a good idea so far to implement this CURSOR fix. [~danny0405] I 
would love to see your opnion when you get a chance to look at this problem.

> Remove the first operand of RexCall from SqlWindowTableFunction
> ---
>
> Key: CALCITE-3955
> URL: https://issues.apache.org/jira/browse/CALCITE-3955
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.22.0
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CALCITE-3382, we introduced TUMBLE window function to replace the 
> deprecated group tumble window.
> But for query
> {code:sql}
> select *
> from table(tumble(table Shipments, descriptor(rowtime), INTERVAL '1' MINUTE))
> {code}
> the outputs plan is
> {code:xml}
> LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], window_end=[$3])
>   LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), 
> 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) 
> ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)])
> LogicalProject(ORDERID=[$0], ROWTIME=[$1])
>   LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]])
> {code}
> The first operand of TUMBLE rex call is always the last input field, but 
> actually it represents the source table which is the input rel node.
> This issue remove the first operand from the RexCall because it is useless 
> and confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CALCITE-3962) Make JSON_VALUE operands varadic

2020-04-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved CALCITE-3962.
-
Resolution: Fixed

Fixed in 
[f1c0756|https://github.com/apache/calcite/commit/f1c0756f33b79904ca3a429bbff79ebf0103ece9]
 !

> Make JSON_VALUE operands varadic
> 
>
> Key: CALCITE-3962
> URL: https://issues.apache.org/jira/browse/CALCITE-3962
> Project: Calcite
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 1.21.0, 1.22.0
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.23.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)