[jira] [Commented] (CALCITE-3923) Refactor how planner rules are parameterized
[ https://issues.apache.org/jira/browse/CALCITE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097081#comment-17097081 ] Haisheng Yuan commented on CALCITE-3923: topProject convention should be always same with botttomProject convention. So the check is not needed. > Refactor how planner rules are parameterized > > > Key: CALCITE-3923 > URL: https://issues.apache.org/jira/browse/CALCITE-3923 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > > People often want different variants of planner rules. An example is > {{FilterJoinRule}}, which has a 'boolean smart’ parameter, a predicate (which > returns whether to pull up filter conditions), operands (which determine the > precise sub-classes of {{RelNode}} that the rule should match) and a > {{RelBuilderFactory}} (which controls the type of {{RelNode}} created by this > rule). > Suppose you have an instance of {{FilterJoinRule}} and you want to change > {{smart}} from true to false. The {{smart}} parameter is immutable (good!) > but you can’t easily create a clone of the rule because you don’t know the > values of the other parameters. Your instance might even be (unbeknownst to > you) a sub-class with extra parameters and a private constructor. > So, my proposal is to put all of the config information of a {{RelOptRule}} > into a single {{config}} parameter that contains all relevant properties. > Each sub-class of {{RelOptRule}} would have one constructor with just a > ‘config’ parameter. Each config knows which sub-class of {{RelOptRule}} to > create. Therefore it is easy to copy a config, change one or more properties, > and create a new rule instance. > Adding a property to a rule’s config does not require us to add or deprecate > any constructors. > The operands are part of the config, so if you have a rule that matches a > {{EnumerableFilter}} on an {{EnumerableJoin}} and you want to make it match > an {{EnumerableFilter}} on an {{EnumerableNestedLoopJoin}}, you can easily > create one with one changed operand. > The config is immutable and self-describing, so we can use it to > automatically generate a unique description for each rule instance. > (See the email thread [[DISCUSS] Refactor how planner rules are > parameterized|https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E].) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097079#comment-17097079 ] Laurent Goujon commented on CALCITE-3965: - I agree with the unnecessary xml writing, but adding a synchronized keyword when it is not necessary (which is my analysis, and maybe I wrong here, so I would appreciate a second pair of eyes) is also a cause for lock contention. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097077#comment-17097077 ] Julian Hyde commented on CALCITE-3965: -- I agree, very likely a duplicate of 3517. Please fix 3517, CPU use will go way down, and lock contention will reduce. Lock contention is just a symptom, not the cause. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3923) Refactor how planner rules are parameterized
[ https://issues.apache.org/jira/browse/CALCITE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097074#comment-17097074 ] Xiening Dai commented on CALCITE-3923: -- I am thinking about this. Once we have the RelBuilder change in place, we could do such in ProjectMerge rule for example - {code:java} RelBuilder relBuilder = call.builder(); if (topProject.getConvention() == bottomProject.getConvention()) { relBuilder = topProject.getConvention().transformRelBuilder(relBuilder); } {code} Based on my test, a simple change like this will reduce 50% project merge rule firings for an N-way join query. But I don't like the additional check here, maybe we should provide "target convention" as a config parameter? I understand this might not necessarily be part of your change. But we might need to add that to config some point in the future. > Refactor how planner rules are parameterized > > > Key: CALCITE-3923 > URL: https://issues.apache.org/jira/browse/CALCITE-3923 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > > People often want different variants of planner rules. An example is > {{FilterJoinRule}}, which has a 'boolean smart’ parameter, a predicate (which > returns whether to pull up filter conditions), operands (which determine the > precise sub-classes of {{RelNode}} that the rule should match) and a > {{RelBuilderFactory}} (which controls the type of {{RelNode}} created by this > rule). > Suppose you have an instance of {{FilterJoinRule}} and you want to change > {{smart}} from true to false. The {{smart}} parameter is immutable (good!) > but you can’t easily create a clone of the rule because you don’t know the > values of the other parameters. Your instance might even be (unbeknownst to > you) a sub-class with extra parameters and a private constructor. > So, my proposal is to put all of the config information of a {{RelOptRule}} > into a single {{config}} parameter that contains all relevant properties. > Each sub-class of {{RelOptRule}} would have one constructor with just a > ‘config’ parameter. Each config knows which sub-class of {{RelOptRule}} to > create. Therefore it is easy to copy a config, change one or more properties, > and create a new rule instance. > Adding a property to a rule’s config does not require us to add or deprecate > any constructors. > The operands are part of the config, so if you have a rule that matches a > {{EnumerableFilter}} on an {{EnumerableJoin}} and you want to make it match > an {{EnumerableFilter}} on an {{EnumerableNestedLoopJoin}}, you can easily > create one with one changed operand. > The config is immutable and self-describing, so we can use it to > automatically generate a unique description for each rule instance. > (See the email thread [[DISCUSS] Refactor how planner rules are > parameterized|https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E].) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097072#comment-17097072 ] Haisheng Yuan commented on CALCITE-3965: Is it duplicate with CALCITE-3517? > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097071#comment-17097071 ] Laurent Goujon commented on CALCITE-3965: - This is somehow related to CALCITE-3517, but after a look at the code, {{DiffRepository#expand}} do not do write to disk, and the main issue is really contention around the instance lock. The method is synchronized, but most operations look thread-safe. They are some calls to get and set (which are also synchronized), but it doesn't look like they need to done "atomically". Removing {{synchronized}} on the expand() method results in the build completing in 2min30s with no test failures. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3965) Excessive time waiting on DiffRepository lock
Laurent Goujon created CALCITE-3965: --- Summary: Excessive time waiting on DiffRepository lock Key: CALCITE-3965 URL: https://issues.apache.org/jira/browse/CALCITE-3965 Project: Calcite Issue Type: Bug Components: core Reporter: Laurent Goujon Assignee: Laurent Goujon When running the whole test suite from commandline, tests are parallelized and gradle/junit tries to use as many cores as possible (16 on my machine). But the tests take a very long time, approximatevely 90minutes on my machine, and several of them failed because they took too long to complete. Using jstack to look at the threads state while tests are running show that most of them are waiting on {{DiffRepository}} methods ({{DiffRepository#expand}} in most cases) while one of the thread obtained the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2970) Performance issue when enabling abstract converter for EnumerableConvention
[ https://issues.apache.org/jira/browse/CALCITE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097033#comment-17097033 ] Julian Hyde commented on CALCITE-2970: -- I take some of that back. I see that {{RelBuilder}} has a method {{RelBuilder transform(UnaryOperator transform)}} that creates a copy of a {{RelBuilder}} with the same state (including context, cluster) but a different config. So I think that {{UnaryOperator transform}} would be viable. > Performance issue when enabling abstract converter for EnumerableConvention > --- > > Key: CALCITE-2970 > URL: https://issues.apache.org/jira/browse/CALCITE-2970 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 18h 20m > Remaining Estimate: 0h > > If we enable the use of abstract converter for {{EnumerableConvention}}, by > making {{useAbstractConvertersForConversion}} return true, > {{JDBCTest.testJoinManyWay}} will not complete. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097015#comment-17097015 ] Xiening Dai commented on CALCITE-3963: -- {quote}We shouldn't rely on the first rel or subset's best.{quote} To further explain this, consider this example. We have a simple join case which has two alternatives. Plan A: {code:java} HashJoin TableScanA TableScanB {code} Plan B: {code:java} MergeJoin Sort TableScanA Sort TableScanB {code} Assuming the self cost of hash join and merge join are similar, then plan A is better since it doesn't incur sorting. But because these two join nodes have different input subset, the input row counts are decided by each subset's best node. If for some reason, we report a smaller row count in plan B's Sort subset (in this simple example it shouldn't, but it's possible in real world when input is much more complex), we could end up picking plan B as its overall cost is lower. We've seen issues like this before. > Maintains logical properties at RelSet (equivalent group) instead of RelNode > > > Key: CALCITE-3963 > URL: https://issues.apache.org/jira/browse/CALCITE-3963 > Project: Calcite > Issue Type: Bug >Reporter: Xiening Dai >Assignee: Xiening Dai >Priority: Major > > Currently the logical properties (such as row count, distinct row count, etc) > are maintained at RelNode level. This creates a number of meta data > consistency problems, e.g. CALCITE-1048, CALCITE-2166. > In theory, all RelNodes in a RelSet should share the same logical properties > per definition of relational equivalence. So it makes more sense to keep > logical properties at RelSet level, rather than the RelNode. And such > properties shouldn't change when new sub set is created or subset's best is > changed. > Specifically I think below build in metadata should fall into the logical > properties category - > Selectivity > UniqueKeys > ColumnUniqueness > RowCount > MaxRowCount > MinRowCount > DistinctRowCount > Size (averageRowSize, averageColumnSize) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3955) Remove the first operand of RexCall from SqlWindowTableFunction
[ https://issues.apache.org/jira/browse/CALCITE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097014#comment-17097014 ] Julian Hyde commented on CALCITE-3955: -- It would help a lot if {{SqlGroupedWindowFunction}} extended {{SqlUserDefinedTableFunction}}. (I should have said this when I was reviewing CALCITE-3382.) Or, if they both implemented a new {{interface SqlTableFunction}}. Note how [SqlToRelConverter asks a table function for its element type|https://github.com/apache/calcite/blob/f1c0756f33b79904ca3a429bbff79ebf0103ece9/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2496-L2501]; moving to {{SqlTableFunction}} would help with polymorphic table functions. Currently there are two places in {{SqlToRelConverter}} that add to {{bb.cursors}}. If we added {{interface SqlTableFunction}} we could get that down to one, which would be awesome. Run {{SqlToRelConverterTest.testCollectionTableWithCursorParam}} in a debugger and see how {{SqlToRelConverter.convertCursor}} gets called. The group window table functions should use the same code path. > Remove the first operand of RexCall from SqlWindowTableFunction > --- > > Key: CALCITE-3955 > URL: https://issues.apache.org/jira/browse/CALCITE-3955 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.22.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 1.23.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In CALCITE-3382, we introduced TUMBLE window function to replace the > deprecated group tumble window. > But for query > {code:sql} > select * > from table(tumble(table Shipments, descriptor(rowtime), INTERVAL '1' MINUTE)) > {code} > the outputs plan is > {code:xml} > LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], window_end=[$3]) > LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), > 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) > ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)]) > LogicalProject(ORDERID=[$0], ROWTIME=[$1]) > LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]]) > {code} > The first operand of TUMBLE rex call is always the last input field, but > actually it represents the source table which is the input rel node. > This issue remove the first operand from the RexCall because it is useless > and confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-2270) [SQL:2016] Polymorphic table functions
[ https://issues.apache.org/jira/browse/CALCITE-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated CALCITE-2270: -- Issue Type: New Feature (was: Bug) > [SQL:2016] Polymorphic table functions > -- > > Key: CALCITE-2270 > URL: https://issues.apache.org/jira/browse/CALCITE-2270 > Project: Calcite > Issue Type: New Feature >Reporter: Chunhui Shi >Assignee: Rui Wang >Priority: Major > > Polymorphic table functions: table functions without predefined return type -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097006#comment-17097006 ] Xiening Dai commented on CALCITE-3963: -- What I mean by "maintain" is more about associating these properties with RelSet rather than RelNode. They can still store in meta data cache somehow, which would be an implementation detail. But conceptually they should belong to RelSet. For example, when calculate row count for a RelSubset, the logic today is to use row count of subset.best, and if best is not available, we use the row count of the first rel in the set. The logic is flawed in my opinion. Essentially the row count should be consistent across the entire set, and only changes when a new logical node is added to the set, or the set gets merged. We shouldn't rely on the first rel or subset's best. One of the clear benefits, which Haisheng already mentioned, is to save a large amount of cache memory and avoid unnecessary re-calculation. But more importantly we plug this hole in the conceptual design. In terms of how we derive logical properties for the set, I think in a lot of cases, we don't "aggregate" inputs from the nodes, but more likely we choose the most convincing, or promising, node to report this stat. In the "unique keys" example you mentioned, do you have a real world case where RelNodes within one set have different unique keys? > Maintains logical properties at RelSet (equivalent group) instead of RelNode > > > Key: CALCITE-3963 > URL: https://issues.apache.org/jira/browse/CALCITE-3963 > Project: Calcite > Issue Type: Bug >Reporter: Xiening Dai >Assignee: Xiening Dai >Priority: Major > > Currently the logical properties (such as row count, distinct row count, etc) > are maintained at RelNode level. This creates a number of meta data > consistency problems, e.g. CALCITE-1048, CALCITE-2166. > In theory, all RelNodes in a RelSet should share the same logical properties > per definition of relational equivalence. So it makes more sense to keep > logical properties at RelSet level, rather than the RelNode. And such > properties shouldn't change when new sub set is created or subset's best is > changed. > Specifically I think below build in metadata should fall into the logical > properties category - > Selectivity > UniqueKeys > ColumnUniqueness > RowCount > MaxRowCount > MinRowCount > DistinctRowCount > Size (averageRowSize, averageColumnSize) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2270) [SQL:2016] Polymorphic table functions
[ https://issues.apache.org/jira/browse/CALCITE-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096990#comment-17096990 ] Julian Hyde commented on CALCITE-2270: -- Can you add more details how you would support polymorphic table functions? Is the return type defined using some rule based on the types of the input columns? In Calcite we have many functions where the return type is defined using code (e.g. {{SqlOperator.inferReturnType}}); it is more powerful, requires Java code rather than SQL, but is probably less effort to implement than a rule-derived return type. > [SQL:2016] Polymorphic table functions > -- > > Key: CALCITE-2270 > URL: https://issues.apache.org/jira/browse/CALCITE-2270 > Project: Calcite > Issue Type: Bug >Reporter: Chunhui Shi >Assignee: Rui Wang >Priority: Major > > Polymorphic table functions: table functions without predefined return type -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3964) Support in DESCRIPTOR operator
[ https://issues.apache.org/jira/browse/CALCITE-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096987#comment-17096987 ] Julian Hyde commented on CALCITE-3964: -- The above description is a bit abstract; the SQL standard folks hate including examples (because they are "non-normative") but a SQL example would help me understand. > Support in DESCRIPTOR operator > --- > > Key: CALCITE-3964 > URL: https://issues.apache.org/jira/browse/CALCITE-3964 > Project: Calcite > Issue Type: Sub-task >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > > DESCRIPTOR can include an optional data type, which will enable type checking > in SQL validator. It is useful for streaming windowing: the windowing is > required to be applied on TIMESTAMP type, and we can rely on descriptor to do > type validation. > *The following is copied from SQL standard 2016*: > 8.15 > {code:java} > ::= > > ::= > DESCRIPTOR > ::= > > [ { }... ] > ::= >[ ] > {code} > A is the keyword DESCRIPTOR followed by a parenthesized > list of column names; each column name may optionally have a data type. If > every column name has a data type, then the descriptor describes a row type. > In the examples, CSVreader and Pivot use descriptor arguments that are just > lists of column names; ExecR is an example that uses a descriptor to pass a > complete row type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096985#comment-17096985 ] Haisheng Yuan commented on CALCITE-3963: As long as all the alternatives in a RelSet share the same logical properties, we don't care where the logical properties are stored. I am afraid the 'fold' operator will make things complicated. What about cardinality and selectivity? We may just end up with choosing one blindly. It doesn't seem right that we use alternative 1's cardinality info, use alternative 2's selectivity info, and use all the alternatives' unique keys ... Admittedly, each alternative's stats may vary a lot, one of the reason is that Calcite believes all the simplification should be done in VolcanoPlanner and selected based on cost, while other systems like Sql Server and Greenplum do all the simplification like constant folding, join simplification, predicate push-down before the logical plan goes into the MEMO. One of the reason to share logical properties between alternatives in a group is that it becomes possible (in the future) to do early decision to stop exploring this group. If we use the 'fold' operator to decide the group's logical properties, when is it good time to decide? Option 1: whenever there is a new alternative, recomputing the logical properties. That may be not better than just storing logical properties for each relnode. Option 2: roll it up after all the logical alternatives are generated. But there is no logical / physical difference, we don't know it is logical operator or not. Judging by convention is not perfect, because systems like Flink, Drill, Ignite define their own logical convention. There is no logical rule and physical rule difference either, they are matched and applied at the same stage. Physical rules can even generate logical operators, like ProjectMergeRule, will these generated logical operators be counted? Another reason to share logical properties is to avoid redundant computation. For example, {code:java} SELECT a,b,c,max(d) FROM foo GROUP BY a,b,c; HashAggregate +-- TableScan {code} In distributed system, suppose we generate HashAgg with distribution alternatives of all the 8 key combinations. In SQL Server, there is only 1 physical operator HashAgg, but in Calcite, there are 8 HashAgg operators, the same HashAgg with different traitset. We will get another 8 exchange operators (in Calcite 1.22 and before, there were more than 50 exchange operators), we need to compute the logical properties for all the HashAgg and Exchange operators, even the result is cached in metadata system, but these operators are just throwing money that are left on the table by LogicalAggregate operator. > Maintains logical properties at RelSet (equivalent group) instead of RelNode > > > Key: CALCITE-3963 > URL: https://issues.apache.org/jira/browse/CALCITE-3963 > Project: Calcite > Issue Type: Bug >Reporter: Xiening Dai >Assignee: Xiening Dai >Priority: Major > > Currently the logical properties (such as row count, distinct row count, etc) > are maintained at RelNode level. This creates a number of meta data > consistency problems, e.g. CALCITE-1048, CALCITE-2166. > In theory, all RelNodes in a RelSet should share the same logical properties > per definition of relational equivalence. So it makes more sense to keep > logical properties at RelSet level, rather than the RelNode. And such > properties shouldn't change when new sub set is created or subset's best is > changed. > Specifically I think below build in metadata should fall into the logical > properties category - > Selectivity > UniqueKeys > ColumnUniqueness > RowCount > MaxRowCount > MinRowCount > DistinctRowCount > Size (averageRowSize, averageColumnSize) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3923) Refactor how planner rules are parameterized
[ https://issues.apache.org/jira/browse/CALCITE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096904#comment-17096904 ] Julian Hyde commented on CALCITE-3923: -- The single convention rule idea is worth pursuing. It is one possible solution to the "RelNode.copy problem". (See also CALCITE-2064.) But unless solving it completely removes the need to copy rule instances - which I think it won't - this change will still be needed. So, can we please treat it as an orthogonal issue? > Refactor how planner rules are parameterized > > > Key: CALCITE-3923 > URL: https://issues.apache.org/jira/browse/CALCITE-3923 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > > People often want different variants of planner rules. An example is > {{FilterJoinRule}}, which has a 'boolean smart’ parameter, a predicate (which > returns whether to pull up filter conditions), operands (which determine the > precise sub-classes of {{RelNode}} that the rule should match) and a > {{RelBuilderFactory}} (which controls the type of {{RelNode}} created by this > rule). > Suppose you have an instance of {{FilterJoinRule}} and you want to change > {{smart}} from true to false. The {{smart}} parameter is immutable (good!) > but you can’t easily create a clone of the rule because you don’t know the > values of the other parameters. Your instance might even be (unbeknownst to > you) a sub-class with extra parameters and a private constructor. > So, my proposal is to put all of the config information of a {{RelOptRule}} > into a single {{config}} parameter that contains all relevant properties. > Each sub-class of {{RelOptRule}} would have one constructor with just a > ‘config’ parameter. Each config knows which sub-class of {{RelOptRule}} to > create. Therefore it is easy to copy a config, change one or more properties, > and create a new rule instance. > Adding a property to a rule’s config does not require us to add or deprecate > any constructors. > The operands are part of the config, so if you have a rule that matches a > {{EnumerableFilter}} on an {{EnumerableJoin}} and you want to make it match > an {{EnumerableFilter}} on an {{EnumerableNestedLoopJoin}}, you can easily > create one with one changed operand. > The config is immutable and self-describing, so we can use it to > automatically generate a unique description for each rule instance. > (See the email thread [[DISCUSS] Refactor how planner rules are > parameterized|https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E].) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2970) Performance issue when enabling abstract converter for EnumerableConvention
[ https://issues.apache.org/jira/browse/CALCITE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096896#comment-17096896 ] Julian Hyde commented on CALCITE-2970: -- The other reason, besides handling sub-classes of {{RelBuilder}}, to use a {{UnaryOperator is that you can start from an existing {{RelBuilder}} that has some properties and state. Other suggestions that have been made - using a {{RelBuilderFactory}} or a {{RelBuilder.Config}} - have cases where they lose information. (E.g. the current {{RelOptCluster}}, which is in the {{RelBuilder}} but not its {{Config}}, or the current {{ViewExpander}}, which comes from the {{Context}} but not the {{Config}}.) > Performance issue when enabling abstract converter for EnumerableConvention > --- > > Key: CALCITE-2970 > URL: https://issues.apache.org/jira/browse/CALCITE-2970 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 18h 20m > Remaining Estimate: 0h > > If we enable the use of abstract converter for {{EnumerableConvention}}, by > making {{useAbstractConvertersForConversion}} return true, > {{JDBCTest.testJoinManyWay}} will not complete. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096889#comment-17096889 ] Julian Hyde commented on CALCITE-3963: -- Minor quibble: in JIRA subject, use the imperative form of the verb ("Maintain") rather than third-person active ("Maintains") When you stay "maintain" do you mean "store"? I'm not sure I agree. The metadata system allows us to derive a property for any {{RelNode}} (e.g. calling {{RelMetadataQuery. getUniqueKeys(RelNode rel, boolean ignoreNulls)}} on a particular {{LogicalProject}}) and it also maintains a cache, so that once derived, the value does not have to be re-computed. So, the metadata system allows us to not worry too much about whether values are stored, which is good. Now, let's suppose that you want to know the unique keys of a particular {{RelSet}} (or {{RelSubSet}} - the reasoning is similar). Unique keys are a logical property, so we should be able to derive the set of unique keys by taking the union of the unique keys of every {{RelNode}} in that set. If you add a {{RelNode}} to a set, or merge sets, then the set may acquire additional unique keys. And those keys may cause changes to unique keys (and other metadata) for any {{RelNode}} that consumes any {{RelNode}} in the set. It's complicated, so we should lean on the metadata system to maintain everything for us. I think we need to add a 'fold' operator to each type of metadata to say how the metadata of the {{RelSet}} is derived from those of the constituent nodes. In the case of {{RelMdUniqueKeys}} the fold operator is 'union'. (In SQL terms, the 'fold' operator would be called a 'roll up', that is, an aggregate function. {{RelMdMinRowCount}} rolls up using {{MAX}}. Et cetera.) As I said earlier, we should not focus on where the {{RelSet}}'s metadata is stored. Let the metadata system worry about that. Focus instead on how the metadata is derived. > Maintains logical properties at RelSet (equivalent group) instead of RelNode > > > Key: CALCITE-3963 > URL: https://issues.apache.org/jira/browse/CALCITE-3963 > Project: Calcite > Issue Type: Bug >Reporter: Xiening Dai >Assignee: Xiening Dai >Priority: Major > > Currently the logical properties (such as row count, distinct row count, etc) > are maintained at RelNode level. This creates a number of meta data > consistency problems, e.g. CALCITE-1048, CALCITE-2166. > In theory, all RelNodes in a RelSet should share the same logical properties > per definition of relational equivalence. So it makes more sense to keep > logical properties at RelSet level, rather than the RelNode. And such > properties shouldn't change when new sub set is created or subset's best is > changed. > Specifically I think below build in metadata should fall into the logical > properties category - > Selectivity > UniqueKeys > ColumnUniqueness > RowCount > MaxRowCount > MinRowCount > DistinctRowCount > Size (averageRowSize, averageColumnSize) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3955) Remove the first operand of RexCall from SqlWindowTableFunction
[ https://issues.apache.org/jira/browse/CALCITE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096268#comment-17096268 ] Danny Chen commented on CALCITE-3955: - After some try i have no idea yet, we may need a way to identify that the function is a window function when do validation rewrite, [~julianhyde] any ideas ? > Remove the first operand of RexCall from SqlWindowTableFunction > --- > > Key: CALCITE-3955 > URL: https://issues.apache.org/jira/browse/CALCITE-3955 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.22.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 1.23.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In CALCITE-3382, we introduced TUMBLE window function to replace the > deprecated group tumble window. > But for query > {code:sql} > select * > from table(tumble(table Shipments, descriptor(rowtime), INTERVAL '1' MINUTE)) > {code} > the outputs plan is > {code:xml} > LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], window_end=[$3]) > LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), > 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) > ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)]) > LogicalProject(ORDERID=[$0], ROWTIME=[$1]) > LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]]) > {code} > The first operand of TUMBLE rex call is always the last input field, but > actually it represents the source table which is the input rel node. > This issue remove the first operand from the RexCall because it is useless > and confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3955) Remove the first operand of RexCall from SqlWindowTableFunction
[ https://issues.apache.org/jira/browse/CALCITE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096189#comment-17096189 ] Rui Wang commented on CALCITE-3955: --- Ok. I have spent some time to investigate how to wrap the first argument into a CURSOR. The main problem I have seen so far is *WHERE* to wrap it as a CURSOR. if we do it in "StandardConvertletTable", it seems too late based on how CURSOR/subquery is registered in Calcite. The best place might be "SqlValidatorImpl.performUnconditionalRewrites", where we can wrap "table t" to CURSOR. However, it is possible that a query could use "FROM TUMBLE((SELECT ...) ...)", in which the subquery won't fall into the "table t" rewrite path. I don't have a good idea so far to implement this CURSOR fix. [~danny0405] I would love to see your opnion when you get a chance to look at this problem. > Remove the first operand of RexCall from SqlWindowTableFunction > --- > > Key: CALCITE-3955 > URL: https://issues.apache.org/jira/browse/CALCITE-3955 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.22.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 1.23.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In CALCITE-3382, we introduced TUMBLE window function to replace the > deprecated group tumble window. > But for query > {code:sql} > select * > from table(tumble(table Shipments, descriptor(rowtime), INTERVAL '1' MINUTE)) > {code} > the outputs plan is > {code:xml} > LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], window_end=[$3]) > LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), > 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) > ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)]) > LogicalProject(ORDERID=[$0], ROWTIME=[$1]) > LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]]) > {code} > The first operand of TUMBLE rex call is always the last input field, but > actually it represents the source table which is the input rel node. > This issue remove the first operand from the RexCall because it is useless > and confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CALCITE-3962) Make JSON_VALUE operands varadic
[ https://issues.apache.org/jira/browse/CALCITE-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved CALCITE-3962. - Resolution: Fixed Fixed in [f1c0756|https://github.com/apache/calcite/commit/f1c0756f33b79904ca3a429bbff79ebf0103ece9] ! > Make JSON_VALUE operands varadic > > > Key: CALCITE-3962 > URL: https://issues.apache.org/jira/browse/CALCITE-3962 > Project: Calcite > Issue Type: Sub-task > Components: core >Affects Versions: 1.21.0, 1.22.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 1.23.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)