Re: Columns used in query
Hi Gaurav, You may find this useful: https://lists.apache.org/thread.html/rc38d12de860f64e8e2926b7bcf6f2e9e6bcbfbc7ca5e886a71323b47%40%3Cdev.calcite.apache.org%3E On 2020/06/04 01:41:41, Gaurav Sehgal wrote: > Hi, > > I've a use case where i want to find all the columns, and their respective > tables accessed in a sql query. I'm able to find all the table names by > converting the sql query to RelNode and then visiting the TableScan node > but not sure what would be the right approach to find all the column names. > > Can someone help me with this? > > > Thanks > Gaurav Sehgal >
Columns used in query
Hi, I've a use case where i want to find all the columns, and their respective tables accessed in a sql query. I'm able to find all the table names by converting the sql query to RelNode and then visiting the TableScan node but not sure what would be the right approach to find all the column names. Can someone help me with this? Thanks Gaurav Sehgal
Calcite-Master - Build # 1773 - Failure
The Apache Jenkins build system has built Calcite-Master (build #1773) Status: Failure Check console output at https://builds.apache.org/job/Calcite-Master/1773/ to view the results.
Re: Avatica results mapping
This problem keeps on coming up. Avatica doesn't handle extended data types very well. I think we hard-code support for everything in java.sql.Types but then we hit the wall when things are "OTHER". There are also types that are widely used that are not in java.sql.Types - e.g. GEOMETRY. They may or may not have an accepted integer type id. I would love to find a robust solution. Especially to be able to add a new type on the server without modifying the wire protocol or the Avatica client. We should make more use of JDBC type maps, e.g. "ResultSet.getObject(int, Map)" [1] Julian [1] https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html#getObject-int-java.util.Map- On Wed, Jun 3, 2020 at 10:44 AM Jon Pither wrote: > > Hi, > > I have a data-type I'm mapping as SqlTypeName/OTHER - in this case > java.util.UUID. When I return this field from an enumerator via an adapter > it works OK - the UUID is passed through into the JDBC result set. > > Over the wire though, Avatica throws an error " Remote driver error: > RuntimeException: Unhandled type in Frame: class java.util.UUID". > > I'm contemplating what to do here - is it possible to convert the UUID to a > string in the case of 'on the wire' for avatica only? But then pass through > the UUID for straight through Calcite? > > Regards, > > Jon.
Avatica results mapping
Hi, I have a data-type I'm mapping as SqlTypeName/OTHER - in this case java.util.UUID. When I return this field from an enumerator via an adapter it works OK - the UUID is passed through into the JDBC result set. Over the wire though, Avatica throws an error " Remote driver error: RuntimeException: Unhandled type in Frame: class java.util.UUID". I'm contemplating what to do here - is it possible to convert the UUID to a string in the case of 'on the wire' for avatica only? But then pass through the UUID for straight through Calcite? Regards, Jon.
[jira] [Created] (CALCITE-4040) Nonstandard SqlAggFunction rollup NPE Exception
xzh_dz created CALCITE-4040: --- Summary: Nonstandard SqlAggFunction rollup NPE Exception Key: CALCITE-4040 URL: https://issues.apache.org/jira/browse/CALCITE-4040 Project: Calcite Issue Type: Wish Reporter: xzh_dz When i try to rollup some SqlAggFunctions in my project,I find something wrong. A case can be reproduced as below: MaterializationTest: {code:java} @Test public void testSqlAggFunctionRollup() { checkNoMaterialize( "select \"empid\", stddev_pop(\"deptno\") from \"emps\" group by \"empid\", \"deptno\"", "select \"empid\", stddev_pop(\"deptno\") from \"emps\" group by \"empid\"", HR_FKUK_MODEL); } {code} If sql is not materialized, it should not throw an exception. Exception: {code:java} java.sql.SQLException: Error while executing SQL "explain plan for select "empid", stddev_pop("deptno") from "emps" group by "empid"": null at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) at org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:528) Caused by: java.lang.NullPointerException at java.util.Objects.requireNonNull(Objects.java:203) at org.apache.calcite.rel.core.AggregateCall.(AggregateCall.java:98) at org.apache.calcite.rel.core.AggregateCall.create(AggregateCall.java:198) at org.apache.calcite.plan.SubstitutionVisitor.unifyAggregates(SubstitutionVisitor.java:1854) at org.apache.calcite.plan.SubstitutionVisitor$AggregateToAggregateUnifyRule.apply(SubstitutionVisitor.java:1545) at org.apache.calcite.plan.SubstitutionVisitor.go(SubstitutionVisitor.java:544) at org.apache.calcite.plan.SubstitutionVisitor.go(SubstitutionVisitor.java:478) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Using indexes rather than table scans with Calcite
Hi Haisheng, > If I didn't get it wrong, joining with scan_A means it has to read all the > tuples from table A, this will make index meaningless, because the purpose of > using index in that example is to avoid reading all the tuples from table A. I meant 'rowId' as a physical address (ctid in PG or rowId in Oracle). We can combine those rowIds in some way (by analogy with bitmap index) and then use output rowids to fetch only those rows which we actually need. Logically it is equivalent to join and can be implemented as a nested loop join: we can iterate over the resulted rowids and fetch corresponding rows from the row store. > Perhaps you have different definition of Join, but the Join operator in > Calcite doesn't have correlation variable, only Correlate operator has, which > is Calcite's version of Apply. You still need the rule to transform Join to > Correlate, aka, Apply. In Calcite, the rule is JoinToCorrelateRule. However I > don't recommend using JoinToCorrelateRule, because it will transform Join to > Correlate unconditionally, in case of multi-way joins with join reordering > enabled, it will generate a lot of useless alternatives, unless you want to > support multi-level correlation, like: We are going to use Correlate, I just skipped this detail in the previous mail. You are right that it can inflate the search space. May be we'll need to think of some pruning techniques later if it become a problem. As I can see, this approach has some advantages because it seems more flexible to me. In addition to the ability you mentioned to use multi-level correlated join, it provides some kind of freedom because we are not restricted to the particular pattern of Join2IndexApply rule. In other words: we need to match Join2IndexApply rule on some concrete pattern like Join RelNode TableScan But what if there is a filter or project between join and scan? We can add specific patterns for this rule Join RelNode Project TableScan Join RelNode Filter TableScan but still it's not enough to cover all possible cases. What if there are some other RelNodes like aggregate there? We cannot foresee all possible cases. With Correlation approach we just try to push down the filter with correlation varible as much as possible. If it is possible to this filter to reach the IndexScan, it becomes a very efficient index join with low cost. Otherwise optimizer throws this alternative away. It can be helpful in some queries like SELECT * FROM deps d JOIN (SELECT depId, COOUNT(*) FROM emps GROUP BY depId) AS e ON d.depId = e.depId WHERE d.name = "AAA" if table deps is relatively small (affter filtering) and emps is big it will be more efficient to do correlated nested loop join: CorrelatedJoin(d.depId = e.depId) Filter(d.name = "AAA") TableScan(deps) Agg(depId, COOUNT(*)) IndexScan(emps, lookup: e.depId = d.depId[correlatedVar]) because in this case we don't have to do a full aggregation over the 'emps' table before doing a join: HashJoin(d.depId = e.depId) Filter(d.name = "AAA") TableScan(deps) Agg(depId, COOUNT(*)) <- full aggregation TableScan(emps) So, tradeoff between rule-based approach and MV approach is classic: space search inflation vs more plan alternatives. Thank you! -- Kind Regards Roman Kondakov On 03.06.2020 05:42, Haisheng Yuan wrote: >> using this approach, bitmap indexes may be represented as set operations >> (union, intersect) over idx_a and idx_b followed by joining with scan_A >> using 'rowId'. > > If I didn't get it wrong, joining with scan_A means it has to read all the > tuples from table A, this will make index meaningless, because the purpose of > using index in that example is to avoid reading all the tuples from table A. > >> and it looks like it can be implemented without special >> Join2IndexApplyRule: we'll use correlation variable from join condition >> as filter condition. > > Perhaps you have different definition of Join, but the Join operator in > Calcite doesn't have correlation variable, only Correlate operator has, which > is Calcite's version of Apply. You still need the rule to transform Join to > Correlate, aka, Apply. In Calcite, the rule is JoinToCorrelateRule. However I > don't recommend using JoinToCorrelateRule, because it will transform Join to > Correlate unconditionally, in case of multi-way joins with join reordering > enabled, it will generate a lot of useless alternatives, unless you want to > support multi-level correlation, like: > > NestedLoopOuterJoin > -> Table Scan on A > -> HashOuterJoin > Join Condition: B.Y = C.W > -> Table Scan on B > -> Index Scan using C_Z_IDX on C > Index Condition: C.Z = A.X > > or > > NestedLoopOuterJoin > -> Table Scan on SmallTable1 A > -> NestedLoopOuterJoin > -> Table Scan on SmallTable2 B > -> Index Scan using XYIndex on LargeTable C >