Re: Columns used in query

2020-06-03 Thread Haisheng Yuan
Hi Gaurav,

You may find this useful:
https://lists.apache.org/thread.html/rc38d12de860f64e8e2926b7bcf6f2e9e6bcbfbc7ca5e886a71323b47%40%3Cdev.calcite.apache.org%3E

On 2020/06/04 01:41:41, Gaurav Sehgal  wrote: 
> Hi,
> 
> I've a use case where i want to find all the columns, and their respective
> tables accessed in a sql query. I'm able to find all the table names by
> converting the sql query to RelNode and then visiting the TableScan node
> but not sure what would be the right approach to find all the column names.
> 
> Can someone help me with this?
> 
> 
> Thanks
> Gaurav Sehgal
> 


Columns used in query

2020-06-03 Thread Gaurav Sehgal
Hi,

I've a use case where i want to find all the columns, and their respective
tables accessed in a sql query. I'm able to find all the table names by
converting the sql query to RelNode and then visiting the TableScan node
but not sure what would be the right approach to find all the column names.

Can someone help me with this?


Thanks
Gaurav Sehgal


Calcite-Master - Build # 1773 - Failure

2020-06-03 Thread Apache Jenkins Server
The Apache Jenkins build system has built Calcite-Master (build #1773)

Status: Failure

Check console output at https://builds.apache.org/job/Calcite-Master/1773/ to 
view the results.

Re: Avatica results mapping

2020-06-03 Thread Julian Hyde
This problem keeps on coming up. Avatica doesn't handle extended data
types very well. I think we hard-code support for everything in
java.sql.Types but then we hit the wall when things are "OTHER".

There are also types that are widely used that are not in
java.sql.Types - e.g. GEOMETRY. They may or may not have an accepted
integer type id.

I would love to find a robust solution. Especially to be able to add a
new type on the server without modifying the wire protocol or the
Avatica client.

We should make more use of JDBC type maps, e.g.
"ResultSet.getObject(int, Map)" [1]

Julian

[1] 
https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html#getObject-int-java.util.Map-

On Wed, Jun 3, 2020 at 10:44 AM Jon Pither  wrote:
>
> Hi,
>
> I have a data-type I'm mapping as SqlTypeName/OTHER - in this case
> java.util.UUID. When I return this field from an enumerator via an adapter
> it works OK - the UUID is passed through into the JDBC result set.
>
> Over the wire though, Avatica throws an error " Remote driver error:
> RuntimeException: Unhandled type in Frame: class java.util.UUID".
>
> I'm contemplating what to do here - is it possible to convert the UUID to a
> string in the case of 'on the wire' for avatica only? But then pass through
> the UUID for straight through Calcite?
>
> Regards,
>
> Jon.


Avatica results mapping

2020-06-03 Thread Jon Pither
Hi,

I have a data-type I'm mapping as SqlTypeName/OTHER - in this case
java.util.UUID. When I return this field from an enumerator via an adapter
it works OK - the UUID is passed through into the JDBC result set.

Over the wire though, Avatica throws an error " Remote driver error:
RuntimeException: Unhandled type in Frame: class java.util.UUID".

I'm contemplating what to do here - is it possible to convert the UUID to a
string in the case of 'on the wire' for avatica only? But then pass through
the UUID for straight through Calcite?

Regards,

Jon.


[jira] [Created] (CALCITE-4040) Nonstandard SqlAggFunction rollup NPE Exception

2020-06-03 Thread xzh_dz (Jira)
xzh_dz created CALCITE-4040:
---

 Summary: Nonstandard SqlAggFunction rollup NPE Exception
 Key: CALCITE-4040
 URL: https://issues.apache.org/jira/browse/CALCITE-4040
 Project: Calcite
  Issue Type: Wish
Reporter: xzh_dz


When i try to rollup some SqlAggFunctions in my project,I find something wrong.
A case can be reproduced as below:
MaterializationTest:
{code:java}
@Test public void testSqlAggFunctionRollup() {
checkNoMaterialize(
"select \"empid\", stddev_pop(\"deptno\") from \"emps\" group by 
\"empid\", \"deptno\"",
"select \"empid\", stddev_pop(\"deptno\") from \"emps\" group by 
\"empid\"",
HR_FKUK_MODEL);
  }
{code}
 If sql is not materialized, it should not throw an exception.
Exception:
{code:java}
java.sql.SQLException: Error while executing SQL "explain plan for select 
"empid", stddev_pop("deptno") from "emps" group by "empid"": null

at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227)
at 
org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:528)
Caused by: java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at 
org.apache.calcite.rel.core.AggregateCall.(AggregateCall.java:98)
at 
org.apache.calcite.rel.core.AggregateCall.create(AggregateCall.java:198)
at 
org.apache.calcite.plan.SubstitutionVisitor.unifyAggregates(SubstitutionVisitor.java:1854)
at 
org.apache.calcite.plan.SubstitutionVisitor$AggregateToAggregateUnifyRule.apply(SubstitutionVisitor.java:1545)
at 
org.apache.calcite.plan.SubstitutionVisitor.go(SubstitutionVisitor.java:544)
at 
org.apache.calcite.plan.SubstitutionVisitor.go(SubstitutionVisitor.java:478)

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Using indexes rather than table scans with Calcite

2020-06-03 Thread Roman Kondakov
Hi Haisheng,

> If I didn't get it wrong, joining with scan_A means it has to read all the 
> tuples from table A, this will make index meaningless, because the purpose of 
> using index in that example is to avoid reading all the tuples from table A.

I meant 'rowId' as a physical address (ctid in PG or rowId in Oracle).
We can combine those rowIds in some way (by analogy with bitmap index)
and then use output rowids to fetch only those rows which we actually
need. Logically it is equivalent to join and can be implemented as a
nested loop join: we can iterate over the resulted rowids and fetch
corresponding rows from the row store.

> Perhaps you have different definition of Join, but the Join operator in 
> Calcite doesn't have correlation variable, only Correlate operator has, which 
> is Calcite's version of Apply. You still need the rule to transform Join to 
> Correlate, aka, Apply. In Calcite, the rule is JoinToCorrelateRule. However I 
> don't recommend using JoinToCorrelateRule, because it will transform Join to 
> Correlate unconditionally, in case of multi-way joins with join reordering 
> enabled, it will generate a lot of useless alternatives, unless you want to 
> support multi-level correlation, like:

We are going to use Correlate, I just skipped this detail in the
previous mail. You are right that it can inflate the search space. May
be we'll need to think of some pruning techniques later if it become a
problem. As I can see, this approach has some advantages because it
seems more flexible to me. In addition to the ability you mentioned to
use multi-level correlated join, it provides some kind of freedom
because we are not restricted to the particular pattern of
Join2IndexApply rule. In other words: we need to match Join2IndexApply
rule on some concrete pattern like

Join
  RelNode
  TableScan

But what if there is a filter or project between join and scan? We can
add specific patterns for this rule

Join
  RelNode
  Project
TableScan

Join
  RelNode
  Filter
TableScan

but still it's not enough to cover all possible cases. What if there are
some other RelNodes like aggregate there? We cannot foresee all possible
cases. With Correlation approach we just try to push down the filter
with correlation varible as much as possible. If it is possible to this
filter to reach the IndexScan, it becomes a very efficient index join
with low cost. Otherwise optimizer throws this alternative away. It can
be helpful in some queries like

SELECT * FROM
deps d
JOIN
(SELECT depId, COOUNT(*) FROM emps GROUP BY depId) AS e
ON d.depId = e.depId
WHERE d.name = "AAA"

if table deps is relatively small (affter filtering) and emps is big it
will be more efficient to do correlated nested loop join:

CorrelatedJoin(d.depId = e.depId)
  Filter(d.name = "AAA")
TableScan(deps)
  Agg(depId, COOUNT(*))
   IndexScan(emps, lookup: e.depId = d.depId[correlatedVar])

because in this case we don't have to do a full aggregation over the
'emps' table before doing a join:

HashJoin(d.depId = e.depId)
  Filter(d.name = "AAA")
TableScan(deps)
  Agg(depId, COOUNT(*)) <- full aggregation
   TableScan(emps)

So, tradeoff between rule-based approach and MV approach is classic:
space search inflation vs more plan alternatives.

Thank you!

-- 
Kind Regards
Roman Kondakov


On 03.06.2020 05:42, Haisheng Yuan wrote:
>> using this approach, bitmap indexes may be represented as set operations
>> (union, intersect) over idx_a and idx_b followed by joining with scan_A
>> using 'rowId'.
> 
> If I didn't get it wrong, joining with scan_A means it has to read all the 
> tuples from table A, this will make index meaningless, because the purpose of 
> using index in that example is to avoid reading all the tuples from table A.
> 
>> and it looks like it can be implemented without special
>> Join2IndexApplyRule: we'll use correlation variable from join condition
>> as filter condition.
> 
> Perhaps you have different definition of Join, but the Join operator in 
> Calcite doesn't have correlation variable, only Correlate operator has, which 
> is Calcite's version of Apply. You still need the rule to transform Join to 
> Correlate, aka, Apply. In Calcite, the rule is JoinToCorrelateRule. However I 
> don't recommend using JoinToCorrelateRule, because it will transform Join to 
> Correlate unconditionally, in case of multi-way joins with join reordering 
> enabled, it will generate a lot of useless alternatives, unless you want to 
> support multi-level correlation, like:
> 
> NestedLoopOuterJoin
> -> Table Scan on A
> ->  HashOuterJoin
>  Join Condition: B.Y = C.W
> -> Table Scan on B
> -> Index Scan using C_Z_IDX on C
>  Index Condition: C.Z = A.X
> 
> or
> 
> NestedLoopOuterJoin
> -> Table Scan on SmallTable1 A
> -> NestedLoopOuterJoin
> -> Table Scan on SmallTable2 B
> -> Index Scan using XYIndex on LargeTable C
>