[jira] [Created] (CALCITE-1825) Rule to split a Project into expressions that can and cannot be pushed into Druid

2017-06-01 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1825:


 Summary: Rule to split a Project into expressions that can and 
cannot be pushed into Druid
 Key: CALCITE-1825
 URL: https://issues.apache.org/jira/browse/CALCITE-1825
 Project: Calcite
  Issue Type: Bug
  Components: druid
Reporter: Julian Hyde
Assignee: Julian Hyde


Create a rule that can split the expressions in a Project (and optionally also 
a Filter) into pieces that can be pushed down to Druid and pieces that cannot. 

There is class CalcRelSplitter that can split expressions in a Project and 
Filter according to specified criteria. An existing rule, ProjectToWindowRule, 
that uses a sub-class of CalcRelSplitter. This rule could be built along 
similar lines.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CALCITE-1824) GROUP_ID returns wrong result

2017-06-01 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1824:


 Summary: GROUP_ID returns wrong result
 Key: CALCITE-1824
 URL: https://issues.apache.org/jira/browse/CALCITE-1824
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Julian Hyde


We implemented the {{GROUP_ID()}} function in CALCITE-512 but we got the 
specification wrong, and it returns the wrong result.

{{GROUP_ID}} is not in the SQL standard. It is implemented only by Oracle.

I mistakenly believed that {{GROUP_ID()}} is equivalent to {{GROUPING_ID(g1, 
..., gn)}} (in a query with {{GROUP BY g1, ..., gn}}). In fact, {{GROUP_ID}} is 
useful only if you have duplicate grouping sets. If grouping sets are distinct, 
{{GROUP_ID()}} will always return zero.

Example 1

{code}SELECT deptno, job, GROUP_ID() AS g
FROM Emp
GROUP BY ROLLUP(deptno, job)

DEPTNO JOBG
-- - --
10 CLERK  0
10 MANAGER0
10 PRESIDENT  0
100
20 CLERK  0
20 ANALYST0
20 MANAGER0
200
30 CLERK  0
30 MANAGER0
30 SALESMAN   0
300
  0
{code} produces grouping sets (deptno, job), (deptno), (). These are distinct, 
so GROUP_ID() is 0 for all rows.

Example 2

{code}SELECT deptno, GROUP_ID() AS g
FROM Emp
GROUP BY GROUPING SETS (deptno, (), ());

DEPTNO  G
-- --
10  0
20  0
30  0
0
1
{code}

As you can see, the grouping set () occurs twice. So there is one row in the 
result for each occurrence: the first occurrence has g = 0; the second has g = 
1.

In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This 
is wrong, but nevertheless closer to the required behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CALCITE-1823) Push join down to Druid, using Druid's query-time lookups

2017-06-01 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1823:


 Summary: Push join down to Druid, using Druid's query-time lookups
 Key: CALCITE-1823
 URL: https://issues.apache.org/jira/browse/CALCITE-1823
 Project: Calcite
  Issue Type: Bug
  Components: druid
Reporter: Julian Hyde
Assignee: Julian Hyde


Push Join down to Druid, using Druid's [query-time 
lookup|http://druid.io/docs/latest/querying/lookups.html] feature. As of Druid 
0.10 this feature is marked experimental. 

bq. Very small lookups (count of keys on the order of a few dozen to a few 
hundred) can be passed at query time as a "map" lookup

We could use map lookups for joins to a Values operator.

bq. Globally cached lookups from local files, remote URIs, or JDBC through 
lookups-cached-global.

If Calcite sees a Join between Druid and a table in a JDBC data source, and it 
knows that that table is [registered as a lookup in Druid's 
schema|http://druid.io/docs/latest/development/extensions-core/lookups-cached-global.html],
 and it knows the name of that lookup, then it could translate to a JDBC lookup 
in Druid. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CALCITE-1822) Push Aggregate that follows Aggregate down to Druid

2017-06-01 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-1822:
-
Description: 
Push Aggregate that follows Aggregate down to Druid. This can occur if the SQL 
has an aggregate function applied to an aggregate function, or with a sub-query 
in the FROM clause.

{code}
SELECT MAX(COUNT(*))
FROM Emp
GROUP BY deptno

SELECT MAX(c) FROM (
  SELECT deptno, COUNT(*) AS c
  FROM Emp
  GROUP BY deptno)
{code}

And there are other possibilities where there is a Project and/or a Filter 
after the first Aggregate and before the second Aggregate.

[~bslim], you wrote:
{quote}
For instance in druid we can do select count distinct as an inner group by that 
group on the key and the outer one does then count. more complex cases is count 
distinct from unions of multiple queries
{quote}

Can you please write a SQL statement for each of those cases?

  was:
Push Aggregate that follows Aggregate down to Druid. This can occur if the SQL 
has an aggregate function applied to an aggregate function, or with a sub-query 
in the FROM clause.

{code}
SELECT MAX(COUNT(*))
FROM Emp
GROUP BY deptno

SELECT MAX(c) FROM (
  SELECT deptno, COUNT(*) AS c
  FROM Emp
  GROUP BY deptno)
{code}

And there are other possibilities where there is a Project and/or a Filter 
after the first Aggregate and before the second Aggregate.


> Push Aggregate that follows Aggregate down to Druid
> ---
>
> Key: CALCITE-1822
> URL: https://issues.apache.org/jira/browse/CALCITE-1822
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>
> Push Aggregate that follows Aggregate down to Druid. This can occur if the 
> SQL has an aggregate function applied to an aggregate function, or with a 
> sub-query in the FROM clause.
> {code}
> SELECT MAX(COUNT(*))
> FROM Emp
> GROUP BY deptno
> SELECT MAX(c) FROM (
>   SELECT deptno, COUNT(*) AS c
>   FROM Emp
>   GROUP BY deptno)
> {code}
> And there are other possibilities where there is a Project and/or a Filter 
> after the first Aggregate and before the second Aggregate.
> [~bslim], you wrote:
> {quote}
> For instance in druid we can do select count distinct as an inner group by 
> that group on the key and the outer one does then count. more complex cases 
> is count distinct from unions of multiple queries
> {quote}
> Can you please write a SQL statement for each of those cases?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CALCITE-1822) Push Aggregate that follows Aggregate down to Druid

2017-06-01 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1822:


 Summary: Push Aggregate that follows Aggregate down to Druid
 Key: CALCITE-1822
 URL: https://issues.apache.org/jira/browse/CALCITE-1822
 Project: Calcite
  Issue Type: Bug
  Components: druid
Reporter: Julian Hyde
Assignee: Julian Hyde


Push Aggregate that follows Aggregate down to Druid. This can occur if the SQL 
has an aggregate function applied to an aggregate function, or with a sub-query 
in the FROM clause.

{code}
SELECT MAX(COUNT(*))
FROM Emp
GROUP BY deptno

SELECT MAX(c) FROM (
  SELECT deptno, COUNT(*) AS c
  FROM Emp
  GROUP BY deptno)
{code}

And there are other possibilities where there is a Project and/or a Filter 
after the first Aggregate and before the second Aggregate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CALCITE-1821) Support typed columns in Druid

2017-06-01 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-1821:
-
Description: 
Druid 0.10 introduces typed columns. We should make Calcite's Druid adapter 
aware of types. Here are the improvements:
* group by druid metrics;
* filter by druid metrics;
* we can send a hint to druid that a dimension is a number that can make 
aggregation more efficient;
* virtual column that can be used as arbitrary expression over metrics.

[~bslim], By "typed columns" do you mean [numeric 
dimensions|http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions]?
 If so, let's be consistent with Druid's terminology.

  was:
Druid 0.10 introduces typed columns. We should make Calcite's Druid adapter 
aware of types. Here are the improvements:
* group by druid metrics;
* filter by druid metrics;
* we can send a hint to druid that a dimension is a number that can make 
aggregation more efficient;
* virtual column that can be used as arbitrary expression over metrics.

[~bslim], By "typed columns" do you mean [numeric 
dimensions|http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions].
 If so let's be consistent with Druid's terminology.


> Support typed columns in Druid
> --
>
> Key: CALCITE-1821
> URL: https://issues.apache.org/jira/browse/CALCITE-1821
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>
> Druid 0.10 introduces typed columns. We should make Calcite's Druid adapter 
> aware of types. Here are the improvements:
> * group by druid metrics;
> * filter by druid metrics;
> * we can send a hint to druid that a dimension is a number that can make 
> aggregation more efficient;
> * virtual column that can be used as arbitrary expression over metrics.
> [~bslim], By "typed columns" do you mean [numeric 
> dimensions|http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions]?
>  If so, let's be consistent with Druid's terminology.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CALCITE-1821) Support typed columns in Druid

2017-06-01 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-1821:
-
Component/s: druid

> Support typed columns in Druid
> --
>
> Key: CALCITE-1821
> URL: https://issues.apache.org/jira/browse/CALCITE-1821
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>
> Druid 0.10 introduces typed columns. We should make Calcite's Druid adapter 
> aware of types. Here are the improvements:
> * group by druid metrics;
> * filter by druid metrics;
> * we can send a hint to druid that a dimension is a number that can make 
> aggregation more efficient;
> * virtual column that can be used as arbitrary expression over metrics.
> [~bslim], By "typed columns" do you mean [numeric 
> dimensions|http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions].
>  If so let's be consistent with Druid's terminology.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CALCITE-1821) Support typed columns in Druid

2017-06-01 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1821:


 Summary: Support typed columns in Druid
 Key: CALCITE-1821
 URL: https://issues.apache.org/jira/browse/CALCITE-1821
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Julian Hyde


Druid 0.10 introduces typed columns. We should make Calcite's Druid adapter 
aware of types. Here are the improvements:
* group by druid metrics;
* filter by druid metrics;
* we can send a hint to druid that a dimension is a number that can make 
aggregation more efficient;
* virtual column that can be used as arbitrary expression over metrics.

[~bslim], By "typed columns" do you mean [numeric 
dimensions|http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions].
 If so let's be consistent with Druid's terminology.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CALCITE-1803) Push Project that follows Aggregate down to Druid

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033455#comment-16033455
 ] 

Julian Hyde edited comment on CALCITE-1803 at 6/1/17 6:32 PM:
--

Since Calcite rewrites {{AVG\(x)}} to {{SUM\(x) / COUNT\(x)}}, this change 
should enable

{code}SELECT "store_state", AVG("unit_sales") FROM "foodmart" GROUP BY 
"store_state"{code}

to be pushed down to Druid in its entirety.


was (Author: julianhyde):
Since Calcite rewrites {{AVG(x)}} to {{SUM(x) / COUNT(x)}}, this change should 
enable

{code}SELECT "store_state", AVG("unit_sales") FROM "foodmart" GROUP BY 
"store_state"{code}

to be pushed down to Druid in its entirety.

> Push Project that follows Aggregate down to Druid
> -
>
> Key: CALCITE-1803
> URL: https://issues.apache.org/jira/browse/CALCITE-1803
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.11.0
>Reporter: Junxian Wu
>Assignee: Julian Hyde
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Druid post aggregations are not supported when parsing SQL queries. By 
> implementing post aggregations, we can offload some computation to the druid 
> cluster rather than aggregate on the client side.
> Example usage:
> {{SELECT SUM("column1") - SUM("column2") FROM "table";}}
> This query will be parsed into two separate Druid aggregations according to 
> current rules. Then the results will be subtracted in Calcite. By using the 
> {{postAggregations}} field in the druid query, the subtraction could be done 
> in Druid cluster. Although the previous example is simple, the difference 
> will be obvious when the number of result rows are large. (Multiple rows 
> result will happen when group by is used).
> Questions:
> After I push Post aggregation into Druid query, what should I change on the 
> project relational correlation? In the case of the example above, the 
> {{BindableProject}} will have the expression to representation the 
> subtraction. If I push the post aggregation into druid query, the expression 
> of subtraction should be replaced by the representation of the post 
> aggregations result. For now, the project expression seems can only point to 
> the aggregations results. Since post aggregations have to point to 
> aggregations results too, it could not be placed in the parallel level as 
> aggregation. Where should I put post aggregations?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1803) Push Project that follows Aggregate down to Druid

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033455#comment-16033455
 ] 

Julian Hyde commented on CALCITE-1803:
--

Since Calcite rewrites {{AVG(x)}} to {{SUM(x) / COUNT(x)}}, this change should 
enable

{code}SELECT "store_state", AVG("unit_sales") FROM "foodmart" GROUP BY 
"store_state"{code}

to be pushed down to Druid in its entirety.

> Push Project that follows Aggregate down to Druid
> -
>
> Key: CALCITE-1803
> URL: https://issues.apache.org/jira/browse/CALCITE-1803
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.11.0
>Reporter: Junxian Wu
>Assignee: Julian Hyde
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Druid post aggregations are not supported when parsing SQL queries. By 
> implementing post aggregations, we can offload some computation to the druid 
> cluster rather than aggregate on the client side.
> Example usage:
> {{SELECT SUM("column1") - SUM("column2") FROM "table";}}
> This query will be parsed into two separate Druid aggregations according to 
> current rules. Then the results will be subtracted in Calcite. By using the 
> {{postAggregations}} field in the druid query, the subtraction could be done 
> in Druid cluster. Although the previous example is simple, the difference 
> will be obvious when the number of result rows are large. (Multiple rows 
> result will happen when group by is used).
> Questions:
> After I push Post aggregation into Druid query, what should I change on the 
> project relational correlation? In the case of the example above, the 
> {{BindableProject}} will have the expression to representation the 
> subtraction. If I push the post aggregation into druid query, the expression 
> of subtraction should be replaced by the representation of the post 
> aggregations result. For now, the project expression seems can only point to 
> the aggregations results. Since post aggregations have to point to 
> aggregations results too, it could not be placed in the parallel level as 
> aggregation. Where should I put post aggregations?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CALCITE-1820) Push more filters down to Druid

2017-06-01 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1820:


 Summary: Push more filters down to Druid
 Key: CALCITE-1820
 URL: https://issues.apache.org/jira/browse/CALCITE-1820
 Project: Calcite
  Issue Type: Bug
  Components: druid
Reporter: Julian Hyde
Assignee: Julian Hyde


Push more kinds of filters to Druid. For example:
* SUBSTRING
* like regex pattern (SIMILAR TO and LIKE)
* other kinds of extraction functions
* more smart re-writing of time filters into intervals

[~bslim], Can you please fill out the list?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CALCITE-1687) NPE in AggregateNode.getAccumulator

2017-06-01 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde resolved CALCITE-1687.
--
Resolution: Duplicate

> NPE in AggregateNode.getAccumulator
> ---
>
> Key: CALCITE-1687
> URL: https://issues.apache.org/jira/browse/CALCITE-1687
> Project: Calcite
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Julian Hyde
>
> Faced this while working on https://issues.apache.org/jira/browse/CALCITE-1683
> Stack Trace - 
> java.lang.RuntimeException: exception while executing [select count(*) as c
> from "foodmart"
> where extract(year from "timestamp") = 1997
> and extract(month from "timestamp") in (4, 6)
> ]
>   at 
> org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1245)
>   at 
> org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1224)
>   at 
> org.apache.calcite.test.CalciteAssert$AssertQuery.returnsUnordered(CalciteAssert.java:1251)
>   at 
> org.apache.calcite.test.DruidAdapterIT.testFilterTimestamp(DruidAdapterIT.java:1349)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: java.lang.RuntimeException: With materializationsEnabled=false, 
> limit=0
>   at 
> org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:524)
>   at 
> org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1241)
>   ... 30 more
> Caused by: java.sql.SQLException: Error while executing SQL "select count(*) 
> as c
> from "foodmart"
> where extract(year from "timestamp") = 1997
> and extract(month from "timestamp") in (4, 6)
> ": null
>   at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>   at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
>   at 
> org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:492)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.calcite.interpreter.AggregateNode.getAccumulator(AggregateNode.java:144)
>   at 
> org.apache.calcite.interpreter.AggregateNode.(AggregateNode.java:86)
>   at 
> org.apache.calcite.interpreter.Nodes$CoreCompiler.visit(Nodes.java:47)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.Delegati

[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033414#comment-16033414
 ] 

Julian Hyde edited comment on CALCITE-1787 at 6/1/17 6:08 PM:
--

Both of the following queries are valid SQL and we should push them down to 
Druid and use theta sketches (or HLL, whichever is available):

{code}
select count(distinct "customer_id")
from "foodmart"
where "store_city" in ('Chicago', 'Seattle')

select count(distinct "customer_id") filter (where "store_city" in ('Chicago', 
'Seattle'))
from "foodmart"
{code}

That's the what. Now let's figure out the how.


was (Author: julianhyde):
Both of the following queries are valid SQL and we should push them down to 
Druid and use theta sketches (or HLL, whichever is available):

{code}
select count(distinct "customer_id")
from "foodmart"
where "store_city" in ('Chicago', 'Seattle')

select count(distinct "customer_id") filter (where "store_city" in ('Chicago', 
'Seattle')
from "foodmart"
{code}

That's the what. Now let's figure out the how.

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033414#comment-16033414
 ] 

Julian Hyde commented on CALCITE-1787:
--

Both of the following queries are valid SQL and we should push them down to 
Druid and use theta sketches (or HLL, whichever is available):

{code}
select count(distinct "customer_id")
from "foodmart"
where "store_city" in ('Chicago', 'Seattle')

select count(distinct "customer_id") filter (where "store_city" in ('Chicago', 
'Seattle')
from "foodmart"
{code}

That's the what. Now let's figure out the how.

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1804) Cannot assign NOT NULL array to NULLABLE array

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033407#comment-16033407
 ] 

Julian Hyde commented on CALCITE-1804:
--

You don't need a formatter - you can format manually. Just make sure your 
editor doesn't make spurious changes.

How about making canAssignFrom call itself recursively on the component type? 
Your code won't handle assigning an array of arrays of not null integers to an 
array of arrays of nullable integers, but the recursive solution would, and 
would be simpler.

Please make sure that the tests and validate/verify pass before submitting the 
PR.

> Cannot assign NOT NULL array to NULLABLE array
> --
>
> Key: CALCITE-1804
> URL: https://issues.apache.org/jira/browse/CALCITE-1804
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.13.0
>Reporter: Ankit Singhal
>Assignee: Julian Hyde
>
> As ArraySqlType return a family of its own type and comparing families in 
> SqlTypeUtil#canAssignFrom will compare the digest with Nullable constraints, 
> which will not match when we are inserting an array in a nullable column.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033229#comment-16033229
 ] 

slim bouguerra commented on CALCITE-1787:
-

i will take back my first point i am sorry i have miss read the example, what i 
have in mind was queries like how many unique users visited both product A and 
B which i guess in SQL does not translates {code} SELECT COUNT(DISTINCT 
"user_unique") FROM "foodmart" WHERE "store_city" = 'Chicago' AND "store_city" 
= 'Seattle';  {code}


> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CALCITE-1613) Implement EXTRACT for time unit DOW, DOY; and fix CENTURY

2017-06-01 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde resolved CALCITE-1613.
--
   Resolution: Fixed
Fix Version/s: 1.13.0

Fixed in  http://git-wip-us.apache.org/repos/asf/calcite/commit/6cb4d45b.

However, we didn't remove {{addMonths}} and related methods yet; CALCITE-1639 
had made some improvements in Calcite which are not in Avatica as of 
avatica-1.10.

> Implement EXTRACT for time unit DOW, DOY; and fix CENTURY
> -
>
> Key: CALCITE-1613
> URL: https://issues.apache.org/jira/browse/CALCITE-1613
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
> Fix For: 1.13.0
>
>
> Implement EXTRACT for time units DOW, DOY and others introduced in 
> CALCITE-1606.
> Fix EXTRACT(CENTURY FROM ...), which previously just divided the year by 100, 
> and a similar bug in MILLENNIUM.
> Requires CALCITE-1609 and therefore has to wait for an Avatica release.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032985#comment-16032985
 ] 

slim bouguerra edited comment on CALCITE-1787 at 6/1/17 3:38 PM:
-

[~zhumayun] please read the sketch docs.
1 - Don't agree with the claim that queries like {code} SELECT COUNT(DISTINCT 
"user_unique") FROM "foodmart" WHERE "store_city" = 'Chicago' AND "store_city" 
= 'Seattle'; {code} works fine.
Pushing only filters to druid will produce the wrong results, you need post 
aggregation and filtered aggregator to do the intersection between sketches, 
without intersection the result you get is the union which means you have 
counted duplication thus you are not getting unique counts.
2 - for the filters on metrics or the more general case when we can not push 
filter/query to druid in fact calcite can not do much, sketch is a binary blob 
that needs  ser/desr library, not sure what is the perfect path to take, i 
don't know calcite well to provide an answer to the question.


was (Author: bslim):
[~zhumayun] please read the sketch docs.
1 - Don't agree with the claim that queries like {code} SELECT COUNT(DISTINCT 
"user_unique") FROM "foodmart" WHERE "the_month" = 'April' AND "store_city" = 
'Seattle'; {code} works fine.
Pushing only filters to druid will produce the wrong results, you need post 
aggregation and filtered aggregator to do the intersection between sketches, 
without intersection the result you get is the union which means you have 
counted duplication thus you are not getting unique counts.
2 - for the filters on metrics or the more general case when we can not push 
filter/query to druid in fact calcite can not do much, sketch is a binary blob 
that needs  ser/desr library, not sure what is the perfect path to take, i 
don't know calcite well to provide an answer to the question.

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1798) In JDBC adapter, generate dialect-specific SQL for FLOOR operator

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033173#comment-16033173
 ] 

Julian Hyde commented on CALCITE-1798:
--

Let's make a new JIRA case for this. This case is fixed already. I think 
SqlNode.clone should remain shallow clone. Your proposed fix makes clone 
sometimes deep, sometimes shallow. In fact I don't think we should alter clone 
at all. My preferred solution would be to treat all SqlNodes as immutable 
during this entire process, so if we to change anything, we have to create a 
new copy.

> In JDBC adapter, generate dialect-specific SQL for FLOOR operator
> -
>
> Key: CALCITE-1798
> URL: https://issues.apache.org/jira/browse/CALCITE-1798
> Project: Calcite
>  Issue Type: Bug
>  Components: jdbc-adapter
>Reporter: Chris Baynes
>Assignee: Julian Hyde
>  Labels: dialect
> Fix For: 1.13.0
>
>
> The FLOOR operator (on dates) is currently broken for all jdbc dialects.
> The syntax allowed by the parser looks like: "FLOOR(datetime to timeUnit)".
> However no jdbc dialect (as far as I'm aware) actually name the function 
> FLOOR:
> In postgres: DATE_TRUNC('year', my_datetime)
> In hsqldb: TRUNC ( my_datetime, '' )
> In oracle: TRUNC(my_datetime, 'YEAR')
> In mysql: There's no direct equivalent in mysql (though it could be emulated 
> with some nasty timestamp diffing)
> The other issue is that the timeUnits are sometimes also named differently by 
> each dialect (e.g. '' in hsqldb).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033138#comment-16033138
 ] 

Julian Hyde commented on CALCITE-1787:
--

I don't think that "user_unique" should appear in queries. They want the 
(approximate) number of distinct users, not the number of distinct 
"user_unique" values.  "user_unique" is an implementation detail.

[~bslim], "how many unique users visited both product A and B" is an 
interesting query. But let's first look at how you'd write that in SQL (hint: 
no "user_unique") then figure out how to map onto Druid.

Since we're all meeting this afternoon, can we discuss then?

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032992#comment-16032992
 ] 

slim bouguerra commented on CALCITE-1787:
-

For instance if you want to  query for how many unique users visited both 
product A and B? the query to druid should look like 
{code} 
{
  "queryType": "groupBy",
  "dataSource": "test_datasource",
  "granularity": "ALL",
  "dimensions": [],
  "filter": {
"type": "or",
"fields": [
  {"type": "selector", "dimension": "product", "value": "A"},
  {"type": "selector", "dimension": "product", "value": "B"}
]
  },
  "aggregations": [
{
  "type" : "filtered",
  "filter" : {
"type" : "selector",
"dimension" : "product",
"value" : "A"
  },
  "aggregator" : {
"type": "thetaSketch", "name": "A_unique_users", "fieldName": 
"user_id_sketch"
  }
},
{
  "type" : "filtered",
  "filter" : {
"type" : "selector",
"dimension" : "product",
"value" : "B"
  },
  "aggregator" : {
"type": "thetaSketch", "name": "B_unique_users", "fieldName": 
"user_id_sketch"
  }
}
  ],
  "postAggregations": [
{
  "type": "thetaSketchEstimate",
  "name": "final_unique_users",
  "field":
  {
"type": "thetaSketchSetOp",
"name": "final_unique_users_sketch",
"func": "INTERSECT",
"fields": [
  {
"type": "fieldAccess",
"fieldName": "A_unique_users"
  },
  {
"type": "fieldAccess",
"fieldName": "B_unique_users"
  }
]
  }
}
  ],
  "intervals": [
"2014-10-19T00:00:00.000Z/2014-10-22T00:00:00.000Z"
  ]
}
{code}

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032985#comment-16032985
 ] 

slim bouguerra edited comment on CALCITE-1787 at 6/1/17 1:32 PM:
-

[~zhumayun] please read the sketch docs.
1 - Don't agree with the claim that queries like {code} SELECT COUNT(DISTINCT 
"user_unique") FROM "foodmart" WHERE "the_month" = 'April' AND "store_city" = 
'Seattle'; {code} works fine.
Pushing only filters to druid will produce the wrong results, you need post 
aggregation and filtered aggregator to do the intersection between sketches, 
without intersection the result you get is the union which means you have 
counted duplication thus you are not getting unique counts.
2 - for the filters on metrics or the more general case when we can not push 
filter/query to druid in fact calcite can not do much, sketch is a binary blob 
that needs  ser/desr library, not sure what is the perfect path to take, i 
don't know calcite well to provide an answer to the question.


was (Author: bslim):
[~zhumayun] please read the sketch docs.
1 - Don't agree with the claim that queries like {code} SELECT COUNT(DISTINCT 
"user_unique") FROM "foodmart" WHERE "the_month" = 'April' AND "store_city" = 
'Seattle'; {code} works fine.
Pushing only filters to druid will produce the wrong results, you need post 
aggregation and filtered aggregator to do the intersection between sketches, 
without intersection the result you get is the union which means you have 
counted duplication thus you are not getting unique counts.
2 - for the filters on metrics or the more general case when we can not push 
filter/query to druid in fact calcite can not do much, sketch is a binary blob 
that needs that needs ser/desr library, not sure what is the perfect path to 
take, i don't know calcite well to provide a question.

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

2017-06-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032985#comment-16032985
 ] 

slim bouguerra commented on CALCITE-1787:
-

[~zhumayun] please read the sketch docs.
1 - Don't agree with the claim that queries like {code} SELECT COUNT(DISTINCT 
"user_unique") FROM "foodmart" WHERE "the_month" = 'April' AND "store_city" = 
'Seattle'; {code} works fine.
Pushing only filters to druid will produce the wrong results, you need post 
aggregation and filtered aggregator to do the intersection between sketches, 
without intersection the result you get is the union which means you have 
counted duplication thus you are not getting unique counts.
2 - for the filters on metrics or the more general case when we can not push 
filter/query to druid in fact calcite can not do much, sketch is a binary blob 
that needs that needs ser/desr library, not sure what is the perfect path to 
take, i don't know calcite well to provide a question.

> thetaSketch Support for Druid Adapter
> -
>
> Key: CALCITE-1787
> URL: https://issues.apache.org/jira/browse/CALCITE-1787
> Project: Calcite
>  Issue Type: New Feature
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1819) Druid Adapter does not push the boolean operator "<>" as a filter correctly

2017-06-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032910#comment-16032910
 ] 

slim bouguerra commented on CALCITE-1819:
-

+1 Thanks.


> Druid Adapter does not push the boolean operator "<>" as a filter correctly 
> 
>
> Key: CALCITE-1819
> URL: https://issues.apache.org/jira/browse/CALCITE-1819
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Affects Versions: 1.12.0
>Reporter: Zain Humayun
>Assignee: Zain Humayun
>
> The query
> {code:sql}
> SELECT COUNT(DISTINCT "the_month") FROM "foodmart" WHERE "the_month" <> 
> 'October';
> {code}
> Will produce a Druid query with the following filter:
> {code:javascript}
> "filter":{
>   "type":"not",
>   "fields":[
>  {
> "type":"selector",
> "dimension":"the_month",
> "value":"October"
>  }
>   ]
>}
> {code}
> But the expected filter should look like:
> {code:javascript}
> "filter":{
>   "type":"not",
>   "field":{
> "type":"selector",
> "dimension":"the_month",
> "value":"October"
>  }
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1798) In JDBC adapter, generate dialect-specific SQL for FLOOR operator

2017-06-01 Thread Chris Baynes (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032794#comment-16032794
 ] 

Chris Baynes commented on CALCITE-1798:
---

I'll log another JIRA case with the proposed mechanism.
Just have one other issue with the current implementation - I've just noticed 
that the same node can be referenced by different parts of the query.
For example if I add a GROUP BY clause using the same FLOOR operator, the 
setOperator call I've used will be applied to the SqlCall of the GROUP BY and 
then the unparse method will break when it is called on it later on.
To fix that I'm using clone before setOperator, but needed to adjust the clone 
method on SqlBasicCall to make sure it clones the operands too.
The PR is here: https://github.com/apache/calcite/pull/465

> In JDBC adapter, generate dialect-specific SQL for FLOOR operator
> -
>
> Key: CALCITE-1798
> URL: https://issues.apache.org/jira/browse/CALCITE-1798
> Project: Calcite
>  Issue Type: Bug
>  Components: jdbc-adapter
>Reporter: Chris Baynes
>Assignee: Julian Hyde
>  Labels: dialect
> Fix For: 1.13.0
>
>
> The FLOOR operator (on dates) is currently broken for all jdbc dialects.
> The syntax allowed by the parser looks like: "FLOOR(datetime to timeUnit)".
> However no jdbc dialect (as far as I'm aware) actually name the function 
> FLOOR:
> In postgres: DATE_TRUNC('year', my_datetime)
> In hsqldb: TRUNC ( my_datetime, '' )
> In oracle: TRUNC(my_datetime, 'YEAR')
> In mysql: There's no direct equivalent in mysql (though it could be emulated 
> with some nasty timestamp diffing)
> The other issue is that the timeUnits are sometimes also named differently by 
> each dialect (e.g. '' in hsqldb).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)