[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

slim bouguerra (JIRA) Thu, 25 May 2017 22:12:59 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025766#comment-16025766
 ]


slim bouguerra edited comment on CALCITE-1787 at 5/26/17 5:11 AM:
------------------------------------------------------------------

Filters are applied to prune the rows before getting to the aggregation. 
Filtered Aggregator is a kind of aggregator that allow you to prune rows while 
doing the aggregation.
Thus filter will be the first funnel then we can have a more fine grain filter 
per aggregator.
So both play together in scenarios like the following.
Assume your task is to compute the ratio of sales between two states let say CA 
and NY.
To do this in an efficient way the druid query will have filter = rows that 
contain CA or NY, then will have two filtered aggregators (the first contains 
filter = CA while the second has filter = NY)   
Thus in one pass over the data we are able to compute the SUM of sales and we 
can compute the ratio as post aggregate.
I hope you got the idea.
I am not sure what is the equivalent to this in the realm of relational algebra 
maybe [~julianhyde] has better examples.
Also the druid doc has a good [explanation 
|druid.io/docs/latest/querying/aggregations.html#filtered-aggregator]. 


was (Author: bslim):
Filters are applied to prune the rows before getting to the aggregation. 
Filtered Aggregator is a kind of aggregator that allow you to prune rows while 
doing the aggregation.
Thus filter will be the first funnel then we can have a more fine grain filter 
per aggregator.
So both play together in scenarios like the following.
Assume your task is to compute the ratio of sales between two states let say CA 
and NY.
To do this in an efficient way the druid query will have filter = rows that 
contain CA or NY, then will have two filtered aggregators (the first contains 
filter = CA while the second has filter = NY)   
Thus in one pass over the data we are able to compute the SUM of sales and we 
can compute the ratio as post aggregate.
I hope you got the idea.
I am not sure what is the equivalent to this in the realm of relational algebra 
maybe [~julianhyde] has better examples.

> thetaSketch Support for Druid Adapter
> -------------------------------------
>
>                 Key: CALCITE-1787
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1787
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.12.0
>            Reporter: Zain Humayun
>            Assignee: Julian Hyde
>            Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

Reply via email to