[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

Julian Hyde (JIRA) Tue, 06 Jun 2017 14:08:32 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039640#comment-16039640
 ]


Julian Hyde commented on CALCITE-1787:
--------------------------------------

Regarding [~bslim]'s proposal to use user-defined aggregate functions. The 
experience for the end user wouldn't be quite as pleasant: the tool would have 
to know about the aggregate function, and also know about which sketch columns 
are available. But I wouldn't object to it.

My position remains that sketches are an implementation detail, and that if you 
include them in the query the model is no longer declarative. It's exactly 
analogous to requiring users to rewrite their queries to reference the hidden 
table and columns that store a b-tree index if they want to use that index in a 
query. So the abstract "user_id" column, and a mapping onto its "user_unique" 
sketch column applied automatically by the planner, would still be the ideal 
solution.

> thetaSketch Support for Druid Adapter
> -------------------------------------
>
>                 Key: CALCITE-1787
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1787
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.12.0
>            Reporter: Zain Humayun
>            Assignee: Zain Humayun
>            Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

Reply via email to