[
https://issues.apache.org/jira/browse/CALCITE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828493#comment-15828493
]
Nishant Bangarwa commented on CALCITE-1578:
-------------------------------------------
Here is a sample query using extraction function, in case it helps -
{code}
{
"queryType": "groupBy",
"dataSource": "druid_tpcds_ss_sold_time_subset",
"granularity": "ALL",
"dimensions": [
"i_brand_id",
{
"type" : "extraction",
"dimension" : "__time",
"outputName" : "year",
"extractionFn" : {
"type" : "timeFormat",
"granularity" : "YEAR"
}
}
],
"limitSpec": {
"type": "default",
"limit": 10,
"columns": [
{
"dimension": "$f3",
"direction": "ascending"
}
]
},
"aggregations": [
{
"type": "longMax",
"name": "$f2",
"fieldName": "ss_quantity"
},
{
"type": "doubleSum",
"name": "$f3",
"fieldName": "ss_wholesale_cost"
}
],
"intervals": [
"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
]
}
{code}
> Druid adapter: wrong semantics of topN query limit with granularity
> -------------------------------------------------------------------
>
> Key: CALCITE-1578
> URL: https://issues.apache.org/jira/browse/CALCITE-1578
> Project: Calcite
> Issue Type: Bug
> Components: druid
> Affects Versions: 1.11.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Julian Hyde
> Priority: Critical
>
> Semantics of Druid topN query with limit and granularity is not equivalent to
> input SQL. In particular, limit is applied on each granularity value, not on
> the overall query.
> Currently, the following query will be transformed into a topN query:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity),
> sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s DESC
> LIMIT 10;
> {code}
> Previous query outputs at most 10 rows. In turn, the equivalent SQL query for
> a Druid topN query should be expressed as:
> {code:sql}
> SELECT rs.i_brand_id, rs.d, rs.m, rs.s
> FROM (
> SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m,
> sum(ss_wholesale_cost) as s,
> ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY
> sum(ss_wholesale_cost) DESC ) AS rownum
> FROM store_sales_sold_time_subset
> GROUP BY i_brand_id, floor_day(`__time`)
> ) rs
> WHERE rownum <= 10;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)