[jira] [Commented] (CALCITE-1578) Druid adapter: wrong semantics of topN query limit with granularity

Nishant Bangarwa (JIRA) Wed, 18 Jan 2017 10:15:44 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828493#comment-15828493
 ]


Nishant Bangarwa commented on CALCITE-1578:
-------------------------------------------

Here is a sample query using extraction function, in case it helps - 
{code}
{
  "queryType": "groupBy",
  "dataSource": "druid_tpcds_ss_sold_time_subset",
  "granularity": "ALL",
  "dimensions": [
    "i_brand_id",
    {
      "type" : "extraction",
      "dimension" : "__time",
      "outputName" :  "year",
      "extractionFn" : {
        "type" : "timeFormat",
        "granularity" : "YEAR"
      }
    }
  ],
  "limitSpec": {
    "type": "default",
    "limit": 10,
    "columns": [
      {
        "dimension": "$f3",
        "direction": "ascending"
      }
    ]
  },
  "aggregations": [
    {
      "type": "longMax",
      "name": "$f2",
      "fieldName": "ss_quantity"
    },
    {
      "type": "doubleSum",
      "name": "$f3",
      "fieldName": "ss_wholesale_cost"
    }
  ],
  "intervals": [
    "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
  ]
}
{code}

> Druid adapter: wrong semantics of topN query limit with granularity
> -------------------------------------------------------------------
>
>                 Key: CALCITE-1578
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1578
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>    Affects Versions: 1.11.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Julian Hyde
>            Priority: Critical
>
> Semantics of Druid topN query with limit and granularity is not equivalent to 
> input SQL. In particular, limit is applied on each granularity value, not on 
> the overall query.
> Currently, the following query will be transformed into a topN query:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
> sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s DESC
> LIMIT 10;
> {code}
> Previous query outputs at most 10 rows. In turn, the equivalent SQL query for 
> a Druid topN query should be expressed as:
> {code:sql}
> SELECT rs.i_brand_id, rs.d, rs.m, rs.s
> FROM (
>     SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m, 
> sum(ss_wholesale_cost) as s,
>            ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY 
> sum(ss_wholesale_cost) DESC ) AS rownum
>     FROM store_sales_sold_time_subset
>     GROUP BY i_brand_id, floor_day(`__time`)
> ) rs
> WHERE rownum <= 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-1578) Druid adapter: wrong semantics of topN query limit with granularity

Reply via email to