Jesus Camacho Rodriguez created CALCITE-1578:
------------------------------------------------
Summary: Druid adapter: wrong semantics of topN query limit with
granularity
Key: CALCITE-1578
URL: https://issues.apache.org/jira/browse/CALCITE-1578
Project: Calcite
Issue Type: Bug
Components: druid
Affects Versions: 1.11.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical
Semantics of Druid topN query with limit and granularity is not equivalent to
input SQL. In particular, limit is applied on each granularity value, not on
the overall query.
Currently, the following query will be transformed into a topN query:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity),
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s DESC
LIMIT 10;
{code}
Previous query outputs at most 10 rows. In turn, the equivalent SQL query for a
Druid topN query should be expressed as:
{code:sql}
SELECT rs.i_brand_id, rs.d, rs.m, rs.s
FROM (
SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m,
sum(ss_wholesale_cost) as s,
ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY
sum(ss_wholesale_cost) DESC ) AS rownum
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
) rs
WHERE rownum <= 10;
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)