kgyrtkirk commented on issue #16728: URL: https://github.com/apache/druid/issues/16728#issuecomment-2224951243
interesting; I think these things could be supported in general with some kind of semijoin however for the above query I think there is an alterante way to write the query: instead of: ``` (SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >= '2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T01:00:00.000Z' GROUP BY 1 ORDER BY "added" DESC LIMIT 33) base LEFT OUTER JOIN (SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >= '2015-09-12T01:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z' GROUP BY 1) comparison ON (base."page" IS NOT DISTINCT FROM comparison."page") ``` a single query could be given to collect both aggregates ``` SELECT "page", sum(added) filter (where "__time" >= '2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T01:00:00.000Z') as base_added, sum(added) filter (where "__time" >= '2015-09-12T01:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z') AS comparison_added FROM "wikipedia" WHERE "__time" >= '2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z' GROUP BY 1 ORDER BY "base_added" DESC LIMIT 33 ``` I believe this rewrite could not be done in all cases; but might worth a try; could you check if this would help? or if yes how much? not sure if the input query could be recognized and rewritten to this one automatically...it doesn't seem impossible; but right now I feel like that would be a bit fragile <details> <summary>queries used/etc </summary> modified base query ```sql SELECT (COALESCE(base."page", comparison."page")) AS "page", (ANY_VALUE(base."added")) AS "added", (ANY_VALUE(comparison."added")) AS "added_prev", (ANY_VALUE(base."added" - comparison."added")) AS "added_delta" FROM (SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >= '2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T01:00:00.000Z' GROUP BY 1 ORDER BY "added" DESC LIMIT 33) base LEFT OUTER JOIN (SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >= '2015-09-12T01:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z' GROUP BY 1) comparison ON (base."page" IS NOT DISTINCT FROM comparison."page") GROUP BY 1 ORDER BY "added" DESC LIMIT 33 ``` plan is to join a topN and a gby result rewritten query ``` SELECT page, (ANY_VALUE(base_added)) AS "added", (ANY_VALUE(comparison_added)) AS "added_prev", (ANY_VALUE(base_added - comparison_added)) AS "added_delta" FROM (SELECT "page", sum(added) filter (where "__time" >= '2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T01:00:00.000Z') as base_added, sum(added) filter (where "__time" >= '2015-09-12T01:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z') AS comparison_added FROM "wikipedia" WHERE "__time" >= '2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z' GROUP BY 1 ORDER BY "base_added" DESC LIMIT 33) base GROUP BY 1 ORDER BY "added" DESC LIMIT 33 ``` plan is to do a gby on a topn </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org