kgyrtkirk commented on issue #16728:
URL: https://github.com/apache/druid/issues/16728#issuecomment-2224951243

   interesting; I think these things could be supported in general with some 
kind of semijoin
   
   however for the above query I think there is an alterante way to write the 
query:
   instead of:
   ```
     (SELECT "page", sum(added) AS "added" FROM "wikipedia"  WHERE "__time" >= 
'2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T01:00:00.000Z' GROUP BY 1 
ORDER BY "added" DESC LIMIT 33) base 
   LEFT OUTER JOIN 
     (SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >= 
'2015-09-12T01:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z' GROUP BY 
1) comparison 
   ON 
     (base."page" IS NOT DISTINCT FROM comparison."page") 
   ```
   
   a single query could be given to collect both aggregates
   ```
   SELECT "page", 
       sum(added) filter (where "__time" >= '2015-09-12T00:00:00.000Z' AND 
"__time" < '2015-09-12T01:00:00.000Z') as base_added,
       sum(added) filter (where "__time" >= '2015-09-12T01:00:00.000Z' AND 
"__time" < '2015-09-12T20:00:00.000Z')  AS comparison_added
     FROM "wikipedia"  WHERE "__time" >= '2015-09-12T00:00:00.000Z' AND 
"__time" < '2015-09-12T20:00:00.000Z' GROUP BY 1 ORDER BY "base_added" DESC 
LIMIT 33
   ```
   
   I believe this rewrite could not be done in all cases; but might worth a 
try; could you check if this would help? or if yes how much?
   
   not sure if the input query could be recognized and rewritten to this one 
automatically...it doesn't seem impossible; but right now I feel like that 
would be a bit fragile
   
   <details>
   <summary>queries used/etc
   </summary>
   
   modified base query
   ```sql
   SELECT 
     (COALESCE(base."page", comparison."page")) AS "page", 
     (ANY_VALUE(base."added")) AS "added",
     (ANY_VALUE(comparison."added")) AS "added_prev",
     (ANY_VALUE(base."added" - comparison."added")) AS "added_delta" 
   FROM 
     (SELECT "page", sum(added) AS "added" FROM "wikipedia"  WHERE "__time" >= 
'2015-09-12T00:00:00.000Z' AND "__time" < '2015-09-12T01:00:00.000Z' GROUP BY 1 
ORDER BY "added" DESC LIMIT 33) base 
   LEFT OUTER JOIN 
     (SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >= 
'2015-09-12T01:00:00.000Z' AND "__time" < '2015-09-12T20:00:00.000Z' GROUP BY 
1) comparison 
   ON 
     (base."page" IS NOT DISTINCT FROM comparison."page") 
   GROUP BY 1
   ORDER BY "added" DESC 
   LIMIT 33
   ```
   plan is to join a topN and a gby result
   
   rewritten query
   ```
   SELECT 
     page, 
     (ANY_VALUE(base_added)) AS "added",
     (ANY_VALUE(comparison_added)) AS "added_prev",
     (ANY_VALUE(base_added - comparison_added)) AS "added_delta" 
   FROM 
     (SELECT "page", 
       sum(added) filter (where "__time" >= '2015-09-12T00:00:00.000Z' AND 
"__time" < '2015-09-12T01:00:00.000Z') as base_added,
       sum(added) filter (where "__time" >= '2015-09-12T01:00:00.000Z' AND 
"__time" < '2015-09-12T20:00:00.000Z')  AS comparison_added
     FROM "wikipedia"  WHERE "__time" >= '2015-09-12T00:00:00.000Z' AND 
"__time" < '2015-09-12T20:00:00.000Z' GROUP BY 1 ORDER BY "base_added" DESC 
LIMIT 33) base 
   GROUP BY 1
   ORDER BY "added" DESC 
   LIMIT 33
   ```
   
   plan is to do a gby on a topn
   </details>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to