sergei3000 opened a new issue, #16527:
URL: https://github.com/apache/druid/issues/16527

   Query plan in web ui shows the intervals are fully scanned when there's
   1) a `BETWEEN` clause on `DATE_TRUNC('DAY`, __time)`
   2) a `NOT IN` clause containing a list of dates with some adjacent days that 
are inside the `between` interval
   3) some other filter on some other column
   in the query.
   
   ### Affected Versi
   
   29.0.1
   
   ### Description
   
   I believe there's some problem/bug with the query planner.
   
   For example, here's a query that works in my setup (it's a small cluster of 
several servers):
   ```sql
   SELECT *
   FROM mytable
   WHERE __time BETWEEN '2024-03-01' and '2024-03-10' and mycol = 'ABC'
   and DATE_TRUNC('day', __time) not in (TIMESTAMP '2024-03-05',  '2024-03-07')
   ```
   (I changed the names of columns/tables except for `__time`)
   By "works" I mean it works fast and the intervals are optimal in the plan:
   ```
     "intervals": {
       "type": "intervals",
       "intervals": [
         "2024-03-01T00:00:00.000Z/2024-03-05T00:00:00.000Z",
         "2024-03-06T00:00:00.000Z/2024-03-07T00:00:00.000Z",
         "2024-03-08T00:00:00.000Z/2024-03-10T00:00:00.001Z"
       ]
     },
   ```
   
   But once I change the dates list in the `not in` part to have two adjacent 
dates:
   ```sql
   SELECT *
   FROM mytable
   WHERE __time BETWEEN '2024-03-01' and '2024-03-10' and mycol = 'ABC'
   and DATE_TRUNC('day', __time) not in (TIMESTAMP '2024-03-05', '2024-03-06')
   ```
   intervals get a full scan (and the query hangs forever):
   ```
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
   ```
   
   Another interesting case is when I take the problematic query and delete the 
additional filter by another column (note the commented out `and mycol = 'ABC'` 
in this one):
   ```sql
   SELECT *
   FROM mytable
   WHERE __time BETWEEN '2024-03-01' and '2024-03-10' --and symbol = 'SPY'
   and DATE_TRUNC('day', __time) not in (TIMESTAMP '2024-03-05', '2024-03-06')
   ```
   This time the intervals are just as expected and the query is fast again:
   ```
     "intervals": {
       "type": "intervals",
       "intervals": [
         "2024-03-01T00:00:00.000Z/2024-03-05T00:00:00.000Z",
         "2024-03-07T00:00:00.000Z/2024-03-10T00:00:00.001Z"
       ]
     },
   ```
   
   Anyways, I expected the query planner/optimizer should always understand the 
constraints set by the dates in my `between` clause in all cases.
   
   Let me know if you want any other info, I'll try to provide whatever I can.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to