alexch2000 opened a new issue, #12060:
URL: https://github.com/apache/pinot/issues/12060
We are experiencing a significant performance issue in Pinot, specifically
related to how DateTime field filtering is applied in our queries.
**Dataset Description**: Our dataset consists of a large table containing
event records. Each record includes fields like `name`, `stage`, `status`, and
a `timestamp`.
**Query Performance Issue**: We've observed that the location of the
DateTime filter in our query significantly impacts the query's performance.
There is a noticeable performance degradation – up to 90% slower – depending on
how and where the DateTime filter is applied.
**Query Example**: Initially, when the timestamp filter is not the first
condition in the query, we notice that almost 80% of the documents are scanned,
and the query completion time is around 8.5 seconds. Below is an example of the
query generated by our custom connector to Pinot
```
SELECT
name,
sum(CAST(CASE WHEN ((status = 'success') AND (stage = 'start')) THEN 1
ELSE 0 END AS LONG))
FROM
events_table
WHERE
((status = 'success') AND (stage IN ('complete', 'start'))
AND (((timestamp BETWEEN 1699920000000 AND 1701129600000) AND (stage =
'start')) OR (stage = 'complete')
)
)
GROUP BY
name
LIMIT
10000
```
As you can see that almost 80% of docs were scanned + time to finish the
query is around 8.5s.
<img width="685" alt="Screenshot 2023-11-28 at 14 56 39"
src="https://github.com/apache/pinot/assets/994452/7f9764fb-c8e5-4bf5-baac-797a381f4358">
**Improved Performance with Adjusted Query**: However, when we move the
timestamp check to the first condition in the query, there is a significant
improvement. The query completion time drops to approximately 5 seconds, and
far fewer documents are scanned.
```
SELECT
name,
sum(CAST(CASE WHEN ((status = 'success') AND (stage = 'start')) THEN 1
ELSE 0 END AS LONG))
FROM
events_table
WHERE
timestamp BETWEEN 1699920000000 AND 1701129600000
AND (
(
(status = 'success')
AND (stage IN ('complete', 'start'))
)
AND (
(stage = 'start')
OR (stage = 'complete')
)
)
GROUP BY
name
LIMIT
10000
```
<img width="677" alt="Screenshot 2023-11-28 at 14 36 49"
src="https://github.com/apache/pinot/assets/994452/b9126c9c-e885-4129-b04d-282b53a8a9b4">
We expect the query performance to be consistent regardless of the
positioning of the DateTime filter. However, the current behavior shows a
critical variance in execution times.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]