cgivre opened a new pull request #2033: DRILL-7652: Add time_bucket() function for time series analysis URL: https://github.com/apache/drill/pull/2033 # [DRILL-7652](https://issues.apache.org/jira/browse/DRILL-7652): Add time_bucket() function for Time Series Analysis ## Description This PR adds two UDFs which facilitate time series analysis. This PR also includes updates to the `README.md` in the `contrib/udf` folder to reflect the new UDF. ## Documentation These functions are useful for doing time series analysis by grouping the data into arbitrary intervals. See: https://blog.timescale.com/blog/simplified-time-series-analytics -using-the-time_bucket-function/ for more examples. There are two versions of the function: * `time_bucket(<timestamp>, <interval>)` * `time_bucket_ns(<timestamp>,<interval>)` Both functions accept a `BIGINT` timestamp and an interval in milliseconds as arguments. The `time_bucket_ns()` function accepts timestamps in nanoseconds and `time_bucket ()` accepts timestamps in milliseconds. Both return timestamps in the original format. ### Example: The query below calculates the average for the `cpu` metric for every five minute interval. ```sql SELECT time_bucket(time_stamp, 30000) AS five_min, avg(cpu) FROM metrics GROUP BY five_min ORDER BY five_min DESC LIMIT 12; ``` ## Testing There are a series of unit tests included with this PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
