domodwyer opened a new issue #1538: URL: https://github.com/apache/arrow-datafusion/issues/1538
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like to efficiently aggregate (approximate) quantile values from a column of data - "show me the 99th percentile of the latency column in the requests table" **Describe the solution you'd like** Implement TDigest (or similar algorithm) to provide relatively cheap quantile values/estimations. **Describe alternatives you've considered** I've had a look at some other DBs: * duckdb - tdigest & reservoir sampling * timescaledb - tdigest & uddsketch * snowflake - several options, including tdigest for cheap approximations * presto - qdigest * influxdb - tdigest For approximate results, tdigest seems popular, though the uddsketch paper is relatively new and also interesting. **Additional context** Tdigest provides quantile estimatations, I imagine it would expose an `approx_quantile(column, quantile)` aggregation keeping with the naming of the `approx_distinct()` aggregation. Example: ```sql SELECT approx_quantile(latency, 0.99) AS p99 FROM requests; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
