[GitHub] [arrow-datafusion] realno commented on a change in pull request #1539: approx_quantile() aggregation function

GitBox Sat, 29 Jan 2022 12:22:08 -0800


realno commented on a change in pull request #1539:
URL: https://github.com/apache/arrow-datafusion/pull/1539#discussion_r795090321




##########
File path: datafusion/tests/sql/aggregates.rs
##########
@@ -316,6 +316,95 @@ async fn csv_query_approx_count() -> Result<()> {
     Ok(())
 }
 
+// This test executes the APPROX_QUANTILE aggregation against the test data,
+// asserting the estimated quantiles are ±5% their actual values.
+//
+// Actual quantiles calculated with:
+//
+// ```r
+// read_csv("./testing/data/csv/aggregate_test_100.csv") |>
+//     select_if(is.numeric) |>
+//     summarise_all(~ quantile(., c(0.1, 0.5, 0.9)))
+// ```
+//
+// Giving:
+//
+// ```text
+//      c2    c3      c4           c5       c6    c7     c8          c9     
c10   c11    c12
+//   <dbl> <dbl>   <dbl>        <dbl>    <dbl> <dbl>  <dbl>       <dbl>   
<dbl> <dbl>  <dbl>
+// 1     1 -95.3 -22925. -1882606710  -7.25e18  18.9  2671.  472608672. 
1.83e18 0.109 0.0714
+// 2     3  15.5   4599    377164262   1.13e18 134.  30634  2365817608. 
9.30e18 0.491 0.551
+// 3     5 102.   25334.  1991374996.  7.37e18 231   57518. 3776538487. 
1.61e19 0.834 0.946
+// ```
+//
+// Column `c12` is omitted due to a large relative error (~10%) due to the 
small
+// float values.
+#[tokio::test]
+async fn csv_query_approx_quantile() -> Result<()> {

Review comment:
       Looked at it again, it should be fine since the compute logic is tested 
in the algo. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] realno commented on a change in pull request #1539: approx_quantile() aggregation function

Reply via email to