[GitHub] [spark] srowen commented on issue #26762: [SPARK-30131] add array_median function
srowen commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-562098756 Yes, those are pretty different use cases; `approx_median` would also not compute a median over a whole column, and indeed, would be prohibitive enough to compute that I think we assume it isn't desirable vs a pretty tight bound on the median. (IIRC there are helper aggregator classes you can use to do it anyway) The other `array_*` functions are typically there for Hive parity and otherwise would be something users just apply a UDF for. It's not crazy, just think opinion has turned against adding things that aren't in Hive or standard SQL from here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #26762: [SPARK-30131] add array_median function
srowen commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-562091119 That's a pretty different function: it computes a quantile over the whole data set. This is a function of a single array value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #26762: [SPARK-30131] add array_median function
srowen commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-561892651 Does this exist in any other DBs? that would be the reason to add it, but even then, I think we're generally not adding long-tail non-standard functions from other DBs anymore. You can do this with a UDF. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org