[GitHub] [spark] srowen commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox
srowen commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-562098756
 
 
   Yes, those are pretty different use cases; `approx_median` would also not 
compute a median over a whole column, and indeed, would be prohibitive enough 
to compute that I think we assume it isn't desirable vs a pretty tight bound on 
the median. (IIRC there are helper aggregator classes you can use to do it 
anyway)
   
   The other `array_*` functions are typically there for Hive parity and 
otherwise would be something users just apply a UDF for. It's not crazy, just 
think opinion has turned against adding things that aren't in Hive or standard 
SQL from here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox
srowen commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-562091119
 
 
   That's a pretty different function: it computes a quantile over the whole 
data set. This is a function of a single array value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #26762: [SPARK-30131] add array_median function

2019-12-04 Thread GitBox
srowen commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-561892651
 
 
   Does this exist in any other DBs? that would be the reason to add it, but 
even then, I think we're generally not adding long-tail non-standard functions 
from other DBs anymore. You can do this with a UDF.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org