subject:"\"\\\[GitHub\\\] \\\[spark\\\] hagerf commented on issue #26762\\\: \\\[SPARK\\\-30131\\\] add array

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox

hagerf commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-562107475
 
 
   @srowen Ok, I see. If it's really that restrictive then users can use other 
functions for this, even though I think it could be a popular addition, used by 
many. 
   So should I close this PR or ask some other peoples opinion on the matter?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox

hagerf commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-562097725
 
 
   Yes, of course. But we have the prefix `approx` because calculating exact 
median over a whole dataset is difficult to do efficiently. So users who want 
an exact median are forced to use rdds, or UDF etc on arrays if the data fits 
in an array. 
   My point was: there is no exact median or percentile functionality at all in 
Spark. This would help for some subset of those use cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox

hagerf commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-562059403
 
 
   @HyukjinKwon I added some links, I think they should be relevant. 
   We already have `approxQuantile` but then this would be an exact function, 
limited to arrays. This function only calculates median, which is the 
(probably) the most common use case. I can extend it to support exact 
quantiles, if people think that would be better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-04 Thread GitBox

hagerf commented on issue #26762: [SPARK-30131] add array_median function 
URL: https://github.com/apache/spark/pull/26762#issuecomment-561898965
 
 
   @srowen From a quick googling, I see it in AWS Redshift and in IBM DB2 as 
aggregate functions. I've seen several tickets in Spark requesting median, and 
I know from my work that people use the median frequently so my intention was 
to solve a common request. 
   
   But yes, this can of course be done by a UDF, or combination of other 
functions, but can be a bit cumbersome.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

4 matches

Site Navigation

Mail list logo

Footer information