[ 
https://issues.apache.org/jira/browse/SPARK-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130779#comment-15130779
 ] 

Antonio Piccolboni commented on SPARK-13131:
--------------------------------------------

Arguments against the median
1) You may be interested in outliers, median is not sensitive to those 
("stability" can be a negative)
2) Worse, median is not sensitive to half of the data points in the following 
sense: if a benchmark runs in 1s 51 times out of 100, the median is 1. The 
other runs can be 1 or 10 or 100 seconds, the median is still 1
3) Most SLAs I've run into, are based on a high percentile (like 90th or 99th). 
The median is also known a the 50th percentile. Would you buy a sw that runs 
fast 50% of the time, slow otherwise?
4) In repeated runs of the same function, the total runtime is a product of the 
number of runs and average runtime. Not so for the median. In practice you need 
to parametrize with input size, but, again, if  AT(s) is average runtime on 
instances of size s, the expected total time E[AT(S)] where S is the random 
variable input size. No such simple relation exists for the median

Just my 2c before we make a change based on "stability" (the most stable 
statistics is any constant function)

> Use median time in benchmark
> ----------------------------
>
>                 Key: SPARK-13131
>                 URL: https://issues.apache.org/jira/browse/SPARK-13131
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Davies Liu
>            Assignee: Davies Liu
>
> Median time should be more stable than average time in benchmark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to