[ https://issues.apache.org/jira/browse/SPARK-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130779#comment-15130779 ]
Antonio Piccolboni commented on SPARK-13131: -------------------------------------------- Arguments against the median 1) You may be interested in outliers, median is not sensitive to those ("stability" can be a negative) 2) Worse, median is not sensitive to half of the data points in the following sense: if a benchmark runs in 1s 51 times out of 100, the median is 1. The other runs can be 1 or 10 or 100 seconds, the median is still 1 3) Most SLAs I've run into, are based on a high percentile (like 90th or 99th). The median is also known a the 50th percentile. Would you buy a sw that runs fast 50% of the time, slow otherwise? 4) In repeated runs of the same function, the total runtime is a product of the number of runs and average runtime. Not so for the median. In practice you need to parametrize with input size, but, again, if AT(s) is average runtime on instances of size s, the expected total time E[AT(S)] where S is the random variable input size. No such simple relation exists for the median Just my 2c before we make a change based on "stability" (the most stable statistics is any constant function) > Use median time in benchmark > ---------------------------- > > Key: SPARK-13131 > URL: https://issues.apache.org/jira/browse/SPARK-13131 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Davies Liu > Assignee: Davies Liu > > Median time should be more stable than average time in benchmark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org