Hi, The approach we initially took was, in fact, based on the principle of the interquartile range (IQR) – a method that excludes outliers by determining the range between the first and third quartiles. However, I understand from your feedback that directly focusing on the median and quantiles offers a clearer representation. I will adapt the code aligning with your suggestion.
Best, Goksu goksu.in On 18 Aug 2023 1:04 PM +0300, Hin-Tak Leung <ht...@users.sourceforge.net>, wrote: > > > On Friday, 18 August 2023 at 00:21:41 BST, Ahmet Göksu <ah...@goksu.in> wrote: > > > > about outliers, i splitted every tests into chuncks that is sized 100. Made > > IQR calculations and calculated average time on valid chunks. you can find > > the result in the attachment also pushed to gitlab. > > > also, since statistics and benchmarking are a sciences their self, i am a > > bit struggling while approaching the problem as well as feels like out of > > the gsoc project scope. I would like to share this with your indulgence. > > yet, of course I will move in accordance with your instructions. > > Hmm, this is lacking basic maths skills... cutting into chucks and > recombining them aren’t going to deal with outliners. Read about "median", > "quantile" on Wikipedia/Google'ing. Anyway, you want to calculate the > "median" time. E.g. sort 100 numbers by size, getting the average of 50th and > 51th, and your error is the difference between the 91th and the 10th > quantile. ( the 10th and the 91th when you sort them in order of size). If > you can do that for the entire set, do it for the whole set; if not, a > running median - ie. The median of every chuck of 100. Then combine the > running medians. > > This way, the top 9 and bottom 9 values of each 100 have no contribution at > all to your outcome. This is dealing with outliners. > >