Hi,
The approach we initially took was, in fact, based on the principle of the 
interquartile range (IQR) – a method that excludes outliers by determining the 
range between the first and third quartiles. However, I understand from your 
feedback that directly focusing on the median and quantiles offers a clearer 
representation. I will adapt the code aligning with your suggestion.

Best,
Goksu
goksu.in
On 18 Aug 2023 1:04 PM +0300, Hin-Tak Leung <ht...@users.sourceforge.net>, 
wrote:
>
>
> On Friday, 18 August 2023 at 00:21:41 BST, Ahmet Göksu <ah...@goksu.in> wrote:
> >
> > about outliers, i splitted every tests into chuncks that is sized 100. Made 
> > IQR calculations and calculated average time on valid chunks. you can find 
> > the result in the attachment also pushed to gitlab.
>
> > also, since statistics and benchmarking are a sciences their self, i am a 
> > bit struggling while approaching the problem as well as feels like out of 
> > the gsoc project scope. I would like to share this with your indulgence. 
> > yet, of course I will move in accordance with your instructions.
>
> Hmm, this is lacking basic maths skills... cutting into chucks and 
> recombining them aren’t going to deal with outliners. Read about "median", 
> "quantile" on Wikipedia/Google'ing. Anyway, you want to calculate the 
> "median" time. E.g. sort 100 numbers by size, getting the average of 50th and 
> 51th, and your error is the difference between the 91th and the 10th 
> quantile. ( the 10th and the 91th when you sort them in order of size). If 
> you can do that for the entire set, do it for the whole set; if not, a 
> running median - ie. The median of every chuck of 100. Then combine the 
> running medians.
>
> This way, the top 9 and bottom 9 values of each 100 have no contribution at 
> all to your outcome. This is dealing with outliners.
>
>

Reply via email to