[Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-25 Thread chtan
Hi,

the outliers in the boxplot do not seem to be drawn in the following extreme
scenario:
Data Value: 1, Frequency: 5
Data Value: 2, Frequency: 100
Data Value: 3, Frequency: 5

Here, Q1 = Q2 = Q3, so IQR = 0.
Data values 1 and 3 are therefore outliers according to the definition in
the api
(Refer to parameter "whis" under "boxplot": 
http://matplotlib.org/api/pyplot_api.html
  )

But the code below produces a boxplot that shows them as max-min whiskers
(rather than fliers):

import matplotlib.pyplot as plt
data = 100 * [2] + 5 * [1] + 5 * [3]
ax = plt.gca()
bp = ax.boxplot(data, showfliers=True)
for flier in bp['fliers']:
flier.set(marker='o', color='gray')

 


What I though it would look like is obtained by perturbing half of the data
points 2 to 2.01:

 


Is this a bug or I'm not getting something right?

rgds
marcus



--
View this message in context: 
http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-26 Thread Paul Hobson
Are you running python 2 or python 3? If you're on python 2, what happens
if you add "from __future__ import division" to the top of your script?

On Tue, Aug 25, 2015 at 10:31 PM, chtan  wrote:

> Hi,
>
> the outliers in the boxplot do not seem to be drawn in the following
> extreme
> scenario:
> Data Value: 1, Frequency: 5
> Data Value: 2, Frequency: 100
> Data Value: 3, Frequency: 5
>
> Here, Q1 = Q2 = Q3, so IQR = 0.
> Data values 1 and 3 are therefore outliers according to the definition in
> the api
> (Refer to parameter "whis" under "boxplot":
> http://matplotlib.org/api/pyplot_api.html
>   )
>
> But the code below produces a boxplot that shows them as max-min whiskers
> (rather than fliers):
>
> import matplotlib.pyplot as plt
> data = 100 * [2] + 5 * [1] + 5 * [3]
> ax = plt.gca()
> bp = ax.boxplot(data, showfliers=True)
> for flier in bp['fliers']:
> flier.set(marker='o', color='gray')
>
> 
>
>
> What I though it would look like is obtained by perturbing half of the data
> points 2 to 2.01:
>
> 
>
>
> Is this a bug or I'm not getting something right?
>
> rgds
> marcus
>
>
>
> --
> View this message in context:
> http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html
> Sent from the matplotlib - users mailing list archive at Nabble.com.
>
>
> --
> ___
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>
--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-26 Thread Paul Hobson
Your perturbed and unperturbed scenarios draw the same figure on my machine
(mpl v1.4.1).

The reason why you don't get any outliers is the following:
Boxplot uses matplotlib.cbook.boxplot_stats under the hood to compute where
everything will be drawn. If you look in there, you'll see this little
nugget:

# interquartile range
stats['iqr'] = q3 - q1
if stats['iqr'] == 0:
whis = 'range'


When whis = 'range', the whiskers fall back to extending to the min an max.
So that is at least the intent of the code. Open to a different
interpretation of what should be happening, though.

On Wed, Aug 26, 2015 at 1:08 AM, Paul Hobson  wrote:

> Are you running python 2 or python 3? If you're on python 2, what happens
> if you add "from __future__ import division" to the top of your script?
>
> On Tue, Aug 25, 2015 at 10:31 PM, chtan  wrote:
>
>> Hi,
>>
>> the outliers in the boxplot do not seem to be drawn in the following
>> extreme
>> scenario:
>> Data Value: 1, Frequency: 5
>> Data Value: 2, Frequency: 100
>> Data Value: 3, Frequency: 5
>>
>> Here, Q1 = Q2 = Q3, so IQR = 0.
>> Data values 1 and 3 are therefore outliers according to the definition in
>> the api
>> (Refer to parameter "whis" under "boxplot":
>> http://matplotlib.org/api/pyplot_api.html
>>   )
>>
>> But the code below produces a boxplot that shows them as max-min whiskers
>> (rather than fliers):
>>
>> import matplotlib.pyplot as plt
>> data = 100 * [2] + 5 * [1] + 5 * [3]
>> ax = plt.gca()
>> bp = ax.boxplot(data, showfliers=True)
>> for flier in bp['fliers']:
>> flier.set(marker='o', color='gray')
>>
>> 
>>
>>
>> What I though it would look like is obtained by perturbing half of the
>> data
>> points 2 to 2.01:
>>
>> 
>>
>>
>> Is this a bug or I'm not getting something right?
>>
>> rgds
>> marcus
>>
>>
>>
>> --
>> View this message in context:
>> http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html
>> Sent from the matplotlib - users mailing list archive at Nabble.com.
>>
>>
>> --
>> ___
>> Matplotlib-users mailing list
>> Matplotlib-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>>
>
>
--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-26 Thread chtan
I'm on python 2.

I get the same outputs after adding "from __future__ import division".



--
View this message in context: 
http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46031.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-26 Thread chtan
Uh, now I understand why it's behaving this way. Tx Paul.

>From the documentation, it seems natural to expect the behaviour to be
uniform throughout the meaningful range for IQR.

How may I go about searching for the responsible code on my own in
situations like this?
>From the perplexing behaviour to the little nugget in
matplotlib.cbook.boxplot_stats, the path isn't clear to me.

Any general advice?



--
View this message in context: 
http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46032.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-26 Thread Paul Hobson
Even though I'm familiar with the boxplot source code, I largely use
IPython for quick investigations like this.

In IPython, doing something like "matplotlib.Axes.boxplot??" shows the full
source code for that functions\.

Then I saw/remembered that boxplot now just calls
matplotlib.cbook.boxplot_stats and passes the results to
matplotlib.Axes.bxp.

So then I did "matplotlib.cbook.boxplot_stats" to see how the whiskers were
computed.
-paul

On Wed, Aug 26, 2015 at 8:43 PM, chtan  wrote:

> Uh, now I understand why it's behaving this way. Tx Paul.
>
> >From the documentation, it seems natural to expect the behaviour to be
> uniform throughout the meaningful range for IQR.
>
> How may I go about searching for the responsible code on my own in
> situations like this?
> >From the perplexing behaviour to the little nugget in
> matplotlib.cbook.boxplot_stats, the path isn't clear to me.
>
> Any general advice?
>
>
>
> --
> View this message in context:
> http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46032.html
> Sent from the matplotlib - users mailing list archive at Nabble.com.
>
>
> --
> ___
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>
--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] boxplot behaviour in an extreme scenario

2015-08-27 Thread chtan
Great, thanks!

Rgds
marcus



--
View this message in context: 
http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46034.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

--
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users