On Sep 20, 2012, at 02:43 , Thomas Lumley wrote:

> On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
> <arad...@gmail.com> wrote:
>> Hello All,
>> 
>> I am writing to ask your opinion on how to interpret this case. I have two 
>> vectors "a" and "b" that I am trying to compare.
>> 
>> The wilcoxon test is giving me a pvalue of 5.139217e-303 of a over b with 
>> the alternative "greater". Now if I make a summary on each of them I have 
>> the following
>> 
>>> summary(a)
>>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>> 0.0000000 0.0001411 0.0002381 0.0002671 0.0003623 0.0012910
>>> summary(c)
>>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>> 0.0000000 0.0000000 0.0000000 0.0004947 0.0002972 1.0000000
>> 
>> The mean ratio is then around 0.5399031 which naively goes in opposite 
>> direction of the wilcoxon test ( I was expecting to find a ratio >> 1)
>> 
> 
> There's nothing conceptually strange about the Wilcoxon test showing a
> difference in the opposite direction to the difference in means.  It's
> probably easiest to think about this in terms of the Mann-Whitney
> version of the same test, which is based on the proportion of pairs of
> one observation from each group where the `a' observation is higher.
> Your 'c' vector has a lot more zeros, so a randomly chosen observation
> from 'c' is likely to be smaller than one from 'a', but the non-zero
> observations seem to be larger, so the mean of 'c' is higher.
> 
> The Wilcoxon test probably isn't very useful in a setting like this,
> since its results really make sense only under 'stochastic ordering',
> where the shift is in the same direction across the whole
> distribution.
> 
>  -thomas

I was sure I had seen a definition where X was "larger than" Y if P(X>Y) > 
P(Y<X), but that's obviously not the normal definition. Anyways, it is worth 
emphasizing that that is what the Wilcoxon test tests for, not whether the 
means differ, nor whether the medians do. As a counterexample of the latter, try

x <- rep(0:1, c(60,40))
y <- rep(0:1, c(80,20))
wilcox.test(x,y)
median(x)
median(y)

(and the "location shift" reference in wilcox.test output is a bit of a red 
herring.)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to