David and Peter, I want to give a *very* belated thanks for your responses. They were enlightening. I ultimately used the wilcox_test function from the coin library.
Cheers, Brad ____________________ W. Bradley Knox, PhD http://bradknox.net bradk...@mit.edu On Wed, Sep 3, 2014 at 4:20 PM, peter dalgaard <pda...@gmail.com> wrote: > Notice that correct=TRUE for wilcox.test refers to the continuity > correction, not the correction for ties. > > You can fairly easily simulate from the exact distribution of W: > > x <- c(359,359,359,359,359,359,335,359,359,359,359, > 359,359,359,359,359,359,359,359,359,359,303,359,359,359) > y <- c(332,85,359,359,359,220,231,300,359,237,359,183,286, > 355,250,105,359,359,298,359,359,359,28.6,359,359,128) > R <- rank(c(x,y)) > sim <- replicate(1e6,sum(sample(R,25))) - 325 > > # With no ties, the ranks would be a permutation of 1:51, and we could do > sim2 <- replicate(1e6,sum(sample(1:51,25))) - 325 > > In either case, the p-value is the probability that W >= 485 or W <= 165, > and > > > mean(sim >= 485 | sim <= 165) > [1] 0.000151 > > mean(sim2 >= 485 | sim2 <= 165) > [1] 0.002182 > > Also, try > > plot(density(sim)) > lines(density(sim2)) > > and notice that the distribution of sim is narrower than that of sim2 > (hence the smaller p-value with tie correction), but also that the normal > approximationtion is not nearly as good as for the untied case. The > "clumpiness" is due to the fact that 35 of the ranks have the maximum value > of 34 (corresponding to the original 359's). > > -pd > > On 03 Sep 2014, at 19:13 , David L Carlson <dcarl...@tamu.edu> wrote: > > > Since they all have the same W/U value, it seems likely that the > difference is how the different versions adjust the standard error for > ties. Here are a couple of posts addressing the issues of ties: > > > > http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html > > > http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and > > > > David C > > > > From: wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com] On Behalf > Of W Bradley Knox > > Sent: Wednesday, September 3, 2014 9:20 AM > > To: David L Carlson > > Cc: Tal Galili; r-help@r-project.org > > Subject: Re: [R] wilcox.test - difference between p-values of R and > online calculators > > > > Tal and David, thanks for your messages. > > > > I should have added that I tried all variations of true/false values for > the exact and correct parameters. Running with correct=FALSE makes only a > tiny change, resulting in W = 485, p-value = 0.0002481. > > > > At one point, I also thought that the discrepancy between R and these > online calculators might come from how ties are handled, but the fact that > R and two of the online calcultors reach the same U/W values seems to > indicate that ties aren't the issue, since (I believe) the U or W values > contain all of the information needed to calculate the p-value, assuming > the number of samples is also known for each condition. (However, it's been > a while since I looked into how MWU tests work, so maybe now's the time to > refresh.) If that's correct, the discrepancy seems to be based in what R > does with the W value that is identical to the U values of two of the > online calculators. (I'm also assuming that U and W have the same meaning, > which seems likely.) > > > > - Brad > > > > ____________________ > > W. Bradley Knox, PhD > > http://bradknox.net<http://bradknox.net/> > > bradk...@mit.edu<mailto:bradk...@mit.edu> > > > > On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson <dcarl...@tamu.edu > <mailto:dcarl...@tamu.edu>> wrote: > > That does not change the results. The problem is likely to be the way > ties are handled. The first sample has 25 values of which 23 are identical > (359). The second sample has 26 values of which 12 are identical (359). The > difference between the implementations may be a result of the way the ties > are ranked. For example the R function rank() offers 5 different ways of > handling the rank on tied observations. With so many ties, that could make > a substantial difference. > > > > Package coin has wilxon_test() which uses Monte Carlo simulation to > estimate the confidence limits. > > > > ------------------------------------- > > David L Carlson > > Department of Anthropology > > Texas A&M University > > College Station, TX 77840-4352 > > > > > > -----Original Message----- > > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> > [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] > On Behalf Of Tal Galili > > Sent: Wednesday, September 3, 2014 5:24 AM > > To: W Bradley Knox > > Cc: r-help@r-project.org<mailto:r-help@r-project.org> > > Subject: Re: [R] wilcox.test - difference between p-values of R and > online calculators > > > > It seems your numbers has ties. What happens if you run wilcox.test with > > correct=FALSE, will the results be the same as the online calculators? > > > > > > > > ----------------Contact > > Details:------------------------------------------------------- > > Contact me: tal.gal...@gmail.com<mailto:tal.gal...@gmail.com> | > > Read me: www.talgalili.com<http://www.talgalili.com> (Hebrew) | > www.biostatistics.co.il<http://www.biostatistics.co.il> (Hebrew) | > > www.r-statistics.com<http://www.r-statistics.com> (English) > > > ---------------------------------------------------------------------------------------------- > > > > > > > > On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox <bradk...@mit.edu<mailto: > bradk...@mit.edu>> wrote: > > > >> Hi. > >> > >> I'm taking the long-overdue step of moving from using online > calculators to > >> compute results for Mann-Whitney U tests to a more streamlined system > >> involving R. > >> > >> However, I'm finding that R computes a different result than the 3 > online > >> calculators that I've used before (all of which approximately agree). > These > >> calculators are here: > >> > >> http://elegans.som.vcu.edu/~leon/stats/utest.cgi > >> http://vassarstats.net/utest.html > >> http://www.socscistatistics.com/tests/mannwhitney/ > >> > >> An example calculation is > >> > >> > >> > *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))* > >> > >> which prints > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> *Wilcoxon rank sum test with continuity correction data: c(359, 359, > 359, > >> 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, > 359, > >> 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, > 359, > >> 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, > 359, > >> 28.6, 359, 359) and 359, 128) W = 485, p-value = 0.0002594 alternative > >> hypothesis: true location shift is not equal to 0 Warning message: In > >> wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot > >> compute exact p-value with ties* > >> > >> > >> However, all of the online calculators find p-values close to 0.0025, > 10x > >> the value output by R. All results are for a two-tailed case. > Importantly, > >> the W value computed by R *does agree* with the U values output by the > >> first two online calculators listed above, yet it has a different > p-value. > >> > >> Can anyone shed some light on how and why R's calculation differs from > that > >> of these online calculators? Thanks for your time. > >> > >> ____________________ > >> W. Bradley Knox, PhD > >> http://bradknox.net > >> bradk...@mit.edu<mailto:bradk...@mit.edu> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org<mailto:R-help@r-project.org> mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.