Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think it is very likely, as the document of `Wilcoxon` says:
This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute the same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.gra...@effectivedefense.org> wrote: > > > On 2021-3-19 9:52 AM, Jiefei Wang wrote: > > After digging into the R source, it turns out that the argument `exact` > has > > nothing to do with the numeric precision. It only affects the statistic > > model used to compute the p-value. When `exact=TRUE` the true > distribution > > of the statistic will be used. Otherwise, a normal approximation will be > > used. > > > > I think the documentation needs to be improved here, you can compute the > > exact p-value *only* when you do not have any ties in your data. If you > > have ties in your data you will get the p-value from the normal > > approximation no matter what value you put in `exact`. This behavior > should > > be documented or a warning should be given when `exact=TRUE` and ties > > present. > > > > FYI, if the exact p-value is required, `pwilcox` function will be used to > > compute the p-value. There are no details on how it computes the pvalue > but > > its C code seems to compute the probability table, so I assume it > computes > > the exact p-value from the true distribution of the statistic, not a > > permutation or MC p-value. > > > My example shows that it does NOT use Monte Carlo, because > otherwise it uses some distribution. I believe the term "exact" means > that it uses the permutation distribution, though I could be mistaken. > If it's NOT a permutation distribution, I don't know what it is. > > > Spencer > > > > Best, > > Jiefei > > > > > > > > On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com> wrote: > > > >> Hey, > >> > >> I just want to point out that the word "exact" has two meanings. It can > >> mean the numerically accurate p-value as Bogdan asked in his first > email, > >> or it could mean the p-value calculated from the exact distribution of > the > >> statistic(In this case, U stat). These two are actually not related, > even > >> though they all called "exact". > >> > >> Best, > >> Jiefei > >> > >> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < > >> spencer.gra...@effectivedefense.org> wrote: > >> > >>> > >>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote: > >>>> thanks a lot, Vivek ! in other words, assuming that we work with 1000 > >>> data > >>>> points, > >>>> > >>>> shall we use EXACT = TRUE, it uses the normal approximation, > >>>> > >>>> while if EXACT=FALSE (for these large samples), it does not ? > >>> > >>> As David Winsemius noted, the documentation is not clear. > >>> Consider the following: > >>> > >>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x, > >>> y)$p.value > >>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > > >>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, > >>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, > >>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, > >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: > >>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal > >>> approximation, which is the same as exact=FALSE. I think that with > >>> exact=FALSE, you get a permutation distribution, though I'm not sure. > >>> You might try looking at "wilcox_test in package coin for exact, > >>> asymptotic and Monte Carlo conditional p-values, including in the > >>> presence of ties" to see if it is clearer. NOTE: R is case sensitive, > so > >>> "EXACT" is a different variable from "exact". It is interpreted as an > >>> optional argument, which is not recognized and therefore ignored in > this > >>> context. > >>> Hope this helps. > >>> Spencer > >>> > >>> > >>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mm...@gmail.com> > wrote: > >>>> > >>>>> Hi Bogdan, > >>>>> > >>>>> You can also get the information from the link of the Wilcox.test > >>> function > >>>>> page. > >>>>> > >>>>> “By default (if exact is not specified), an exact p-value is computed > >>> if > >>>>> the samples contain less than 50 finite values and there are no ties. > >>>>> Otherwise, a normal approximation is used.” > >>>>> > >>>>> For more: > >>>>> > >>>>> > >>> > https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html > >>>>> Hope this helps! > >>>>> > >>>>> Best, > >>>>> > >>>>> VD > >>>>> > >>>>> > >>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tan...@gmail.com> > >>> wrote: > >>>>>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, > and > >>> that > >>>>>> was the request from the journal. > >>>>>> > >>>>>> if I may ask another question please : what is the meaning of > >>> "exact=TRUE" > >>>>>> or "exact=FALSE" in wilcox.test ? > >>>>>> > >>>>>> i can see that the "numerically precise" p-values are different. > >>> thanks a > >>>>>> lot ! > >>>>>> > >>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>> tst$p.value > >>>>>> [1] 8.535524e-25 > >>>>>> > >>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) > >>>>>> tst$p.value > >>>>>> [1] 3.448211e-25 > >>>>>> > >>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < > >>>>>> peter.langfel...@gmail.com> wrote: > >>>>>> > >>>>>>> I thinnk the answer is much simpler. The print method for > hypothesis > >>>>>>> tests (class htest) truncates the p-values. In the above example, > >>>>>>> instead of using > >>>>>>> > >>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>> > >>>>>>> and copying the output, just print the p-value: > >>>>>>> > >>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>> tst$p.value > >>>>>>> > >>>>>>> [1] 2.988368e-32 > >>>>>>> > >>>>>>> > >>>>>>> I think this value is what the journal asks for. > >>>>>>> > >>>>>>> HTH, > >>>>>>> > >>>>>>> Peter > >>>>>>> > >>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves > >>>>>>> <spencer.gra...@effectivedefense.org> wrote: > >>>>>>>> I would push back on that from two perspectives: > >>>>>>>> > >>>>>>>> > >>>>>>>> 1. I would study exactly what the journal said > very > >>>>>>>> carefully. If they mandated "wilcox.test", that function has an > >>>>>>>> argument called "exact". If that's what they are asking, then > using > >>>>>>>> that argument gives the exact p-value, e.g.: > >>>>>>>> > >>>>>>>> > >>>>>>>> > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>>> > >>>>>>>> Wilcoxon rank sum exact test > >>>>>>>> > >>>>>>>> data: rnorm(100) and rnorm(100, 2) > >>>>>>>> W = 691, p-value < 2.2e-16 > >>>>>>>> > >>>>>>>> > >>>>>>>> 2. If that's NOT what they are asking, then I'm > not > >>>>>>>> convinced what they are asking makes sense: There is is no such > >>> thing > >>>>>>>> as an "exact p value" except to the extent that certain > assumptions > >>>>>>>> hold, and all models are wrong (but some are useful), as George > Box > >>>>>>>> famously said years ago.[1] Truth only exists in mathematics, and > >>>>>>>> that's because it's a fiction to start with ;-) > >>>>>>>> > >>>>>>>> > >>>>>>>> Hope this helps. > >>>>>>>> Spencer Graves > >>>>>>>> > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote: > >>>>>>>>> < > >>> > https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16 > >>>>>>>>> Dear all, > >>>>>>>>> > >>>>>>>>> i would appreciate having your advice on the following please : > >>>>>>>>> > >>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when we > >>>>>> compare > >>>>>>>>> sets of 1000 genes expression (in the genomics field). > >>>>>>>>> > >>>>>>>>> however, the journal asks us to provide the exact p value ... > >>>>>>>>> > >>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a lot, > >>>>>>>>> > >>>>>>>>> -- bogdan > >>>>>>>>> > >>>>>>>>> [[alternative HTML version deleted]] > >>>>>>>>> > >>>>>>>>> ______________________________________________ > >>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>>> PLEASE do read the posting guide > >>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>>> and provide commented, minimal, self-contained, reproducible > code. > >>>>>>>> ______________________________________________ > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>> PLEASE do read the posting guide > >>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>> [[alternative HTML version deleted]] > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting guide > >>>>>> http://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>> > >>>>> -- > >>>>> ---------------------------------------------------------- > >>>>> > >>>>> Vivek Das, PhD > >>>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.