Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Bogdan Tanasa
thanks a lot, Jiefei ! and thanks to all for your time and comments !

have a good weekend !




On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang  wrote:

> Hi Bogdan,
>
> I think the journal is asking about the exact value of the pvalue, it
> doesn't matter if it is from the exact distribution or normal
> approximation. However, it does not make any sense to report such a small
> pvlaue. If I was you, I would show the reviewers the exact pvalue they want
> and gently explain why you did not put it into your paper. If they insist
> that the number must be on the paper, then go ahead and do it.
>
> Best,
> Jiefei
>
>
>
> Bogdan Tanasa  于 2021年3月20日周六 上午2:39写道:
>
>> Thank you Kevin, their wording is "Please note that the exact p value
>> should be provided, when possible, etc"
>>
>> by "exact p-value" i believe that they do mean indeed the actual number,
>> and not to specify "exact=TRUE" ;
>>
>> as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC,
>> it runs out of memory ...
>>
>> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
>>
>> On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe 
>> wrote:
>>
>> > I have to ask since. Are you sure the journal simply means by exact
>> > p-value that they don’t want to see a p-value given as < 0.0001, for
>> > example, and simply want the actual number?
>> >
>> > I cannot imagine they really meant exact as in the p-value from some
>> exact
>> > distribution.
>> >
>> > --
>> > Kevin E. Thorpe
>> > Head of Biostatistics,  Applied Health Research Centre (AHRC)
>> > Li Ka Shing Knowledge Institute of St. Michael's
>> > Assistant Professor, Dalla Lana School of Public Health
>> > University of Toronto
>> > email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
>> >
>> > > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa  wrote:
>> > >
>> > > EXTERNAL EMAIL:
>> > >
>> > > Dear all, thank you all for comments and help.
>> > >
>> > > as far as i can see, shall we have samples of 1000 records, only
>> > > "exact=FALSE" allows the code to run:
>> > >
>> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
>> > > [1] 7.304863e-231
>> > >
>> > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
>> > >
>> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
>> > > (the job is terminated by OS)
>> > >
>> > > shall you have any other suggestions, please let me know. thanks a
>> lot !
>> > >
>> > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter 
>> > wrote:
>> > >
>> > >> I **believe** -- if my old memory still serves-- that the "exact"
>> > >> specification uses a home grown version of the algorithm to calculate
>> > >> exact,  or close approximations to the exact, permutation
>> distribution
>> > >> originally developed by Cyrus Mehta, founder of StatXact software.
>> Of
>> > >> course, examining the C code source would determine this, but I don't
>> > care
>> > >> to attempt this.
>> > >>
>> > >> If this is (no longer?) correct, please point this out.
>> > >>
>> > >> Best,
>> > >>
>> > >> Bert Gunter
>> > >>
>> > >> "The trouble with having an open mind is that people keep coming
>> along
>> > and
>> > >> sticking things into it."
>> > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> > >>
>> > >>
>> > >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang 
>> wrote:
>> > >>
>> > >>> Hi Spencer,
>> > >>>
>> > >>> Thanks for your test results, I do not know the answer as I haven't
>> > >>> used wilcox.test for many years. I do not know if it is possible to
>> > >>> compute
>> > >>> the exact distribution of the Wilcoxon rank sum statistic, but I
>> think
>> > it
>> > >>> is very likely, as the document of `Wilcoxon` says:
>> > >>>
>> > >>> This distribution is obtained as follows. Let x and y be two random,
>> > >>> independent samples of size m and n. Then the Wilcoxon rank sum
>> > statistic
>> > >>> is the number of all pairs (x[i], y[j]) for which y[j] is not
>> greater
>> > than
>> > >>> x[i]. This statistic takes values between 0 and m * n, and its mean
>> and
>> > >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
>> > >>>
>> > >>> As a nice feature of the non-parametric statistic, it is usually
>> > >>> distribution-free so you can pick any distribution you like to
>> compute
>> > the
>> > >>> same statistic. I wonder if this is the case, but I might be wrong.
>> > >>>
>> > >>> Cheers,
>> > >>> Jiefei
>> > >>>
>> > >>>
>> > >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
>> > >>> spencer.gra...@effectivedefense.org> wrote:
>> > >>>
>> > 
>> > 
>> >  On 2021-3-19 9:52 AM, Jiefei Wang wrote:
>> > > After digging into the R source, it turns out that the argument
>> > >>> `exact`
>> >  has
>> > > nothing to do with the numeric precision. It only affects the
>> > >>> statistic
>> > > model used to compute the p-value. When `exact=TRUE` the true
>> >  distribution
>> > > of the statistic will be used. Otherwise, a normal 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Jiefei Wang
Hi Bogdan,

I think the journal is asking about the exact value of the pvalue, it
doesn't matter if it is from the exact distribution or normal
approximation. However, it does not make any sense to report such a small
pvlaue. If I was you, I would show the reviewers the exact pvalue they want
and gently explain why you did not put it into your paper. If they insist
that the number must be on the paper, then go ahead and do it.

Best,
Jiefei



Bogdan Tanasa  于 2021年3月20日周六 上午2:39写道:

> Thank you Kevin, their wording is "Please note that the exact p value
> should be provided, when possible, etc"
>
> by "exact p-value" i believe that they do mean indeed the actual number,
> and not to specify "exact=TRUE" ;
>
> as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC,
> it runs out of memory ...
>
> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
>
> On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe 
> wrote:
>
> > I have to ask since. Are you sure the journal simply means by exact
> > p-value that they don’t want to see a p-value given as < 0.0001, for
> > example, and simply want the actual number?
> >
> > I cannot imagine they really meant exact as in the p-value from some
> exact
> > distribution.
> >
> > --
> > Kevin E. Thorpe
> > Head of Biostatistics,  Applied Health Research Centre (AHRC)
> > Li Ka Shing Knowledge Institute of St. Michael's
> > Assistant Professor, Dalla Lana School of Public Health
> > University of Toronto
> > email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
> >
> > > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa  wrote:
> > >
> > > EXTERNAL EMAIL:
> > >
> > > Dear all, thank you all for comments and help.
> > >
> > > as far as i can see, shall we have samples of 1000 records, only
> > > "exact=FALSE" allows the code to run:
> > >
> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
> > > [1] 7.304863e-231
> > >
> > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
> > >
> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
> > > (the job is terminated by OS)
> > >
> > > shall you have any other suggestions, please let me know. thanks a lot
> !
> > >
> > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter 
> > wrote:
> > >
> > >> I **believe** -- if my old memory still serves-- that the "exact"
> > >> specification uses a home grown version of the algorithm to calculate
> > >> exact,  or close approximations to the exact, permutation distribution
> > >> originally developed by Cyrus Mehta, founder of StatXact software.  Of
> > >> course, examining the C code source would determine this, but I don't
> > care
> > >> to attempt this.
> > >>
> > >> If this is (no longer?) correct, please point this out.
> > >>
> > >> Best,
> > >>
> > >> Bert Gunter
> > >>
> > >> "The trouble with having an open mind is that people keep coming along
> > and
> > >> sticking things into it."
> > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >>
> > >>
> > >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang 
> wrote:
> > >>
> > >>> Hi Spencer,
> > >>>
> > >>> Thanks for your test results, I do not know the answer as I haven't
> > >>> used wilcox.test for many years. I do not know if it is possible to
> > >>> compute
> > >>> the exact distribution of the Wilcoxon rank sum statistic, but I
> think
> > it
> > >>> is very likely, as the document of `Wilcoxon` says:
> > >>>
> > >>> This distribution is obtained as follows. Let x and y be two random,
> > >>> independent samples of size m and n. Then the Wilcoxon rank sum
> > statistic
> > >>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater
> > than
> > >>> x[i]. This statistic takes values between 0 and m * n, and its mean
> and
> > >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
> > >>>
> > >>> As a nice feature of the non-parametric statistic, it is usually
> > >>> distribution-free so you can pick any distribution you like to
> compute
> > the
> > >>> same statistic. I wonder if this is the case, but I might be wrong.
> > >>>
> > >>> Cheers,
> > >>> Jiefei
> > >>>
> > >>>
> > >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
> > >>> spencer.gra...@effectivedefense.org> wrote:
> > >>>
> > 
> > 
> >  On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> > > After digging into the R source, it turns out that the argument
> > >>> `exact`
> >  has
> > > nothing to do with the numeric precision. It only affects the
> > >>> statistic
> > > model used to compute the p-value. When `exact=TRUE` the true
> >  distribution
> > > of the statistic will be used. Otherwise, a normal approximation
> will
> > >>> be
> > > used.
> > >
> > > I think the documentation needs to be improved here, you can
> compute
> > >>> the
> > > exact p-value *only* when you do not have any ties in your data. If
> > >>> you
> > > have ties in your data you will get the p-value from the normal
> > 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Bert Gunter
Yes, Bogdan, that sounds *exactly* right.  ;-)  -- it runs out of memory
trying to calculate the exact permutation distribution. What you apparently
get with exact = FALSE is the exact answer( to within floating point
arithmetic's approximation) to a normal approximation.

... and furthermore...
I would imagine any random number below, say, 1e-100 would serve equally
well and would be equally correct/incorrect. I also imagine that a sensible
display of the paired differences  or even just a count of how many of the
thousand are, say, >0, would make even more sense than an overwrought and
unnecessary p-value. But that is just my personal opinion of senseless
standard scientific practice, and if anyone want to dispute it, please
reply OFFLIST, though I would probably not disagree with any such criticism
of my cynicism.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 19, 2021 at 10:22 AM Bogdan Tanasa  wrote:

> Dear all, thank you all for comments and help.
>
> as far as i can see, shall we have samples of 1000 records, only
> "exact=FALSE" allows the code to run:
>
> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
> [1] 7.304863e-231
>
> shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
>
> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
> (the job is terminated by OS)
>
> shall you have any other suggestions, please let me know. thanks a lot !
>
> On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter 
> wrote:
>
>> I **believe** -- if my old memory still serves-- that the "exact"
>> specification uses a home grown version of the algorithm to calculate
>> exact,  or close approximations to the exact, permutation distribution
>> originally developed by Cyrus Mehta, founder of StatXact software.  Of
>> course, examining the C code source would determine this, but I don't care
>> to attempt this.
>>
>> If this is (no longer?) correct, please point this out.
>>
>> Best,
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang  wrote:
>>
>>> Hi Spencer,
>>>
>>> Thanks for your test results, I do not know the answer as I haven't
>>> used wilcox.test for many years. I do not know if it is possible to
>>> compute
>>> the exact distribution of the Wilcoxon rank sum statistic, but I think it
>>> is very likely, as the document of `Wilcoxon` says:
>>>
>>> This distribution is obtained as follows. Let x and y be two random,
>>> independent samples of size m and n. Then the Wilcoxon rank sum statistic
>>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater
>>> than
>>> x[i]. This statistic takes values between 0 and m * n, and its mean and
>>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
>>>
>>> As a nice feature of the non-parametric statistic, it is usually
>>> distribution-free so you can pick any distribution you like to compute
>>> the
>>> same statistic. I wonder if this is the case, but I might be wrong.
>>>
>>> Cheers,
>>> Jiefei
>>>
>>>
>>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
>>> spencer.gra...@effectivedefense.org> wrote:
>>>
>>> >
>>> >
>>> > On 2021-3-19 9:52 AM, Jiefei Wang wrote:
>>> > > After digging into the R source, it turns out that the argument
>>> `exact`
>>> > has
>>> > > nothing to do with the numeric precision. It only affects the
>>> statistic
>>> > > model used to compute the p-value. When `exact=TRUE` the true
>>> > distribution
>>> > > of the statistic will be used. Otherwise, a normal approximation
>>> will be
>>> > > used.
>>> > >
>>> > > I think the documentation needs to be improved here, you can compute
>>> the
>>> > > exact p-value *only* when you do not have any ties in your data. If
>>> you
>>> > > have ties in your data you will get the p-value from the normal
>>> > > approximation no matter what value you put in `exact`. This behavior
>>> > should
>>> > > be documented or a warning should be given when `exact=TRUE` and ties
>>> > > present.
>>> > >
>>> > > FYI, if the exact p-value is required, `pwilcox` function will be
>>> used to
>>> > > compute the p-value. There are no details on how it computes the
>>> pvalue
>>> > but
>>> > > its C code seems to compute the probability table, so I assume it
>>> > computes
>>> > > the exact p-value from the true distribution of the statistic, not a
>>> > > permutation or MC p-value.
>>> >
>>> >
>>> >My example shows that it does NOT use Monte Carlo, because
>>> > otherwise it uses some distribution.  I believe the term "exact" means
>>> > that it uses the permutation distribution, though I could be mistaken.
>>> > If it's NOT a permutation distribution, I don't know what it is.
>>> >
>>> >
>>> >Spencer

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Bogdan Tanasa
Thank you Kevin, their wording is "Please note that the exact p value
should be provided, when possible, etc"

by "exact p-value" i believe that they do mean indeed the actual number,
and not to specify "exact=TRUE" ;

as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC,
it runs out of memory ...

wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value

On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe 
wrote:

> I have to ask since. Are you sure the journal simply means by exact
> p-value that they don’t want to see a p-value given as < 0.0001, for
> example, and simply want the actual number?
>
> I cannot imagine they really meant exact as in the p-value from some exact
> distribution.
>
> --
> Kevin E. Thorpe
> Head of Biostatistics,  Applied Health Research Centre (AHRC)
> Li Ka Shing Knowledge Institute of St. Michael's
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
>
> > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa  wrote:
> >
> > EXTERNAL EMAIL:
> >
> > Dear all, thank you all for comments and help.
> >
> > as far as i can see, shall we have samples of 1000 records, only
> > "exact=FALSE" allows the code to run:
> >
> > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
> > [1] 7.304863e-231
> >
> > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
> >
> > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
> > (the job is terminated by OS)
> >
> > shall you have any other suggestions, please let me know. thanks a lot !
> >
> > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter 
> wrote:
> >
> >> I **believe** -- if my old memory still serves-- that the "exact"
> >> specification uses a home grown version of the algorithm to calculate
> >> exact,  or close approximations to the exact, permutation distribution
> >> originally developed by Cyrus Mehta, founder of StatXact software.  Of
> >> course, examining the C code source would determine this, but I don't
> care
> >> to attempt this.
> >>
> >> If this is (no longer?) correct, please point this out.
> >>
> >> Best,
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> and
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang  wrote:
> >>
> >>> Hi Spencer,
> >>>
> >>> Thanks for your test results, I do not know the answer as I haven't
> >>> used wilcox.test for many years. I do not know if it is possible to
> >>> compute
> >>> the exact distribution of the Wilcoxon rank sum statistic, but I think
> it
> >>> is very likely, as the document of `Wilcoxon` says:
> >>>
> >>> This distribution is obtained as follows. Let x and y be two random,
> >>> independent samples of size m and n. Then the Wilcoxon rank sum
> statistic
> >>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater
> than
> >>> x[i]. This statistic takes values between 0 and m * n, and its mean and
> >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
> >>>
> >>> As a nice feature of the non-parametric statistic, it is usually
> >>> distribution-free so you can pick any distribution you like to compute
> the
> >>> same statistic. I wonder if this is the case, but I might be wrong.
> >>>
> >>> Cheers,
> >>> Jiefei
> >>>
> >>>
> >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
> >>> spencer.gra...@effectivedefense.org> wrote:
> >>>
> 
> 
>  On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> > After digging into the R source, it turns out that the argument
> >>> `exact`
>  has
> > nothing to do with the numeric precision. It only affects the
> >>> statistic
> > model used to compute the p-value. When `exact=TRUE` the true
>  distribution
> > of the statistic will be used. Otherwise, a normal approximation will
> >>> be
> > used.
> >
> > I think the documentation needs to be improved here, you can compute
> >>> the
> > exact p-value *only* when you do not have any ties in your data. If
> >>> you
> > have ties in your data you will get the p-value from the normal
> > approximation no matter what value you put in `exact`. This behavior
>  should
> > be documented or a warning should be given when `exact=TRUE` and ties
> > present.
> >
> > FYI, if the exact p-value is required, `pwilcox` function will be
> >>> used to
> > compute the p-value. There are no details on how it computes the
> >>> pvalue
>  but
> > its C code seems to compute the probability table, so I assume it
>  computes
> > the exact p-value from the true distribution of the statistic, not a
> > permutation or MC p-value.
> 
> 
>    My example shows that it does NOT use Monte Carlo, because
>  otherwise it uses some distribution.  I believe the term "exact" means

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Kevin Thorpe
I have to ask since. Are you sure the journal simply means by exact p-value 
that they don’t want to see a p-value given as < 0.0001, for example, and 
simply want the actual number?

I cannot imagine they really meant exact as in the p-value from some exact 
distribution.

-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

> On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa  wrote:
> 
> EXTERNAL EMAIL:
> 
> Dear all, thank you all for comments and help.
> 
> as far as i can see, shall we have samples of 1000 records, only
> "exact=FALSE" allows the code to run:
> 
> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
> [1] 7.304863e-231
> 
> shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
> 
> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
> (the job is terminated by OS)
> 
> shall you have any other suggestions, please let me know. thanks a lot !
> 
> On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter  wrote:
> 
>> I **believe** -- if my old memory still serves-- that the "exact"
>> specification uses a home grown version of the algorithm to calculate
>> exact,  or close approximations to the exact, permutation distribution
>> originally developed by Cyrus Mehta, founder of StatXact software.  Of
>> course, examining the C code source would determine this, but I don't care
>> to attempt this.
>> 
>> If this is (no longer?) correct, please point this out.
>> 
>> Best,
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang  wrote:
>> 
>>> Hi Spencer,
>>> 
>>> Thanks for your test results, I do not know the answer as I haven't
>>> used wilcox.test for many years. I do not know if it is possible to
>>> compute
>>> the exact distribution of the Wilcoxon rank sum statistic, but I think it
>>> is very likely, as the document of `Wilcoxon` says:
>>> 
>>> This distribution is obtained as follows. Let x and y be two random,
>>> independent samples of size m and n. Then the Wilcoxon rank sum statistic
>>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater than
>>> x[i]. This statistic takes values between 0 and m * n, and its mean and
>>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
>>> 
>>> As a nice feature of the non-parametric statistic, it is usually
>>> distribution-free so you can pick any distribution you like to compute the
>>> same statistic. I wonder if this is the case, but I might be wrong.
>>> 
>>> Cheers,
>>> Jiefei
>>> 
>>> 
>>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
>>> spencer.gra...@effectivedefense.org> wrote:
>>> 
 
 
 On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> After digging into the R source, it turns out that the argument
>>> `exact`
 has
> nothing to do with the numeric precision. It only affects the
>>> statistic
> model used to compute the p-value. When `exact=TRUE` the true
 distribution
> of the statistic will be used. Otherwise, a normal approximation will
>>> be
> used.
> 
> I think the documentation needs to be improved here, you can compute
>>> the
> exact p-value *only* when you do not have any ties in your data. If
>>> you
> have ties in your data you will get the p-value from the normal
> approximation no matter what value you put in `exact`. This behavior
 should
> be documented or a warning should be given when `exact=TRUE` and ties
> present.
> 
> FYI, if the exact p-value is required, `pwilcox` function will be
>>> used to
> compute the p-value. There are no details on how it computes the
>>> pvalue
 but
> its C code seems to compute the probability table, so I assume it
 computes
> the exact p-value from the true distribution of the statistic, not a
> permutation or MC p-value.
 
 
   My example shows that it does NOT use Monte Carlo, because
 otherwise it uses some distribution.  I believe the term "exact" means
 that it uses the permutation distribution, though I could be mistaken.
 If it's NOT a permutation distribution, I don't know what it is.
 
 
   Spencer
> 
> Best,
> Jiefei
> 
> 
> 
> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang 
>>> wrote:
> 
>> Hey,
>> 
>> I just want to point out that the word "exact" has two meanings. It
>>> can
>> mean the numerically accurate p-value as Bogdan asked in his first
 email,
>> or it could mean the p-value calculated from the exact distribution
>>> of
 the
>> statistic(In this case, U stat). These two are actually not 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Bogdan Tanasa
Dear all, thank you all for comments and help.

as far as i can see, shall we have samples of 1000 records, only
"exact=FALSE" allows the code to run:

wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
[1] 7.304863e-231

shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :

wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
(the job is terminated by OS)

shall you have any other suggestions, please let me know. thanks a lot !

On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter  wrote:

> I **believe** -- if my old memory still serves-- that the "exact"
> specification uses a home grown version of the algorithm to calculate
> exact,  or close approximations to the exact, permutation distribution
> originally developed by Cyrus Mehta, founder of StatXact software.  Of
> course, examining the C code source would determine this, but I don't care
> to attempt this.
>
> If this is (no longer?) correct, please point this out.
>
> Best,
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang  wrote:
>
>> Hi Spencer,
>>
>> Thanks for your test results, I do not know the answer as I haven't
>> used wilcox.test for many years. I do not know if it is possible to
>> compute
>> the exact distribution of the Wilcoxon rank sum statistic, but I think it
>> is very likely, as the document of `Wilcoxon` says:
>>
>> This distribution is obtained as follows. Let x and y be two random,
>> independent samples of size m and n. Then the Wilcoxon rank sum statistic
>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater than
>> x[i]. This statistic takes values between 0 and m * n, and its mean and
>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
>>
>> As a nice feature of the non-parametric statistic, it is usually
>> distribution-free so you can pick any distribution you like to compute the
>> same statistic. I wonder if this is the case, but I might be wrong.
>>
>> Cheers,
>> Jiefei
>>
>>
>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
>> spencer.gra...@effectivedefense.org> wrote:
>>
>> >
>> >
>> > On 2021-3-19 9:52 AM, Jiefei Wang wrote:
>> > > After digging into the R source, it turns out that the argument
>> `exact`
>> > has
>> > > nothing to do with the numeric precision. It only affects the
>> statistic
>> > > model used to compute the p-value. When `exact=TRUE` the true
>> > distribution
>> > > of the statistic will be used. Otherwise, a normal approximation will
>> be
>> > > used.
>> > >
>> > > I think the documentation needs to be improved here, you can compute
>> the
>> > > exact p-value *only* when you do not have any ties in your data. If
>> you
>> > > have ties in your data you will get the p-value from the normal
>> > > approximation no matter what value you put in `exact`. This behavior
>> > should
>> > > be documented or a warning should be given when `exact=TRUE` and ties
>> > > present.
>> > >
>> > > FYI, if the exact p-value is required, `pwilcox` function will be
>> used to
>> > > compute the p-value. There are no details on how it computes the
>> pvalue
>> > but
>> > > its C code seems to compute the probability table, so I assume it
>> > computes
>> > > the exact p-value from the true distribution of the statistic, not a
>> > > permutation or MC p-value.
>> >
>> >
>> >My example shows that it does NOT use Monte Carlo, because
>> > otherwise it uses some distribution.  I believe the term "exact" means
>> > that it uses the permutation distribution, though I could be mistaken.
>> > If it's NOT a permutation distribution, I don't know what it is.
>> >
>> >
>> >Spencer
>> > >
>> > > Best,
>> > > Jiefei
>> > >
>> > >
>> > >
>> > > On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang 
>> wrote:
>> > >
>> > >> Hey,
>> > >>
>> > >> I just want to point out that the word "exact" has two meanings. It
>> can
>> > >> mean the numerically accurate p-value as Bogdan asked in his first
>> > email,
>> > >> or it could mean the p-value calculated from the exact distribution
>> of
>> > the
>> > >> statistic(In this case, U stat). These two are actually not related,
>> > even
>> > >> though they all called "exact".
>> > >>
>> > >> Best,
>> > >> Jiefei
>> > >>
>> > >> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
>> > >> spencer.gra...@effectivedefense.org> wrote:
>> > >>
>> > >>>
>> > >>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>> >  thanks a lot, Vivek ! in other words, assuming that we work with
>> 1000
>> > >>> data
>> >  points,
>> > 
>> >  shall we use EXACT = TRUE, it uses the normal approximation,
>> > 
>> >  while if EXACT=FALSE (for these large samples), it does not ?
>> > >>>
>> > >>> As David Winsemius noted, the documentation is not clear.
>> > >>> Consider the following:
>> > >>>
>> >  set.seed(1)  > x <- 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Viechtbauer, Wolfgang (SP)
For me, it was always clear based on the documentation that if there are ties, 
then the normal approximation is used (irrespective of what 'exact' is set to). 
In fact, if there are ties, the output even tells you that this is happening:

wilcox.test(c(1,3,2,2,4), exact=TRUE)

[...]
Warning message:
In wilcox.test.default(c(1, 3, 2, 2, 4), exact = TRUE) :
  cannot compute exact p-value with ties

Best,
Wolfgang

>-Original Message-
>From: Jiefei Wang [mailto:szwj...@gmail.com]
>Sent: Friday, 19 March, 2021 16:32
>To: Viechtbauer, Wolfgang (SP)
>Cc: r-help
>Subject: Re: [R] about a p-value < 2.2e-16
>
>Dear Wolfgang,
>
>Thanks for the documentation, but the document only states the default 
>behavior,
>it does not mention what would happen if we tell it to compute the exact 
>p-value
>but the data has ties. I think this would be misleading as people might think
>their result is exact by specifying `exact=TRUE` but the truth is that their 
>data
>contains ties and the result is from the normal approximation.
>
>Best,
>Jiefei
>
>On Fri, Mar 19, 2021 at 11:18 PM Viechtbauer, Wolfgang (SP)
> wrote:
>Dear Jiefei,
>
>This behavior is documented. From help(wilcox.test):
>
>"By default (if exact is not specified), an exact p-value is computed if the
>samples contain less than 50 finite values and there are no ties. Otherwise, a
>normal approximation is used."
>
>Best,
>Wolfgang
>
>>-Original Message-
>>From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jiefei Wang
>>Sent: Friday, 19 March, 2021 15:52
>>To: Spencer Graves
>>Cc: r-help; Bogdan Tanasa
>>Subject: Re: [R] about a p-value < 2.2e-16
>>
>>After digging into the R source, it turns out that the argument `exact` has
>>nothing to do with the numeric precision. It only affects the statistic
>>model used to compute the p-value. When `exact=TRUE` the true distribution
>>of the statistic will be used. Otherwise, a normal approximation will be
>>used.
>>
>>I think the documentation needs to be improved here, you can compute the
>>exact p-value *only* when you do not have any ties in your data. If you
>>have ties in your data you will get the p-value from the normal
>>approximation no matter what value you put in `exact`. This behavior should
>>be documented or a warning should be given when `exact=TRUE` and ties
>>present.
>>
>>FYI, if the exact p-value is required, `pwilcox` function will be used to
>>compute the p-value. There are no details on how it computes the pvalue but
>>its C code seems to compute the probability table, so I assume it computes
>>the exact p-value from the true distribution of the statistic, not a
>>permutation or MC p-value.
>>
>>Best,
>>Jiefei
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Bert Gunter
I **believe** -- if my old memory still serves-- that the "exact"
specification uses a home grown version of the algorithm to calculate
exact,  or close approximations to the exact, permutation distribution
originally developed by Cyrus Mehta, founder of StatXact software.  Of
course, examining the C code source would determine this, but I don't care
to attempt this.

If this is (no longer?) correct, please point this out.

Best,

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang  wrote:

> Hi Spencer,
>
> Thanks for your test results, I do not know the answer as I haven't
> used wilcox.test for many years. I do not know if it is possible to compute
> the exact distribution of the Wilcoxon rank sum statistic, but I think it
> is very likely, as the document of `Wilcoxon` says:
>
> This distribution is obtained as follows. Let x and y be two random,
> independent samples of size m and n. Then the Wilcoxon rank sum statistic
> is the number of all pairs (x[i], y[j]) for which y[j] is not greater than
> x[i]. This statistic takes values between 0 and m * n, and its mean and
> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
>
> As a nice feature of the non-parametric statistic, it is usually
> distribution-free so you can pick any distribution you like to compute the
> same statistic. I wonder if this is the case, but I might be wrong.
>
> Cheers,
> Jiefei
>
>
> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
> spencer.gra...@effectivedefense.org> wrote:
>
> >
> >
> > On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> > > After digging into the R source, it turns out that the argument `exact`
> > has
> > > nothing to do with the numeric precision. It only affects the statistic
> > > model used to compute the p-value. When `exact=TRUE` the true
> > distribution
> > > of the statistic will be used. Otherwise, a normal approximation will
> be
> > > used.
> > >
> > > I think the documentation needs to be improved here, you can compute
> the
> > > exact p-value *only* when you do not have any ties in your data. If you
> > > have ties in your data you will get the p-value from the normal
> > > approximation no matter what value you put in `exact`. This behavior
> > should
> > > be documented or a warning should be given when `exact=TRUE` and ties
> > > present.
> > >
> > > FYI, if the exact p-value is required, `pwilcox` function will be used
> to
> > > compute the p-value. There are no details on how it computes the pvalue
> > but
> > > its C code seems to compute the probability table, so I assume it
> > computes
> > > the exact p-value from the true distribution of the statistic, not a
> > > permutation or MC p-value.
> >
> >
> >My example shows that it does NOT use Monte Carlo, because
> > otherwise it uses some distribution.  I believe the term "exact" means
> > that it uses the permutation distribution, though I could be mistaken.
> > If it's NOT a permutation distribution, I don't know what it is.
> >
> >
> >Spencer
> > >
> > > Best,
> > > Jiefei
> > >
> > >
> > >
> > > On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang 
> wrote:
> > >
> > >> Hey,
> > >>
> > >> I just want to point out that the word "exact" has two meanings. It
> can
> > >> mean the numerically accurate p-value as Bogdan asked in his first
> > email,
> > >> or it could mean the p-value calculated from the exact distribution of
> > the
> > >> statistic(In this case, U stat). These two are actually not related,
> > even
> > >> though they all called "exact".
> > >>
> > >> Best,
> > >> Jiefei
> > >>
> > >> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> > >> spencer.gra...@effectivedefense.org> wrote:
> > >>
> > >>>
> > >>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
> >  thanks a lot, Vivek ! in other words, assuming that we work with
> 1000
> > >>> data
> >  points,
> > 
> >  shall we use EXACT = TRUE, it uses the normal approximation,
> > 
> >  while if EXACT=FALSE (for these large samples), it does not ?
> > >>>
> > >>> As David Winsemius noted, the documentation is not clear.
> > >>> Consider the following:
> > >>>
> >  set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > >
> wilcox.test(x,
> > >>> y)$p.value
> > >>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
> > >>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
> wilcox.test(x,
> > >>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
> > >>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
> > >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>> exact=FALSE)$p.value [1] 1.172189e-25 > We 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Jiefei Wang
Hi Spencer,

Thanks for your test results, I do not know the answer as I haven't
used wilcox.test for many years. I do not know if it is possible to compute
the exact distribution of the Wilcoxon rank sum statistic, but I think it
is very likely, as the document of `Wilcoxon` says:

This distribution is obtained as follows. Let x and y be two random,
independent samples of size m and n. Then the Wilcoxon rank sum statistic
is the number of all pairs (x[i], y[j]) for which y[j] is not greater than
x[i]. This statistic takes values between 0 and m * n, and its mean and
variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.

As a nice feature of the non-parametric statistic, it is usually
distribution-free so you can pick any distribution you like to compute the
same statistic. I wonder if this is the case, but I might be wrong.

Cheers,
Jiefei


On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

>
>
> On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> > After digging into the R source, it turns out that the argument `exact`
> has
> > nothing to do with the numeric precision. It only affects the statistic
> > model used to compute the p-value. When `exact=TRUE` the true
> distribution
> > of the statistic will be used. Otherwise, a normal approximation will be
> > used.
> >
> > I think the documentation needs to be improved here, you can compute the
> > exact p-value *only* when you do not have any ties in your data. If you
> > have ties in your data you will get the p-value from the normal
> > approximation no matter what value you put in `exact`. This behavior
> should
> > be documented or a warning should be given when `exact=TRUE` and ties
> > present.
> >
> > FYI, if the exact p-value is required, `pwilcox` function will be used to
> > compute the p-value. There are no details on how it computes the pvalue
> but
> > its C code seems to compute the probability table, so I assume it
> computes
> > the exact p-value from the true distribution of the statistic, not a
> > permutation or MC p-value.
>
>
>My example shows that it does NOT use Monte Carlo, because
> otherwise it uses some distribution.  I believe the term "exact" means
> that it uses the permutation distribution, though I could be mistaken.
> If it's NOT a permutation distribution, I don't know what it is.
>
>
>Spencer
> >
> > Best,
> > Jiefei
> >
> >
> >
> > On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang  wrote:
> >
> >> Hey,
> >>
> >> I just want to point out that the word "exact" has two meanings. It can
> >> mean the numerically accurate p-value as Bogdan asked in his first
> email,
> >> or it could mean the p-value calculated from the exact distribution of
> the
> >> statistic(In this case, U stat). These two are actually not related,
> even
> >> though they all called "exact".
> >>
> >> Best,
> >> Jiefei
> >>
> >> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> >> spencer.gra...@effectivedefense.org> wrote:
> >>
> >>>
> >>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>  thanks a lot, Vivek ! in other words, assuming that we work with 1000
> >>> data
>  points,
> 
>  shall we use EXACT = TRUE, it uses the normal approximation,
> 
>  while if EXACT=FALSE (for these large samples), it does not ?
> >>>
> >>> As David Winsemius noted, the documentation is not clear.
> >>> Consider the following:
> >>>
>  set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
> >>> y)$p.value
> >>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
> >>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
> >>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
> >>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
> >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
> >>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
> >>> approximation, which is the same as exact=FALSE. I think that with
> >>> exact=FALSE, you get a permutation distribution, though I'm not sure.
> >>> You might try looking at "wilcox_test in package coin for exact,
> >>> asymptotic and Monte Carlo conditional p-values, including in the
> >>> presence of ties" to see if it is clearer. NOTE: R is case sensitive,
> so
> >>> "EXACT" is a different variable from "exact". It is interpreted as an
> >>> optional argument, which is not recognized and therefore ignored in
> this
> >>> context.
> >>>Hope this helps.
> >>>Spencer
> >>>
> >>>
>  On Thu, Mar 18, 2021 at 10:47 PM Vivek Das 
> wrote:
> 
> > Hi Bogdan,
> >
> > You can also get the information from the link of the Wilcox.test
> >>> function
> 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Bogdan Tanasa
Dear Jiefei, and all,

many thanks for your time and comments, suggestions, insights.

-- bogdan

On Fri, Mar 19, 2021 at 7:52 AM Jiefei Wang  wrote:

> After digging into the R source, it turns out that the argument `exact`
> has nothing to do with the numeric precision. It only affects the statistic
> model used to compute the p-value. When `exact=TRUE` the true distribution
> of the statistic will be used. Otherwise, a normal approximation will be
> used.
>
> I think the documentation needs to be improved here, you can compute the
> exact p-value *only* when you do not have any ties in your data. If you
> have ties in your data you will get the p-value from the normal
> approximation no matter what value you put in `exact`. This behavior should
> be documented or a warning should be given when `exact=TRUE` and ties
> present.
>
> FYI, if the exact p-value is required, `pwilcox` function will be used to
> compute the p-value. There are no details on how it computes the pvalue but
> its C code seems to compute the probability table, so I assume it computes
> the exact p-value from the true distribution of the statistic, not a
> permutation or MC p-value.
>
> Best,
> Jiefei
>
>
>
> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang  wrote:
>
>> Hey,
>>
>> I just want to point out that the word "exact" has two meanings. It can
>> mean the numerically accurate p-value as Bogdan asked in his first email,
>> or it could mean the p-value calculated from the exact distribution of the
>> statistic(In this case, U stat). These two are actually not related, even
>> though they all called "exact".
>>
>> Best,
>> Jiefei
>>
>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
>> spencer.gra...@effectivedefense.org> wrote:
>>
>>>
>>>
>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>>> > thanks a lot, Vivek ! in other words, assuming that we work with 1000
>>> data
>>> > points,
>>> >
>>> > shall we use EXACT = TRUE, it uses the normal approximation,
>>> >
>>> > while if EXACT=FALSE (for these large samples), it does not ?
>>>
>>>
>>>As David Winsemius noted, the documentation is not clear.
>>> Consider the following:
>>>
>>> > set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
>>> y)$p.value
>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
>>> approximation, which is the same as exact=FALSE. I think that with
>>> exact=FALSE, you get a permutation distribution, though I'm not sure.
>>> You might try looking at "wilcox_test in package coin for exact,
>>> asymptotic and Monte Carlo conditional p-values, including in the
>>> presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
>>> "EXACT" is a different variable from "exact". It is interpreted as an
>>> optional argument, which is not recognized and therefore ignored in this
>>> context.
>>>   Hope this helps.
>>>   Spencer
>>>
>>>
>>> > On Thu, Mar 18, 2021 at 10:47 PM Vivek Das  wrote:
>>> >
>>> >> Hi Bogdan,
>>> >>
>>> >> You can also get the information from the link of the Wilcox.test
>>> function
>>> >> page.
>>> >>
>>> >> “By default (if exact is not specified), an exact p-value is computed
>>> if
>>> >> the samples contain less than 50 finite values and there are no ties.
>>> >> Otherwise, a normal approximation is used.”
>>> >>
>>> >> For more:
>>> >>
>>> >>
>>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
>>> >>
>>> >> Hope this helps!
>>> >>
>>> >> Best,
>>> >>
>>> >> VD
>>> >>
>>> >>
>>> >> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa 
>>> wrote:
>>> >>
>>> >>> Dear Peter, thanks a lot. yes, we can see a very precise p-value,
>>> and that
>>> >>> was the request from the journal.
>>> >>>
>>> >>> if I may ask another question please : what is the meaning of
>>> "exact=TRUE"
>>> >>> or "exact=FALSE" in wilcox.test ?
>>> >>>
>>> >>> i can see that the "numerically precise" p-values are different.
>>> thanks a
>>> >>> lot !
>>> >>>
>>> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>> >>> tst$p.value
>>> >>> [1] 8.535524e-25
>>> >>>
>>> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
>>> >>> tst$p.value
>>> >>> [1] 3.448211e-25
>>> >>>
>>> >>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
>>> >>> peter.langfel...@gmail.com> wrote:
>>> >>>
>>>  I thinnk the answer is much simpler. The print method for hypothesis
>>>  tests (class 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Jiefei Wang
Dear Wolfgang,

Thanks for the documentation, but the document only states the default
behavior, it does not mention what would happen if we tell it to compute
the exact p-value but the data has ties. I think this would be misleading
as people might think their result is exact by specifying `exact=TRUE` but
the truth is that their data contains ties and the result is from the
normal approximation.

Best,
Jiefei

On Fri, Mar 19, 2021 at 11:18 PM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtba...@maastrichtuniversity.nl> wrote:

> Dear Jiefei,
>
> This behavior is documented. From help(wilcox.test):
>
> "By default (if exact is not specified), an exact p-value is computed if
> the samples contain less than 50 finite values and there are no ties.
> Otherwise, a normal approximation is used."
>
> Best,
> Wolfgang
>
> >-Original Message-
> >From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jiefei
> Wang
> >Sent: Friday, 19 March, 2021 15:52
> >To: Spencer Graves
> >Cc: r-help; Bogdan Tanasa
> >Subject: Re: [R] about a p-value < 2.2e-16
> >
> >After digging into the R source, it turns out that the argument `exact`
> has
> >nothing to do with the numeric precision. It only affects the statistic
> >model used to compute the p-value. When `exact=TRUE` the true distribution
> >of the statistic will be used. Otherwise, a normal approximation will be
> >used.
> >
> >I think the documentation needs to be improved here, you can compute the
> >exact p-value *only* when you do not have any ties in your data. If you
> >have ties in your data you will get the p-value from the normal
> >approximation no matter what value you put in `exact`. This behavior
> should
> >be documented or a warning should be given when `exact=TRUE` and ties
> >present.
> >
> >FYI, if the exact p-value is required, `pwilcox` function will be used to
> >compute the p-value. There are no details on how it computes the pvalue
> but
> >its C code seems to compute the probability table, so I assume it computes
> >the exact p-value from the true distribution of the statistic, not a
> >permutation or MC p-value.
> >
> >Best,
> >Jiefei
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Viechtbauer, Wolfgang (SP)
Dear Jiefei,

This behavior is documented. From help(wilcox.test):

"By default (if exact is not specified), an exact p-value is computed if the 
samples contain less than 50 finite values and there are no ties. Otherwise, a 
normal approximation is used."

Best,
Wolfgang

>-Original Message-
>From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jiefei Wang
>Sent: Friday, 19 March, 2021 15:52
>To: Spencer Graves
>Cc: r-help; Bogdan Tanasa
>Subject: Re: [R] about a p-value < 2.2e-16
>
>After digging into the R source, it turns out that the argument `exact` has
>nothing to do with the numeric precision. It only affects the statistic
>model used to compute the p-value. When `exact=TRUE` the true distribution
>of the statistic will be used. Otherwise, a normal approximation will be
>used.
>
>I think the documentation needs to be improved here, you can compute the
>exact p-value *only* when you do not have any ties in your data. If you
>have ties in your data you will get the p-value from the normal
>approximation no matter what value you put in `exact`. This behavior should
>be documented or a warning should be given when `exact=TRUE` and ties
>present.
>
>FYI, if the exact p-value is required, `pwilcox` function will be used to
>compute the p-value. There are no details on how it computes the pvalue but
>its C code seems to compute the probability table, so I assume it computes
>the exact p-value from the true distribution of the statistic, not a
>permutation or MC p-value.
>
>Best,
>Jiefei
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Spencer Graves




On 2021-3-19 9:52 AM, Jiefei Wang wrote:

After digging into the R source, it turns out that the argument `exact` has
nothing to do with the numeric precision. It only affects the statistic
model used to compute the p-value. When `exact=TRUE` the true distribution
of the statistic will be used. Otherwise, a normal approximation will be
used.

I think the documentation needs to be improved here, you can compute the
exact p-value *only* when you do not have any ties in your data. If you
have ties in your data you will get the p-value from the normal
approximation no matter what value you put in `exact`. This behavior should
be documented or a warning should be given when `exact=TRUE` and ties
present.

FYI, if the exact p-value is required, `pwilcox` function will be used to
compute the p-value. There are no details on how it computes the pvalue but
its C code seems to compute the probability table, so I assume it computes
the exact p-value from the true distribution of the statistic, not a
permutation or MC p-value.



  My example shows that it does NOT use Monte Carlo, because 
otherwise it uses some distribution.  I believe the term "exact" means 
that it uses the permutation distribution, though I could be mistaken.  
If it's NOT a permutation distribution, I don't know what it is.



  Spencer


Best,
Jiefei



On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang  wrote:


Hey,

I just want to point out that the word "exact" has two meanings. It can
mean the numerically accurate p-value as Bogdan asked in his first email,
or it could mean the p-value calculated from the exact distribution of the
statistic(In this case, U stat). These two are actually not related, even
though they all called "exact".

Best,
Jiefei

On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:



On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:

thanks a lot, Vivek ! in other words, assuming that we work with 1000

data

points,

shall we use EXACT = TRUE, it uses the normal approximation,

while if EXACT=FALSE (for these large samples), it does not ?


As David Winsemius noted, the documentation is not clear.
Consider the following:


set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,

y)$p.value
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
approximation, which is the same as exact=FALSE. I think that with
exact=FALSE, you get a permutation distribution, though I'm not sure.
You might try looking at "wilcox_test in package coin for exact,
asymptotic and Monte Carlo conditional p-values, including in the
presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
"EXACT" is a different variable from "exact". It is interpreted as an
optional argument, which is not recognized and therefore ignored in this
context.
   Hope this helps.
   Spencer



On Thu, Mar 18, 2021 at 10:47 PM Vivek Das  wrote:


Hi Bogdan,

You can also get the information from the link of the Wilcox.test

function

page.

“By default (if exact is not specified), an exact p-value is computed

if

the samples contain less than 50 finite values and there are no ties.
Otherwise, a normal approximation is used.”

For more:



https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html

Hope this helps!

Best,

VD


On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa 

wrote:

Dear Peter, thanks a lot. yes, we can see a very precise p-value, and

that

was the request from the journal.

if I may ask another question please : what is the meaning of

"exact=TRUE"

or "exact=FALSE" in wilcox.test ?

i can see that the "numerically precise" p-values are different.

thanks a

lot !

tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
tst$p.value
[1] 8.535524e-25

tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
tst$p.value
[1] 3.448211e-25

On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
peter.langfel...@gmail.com> wrote:


I thinnk the answer is much simpler. The print method for hypothesis
tests (class htest) truncates the p-values. In the above example,
instead of using

wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)

and copying the output, just print the p-value:

tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
tst$p.value

[1] 2.988368e-32


I think this value is what the journal asks for.

HTH,

Peter

On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
 wrote:

   

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Jiefei Wang
After digging into the R source, it turns out that the argument `exact` has
nothing to do with the numeric precision. It only affects the statistic
model used to compute the p-value. When `exact=TRUE` the true distribution
of the statistic will be used. Otherwise, a normal approximation will be
used.

I think the documentation needs to be improved here, you can compute the
exact p-value *only* when you do not have any ties in your data. If you
have ties in your data you will get the p-value from the normal
approximation no matter what value you put in `exact`. This behavior should
be documented or a warning should be given when `exact=TRUE` and ties
present.

FYI, if the exact p-value is required, `pwilcox` function will be used to
compute the p-value. There are no details on how it computes the pvalue but
its C code seems to compute the probability table, so I assume it computes
the exact p-value from the true distribution of the statistic, not a
permutation or MC p-value.

Best,
Jiefei



On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang  wrote:

> Hey,
>
> I just want to point out that the word "exact" has two meanings. It can
> mean the numerically accurate p-value as Bogdan asked in his first email,
> or it could mean the p-value calculated from the exact distribution of the
> statistic(In this case, U stat). These two are actually not related, even
> though they all called "exact".
>
> Best,
> Jiefei
>
> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> spencer.gra...@effectivedefense.org> wrote:
>
>>
>>
>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>> > thanks a lot, Vivek ! in other words, assuming that we work with 1000
>> data
>> > points,
>> >
>> > shall we use EXACT = TRUE, it uses the normal approximation,
>> >
>> > while if EXACT=FALSE (for these large samples), it does not ?
>>
>>
>>As David Winsemius noted, the documentation is not clear.
>> Consider the following:
>>
>> > set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
>> y)$p.value
>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
>> approximation, which is the same as exact=FALSE. I think that with
>> exact=FALSE, you get a permutation distribution, though I'm not sure.
>> You might try looking at "wilcox_test in package coin for exact,
>> asymptotic and Monte Carlo conditional p-values, including in the
>> presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
>> "EXACT" is a different variable from "exact". It is interpreted as an
>> optional argument, which is not recognized and therefore ignored in this
>> context.
>>   Hope this helps.
>>   Spencer
>>
>>
>> > On Thu, Mar 18, 2021 at 10:47 PM Vivek Das  wrote:
>> >
>> >> Hi Bogdan,
>> >>
>> >> You can also get the information from the link of the Wilcox.test
>> function
>> >> page.
>> >>
>> >> “By default (if exact is not specified), an exact p-value is computed
>> if
>> >> the samples contain less than 50 finite values and there are no ties.
>> >> Otherwise, a normal approximation is used.”
>> >>
>> >> For more:
>> >>
>> >>
>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
>> >>
>> >> Hope this helps!
>> >>
>> >> Best,
>> >>
>> >> VD
>> >>
>> >>
>> >> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa 
>> wrote:
>> >>
>> >>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
>> that
>> >>> was the request from the journal.
>> >>>
>> >>> if I may ask another question please : what is the meaning of
>> "exact=TRUE"
>> >>> or "exact=FALSE" in wilcox.test ?
>> >>>
>> >>> i can see that the "numerically precise" p-values are different.
>> thanks a
>> >>> lot !
>> >>>
>> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> >>> tst$p.value
>> >>> [1] 8.535524e-25
>> >>>
>> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
>> >>> tst$p.value
>> >>> [1] 3.448211e-25
>> >>>
>> >>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
>> >>> peter.langfel...@gmail.com> wrote:
>> >>>
>>  I thinnk the answer is much simpler. The print method for hypothesis
>>  tests (class htest) truncates the p-values. In the above example,
>>  instead of using
>> 
>>  wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> 
>>  and copying the output, just print the p-value:
>> 
>>  tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>  tst$p.value
>> 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Jiefei Wang
Hey,

I just want to point out that the word "exact" has two meanings. It can
mean the numerically accurate p-value as Bogdan asked in his first email,
or it could mean the p-value calculated from the exact distribution of the
statistic(In this case, U stat). These two are actually not related, even
though they all called "exact".

Best,
Jiefei

On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

>
>
> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
> > thanks a lot, Vivek ! in other words, assuming that we work with 1000
> data
> > points,
> >
> > shall we use EXACT = TRUE, it uses the normal approximation,
> >
> > while if EXACT=FALSE (for these large samples), it does not ?
>
>
>As David Winsemius noted, the documentation is not clear.
> Consider the following:
>
> > set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
> y)$p.value
> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
> approximation, which is the same as exact=FALSE. I think that with
> exact=FALSE, you get a permutation distribution, though I'm not sure.
> You might try looking at "wilcox_test in package coin for exact,
> asymptotic and Monte Carlo conditional p-values, including in the
> presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
> "EXACT" is a different variable from "exact". It is interpreted as an
> optional argument, which is not recognized and therefore ignored in this
> context.
>   Hope this helps.
>   Spencer
>
>
> > On Thu, Mar 18, 2021 at 10:47 PM Vivek Das  wrote:
> >
> >> Hi Bogdan,
> >>
> >> You can also get the information from the link of the Wilcox.test
> function
> >> page.
> >>
> >> “By default (if exact is not specified), an exact p-value is computed if
> >> the samples contain less than 50 finite values and there are no ties.
> >> Otherwise, a normal approximation is used.”
> >>
> >> For more:
> >>
> >>
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
> >>
> >> Hope this helps!
> >>
> >> Best,
> >>
> >> VD
> >>
> >>
> >> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa 
> wrote:
> >>
> >>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
> that
> >>> was the request from the journal.
> >>>
> >>> if I may ask another question please : what is the meaning of
> "exact=TRUE"
> >>> or "exact=FALSE" in wilcox.test ?
> >>>
> >>> i can see that the "numerically precise" p-values are different.
> thanks a
> >>> lot !
> >>>
> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>> tst$p.value
> >>> [1] 8.535524e-25
> >>>
> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
> >>> tst$p.value
> >>> [1] 3.448211e-25
> >>>
> >>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
> >>> peter.langfel...@gmail.com> wrote:
> >>>
>  I thinnk the answer is much simpler. The print method for hypothesis
>  tests (class htest) truncates the p-values. In the above example,
>  instead of using
> 
>  wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> 
>  and copying the output, just print the p-value:
> 
>  tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>  tst$p.value
> 
>  [1] 2.988368e-32
> 
> 
>  I think this value is what the journal asks for.
> 
>  HTH,
> 
>  Peter
> 
>  On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
>   wrote:
> > I would push back on that from two perspectives:
> >
> >
> >   1.  I would study exactly what the journal said very
> > carefully.  If they mandated "wilcox.test", that function has an
> > argument called "exact".  If that's what they are asking, then using
> > that argument gives the exact p-value, e.g.:
> >
> >
> >   > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >
> >   Wilcoxon rank sum exact test
> >
> > data:  rnorm(100) and rnorm(100, 2)
> > W = 691, p-value < 2.2e-16
> >
> >
> >   2.  If that's NOT what they are asking, then I'm not
> > convinced what they are asking makes sense:  There is is no such
> thing
> > as an "exact p value" except to the extent that certain assumptions
> > hold, and all models are wrong (but some are useful), as George Box
> > famously said years ago.[1]  Truth only exists in mathematics, 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread Spencer Graves



On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
> thanks a lot, Vivek ! in other words, assuming that we work with 1000 data
> points,
>
> shall we use EXACT = TRUE, it uses the normal approximation,
>
> while if EXACT=FALSE (for these large samples), it does not ?


   As David Winsemius noted, the documentation is not clear. 
Consider the following:

> set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x, 
> y)$p.value 
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > 
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, 
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, 
exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, 
exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, 
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, 
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, 
exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, 
exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 
1.172189e-25 and 4.123875e-32. The first one, I think, is the normal 
approximation, which is the same as exact=FALSE. I think that with 
exact=FALSE, you get a permutation distribution, though I'm not sure. 
You might try looking at "wilcox_test in package coin for exact, 
asymptotic and Monte Carlo conditional p-values, including in the 
presence of ties" to see if it is clearer. NOTE: R is case sensitive, so 
"EXACT" is a different variable from "exact". It is interpreted as an 
optional argument, which is not recognized and therefore ignored in this 
context.
  Hope this helps.
  Spencer


> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das  wrote:
>
>> Hi Bogdan,
>>
>> You can also get the information from the link of the Wilcox.test function
>> page.
>>
>> “By default (if exact is not specified), an exact p-value is computed if
>> the samples contain less than 50 finite values and there are no ties.
>> Otherwise, a normal approximation is used.”
>>
>> For more:
>>
>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
>>
>> Hope this helps!
>>
>> Best,
>>
>> VD
>>
>>
>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa  wrote:
>>
>>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, and that
>>> was the request from the journal.
>>>
>>> if I may ask another question please : what is the meaning of "exact=TRUE"
>>> or "exact=FALSE" in wilcox.test ?
>>>
>>> i can see that the "numerically precise" p-values are different. thanks a
>>> lot !
>>>
>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>> tst$p.value
>>> [1] 8.535524e-25
>>>
>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
>>> tst$p.value
>>> [1] 3.448211e-25
>>>
>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
>>> peter.langfel...@gmail.com> wrote:
>>>
 I thinnk the answer is much simpler. The print method for hypothesis
 tests (class htest) truncates the p-values. In the above example,
 instead of using

 wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)

 and copying the output, just print the p-value:

 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
 tst$p.value

 [1] 2.988368e-32


 I think this value is what the journal asks for.

 HTH,

 Peter

 On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
  wrote:
> I would push back on that from two perspectives:
>
>
>   1.  I would study exactly what the journal said very
> carefully.  If they mandated "wilcox.test", that function has an
> argument called "exact".  If that's what they are asking, then using
> that argument gives the exact p-value, e.g.:
>
>
>   > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>
>   Wilcoxon rank sum exact test
>
> data:  rnorm(100) and rnorm(100, 2)
> W = 691, p-value < 2.2e-16
>
>
>   2.  If that's NOT what they are asking, then I'm not
> convinced what they are asking makes sense:  There is is no such thing
> as an "exact p value" except to the extent that certain assumptions
> hold, and all models are wrong (but some are useful), as George Box
> famously said years ago.[1]  Truth only exists in mathematics, and
> that's because it's a fiction to start with ;-)
>
>
> Hope this helps.
> Spencer Graves
>
>
> [1]
> https://en.wikipedia.org/wiki/All_models_are_wrong
>
>
> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
>><
 https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16

>> Dear all,
>>
>> i would appreciate having your advice on the following please :
>>
>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
>>> compare
>> sets of 1000 genes expression (in the genomics field).
>>
>> however, the journal asks us to provide the exact p 

Re: [R] about a p-value < 2.2e-16

2021-03-19 Thread David Winsemius



Sent from my iPhone

> On Mar 18, 2021, at 10:26 PM, Bogdan Tanasa  wrote:
> 
> Dear Spencer, thank you very much for your prompt email and help. When
> using :
> 
>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> W = 698, p-value < 2.2e-16
> 
>> wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
> W = 1443, p-value < 2.2e-16
> 
> and in both cases p-value < 2.2e-16. By "exact" p-value, i have meant the
> "precise" p-value ;
> 
> If I may ask please, could we write p-value = 0 ?
> 
> i have noted a similar conversation on stackexchange, although the answer
> is not very clear (to me).

The reason it wasn’t and couldn’t be “clear” was that the underlying scientific 
question and the statistical methods were not precisely described. 

The same lack of background information still persists in this discussion. 

— 
David
> 
> https://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1
> 
> thanks again,
> 
> bogdan
> 
>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <
>> spencer.gra...@effectivedefense.org> wrote:
>>  I would push back on that from two perspectives:
>>1.  I would study exactly what the journal said very
>> carefully.  If they mandated "wilcox.test", that function has an
>> argument called "exact".  If that's what they are asking, then using
>> that argument gives the exact p-value, e.g.:
>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>Wilcoxon rank sum exact test
>> data:  rnorm(100) and rnorm(100, 2)
>> W = 691, p-value < 2.2e-16
>>2.  If that's NOT what they are asking, then I'm not
>> convinced what they are asking makes sense:  There is is no such thing
>> as an "exact p value" except to the extent that certain assumptions
>> hold, and all models are wrong (but some are useful), as George Box
>> famously said years ago.[1]  Truth only exists in mathematics, and
>> that's because it's a fiction to start with ;-)
>>  Hope this helps.
>>  Spencer Graves
>> [1]
>> https://en.wikipedia.org/wiki/All_models_are_wrong
 On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
 <
>> https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16>
>>> Dear all,
>>> i would appreciate having your advice on the following please :
>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare
>>> sets of 1000 genes expression (in the genomics field).
>>> however, the journal asks us to provide the exact p value ...
>>> would it be legitimate to write : "p-value = 0" ? thanks a lot,
>>> -- bogdan
>>> [[alternative HTML version deleted]]
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure in predicting parameters

2021-03-19 Thread Luigi Marongiu
Thank you, I'll try it!

On Thu, Mar 18, 2021 at 9:46 PM Rui Barradas  wrote:
>
> Hello,
>
> Maybe a bit late but there is a contributed package [1] for quantitative
> PCR fitting non-linear models with the Levenberg-Marquardt algorithm.
>
> estim and vector R below are your model and your fitted values vector.
> The RMSE of this fit is smaller than your model's.
>
>
> Isn't this simpler?
>
>
> library(qpcR)
>
> df1 <- data.frame(Cycles = seq_along(high), high)
>
> fit <- pcrfit(
>data = df1,
>cyc = 1,
>fluo = 2
> )
> summary(fit)
>
> coef(estim)
> coef(fit)
>
>
> sqrt(sum(resid(estim)^2))
> #[1] 1724.768
> sqrt(sum(resid(fit)^2))
> #[1] 1178.318
>
>
> highpred <- predict(fit, newdata = df1)
>
> plot(1:45, high, type = "l", col = "red")
> points(1:45, R, col = "blue")
> points(1:45, highpred$Prediction, col = "cyan", pch = 3)
>
>
> [1] https://CRAN.R-project.org/package=qpcR
>
> Hope this helps,
>
> Rui Barradas
>
> Às 06:51 de 18/03/21, Luigi Marongiu escreveu:
> > It worked. I re-written the equation as:
> > ```
> > rutledge_param <- function(p, x, y) ( (p$M / ( 1 + exp(-(x-p$m)/p$s))
> > ) + p$B ) - y
> > ```
> > and used Desmos to estimate the slope, so:
> > ```
> > estim <- nls.lm(par = list(m = halfCycle, s = 2.77, M = MaxFluo, B = 
> > high[1]),
> >  fn = rutledge_param, x = 1:45, y = high)
> > summary(estim)
> > R <- rutledge(list(half_fluorescence = 27.1102, slope = 2.7680,
> > max_fluorescence = 11839.7745, back_fluorescence =
> > -138.8615) , 1:45)
> > points(1:45, R, type="l", col="red")
> > ```
> >
> > Thanks
> >
> > On Tue, Mar 16, 2021 at 8:29 AM Luigi Marongiu  
> > wrote:
> >>
> >> Just an update:
> >> I tried with desmos and the fitting looks good. Desmos calculated the
> >> parameters as:
> >> Fmax = 11839.8
> >> Chalf = 27.1102 (with matches with my estimate of 27 cycles)
> >> k = 2.76798
> >> Fb = -138.864
> >> I forced R to accept the right parameters using a single named list
> >> and re-written the formula (it was a bit unclear in the paper):
> >> ```
> >> rutledge <- function(p, x) {
> >>m = p$half_fluorescence
> >>s = p$slope
> >>M = p$max_fluorescence
> >>B = p$back_fluorescence
> >>y = (M / (1+exp( -((x-m)/s) )) ) + B
> >>return(y)
> >> }
> >> ```
> >> but when I apply it I get a funny graph:
> >> ```
> >> desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798,
> >>  max_fluorescence = 11839.8, back_fluorescence
> >> = -138.864) , high)
> >> ```
> >>
> >> On Mon, Mar 15, 2021 at 7:39 AM Luigi Marongiu  
> >> wrote:
> >>>
> >>> Hello,
> >>> the negative data comes from the machine. Probably I should use raw
> >>> data directly, although in the paper this requirement is not reported.
> >>> The p$x was a typo. Now I corrected it and I got this error:
> >>> ```
> >>>
>  rutledge_param <- function(p, x, y) ((p$M / (1 + exp(-1*(x-p$m)/p$s))) + 
>  p$B) - y
>  estim <- nls.lm(par = list(m = halfFluo, s = slopes, M = MaxFluo, B = 
>  high[1]),
> >>> + fn = rutledge_param, x = 1:45, y = high)
> >>> Error in dimnames(x) <- dn :
> >>>length of 'dimnames' [2] not equal to array extent
> >>> ```
> >>> Probably because 'slopes' is a vector instead of a scalar. Since the
> >>> slope is changing, I don't think is right to use a scalar, but I tried
> >>> and I got:
> >>> ```
>  estim <- nls.lm(par = list(m = halfFluo, s = 1, M = MaxFluo, B = 
>  high[1]),
> >>> + fn = rutledge_param, x = 1:45, y = high)
>  estim
> >>> Nonlinear regression via the Levenberg-Marquardt algorithm
> >>> parameter estimates: 6010.94, 1, 12021.88, 4700.4928889
> >>> residual sum-of-squares: 1.14e+09
> >>> reason terminated: Relative error in the sum of squares is at most `ftol'.
> >>> ```
> >>> The values reported are the same I used at the beginning apart from
> >>> the last (the background parameter) which is 4700 instead of zero. If
> >>> I plug it, I get an L shaped plot that is worse than that at the
> >>> beginning:
> >>> ```
> >>> after = init = rutledge(halfFluo, 1, MaxFluo, 4700.4928889, high)
> >>> points(1:45, after, type="l", col="blue")
> >>> ```
> >>> What did I get wrong here?
> >>> Thanks
> >>>
> >>> On Sun, Mar 14, 2021 at 8:05 PM Bill Dunlap  
> >>> wrote:
> 
> > rutledge_param <- function(p, x, y) ((p$M / (1 + 
> > exp(-1*(p$x-p$m)/p$s))) + p$B) - y
> 
>  Did you mean that p$x to be just x?  As is, this returns numeric(0)
>  for the p that nls.lm gives it because p$x is NULL and NULL-aNumber is
>  numeric().
> 
>  -Bill
> 
>  On Sun, Mar 14, 2021 at 9:46 AM Luigi Marongiu 
>   wrote:
> >
> > Hello,
> > I would like to use the Rutledge equation
> > (https://pubmed.ncbi.nlm.nih.gov/15601990/) to model PCR data. The
> > equation is:
> > Fc = Fmax / (1+exp(-(C-Chalf)/k)) + Fb
> > I defined the equation and another that subtracts the values from the
> >