Re: [R] trouble with wilcox.test

2005-08-18 Thread Prof Brian Ripley
On Wed, 17 Aug 2005, Greg Hather wrote:

 I'm having trouble with the wilcox.test command in R.

Are you sure it is not the concepts that are giving 'trouble'?
What real problem are you trying to solve here?

 To demonstrate the anomalous behavior of wilcox.test, consider

 wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value
 [1] 0.01438390
 wilcox.test(c(1.5,5.5), c(1:1), exact = T)$p.value
 [1] 6.39808e-07 (this calculation takes noticeably longer).
 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
 (R closes/crashes)

 I believe that wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value 
 yields a bad result because of the normal approximation which R uses 
 when exact = F.

Expecting an approximation to be good in the tail for m=2 is pretty 
unrealistic.  But then so is believing the null hypothesis of a common 
*continuous* distribution.  Why worry about the distribution under a 
hypothesis that is patently false?

People often refer to this class of tests as `distribution-free', but they 
are not.  The Wilcoxon test is designed for power against shift 
alternatives, but here there appears to be a very large difference in 
spread.  So

 wilcox.test(5000+c(1.5,5.5), c(1:1), exact = T)$p.value
[1] 0.9989005

even though the two samples differ in important ways.


 Any suggestions for how to compute 
 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value?

I get (current R 2.1.1 on Linux)

 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
[1] 1.59976e-07

and no crash.  So the suggestion is to use a machine adequate to the task, 
and that probably means an OS with adequate stack size.

   [[alternative HTML version deleted]]

 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Please do heed it.  What version of R and what machine is this?  And do 
take note of the request about HTML mail.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] trouble with wilcox.test

2005-08-18 Thread P Ehlers

Prof Brian Ripley wrote:
 On Wed, 17 Aug 2005, Greg Hather wrote:
 
 
I'm having trouble with the wilcox.test command in R.
 
 
 Are you sure it is not the concepts that are giving 'trouble'?
 What real problem are you trying to solve here?
 
 
To demonstrate the anomalous behavior of wilcox.test, consider


wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value

[1] 0.01438390

wilcox.test(c(1.5,5.5), c(1:1), exact = T)$p.value

[1] 6.39808e-07 (this calculation takes noticeably longer).

wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value

(R closes/crashes)

I believe that wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value 
yields a bad result because of the normal approximation which R uses 
when exact = F.
 
 
 Expecting an approximation to be good in the tail for m=2 is pretty 
 unrealistic.  But then so is believing the null hypothesis of a common 
 *continuous* distribution.  Why worry about the distribution under a 
 hypothesis that is patently false?
 
 People often refer to this class of tests as `distribution-free', but they 
 are not.  The Wilcoxon test is designed for power against shift 
 alternatives, but here there appears to be a very large difference in 
 spread.  So
 
 
wilcox.test(5000+c(1.5,5.5), c(1:1), exact = T)$p.value
 
 [1] 0.9989005
 
 even though the two samples differ in important ways.
 
 
 
Any suggestions for how to compute 
wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value?
 
 
 I get (current R 2.1.1 on Linux)
 
 
wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
 
 [1] 1.59976e-07
 
 and no crash.  So the suggestion is to use a machine adequate to the task, 
 and that probably means an OS with adequate stack size.
 
 
  [[alternative HTML version deleted]]
 
 
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 
 Please do heed it.  What version of R and what machine is this?  And do 
 take note of the request about HTML mail.
 

One could also try wilcox.exact() in package exactRankTests (0.8-11)
which also gives (with suitable patience)

[1] 1.59976e-07

even on my puny 256M Windows laptop.

Still, it might be worthwhile adding a don't do something this silly
error message to wilcox.test() rather than having it crash R. Low
priority, IMHO.

Windows XP SP2
R version 2.1.1, 2005-08-11

Peter Ehlers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] trouble with wilcox.test

2005-08-18 Thread P Ehlers


P Ehlers wrote:
 
 Prof Brian Ripley wrote:
 
 On Wed, 17 Aug 2005, Greg Hather wrote:


 I'm having trouble with the wilcox.test command in R.



 Are you sure it is not the concepts that are giving 'trouble'?
 What real problem are you trying to solve here?


 To demonstrate the anomalous behavior of wilcox.test, consider


 wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value


 [1] 0.01438390

 wilcox.test(c(1.5,5.5), c(1:1), exact = T)$p.value


 [1] 6.39808e-07 (this calculation takes noticeably longer).

 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value


 (R closes/crashes)

 I believe that wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value 
 yields a bad result because of the normal approximation which R uses 
 when exact = F.



 Expecting an approximation to be good in the tail for m=2 is pretty 
 unrealistic.  But then so is believing the null hypothesis of a common 
 *continuous* distribution.  Why worry about the distribution under a 
 hypothesis that is patently false?

 People often refer to this class of tests as `distribution-free', but 
 they are not.  The Wilcoxon test is designed for power against shift 
 alternatives, but here there appears to be a very large difference in 
 spread.  So


 wilcox.test(5000+c(1.5,5.5), c(1:1), exact = T)$p.value


 [1] 0.9989005

 even though the two samples differ in important ways.



 Any suggestions for how to compute wilcox.test(c(1.5,5.5), 
 c(1:2), exact = T)$p.value?



 I get (current R 2.1.1 on Linux)


 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value


 [1] 1.59976e-07

 and no crash.  So the suggestion is to use a machine adequate to the 
 task, and that probably means an OS with adequate stack size.


 [[alternative HTML version deleted]]



 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html



 Please do heed it.  What version of R and what machine is this?  And 
 do take note of the request about HTML mail.

 
 One could also try wilcox.exact() in package exactRankTests (0.8-11)
 which also gives (with suitable patience)
 
 [1] 1.59976e-07
 
 even on my puny 256M Windows laptop.
 
 Still, it might be worthwhile adding a don't do something this silly
 error message to wilcox.test() rather than having it crash R. Low
 priority, IMHO.
 
 Windows XP SP2
 R version 2.1.1, 2005-08-11
 
 Peter Ehlers
 

I should also mention package coin's wilcox_test() which does the
job in about a quarter of the time used by exactRankTests.

Peter Ehlers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] trouble with wilcox.test

2005-08-18 Thread Prof Brian Ripley
If this is stack overflow (and I don't know that yet: when I tried this on 
Windows the traceback was clearly corrupt, referring to bratio), the issue 
is that it is impossible to catch such an error, and it is not even AFAIK
portably possible to find the stack size limit (or even the current usage) 
to do some estimates.  (The amount of RAM is not relevant.)  On 
Unix-alikes the stack size limit can be controlled from the shell used to 
launch R so we don't have any a priori knowledge.

The underlying code could be rewritten not to use recursion, but that 
seems not worth the effort involved.

All I can see we can do it to put a warning in the help file.

On Thu, 18 Aug 2005, P Ehlers wrote:


 Prof Brian Ripley wrote:
 On Wed, 17 Aug 2005, Greg Hather wrote:
 
 
 I'm having trouble with the wilcox.test command in R.
 
 
 Are you sure it is not the concepts that are giving 'trouble'?
 What real problem are you trying to solve here?
 
 
 To demonstrate the anomalous behavior of wilcox.test, consider
 
 
 wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value
 
 [1] 0.01438390
 
 wilcox.test(c(1.5,5.5), c(1:1), exact = T)$p.value
 
 [1] 6.39808e-07 (this calculation takes noticeably longer).
 
 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
 
 (R closes/crashes)
 
 I believe that wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value 
 yields a bad result because of the normal approximation which R uses when 
 exact = F.
 
 
 Expecting an approximation to be good in the tail for m=2 is pretty 
 unrealistic.  But then so is believing the null hypothesis of a common 
 *continuous* distribution.  Why worry about the distribution under a 
 hypothesis that is patently false?
 
 People often refer to this class of tests as `distribution-free', but they 
 are not.  The Wilcoxon test is designed for power against shift 
 alternatives, but here there appears to be a very large difference in 
 spread.  So
 
 
 wilcox.test(5000+c(1.5,5.5), c(1:1), exact = T)$p.value
 
 [1] 0.9989005
 
 even though the two samples differ in important ways.
 
 
 
 Any suggestions for how to compute wilcox.test(c(1.5,5.5), c(1:2), 
 exact = T)$p.value?
 
 
 I get (current R 2.1.1 on Linux)
 
 
 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
 
 [1] 1.59976e-07
 
 and no crash.  So the suggestion is to use a machine adequate to the task, 
 and that probably means an OS with adequate stack size.
 
 
 [[alternative HTML version deleted]]
 
 
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 Please do heed it.  What version of R and what machine is this?  And do 
 take note of the request about HTML mail.
 

 One could also try wilcox.exact() in package exactRankTests (0.8-11)
 which also gives (with suitable patience)

 [1] 1.59976e-07

 even on my puny 256M Windows laptop.

 Still, it might be worthwhile adding a don't do something this silly
 error message to wilcox.test() rather than having it crash R. Low
 priority, IMHO.

 Windows XP SP2
 R version 2.1.1, 2005-08-11

 Peter Ehlers



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] trouble with wilcox.test

2005-08-18 Thread P Ehlers

I think your guess about stack overflow is probably correct and I
definitely don't think it's worth wasting effort recoding.

Peter Ehlers

Prof Brian Ripley wrote:
 If this is stack overflow (and I don't know that yet: when I tried this on 
 Windows the traceback was clearly corrupt, referring to bratio), the issue 
 is that it is impossible to catch such an error, and it is not even AFAIK
 portably possible to find the stack size limit (or even the current usage) 
 to do some estimates.  (The amount of RAM is not relevant.)  On 
 Unix-alikes the stack size limit can be controlled from the shell used to 
 launch R so we don't have any a priori knowledge.
 
 The underlying code could be rewritten not to use recursion, but that 
 seems not worth the effort involved.
 
 All I can see we can do it to put a warning in the help file.
 

[snip]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] trouble with wilcox.test

2005-08-18 Thread Greg Hather
Ok, I will think more about the appropriateness of the Wilcoxon test 
here.  I was using

R version 2.1.1, 2005-06-20
Windows XP SP2
512MB RAM

--Greg

- Original Message - 
From: Prof Brian Ripley [EMAIL PROTECTED]
To: Greg Hather [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Wednesday, August 17, 2005 11:45 PM
Subject: Re: [R] trouble with wilcox.test


 On Wed, 17 Aug 2005, Greg Hather wrote:

 I'm having trouble with the wilcox.test command in R.

 Are you sure it is not the concepts that are giving 'trouble'?
 What real problem are you trying to solve here?

 To demonstrate the anomalous behavior of wilcox.test, consider

 wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value
 [1] 0.01438390
 wilcox.test(c(1.5,5.5), c(1:1), exact = T)$p.value
 [1] 6.39808e-07 (this calculation takes noticeably longer).
 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
 (R closes/crashes)

 I believe that wilcox.test(c(1.5,5.5), c(1:1), exact = F)$p.value 
 yields a bad result because of the normal approximation which R uses 
 when exact = F.

 Expecting an approximation to be good in the tail for m=2 is pretty 
 unrealistic.  But then so is believing the null hypothesis of a common 
 *continuous* distribution.  Why worry about the distribution under a 
 hypothesis that is patently false?

 People often refer to this class of tests as `distribution-free', but 
 they are not.  The Wilcoxon test is designed for power against shift 
 alternatives, but here there appears to be a very large difference in 
 spread.  So

 wilcox.test(5000+c(1.5,5.5), c(1:1), exact = T)$p.value
 [1] 0.9989005

 even though the two samples differ in important ways.


 Any suggestions for how to compute wilcox.test(c(1.5,5.5), 
 c(1:2), exact = T)$p.value?

 I get (current R 2.1.1 on Linux)

 wilcox.test(c(1.5,5.5), c(1:2), exact = T)$p.value
 [1] 1.59976e-07

 and no crash.  So the suggestion is to use a machine adequate to the 
 task, and that probably means an OS with adequate stack size.

 [[alternative HTML version deleted]]

 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 Please do heed it.  What version of R and what machine is this?  And 
 do take note of the request about HTML mail.

 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html