Re: [R] Randomization Test

2019-02-28 Thread Meyners, Michael
Ogbos,

To share data (in particularly lengthy one as yours), check out ?dput

To replicate sampling, look at ?replicate (will output to a data frame to use 
further) - that should answer your Q1.

Apart from that (and regarding Q2), the task you are after is getting more and 
more obscure to me. I don't see what you really want to test - between levels 
of n (or A in oodf)? If so, you would need to permute levels within each set 
(are they sets of -5:10, or are all observations completely independent? In the 
latter case, across all sets). What is the null hypothesis you want to test? 
And what is the data exactly? I don't understand what the two columns indicate. 
Then you need to decide on what the test statistic is you want to use. The 
average? The difference between each pair of averages? Anything like that? Do 
you want to test all levels simultaneously, or by pairs? 

Clearly, your way of sampling is inappropriate for a randomization test. You 
are rather simulating data that can take any value between min and max with 
equal probability. That does not seem to be the right null model to me (it 
might be, though, depending on your null hypothesis). It would not make a 
randomization test either way, but rather a Monte Carlo simulation under the 
null hypothesis. Then you would determine your test statistic(s) (whichever 
they are) and subsequently repeat that often enough and check whether your 
observed value is higher than 90 or 99% of the simulated ones. Again, that 
would be a simulation, not a randomization test.

For the latter, you'd need to permute the data in an appropriate way (again 
depending on the null hypothesis, and on the structure, i.e. are they are 
coming in blocks of any kind, or all independent), and then recalculate the 
test statistic every time, and then proceed as before. I'm not sure whether the 
data was the result of a randomized experiment - if not, you can still use the 
idea, but fall into something that some refer to as "permutation testing" - the 
difference being that you need to make much stronger assumptions on 
independence etc of the data. Many use the terms equivalently, but just so you 
are aware. 

I really think you need to look into a textbook (eg. Edgington & Onghena) or 
some papers to understand the concept of randomization tests, or consult with a 
statistician with good background in that field. What you are suggesting is not 
near it, and unless you have a clear hypothesis and a good understanding of how 
the data was generated, it is impossible for you (and anyone else) to say how 
such a test might be designed.

Michael 


> -Original Message-
> From: Ogbos Okike 
> Sent: Mittwoch, 27. Februar 2019 22:53
> To: Meyners, Michael 
> Cc: r-help 
> Subject: Re: [R] Randomization Test
> 
> Dear Kind List,
> 
> I am still battling with this. I have, however, made some progress with the
> suggestions of Micheal and others. At least, I have a better picture of what I
> want to do now as I will attempt a detailed description here.
> 
> I am aware I should show you just a small part of my code and data.
> But when I copied out a small portion and run to see what you get when I
> send that,  I was not satisfied with the signal displayed. The epoch analysis
> averages data and is quite sensitive to leveraging, especially if a small 
> sample
> is used.
> 
> So please permit/exercise patience  me to display the series of epoch that
> give the averaged valued used. You can just run the code and see the signal
> of interest. Here is the code and the data:
> 
> dta <- read.table( text ="n CR
>  -5 8969
.
SNIP...
.
> 10 9566
> ",header=TRUE)
> 
>  data<-matrix(c(dta$CR),ncol=71)
> A<-matrix(rep(-5:10,71))
> B<-matrix(data)
> 
>  oodf<-data.frame(A,B)
>  a<--5:10
> oodf<-data.frame(A,B)
> library(plotrix)
> std.error<-function(x) return(sd(x)/(sum(!is.na(x
> oomean<-as.vector(by(oodf$B,oodf$A,mean))
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="l",ylim=c(8890,9100),
>  )
> A<-oomean-1.96*oose
>  B<-oomean+1.96*oose
> lines(a,A,col="red")
>  lines(a,B,col="red")
> 
>  My Question:
> I wish to conduct a randomization test of significance (90 and 99
> percentile) of the reductions/decreases as displayed by the signal.
> 
> I am attempting using:
> x<-sample(8890:9500,1136,replace=T )
> 
> to generate the random numbers, where 8890, 9500 and 1136 are the
> minimum  and maximum of the signal and 1136 the length of sample data.
> Q1: Please how do I generate many samples as x above, say up to 5000 or
> 10,000? I manually generated and stored as x1,x2, x3 up to x100.
> 
> Q2: Please how do I use this randomly generated numbers to test the
> statistical significan

Re: [R] Randomization Test

2019-02-08 Thread Meyners, Michael
Ogbos,

You do not seem to have received a reply over the list yet, which might be due 
to the fact that this seems rather a stats than an R question. Neither got your 
attachment (Figure) through - see posting guide. 

I'm not familiar with epoch analysis, so not sure what exactly you are doing / 
trying to achieve, but some general thoughts: 

* You do NOT want to restrict your re-randomizations in a way that "none of the 
dates corresponds with the ones in the real event" - actually, as a general 
principle, the true data must be an admissible re-randomization as well. You 
seem to have excluded that (and a lot of other randomizations at the same time 
which might have occurred, i.e. dates 1 and 2 reversed but all others the 
same), thereby rendering the test invalid. Any restrictions you have on your 
re-randomizations must've applied to the original randomization as well.
* If you have rather observational data (which I suspect, but not sure), 
Edgington & Onghena (2007) would rather refer to this as a permutation test - 
the difference being that you have to make strong assumptions (similar to 
parametric tests) on the nature of the data, which are designed-in to be true 
for randomization tests. It might be a merely linguistic discrimination, but it 
is important to note which assumptions have to be (implicitly) made.
* I'm not sure what you mean by "mean differences" of the events - is that two 
groups you are comparing? If so, that seems reasonable, but just make sure the 
test statistic you use is reasonable and sensitive against the alternatives you 
are mostly interested in. The randomization/permutation test will never proof 
that, e.g., means are significantly different, but only that there is SOME 
difference. By selecting the appropriate test statistic, you can influence what 
will pop up more easily and what not, but you can never be sure (unless you 
make strong assumptions about everything else, like in many parametric tests).
* For any test statistic, you would then determine the proportion of its values 
among the 5000 samples where it is as large or larger than the one observed (or 
as small or smaller, or either, depending on the nature of the test statistic 
and whether you aim for a one- or a two-sided test). That is your p value. If 
small enough, conclude significance. At least conceptually important: The 
observed test statistic is always part of the re-randomization (i.e. your 5000) 
- so you truly only do 4999 plus the one you observed. Otherwise the test may 
be more or less liberal. Your p value is hence no smaller than 1/n, where n is 
the total number of samples you looked at (including the observed one), a p 
value of 0 is not possible in randomization tests (nor in other tests, of 
course).

I hope this is helpful, but you will need to go through these and refer to your 
own setup to check whether you adhered to the principles or not, which is 
impossible for me to judge based on the information provided (and I won't be 
able to look at excessive code to check either).

Michael

> -Original Message-
> From: R-help  On Behalf Of Ogbos Okike
> Sent: Montag, 28. Januar 2019 19:42
> To: r-help 
> Subject: [R] Randomization Test
> 
> Dear Contributors,
> 
> I conducting epoch analysis. I tried to test the significance of my result 
> using
> randomization test.
> 
> Since I have 71 events, I randomly selected another 71 events, making sure
> that none of the dates in the random events corresponds with the ones in
> the real event.
> 
> Following the code I found here
> (https://www.uvm.edu/~dhowell/StatPages/R/RandomizationTestsWithR/R
> andom2Sample/TwoIndependentSamplesR.html),
> I combined these two data set and used them to generate another 5000
> events. I then plotted the graph of the mean differences for the 5000
> randomly generated events. On the graph, I indicated the region of the
> mean difference between the real 71 epoch and the randomly selected 71
> epoch.
> 
> Since the two tail test shows that the mean difference falls at the extreme of
> the randomly selected events, I concluded that my result is statistically
> significant.
> 
> 
> 
> I am attaching the graph to assistance you in you suggestions.
> 
> I can attach both my code and the real and randomly generated events if you
> ask for it.
> 
> My request is that you help me to understand if I am on the right track or no.
> This is the first time I am doing this and except the experts decide, I am not
> quite sure whether I am right or not.
> 
> Many thanks for your kind concern.
> 
> Best
> Ogbos
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To 

Re: [R] ANOVA Permutation Test

2018-09-03 Thread Meyners, Michael
Juan, 

Your question might be borderline for this list, as it ultimately rather seems 
a stats question coming in R disguise.

Anyway, the short answer is that you *expect* to get a different p value from a 
permutation test unless you are able to do all possible permutation and 
therefore use the so-called systematic reference set. That is rarely the case, 
and only for relatively small problems. 
The permutation test uses a random subset of all possible permutations. Given 
this randomness, you'll get a different p value. In order to get reproducible 
results, you may specify a seed (?set.seed), yet that is only reproducible with 
this environment. Someone else with a different software and/or code might come 
out with a different p. The higher the number of permutations used, the smaller 
the variation around the p values, however. For most applications, 1000 seem 
good enough to me, but sometimes I go higher (in particular if the p value is 
borderline and I really need a strict above/below alpha decision).

The permutations do not create an implicit normal distribution, but rather a 
null distribution that can (likely is depending on non-normality of your data) 
not normal. So your respective proposal does not appeal.

I don't think you need an alternative - the permutation test is just fine, and 
recognizing the randomness in the execution does not render the (relatively 
small) variability in p values a major issue.

You may want to have a look at the text book by Edgington & Onghena for details 
on permutation tests, and there are plenty of papers out there addressing them 
in various contexts, which will help to understand *why* you observe what you 
observe here. 

HTH, Michael



> -Original Message-
> From: R-help  On Behalf Of Juan Telleria Ruiz
> de Aguirre
> Sent: Montag, 3. September 2018 17:18
> To: R help Mailing list 
> Subject: [R] ANOVA Permutation Test
> 
> Dear R users,
> 
> I have the following Question related to Package lmPerm:
> 
> This package uses a modified version of aov() function, which uses
> Permutation Tests instead of Normal Theory Tests for fitting an Analysis of
> Variance (ANOVA) Model.
> 
> However, when I run the following code for a simple linear model:
> 
> library(lmPerm)
> 
> e$t_Downtime_per_Intervention_Successful %>%
>   aovp(
> formula = `Downtime per Intervention[h]` ~ `Working Hours`,
> data = .
>   ) %>%
>   summary()
> 
> I obtain different p-values for each run!
> 
> With a regular ANOVA Test, I obtain instead a constant F-statistic, but I do 
> not
> fulfill the required Normality Assumptions.
> 
> So my questions are:
> 
> Would it still be possible use the regular aov() by generating permutations in
> advance (Obtaining therefore a Normal Distribution thanks to the Central
> Limit Theorem)? And applying the aov() function afterwards? Does it have
> sense?
> 
> 
> Or maybe this issue could be due to unbalanced classes? I also tried to weight
> observations based on proportions, but the function failed.
> 
> 
> Any alternative solution for performing a One-Way ANOVA Test over Non-
> Normal Data?
> 
> 
> Thank you.
> 
> Juan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Letters group Games-Howell post hoc in R

2018-01-18 Thread Meyners, Michael
Apologies if I missed any earlier replies - did you check
multcompLetters in package {multcompView}?
It allows you to get connecting letters reports (if that's what you are after, 
I didn't check what exactly agricolae is providing here). May have to add some 
manual steps to combine this with any data (means or whatever) you want to 
report.
multcompLetters allows you to use p values or a logical (significant or not)
HTH, Michael

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Bars
> Cortina
> Sent: Dienstag, 16. Januar 2018 13:51
> To: R-help@r-project.org
> Subject: [R] Letters group Games-Howell post hoc in R
> 
> Hello everybody,
> 
> I use the sweetpotato database included in R package:
> 
> data(sweetpotato) This dataset contains two variables: yield(continous
> variable) and virus(factor variable).
> 
> Due to Levene test is significant I cannot assume homogeneity of variances
> and I apply Welch test in R instead of one-way ANOVA followed by Tukey
> posthoc.
> 
> Nevertheless, the problems come from when I apply posthoc test. In Tukey
> posthoc test I use library(agricolae) and displays me the superscript letters
> between virus groups. Therefore there are no problems.
> 
> Nevertheless, to perform Games-Howell posthoc, I use
> library(userfriendlyscience) and I obtain Games-Howell output but it's
> impossible for me to obtain a letter superscript comparison between virus
> groups as it is obtained through library(agricolae).
> 
> The code used it was the following:
> 
> library(userfriendlyscience)
> 
> data(sweetpotato)
> 
> oneway<-oneway(sweetpotato$virus, y=sweetpotato$yield, posthoc =
> 'games-howell')
> 
> oneway
> 
> I try with cld() importing previously library(multcompView) but doesn't work.
> 
> Can somebody could helps me?
> 
> Thanks in advance,
> 
> David Bars.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jaccard index

2015-07-15 Thread Meyners, Michael
Did you search the internet? At first attempt, 

vegdist {vegan}(worked well for me in the past) and 

dist.binary {ade4}

seem to offer what you need. 

HTH, Michael

 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of sreenath
 Sent: Mittwoch, 15. Juli 2015 09:35
 To: r-help@r-project.org
 Subject: [R] Jaccard index
 
 How can i find jaccard index of two groups,which package is to be used?
 please help
 
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Jaccard-index-
 tp4709883.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R: RE: Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design

2015-07-07 Thread Meyners, Michael
Angelo,

the conservatism of Bonferroni aside for a moment, I don't really see what you 
are after. 

mcp(Emotion = Tukey)

just defines the contrasts to test; in this case, all pairwise comparisons. 

The test=adjusted(bonferroni) argument then says that these contrasts should 
not be corrected for according to Tukey's procedure, but using Bonferroni 
adjustment. 

So this should give you tests for all pairwise comparisons with multiplicity 
correction according to Bonferroni. I wonder how this differs from the results 
you get from SPSS. 

Seemingly, this is not what you want, so the question is what you really want. 
Bonferroni method does not indicate which comparisons / contrasts to look at, 
but just takes all those that you are interested in and multiply the 
corresponding p values with the number of comparisons (and making sure it does 
not exceed 1). As John indicated, that can easily be handled manually. Yet, you 
need to create the tests for all comparisons that you are interested in - if 
not via  linfct=mcp(Emotion = Tukey), you need to specify them otherwise (see 
the three options indicated on ?glht). The code I suggested offers a convenient 
shortcut in case you are interested in all pairwise comparisons and want them 
to be corrected according to Bonferroni, but if something else is of interest, 
you'd need to specify this (and let us know, as Bonferroni method does not 
give a clue about which comparisons to test).

NB: You may want to pay attention to the warning halfway down the helppage for 
?mcp; it may not be clear exactly which effects you want to compare; mcp uses 
just the main effects w/o interactions etc.

Michael


 -Original Message-
 From: John Fox [mailto:j...@mcmaster.ca]
 Sent: Dienstag, 7. Juli 2015 03:24
 To: angelo.arc...@virgilio.it
 Cc: Meyners, Michael; r-help@r-project.org
 Subject: Re: [R] R: RE: Bonferroni post hoc test in R for repeated measure
 ANOVA with mixed within and between subjects design
 
 Dear Angelo,
 
 The Bonferroni p-value is just the ordinary p-value times the number of
 tests, so, since R supports multiplication, you can apply the Bonferroni
 adjustment in R. Because Bonferroni tests for multiple comparisons can be
 very conservative, asking why you want to use them is a fair question.
 
 Best,
  John
 
 
 John Fox, Professor
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/
 
 
 
 On Tue, 7 Jul 2015 00:50:49 +0200 (CEST)  angelo.arc...@virgilio.it
 angelo.arc...@virgilio.it wrote:
  Dear Michael,
  thank you for your answer, however, I am not asking for the tukey with
  the bonferroni adjustment, but doing the post hoc with the bonferroni
 method.
  Apparently this is done easily in SPSS, I am wondering whether it is 
  possible
 with R.
 
  Can anyone help me?
 
  Thanks in advance
 
 
  Angelo
 
 
 
  Messaggio originale
  Da: meyner...@pg.com
  Data: 6-lug-2015 17.52
  A: angelo.arc...@virgilio.itangelo.arc...@virgilio.it,
  r-help@r-project.orgr-help@r-project.org
  Ogg: RE: [R] Bonferroni post hoc test in R for repeated measure ANOVA
  with mixed within and between subjects design
 
  Untested, but if anything, your best bet is likely something like
 
  summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey)),
  test=adjusted(bonferroni))
 
  should work (despite the question why you'd want to use Bonferroni
  rather than Tukey For a reference, see the book on the topic by the
  package authors. Might be in the paper, too, which is given by
 
  citation(multcomp)
 
  HTH, Michael
 
 
   -Original Message-
   From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
   angelo.arc...@virgilio.it
   Sent: Montag, 6. Juli 2015 16:01
   To: r-help@r-project.org
   Subject: [R] Bonferroni post hoc test in R for repeated measure
   ANOVA with mixed within and between subjects design
  
   Dear List Members,
  
  
  
   I need to perform a Bonferroni post hoc test in R on a table with
   three within subjects factors (Emotion, having 5 levels, Material,
   having 4 levels, Shoes, having 2 levels) and one between subject factor
 (Musician, having 2 levels).
  
  
   I normally use the Tukey method with the following formula
  
   require(nlme)
   lme_H2H = lme(H2H ~ Emotion*Material*Shoes*Musician, data=scrd,
   random = ~1|Subject)
   require(multcomp)
   summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey)))
  
  
  
   I am not able to find any reference that explains with an example of
   R  code how to perform a post hoc test with the Bonferroni procedure.
   Can anyone provide an example to perform the same post hoc test in
   the code above but with Bonferroni instead of Tukey?
  
  
   Thank you in advance
  
  
  
   Angelo
  
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
   https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design

2015-07-06 Thread Meyners, Michael
Untested, but if anything, your best bet is likely something like 

summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey)), 
test=adjusted(bonferroni))

should work (despite the question why you'd want to use Bonferroni rather than 
Tukey
For a reference, see the book on the topic by the package authors. Might be in 
the paper, too, which is given by

citation(multcomp)

HTH, Michael


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
 angelo.arc...@virgilio.it
 Sent: Montag, 6. Juli 2015 16:01
 To: r-help@r-project.org
 Subject: [R] Bonferroni post hoc test in R for repeated measure ANOVA with
 mixed within and between subjects design
 
 Dear List Members,
 
 
 
 I need to perform a Bonferroni post hoc test in R on a table with three within
 subjects factors (Emotion, having 5 levels, Material, having 4 levels, Shoes,
 having 2 levels) and one between subject factor (Musician, having 2 levels).
 
 
 I normally use the Tukey method with the following formula
 
 require(nlme)
 lme_H2H = lme(H2H ~ Emotion*Material*Shoes*Musician, data=scrd,
 random = ~1|Subject)
 require(multcomp)
 summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey)))
 
 
 
 I am not able to find any reference that explains with an example of R  code
 how to perform a post hoc test with the Bonferroni procedure.
 Can anyone provide an example to perform the same post hoc test in the
 code above but with Bonferroni instead of Tukey?
 
 
 Thank you in advance
 
 
 
 Angelo
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail

2015-06-09 Thread Meyners, Michael
Thanks Martin. 
Yep, I understand it is documented and my code wasn't as it should've been -- 
the confusion comes from the fact that it worked ok for hundreds of situations 
that seem very much alike, but one situation breaks. I agree that you typically 
can't be sure about having only numerical data in the data frame, but I was 
sure I had by design (numeric results of simulations, so no factors or anything 
else) and was then sloppy in passing the rows of the data frame to ecdf. So 
wondering what makes this situation different from all the others I had... 
Anyway, point taken and working solution found, so all fine :-)
Cheers, Michael

 -Original Message-
 From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
 Sent: Montag, 8. Juni 2015 16:43
 To: Meyners, Michael
 Cc: r-help@r-project.org
 Subject: Re: [R] mismatch between match and unique causing ecdf (well,
 approxfun) to fail
 
 
  Aehm, adding on this: I incorrectly *assumed* without testing that
 rounding would help; it doesn't:
  ecdf(round(test2,0))# a rounding that is way too rough for my
 application...
  #Error in xy.coords(x, y) : 'x' and 'y' lengths differ
 
  Digging deeper: The initially mentioned call to unique() is not very 
  helpful,
 as test2 is a data frame, so I get what I deserve, an unchanged data frame
 with 1 row. Still, the issue remains and can even be simplified further:
 
   ecdf(data.frame(a=3, b=4))
  Empirical CDF
  Call: ecdf(data.frame(a = 3, b = 4))
   x[1:2] =  3,  4
 
  works ok, but
 
   ecdf(data.frame(a=3, b=3))
  Error in xy.coords(x, y) : 'x' and 'y' lengths differ
 
  doesn't (same for a=b=1 or 2, so likely the same for any a=b).
  Instead,
 
   ecdf(c(a=3, b=3))
  Empirical CDF
  Call: ecdf(c(a = 3, b = 3))
   x[1:1] =  3
 
  does the trick. From ?ecdf, I get that x should be a numeric vector -
 apparently, my misuse of the function by applying it to a row of a data frame
 (i.e. a data frame with one row). In all my other (dozens of) cases that
 worked ok, though but not for this particular one. A simple unlist() helps:
 
 You were lucky.   To use a one-row data frame instead of a
 numerical vector will typically *not* work unless ... well, you are lucky.
 
 No, do *not*  pass data frame rows instead of numeric vectors.
 
 
   ecdf(unlist(data.frame(a=3, b=3)))
  Empirical CDF
  Call: ecdf(unlist(data.frame(a = 3, b = 3)))
   x[1:1] =  3
 
  Yet, I'm even more confused than before: in my other data, there were
 also duplicated values in the vector (1-row-data frame), and it never caused
 any issue. For this particular example, it does. I must be missing something
 fundamental...
 
 
 well.. I'm confused about why you are confused, but if you are thinking
 about passing rows of data frames as numeric vectors, this means you are
 sure that your data frame only contains classical numbers (no factors, no
 'Date's, no...).
 
 In such a case, transform your data frame to a numerical matrix
 *once* preferably using  data.matrix(d.fr) instead of just
 as.matrix(d.fr) but in this case it should not matter.
 Then *check* the result and then work with that matrix from then on.
 
 All other things probably will continue to leave you confused ..
 ;-)
 
 Martin Maechler,
 ETH Zurich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail

2015-06-08 Thread Meyners, Michael
Aehm, adding on this: I incorrectly *assumed* without testing that rounding 
would help; it doesn't:

ecdf(round(test2,0))# a rounding that is way too rough for my application...
#Error in xy.coords(x, y) : 'x' and 'y' lengths differ

Digging deeper: The initially mentioned call to unique() is not very helpful, 
as test2 is a data frame, so I get what I deserve, an unchanged data frame with 
1 row. Still, the issue remains and can even be simplified further:

 ecdf(data.frame(a=3, b=4))
Empirical CDF 
Call: ecdf(data.frame(a = 3, b = 4))
 x[1:2] =  3,  4

works ok, but

 ecdf(data.frame(a=3, b=3))
Error in xy.coords(x, y) : 'x' and 'y' lengths differ

doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead, 

 ecdf(c(a=3, b=3))
Empirical CDF 
Call: ecdf(c(a = 3, b = 3))
 x[1:1] =  3

does the trick. From ?ecdf, I get that x should be a numeric vector - 
apparently, my misuse of the function by applying it to a row of a data frame 
(i.e. a data frame with one row). In all my other (dozens of) cases that worked 
ok, though but not for this particular one. A simple unlist() helps:

 ecdf(unlist(data.frame(a=3, b=3)))
Empirical CDF 
Call: ecdf(unlist(data.frame(a = 3, b = 3)))
 x[1:1] =  3

Yet, I'm even more confused than before: in my other data, there were also 
duplicated values in the vector (1-row-data frame), and it never caused any 
issue. For this particular example, it does. I must be missing something 
fundamental...
 
Michael

 -Original Message-
 From: Meyners, Michael
 Sent: Montag, 8. Juni 2015 12:02
 To: 'r-help@r-project.org'
 Subject: mismatch between match and unique causing ecdf (well,
 approxfun) to fail
 
 All,
 
 I encountered the following issue with ecdf which was originally on a vector
 of length 10,000, but I have been able to reduce it to a minimal reproducible
 example (just to avoid questions why I'd want to do this for a vector of
 length 2...):
 
 test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344),
 .Names = c(X817, X4789), row.names = 74L, class = data.frame)
 ecdf(test2)
 
 # Error in xy.coords(x, y) : 'x' and 'y' lengths differ
 
 In an attempt to track this down, it occurs that
 
 unique(test2)
 #   X817X4789
 #74 3.398247 3.398247
 
 while
 
 match(test2, unique(test2))
 #[1] 1 1
 
 matches both values to the first one. This causes a hiccup in the call to 
 ecdf,
 as this uses (an equivalent to) a call to approxfun with x = test2 and y =
 cumsum(tabulate(match(test2, unique(test2, the latter now containing
 one entry less than the former, so xy.coords fails.
 
 I understand that the issue should be somehow related  to FAQ 7.31, but I
 would have hoped that unique and match would be using the same precision
 and hence both or neither would consider the two values identical, but not
 one match while unique doesn't.
 
 Last but not least, it doesn't really cause an issue on my end (other than
 breaking my code and hence out of a loop at first place...); rounding will 
 help
 w/o noteworthy changes to the outcome, so no need to propose a
 workaround :-) I'd rather like to raise the issue and learn whether there is a
 purpose for this behavior, and/or whether there is a generic fix to this, or
 whether I am completely missing something.
 
 Version info (under Windows 7):
 R version 3.2.0 (2015-04-16) -- Full of Ingredients
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 
 Cheers, Michael

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mismatch between match and unique causing ecdf (well, approxfun) to fail

2015-06-08 Thread Meyners, Michael
All,

I encountered the following issue with ecdf which was originally on a vector of 
length 10,000, but I have been able to reduce it to a minimal reproducible 
example (just to avoid questions why I'd want to do this for a vector of length 
2...):

test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), 
.Names = c(X817, X4789), row.names = 74L, class = data.frame)
ecdf(test2) 

# Error in xy.coords(x, y) : 'x' and 'y' lengths differ

In an attempt to track this down, it occurs that 

unique(test2)
#   X817X4789
#74 3.398247 3.398247

while 

match(test2, unique(test2))
#[1] 1 1

matches both values to the first one. This causes a hiccup in the call to ecdf, 
as this uses (an equivalent to) a call to approxfun with x = test2 and y = 
cumsum(tabulate(match(test2, unique(test2, the latter now containing one 
entry less than the former, so xy.coords fails.

I understand that the issue should be somehow related  to FAQ 7.31, but I would 
have hoped that unique and match would be using the same precision and hence 
both or neither would consider the two values identical, but not one match 
while unique doesn't. 

Last but not least, it doesn't really cause an issue on my end (other than 
breaking my code and hence out of a loop at first place...); rounding will help 
w/o noteworthy changes to the outcome, so no need to propose a workaround :-) 
I'd rather like to raise the issue and learn whether there is a purpose for 
this behavior, and/or whether there is a generic fix to this, or whether I am 
completely missing something.

Version info (under Windows 7): 
R version 3.2.0 (2015-04-16) -- Full of Ingredients
Platform: x86_64-w64-mingw32/x64 (64-bit)

Cheers, Michael 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R / JMP interface

2014-05-08 Thread Meyners, Michael
Not sure about JMP 11, but remember that JMP 10 did not run with R version = 
3.0.0 
It depends a bit on the changes that come with new R versions; with JMP 10, 
several versions of the 2.x series were compatible even though JMP officially 
only supported earlier versions. I had hoped that with JMP 11 supporting R 
3.0.1, this would work for a while for the following R versions, but that's 
apparently not the case; I haven't checked recently. You may have to downgrade 
(well, have a parallel install) of R 3.0.1 or earlier to get this working, that 
worked for me in the past, though admittedly not optimal. JMP will always be 
behind the latest R versions due to the extended testing phase before the 
release their new version -- R typically releases 2 versions or so during this 
testing phase alone... Alternatively, you might want to find out what changes 
in R render it incompatible with JMP and reverse these changes for your own R 
installation if possible -- that's certainly beyond my capabilities...
My understanding is that it is not to  be expected that R 3.1 will ever be 
supported by JMP 11; you'd likely have to wait for JMP 12; I was once told that 
they do not make updates in this regard with their within-major-release 
patches, though certainly desirable from my pov...
HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Samuel J Gardner
 Sent: Mittwoch, 7. Mai 2014 18:20
 To: Robert Douglas Kinley; r-help@r-project.org
 Subject: Re: [R] R / JMP interface
 
 https://communities.sas.com/message/199936
 
 see link.  Looks like R 3.1 may not yet be supported with JMP 11.
 
 _
 From: Robert Douglas Kinley
 Sent: Wednesday, May 07, 2014 12:18 PM
 To: r-help@r-project.org
 Cc: Samuel J Gardner
 Subject: R / JMP interface
 
 
 hi useRs
 
 I am trying-out the facility to call R code from JMP.
 
 details:  R 3.1.0 , JMP 11.1.1 , Windows 7 enterprise , all 64 bit.
 
 The test-script from the JMP help pages falls over at the first line :- R 
 Init();
 
 giving the error-message :-
 The installed version of R cannot be used.  The entry point ConsoleIob
 could not be located.
 
 Has anyone else come across this - and ideally has anyone got a solution   ?
 
 (I have submitted a request to JMP/SAS help too).
 
 cheers  Bob
 
 
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add constraints to rbinom

2013-07-01 Thread Meyners, Michael
You don't need a constraint (rbinom won't give xn), but you need to make sure 
you are using the n you want to use: try
 x - cbind(x,rbinom(300,n[i],p[i]))# mind the [i] after the n
at the respective line. Furthermore, you need to remove one transformation of x 
to make sure you divide by the right n, try
rate - t(x/n)

I think that should do the trick. I didn't check the remainder of the code, 
though.

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Anamika Chaudhuri
 Sent: Freitag, 28. Juni 2013 04:06
 To: r-help@r-project.org
 Subject: [R] Add constraints to rbinom
 
 Hi:
 
 I am trying to generate Beta-Binomial random variables and then calculate
 Binomial exact confidence intervals to the rate..I was wondering if I needed
 to add a constraint such that x=n to it, how do I go about it. The reason is 
 I
 am getting data where xn which is giving a rate1. Heres my code:
 
 set.seed(111)
 k-63
 x-NULL
 p-rbeta(k,3,3)# so that the mean nausea rate is alpha/(alpha+beta)
 min-10
 max-60
 n-as.integer(runif(k,min,max))
 for(i in 1:k)
 x-cbind(x,rbinom(300,n,p[i]))
 x-t(x)
 rate-t(t(x)/n)
 se_rate-sqrt(rate*(1-rate)/n)
 
 
 
 # Exact Confidence Interval
 
 l_cl_exact-qbeta(.025,x,n-x+1)
 u_cl_exact-qbeta(.975,x+1,n-x)
 
 for (i in 1:63){
 
 for (j in 1:300)
 {
 
 if (x[i,j]==0)
 {
 l_cl_exact[i,j]-0
 u_cl_exact[i,j]-u_cl_exact[i,j]
 }
 else if (x[i,j]==n[i])
 {
 l_cl_exact[i,j]-l_cl_exact[i,j]
 u_cl_exact[i,j]-1
 }
 else
 l_cl_exact[i,j]-l_cl_exact[i,j]
 u_cl_exact[i,j]-u_cl_exact[i,j]
 
 #print(c(i,j))
 
 }
 }
 
 
 Really appreciate any help.
 Thanks
 Anamika
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova und Tukey HSD

2013-05-13 Thread Meyners, Michael
Jochen,

a) this is an English-spoken mailing list; other languages are not encouraged 
nor will they typically generate a lot of replies...
b) your code is fine, so this is not an R-issue; you are rather stuck with some 
of the stats background -- you might want to see a friendly local statistician 
for advice :-) or try a stats mailing list (see e.g. the posting guide)
c) your interpretation is ok -- if the CI does not contain 0, you'd have 
significance at the respective level. Alternatively, you might also use the p 
value (column p adj) and compare that to your significance level (and it 
seems odd that any search machine might fail to give you some links on the 
relationship of p values and CIs -- I got plenty at first try)
d) In brief: you are doing ~100 (rough estimate) paired comparisons. With that, 
you can expect TukeyHSD to be pretty conservative, and it does not come as a 
surprise that you are unable to identify pairs of different groups, even less 
as the p value from the F test is not extremely small (~3%; whatever extremely 
small may mean in this case). Google also easily gets me to this site (and 
many other useful ones): http://www.pmean.com/05/TukeyTest.html which might 
provide some further guidance.

Net, tough luck to encounter such a situation, but apparently none of the group 
differences is strong enough to still show up as significant after correcting 
for multiplicity (which is certainly called for) in this case where you have an 
inflated number of significance tests. 

HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Jochen Schreiber
 Sent: Montag, 13. Mai 2013 14:25
 To: r-help@r-project.org
 Subject: [R] Anova und Tukey HSD
 
 Hallo zusammen,
 
 ich mache zuerst mittels eine Anova eine Überprüfung ob in den Daten ein
 signifikanter Unterschied exisitiert. Danach will ich mittels des TukeyHSD
 rausfinden, zwischen welchen Gruppe der Unterschied vorliegt. Allerdings
 weiß ich nicht wie ich die Daten interpretieren soll. Hier erstmal mein R 
 Code:
 
 [code]
 Value-c(-0.9944999814033508,-0.3585381469727,0.7063000202178955,-
 1.77435803833,-
 1.080299973487854,0.3055071525574,1.849046325684,-
 0.412440395355,0.5827999711036682,1.750669482422,-
 6.693999767303467,-0.877943869019,-
 1.3408000469207764,1.2560999393463135,-
 0.1004081062317,1.849046325684,-
 0.331905722046,0.4957999885082245,0.877943869019,0.73879998922
 34802,0.877943869019,0.9154000282287598,0.877943869019,0.70630
 00202178955,-1.3408000469207764,0.7063000202178955,-
 0.331905722046,-1.6448999643325806,0.412440395355,-
 1.6448999643325806,-
 0.877943869019,0.7487000226974487,0.4399000108242035,1.8490463
 25684,-1.6448999643325806,-2.4323999881744385,1.2265000343322754,-
 0.4957999885082245,-9.999899864196777,-1.750669482422,-
 1.6448999643325806,-9.999899864196777,0.877943869019,-
 5.06279993057251,0.877943869019,-2.967745776367,-
 5.06279993057251,-6.693999767303467,-
 1.0990500450134277,0.9944999814033508,-0.467745776367,-
 0.3585381469727,-
 9.999899864196777,0.5827999711036682,0.7487000226974487,0.738799989223
 4802,-0.2533000111579895,-9.999899864196777,-
 1.0363999605178833,0.3055071525574,-1.174523162842,-
 0.806410490417,-9.999899864196777,-0.9944999814033508,-
 2.478300094604492,-0.150930858612,0.4957999885082245,-
 4.571800231933594,-6.324900150299072,-0.38530001044273376,-
 1.3408000469207764,-5.93179988861084,-6.693999767303467,-
 2.967745776367,0.877943869019,-0.05020405311584,-
 1.77435803833,-0.150930858612,-0.23725003004074097,-
 0.643268528748,1.2560999393463135,-
 0.1004081062317,0.4399000108242035,-
 0.7063000202178955,0.9154000282287598,-
 0.2181814033508,1.2265000343322754,-
 0.412440395355,0.1764581741333,-1.4758000373840332,-
 0.9944999814033508,-1.080299973487854,-0.643268528748,-
 9.999899864196777,-2.0536999702453613,-
 0.2181814033508,0.7487000226974487,0.02510202655792,-
 1.0363999605178833,-0.05020405311584,-
 0.7387999892234802,0.4957999885082245,-1.4758000373840332,-
 0.7063000202178955,0.1764581741333,-5.06279993057251,-
 0.643268528748,-1.4758000373840332,-0.9944999814033508,-
 0.2533000111579895,0.1764581741333,-
 0.331905722046,0.6776500344276428,0.3055071525574,-
 0.05020405311584,0.5827999711036682,1.2560999393463135,-
 0.4957999885082245,-0.38530001044273376,0.9944999814033508,-
 2.4323999881744385,1.126338964844,-
 0.9944999814033508,1.750669482422,1.080299973487854,-
 0.7387999892234802,-1.3408000469207764,0.612820980835,-
 2.0536999702453613,0.7063000202178955,-0.806410490417,-
 0.877943869019,-0.05020405311584,-2.967745776367,-
 0.877943869019,-2.0536999702453613,-1.3408000469207764,-
 1.3408000469207764,-0.38530001044273376,0.7063000202178955,-
 9.999899864196777,-
 

Re: [R] ks.test and wilcoxon.test results differ from other stat.packages

2013-02-01 Thread Meyners, Michael
Impossible to say w/o a reproducible example, but to start with let me suggest 
looking at the exact= (both functions) and correct= (wilcox.test) arguments. 
Experience shows that some change of the default settings allows you to 
reproduce results from other software (and the help pages will explain you what 
the settings do; by that, you might also learn what your other tools actually 
calculate if you have no documentation at hand).

HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Alexander Favorov
 Sent: Freitag, 1. Februar 2013 09:00
 To: r-help@r-project.org
 Subject: [R] ks.test and wilcoxon.test results differ from other
 stat.packages
 
 Probably, it's an obvious info, but I have not found anything in R FAQ
 concerning this feature/bug.
 
 The results of ks.test and wilcoxon.test (in the Mann-Whitney version,
 paired = 'FALSE') don't coincide with the results from the other
 statistical packages, e.g. Statistica, Medcalc, and (as for MW test)
 from the numerous online MW tests.
 
 E.g.
 Statistica p-value=0.0435353
 Medcalc p-value=0.0435354
 R p-value=0.04635
 
 If I want to obtain result of test once, it does not matter.
 
 But what should I do if I want to perform Monte-Carlo simulations and I
 need in 1 or even 100 000 p-values and then will build some
 distribution and then use results of Statictica? Whtever, the
 descrepancy bothers.
 
 Are there some alternative packages for non-parametric statistics that
 produce results comparable with the other program packages? Or,
 probably, there is exists some environment variable to perform
 calculations in more common way?
 
 Thank you!
 
 Sasha.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unexpected (?) behavior of sort=TRUE in merge function

2012-09-04 Thread Meyners, Michael
All,

I realize from the archive that the sort argument in merge has been subject to 
discussion before, though I couldn't find an explanation for this behavior. I 
tried to simplify this to (kind of) minimal code from a real example to the 
following (and I have no doubts that there are smart people around achieving 
the same with smarter code :-)). I'm running R 2.15.1 64bit under MS Windows 7, 
full session info below.

I do have a list with two dataframes:

test - list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
4L, 5L, 6L), .Label = c(Y1, Y2, G, F, L, K), class = factor), 
cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1, 
1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286, 
5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875, 
2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429, 
4.638889, 4.42424242424242, 4.78260869565217, 4.875, 
3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125, 
6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names = 
c(product, 
cong, x), row.names = c(NA, -24L), class = data.frame), 
structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c(Y1, Y2, G, 
F, L, K), class = factor), cong = c(-1, -1, -1, -1, 
-1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957, 4.18181818181818, 
3.75, 4.31578947368421, 4.5, 3.73913043478261, 4.8876404494382, 
5.20792079207921, 5.68, 5.70526315789474, 6.38636363636364, 
4.96703296703297)), .Names = c(product, cong, x), row.names = c(NA, 
-12L), class = data.frame))


The dataframes are pretty much the same but for the values in the x-column and 
the fact that the second one has only half as many observations, missing the 
second half of the expand.grid if you like. Now if I run

lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, 
L, K), cong=c(-1,0,1,11)), all=T, sort=TRUE))   # sort=TRUE is the default, 
so could be omitted

sorts the first dataframe according to the labels of factor product, while 
for the second one the order is maintained from the first dataframes (x) to 
merge (which is the difference that I could not find being documented). Now I 
run the same code with sort=FALSE instead:

lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, 
L, K), cong=c(-1,0,1,11)), all=T, sort=FALSE))

The results are at least consistent and fulfill my needs (this is, btw, not 
unexpected from the documentation). Note that I get exactly the same behavior 
if I apply merge subsequently to test[[1]] and test[[2]], so it is not an issue 
from lapply. (I realize that my dataframes are ordered by levels of product, 
but using test[[2]] - test[[2]][sample(12),] and applying the same code as 
above reveals that indeed no sorting is done but the order is maintained from 
the first dataframe.)

I have a working solution for myself, so I'm not after any advice on how to 
achieve the sorting -- I'd just like to better understand what's going on here 
and/or what I might have missed in the documentation or in the list archives. 

Thanks in advance, 
Michael



Session info:
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.15.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected (?) behavior of sort=TRUE in merge function

2012-09-04 Thread Meyners, Michael
Rui, 

Thanks for looking into this. I apologize, I should've added my output, maybe 
it looks differently on my machine than on others. I also should have made my 
question more explicit: I'm not looking for a solution to get the sorting one 
way or another, I have that already. I rather want to understand why the same 
code behaves differently on two very similar datasets (one just having less 
rows, see below).

The first call gives the following for me:

 lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, 
 L, K), cong=c(-1,0,1,11)), all=T, sort=TRUE))
[[1]]
   product congx
1F   -1 5.857143
2F0 3.625000
3F1 4.782609
4F   11 6.301887
5G   -1 7.30
6G0 4.80
7G1 4.424242
8G   11 5.781250
9K   -1 4.375000
10   K0 4.714286
11   K1 3.804348
12   K   11 5.566038
13   L   -1 7.272727
14   L0 6.25
15   L1 4.875000
16   L   11 6.877551
17  Y1   -1 5.857143
18  Y10 3.875000
19  Y11 3.535714
20  Y1   11 5.731707
21  Y2   -1 5.90
22  Y20 2.50
23  Y21 4.638889
24  Y2   11 5.419355

[[2]]
   product congx
1   Y1   -1 3.043478
2   Y10 4.887640
3   Y11   NA
4   Y1   11   NA
5   Y2   -1 4.181818
6   Y20 5.207921
7   Y21   NA
8   Y2   11   NA
9G   -1 3.75
10   G0 5.68
11   G1   NA
12   G   11   NA
13   F   -1 4.315789
14   F0 5.705263
15   F1   NA
16   F   11   NA
17   L   -1 4.50
18   L0 6.386364
19   L1   NA
20   L   11   NA
21   K   -1 3.739130
22   K0 4.967033
23   K1   NA
24   K   11   NA
 

So different from what you may have observed, here the first data set [[1]] is 
sorted by label of product, not by value. As you correctly stated, Y1 is 
coded as 1, Y2 as 2, etc., but the first rows are for F, followed by G etc. 
The second [[2]] is sorted by level (value). So I have different behavior on 
very similar looking data sets, and hence to me at least one of those cannot be 
right according to documentation (but I agree with you that the second is 
correct according to the help). In my larger example, it seems as if data sets 
which do not originally have all combinations of product and cong anyway are 
sorted like [[2]], and those that are complete (all 24 combinations occur) are 
sorted like [[1]] is, which to me is still unexpected.

Hope this clarifies my question.

Any thoughts appreciated.
Michael

 -Original Message-
 From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
 Sent: Dienstag, 4. September 2012 14:01
 To: Meyners, Michael
 Cc: r-help
 Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function
 
 Hello,
 
 Inline.
 Em 04-09-2012 12:24, Meyners, Michael escreveu:
 
 All,
 
 I realize from the archive that the sort argument in merge has been
 subject to discussion before, though I couldn't find an explanation for
 this behavior. I tried to simplify this to (kind of) minimal code from
 a real example to the following (and I have no doubts that there are
 smart people around achieving the same with smarter code :-)). I'm
 running R 2.15.1 64bit under MS Windows 7, full session info below.
 
 I do have a list with two dataframes:
 
 test - list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L,
 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
 4L, 5L, 6L), .Label = c(Y1, Y2, G, F, L, K), class =
 factor),
 cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1,
 1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286,
 5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875,
 2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429,
 4.638889, 4.42424242424242, 4.78260869565217, 4.875,
 3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125,
 6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names =
 c(product,
 cong, x), row.names = c(NA, -24L), class = data.frame),
 structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L,
 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c(Y1, Y2, G,
 F, L, K), class = factor), cong = c(-1, -1, -1, -1,
 -1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957,
 4.18181818181818,
 3.75, 4.31578947368421, 4.5, 3.73913043478261, 4.8876404494382,
 5.20792079207921, 5.68, 5.70526315789474, 6.38636363636364,
 4.96703296703297)), .Names = c(product, cong, x), row.names =
 c(NA,
 -12L), class = data.frame))
 
 
 The dataframes are pretty much the same but for the values in the x-
 column and the fact that the second one has only half as many
 observations, missing the second half of the expand.grid if you like.
 Now if I run
 
 lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2,
 G, F, L, K), cong=c(-1,0,1,11)), all=T

Re: [R] unexpected (?) behavior of sort=TRUE in merge function

2012-09-04 Thread Meyners, Michael
Rui,

yes, without all=T it works fine, but of course there is no point in the whole 
exercise if I'd drop that, as print(test) would do the same, unless I have 
other values of product or cong in any dataset, which I haven't. :-) 

The purpose of the merge is to have all combinations of the levels of product 
and cong in each dataframe in my list -- there might be smarter ways, but it 
does the trick, and I need a unified setup (layout, size and sorting of the 
data) to ease the following steps in my code. Of course, I could achieve that 
easily by sorting the data subsequently, so there are multiple ways to get what 
I want to have. However, the purpose being a bit beyond this post, it's really 
about the behavior here on data sets that look so similar, and about the fact 
that one of those is not treated like it should be according to documentation). 

Thanks again for taking the time to reply.

Cheers, Michael



 -Original Message-
 From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
 Sent: Dienstag, 4. September 2012 16:58
 To: Meyners, Michael
 Cc: r-help
 Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function
 
 Hello,
 
 You're right I had missed the point, sorry.
 I can't see a reason why that behavior, but it seems to have to do with
 all = T, remove it and the problem is gone. But that's probably not
 what you want.
 NA's issue?
 
 Rui Barradas
 
 Em 04-09-2012 15:17, Meyners, Michael escreveu:
  Rui,
 
  Thanks for looking into this. I apologize, I should've added my
 output, maybe it looks differently on my machine than on others. I also
 should have made my question more explicit: I'm not looking for a
 solution to get the sorting one way or another, I have that already. I
 rather want to understand why the same code behaves differently on two
 very similar datasets (one just having less rows, see below).
 
  The first call gives the following for me:
 
  lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2,
  G, F, L, K), cong=c(-1,0,1,11)), all=T, sort=TRUE))
  [[1]]
  product congx
  1F   -1 5.857143
  2F0 3.625000
  3F1 4.782609
  4F   11 6.301887
  5G   -1 7.30
  6G0 4.80
  7G1 4.424242
  8G   11 5.781250
  9K   -1 4.375000
  10   K0 4.714286
  11   K1 3.804348
  12   K   11 5.566038
  13   L   -1 7.272727
  14   L0 6.25
  15   L1 4.875000
  16   L   11 6.877551
  17  Y1   -1 5.857143
  18  Y10 3.875000
  19  Y11 3.535714
  20  Y1   11 5.731707
  21  Y2   -1 5.90
  22  Y20 2.50
  23  Y21 4.638889
  24  Y2   11 5.419355
 
  [[2]]
  product congx
  1   Y1   -1 3.043478
  2   Y10 4.887640
  3   Y11   NA
  4   Y1   11   NA
  5   Y2   -1 4.181818
  6   Y20 5.207921
  7   Y21   NA
  8   Y2   11   NA
  9G   -1 3.75
  10   G0 5.68
  11   G1   NA
  12   G   11   NA
  13   F   -1 4.315789
  14   F0 5.705263
  15   F1   NA
  16   F   11   NA
  17   L   -1 4.50
  18   L0 6.386364
  19   L1   NA
  20   L   11   NA
  21   K   -1 3.739130
  22   K0 4.967033
  23   K1   NA
  24   K   11   NA
 
 
  So different from what you may have observed, here the first data set
 [[1]] is sorted by label of product, not by value. As you correctly
 stated, Y1 is coded as 1, Y2 as 2, etc., but the first rows are for
 F, followed by G etc. The second [[2]] is sorted by level (value). So I
 have different behavior on very similar looking data sets, and hence to
 me at least one of those cannot be right according to documentation
 (but I agree with you that the second is correct according to the
 help). In my larger example, it seems as if data sets which do not
 originally have all combinations of product and cong anyway are sorted
 like [[2]], and those that are complete (all 24 combinations occur) are
 sorted like [[1]] is, which to me is still unexpected.
 
  Hope this clarifies my question.
 
  Any thoughts appreciated.
  Michael
 
  -Original Message-
  From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
  Sent: Dienstag, 4. September 2012 14:01
  To: Meyners, Michael
  Cc: r-help
  Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge
  function
 
  Hello,
 
  Inline.
  Em 04-09-2012 12:24, Meyners, Michael escreveu:
 
  All,
 
  I realize from the archive that the sort argument in merge has been
  subject to discussion before, though I couldn't find an explanation
  for this behavior. I tried to simplify this to (kind of) minimal
 code
  from a real example to the following (and I have no doubts that
 there
  are smart people around achieving the same with smarter code :-)).
  I'm running R 2.15.1 64bit under MS Windows 7, full session info

Re: [R] How to measure level of similarity of two data frames

2012-05-28 Thread Meyners, Michael
Kel,
in addition, and depending on how you define similarity, you might want to 
look into the RV coefficient as a measure of it (it is actually related to a 
correlation, so similarity would rather mean similar information though not 
necessarily small Euclidean distance); coeffRV in FactoMineR would be one 
option to determine it.
HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Lamke
 Sent: Samstag, 26. Mai 2012 20:05
 To: r-help@r-project.org
 Subject: [R] How to measure level of similarity of two data frames
 
 Hi group,
 
 I've been thinking of calculating euclidean distance between each
 column of a data frames that each consists of standardized numerical
 columns.
 However, I don't know if there's a way of summarizing the overall
 distance by some kind of metrics.  If anyone know a proper way of doing
 so and/or a package I would greatly appreciate your suggestions.
 Thanks very much!
 
 Kel
 
 --
 View this message in context: http://r.789695.n4.nabble.com/How-to-
 measure-level-of-similarity-of-two-data-frames-tp4631466.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-parametric test for repeated measures and post-hoc single comparisons in R?

2012-02-20 Thread Meyners, Michael
No, the authors are correct: the individuals (i.e. the 17 individuals) you have 
need to be independent (i.e. no correlation between them, let alone any 
individual running through your temporal experiment more than once, as 
indicated in the citation), while the *observations* are of course dependent as 
they are within the same subject (individual -- they have the same subject 
effect). Think of Friedman as a non-parametric 2-way ANOVA with one of the 
factors being subject; observations of the same subject are dependent, but once 
you include the subject effect, the errors are assumed to be independent (which 
implies that subjects need to be independent and should, e.g., not work on the 
assessment together).
The imprecision is in your interpretation of individuals vs. observations. 
HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of saschav...@gmail.com
 Sent: Montag, 20. Februar 2012 09:59
 To: peter dalgaard
 Cc: r-help@r-project.org
 Subject: Re: [R] Non-parametric test for repeated measures and post-hoc
 single comparisons in R?
 
 Thanks, I got it! (And I think I should have googled what replicated
 means!) However, then Bortz, Lienert, Boehnke are imprecise, if not
 wrong: Der Friedman-Test setzt voraus, dass die N Individuen
 wechselseitig unabhängig sind, dass also nicht etwa ein und dasselbe
 Individuum zweimal oder mehrmals im Untersuchungsplan auftritt (p.
 271). Which I (hope to) translate: The Friedman test requires the N
 individuals to be reciprocally independent, which means that one
 individual cannot occur twice or more times in the research design.
 
 *S*
 
 On 19.02.12 22:04, peter dalgaard wrote:
 
  Repeated measures means that you have multiple measurements on the
 same individual. Usually, the same person measured at different time
 points. So if you have N individuals and T times, then you can place
 your observations in an N*T layout.
 
  In this layout, you can have 1 observation per cell or R  1
 observations. In the former case, the design is referred to as
 unreplicated.  Got it?
 
  -pd
 
 
  On Feb 19, 2012, at 19:25 , saschav...@gmail.com wrote:
 
  Some attribute x from 17 individuals was recorded repeatedly on 6
 time points using a Likert scale with 7 distractors. Which statistical
 test(s) can I apply to check whether the changes along the 6 time
 points were significant?
 
  set.seed( 123 )
  x- matrix( sample( 1:7, 17*6, repl=T ),
nrow = 17, byrow = TRUE,
dimnames = list(1:17, paste( 'T', 1:6, sep='' ))
  )
 
  I found the Friedman test and the Quade test for testing the overall
 hypothesis.
 
  friedman.test( x )
  quade.test( x )
 
  However, the R help files, my text books (Bortz, Lienert and
 Boehnke, 2008; Köhler, Schachtel and Voleske, 2007; both German), and
 the Wikipedia texts differ in what they propose as requirements for the
 tests. R says that data need to be unreplicated. I read 'unreplicated'
 as 'not-repeated', but is that right? If so, the example, in contrast,
 in friedman.test() appears to use indeed repeated measures. Yet,
 Wikipedia says the contrary that is to say the test is good especially
 if data represents repeated measures. The text books say either (in the
 same paragraph, which is very confusing). What is right?
 
  In addition, what would be an appropriate test for post-hoc single
 comparisons for the indication which column differs from others
 significantly?
 
  Bortz, Lienert, Boehnke (2008). Verteilungsfreie Methoden in der
  Biostatistik. Berlin: Springer Köhler, Schachtel, Voleske (2007).
  Biostatistik: Eine Einführung für Biologen und Agrarwissenschaftler.
  Berlin: Springer
 
  --
  Sascha Vieweg, saschav...@gmail.com
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 Sascha Vieweg, saschav...@gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 sample wilcox.test != kruskal.test

2012-01-11 Thread Meyners, Michael
The devil is in the details (and in the arguments in Lukasz code). The defaults 
for the two functions are different: wilcox.test uses an exact test (which is 
not available in kruskal.test afaik) for your data, and uses the continuity 
correction if the normal approximation is requested (neither available in 
kruskal.test). See the manual (in particular ?wilcox.test) for details, or the 
pertinent literature for the theoretical background.

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Lukasz Reclawowicz
 Sent: Wednesday, January 11, 2012 12:00
 To: syrvn
 Cc: r-help@r-project.org
 Subject: Re: [R] 2 sample wilcox.test != kruskal.test
 
 2012/1/11 syrvn ment...@gmx.net
 
  Hi,
 
  thanks for your answer. Unfortunately I cannot reproduce your
 results.
 
  In my example the results still differ when I use your approach:
 
   x - c(10,11,15,8,16,12,20)
   y - c(10,14,18,25,28,30,35)
   f - as.factor(c(rep(a,7), rep(b,7))) d - c(x,y)
   kruskal.test(x,y)
 
 
 Try to compare  wilcox.test and right formula in kruskal, I got:
  all.equal(wilcox.test(x,y, correct =
 F,exact=F)$p.value,kruskal.test(d~f)$p.value)
 [1] TRUE
 
 --
 Mi³ego dnia
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using a mathematical expression in sapply() XXXX

2012-01-04 Thread Meyners, Michael
Dan,

It depends on what you want to achieve. I suspect you just want to remove 
missing values before summing; if so, consider

sapply(x, sum, na.rm=TRUE)

instead. To make your code running, try

sapply(x, function(x) sum(!is.na(x)))

However, this would just count the number of non-missing values per subgroup, 
not the sum of the values. 

Similarly, 

sapply(x, function(x) sum(x, na.rm=TRUE))

would follow the same rationale but give the sums (with missing values 
removed), i.e. the same results as the first line of code. I'd rather choose 
the first option than this one, though.

Of course, you could also define a new function 

sum1 - function(x) sum(!is.na(x)) # or probably rather sum(x, na.rm=TRUE), 
depending on your needs

and then use 

sapply(x, sum1)

but that seems a bit overkill, I'd say...

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Dan Abner
 Sent: Wednesday, January 04, 2012 14:41
 To: r-help@r-project.org
 Subject: [R] Using a mathematical expression in sapply() 
 
 Hello everyone,
 
 I have the following call to sapply() and error message. Is the most
 efficient way to deal with this to make sum(!is.na(x)) a function in a
 separate line prior to this call? If not, please advise.
 
 N.Valid=sapply(x,sum(!is.na(x)))
 Error in match.fun(FUN) :
   'sum(!is.na(x))' is not a function, character or symbol
 
 
 Thanks!
 
 Dan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generalized procrustes analysis without translation

2011-12-09 Thread Meyners, Michael
I’m not aware of any, but if you really want this, it should be possible to 
modify the code of any of the functions you propose and delete the part doing 
the translation. I’m not sure that this is a good idea, though; either your 
matrices are *truly* centered, then it doesn’t make a difference, or they are 
only centered in theory but not in practice due to random error, then the 
rotation and scaling will depend heavily on the deviation of the center of the 
matrix from the origin. Optimal scaling and rotation in GPA are independent 
from each other so might be (de-)selected depending on needs, but they depend 
on the translation, so it seems to make sense that you cannot deselect that 
part. You wouldn’t do scaling of two variables without centering them around 0 
either, would you? You may transform them back if required, but scaling without 
centering first does not appeal to me, same for rotations.
HTH, Michael

From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mark Heckmann
Sent: Wednesday, December 07, 2011 19:09
To: r-help@r-project.org
Subject: [R] generalized procrustes analysis without translation

I am searching for a GPA function/algorithm that allows to omit translation of 
matrices.
procGPA in the shapes package does not have this option.
I need scaling, rotation and reflection but no translation as my matrices are 
naturally centered.

I am not sure if GPA from FactoMineR might be used for this.
There are no arguments allowing to switch on the above mentioned arguments 
except for scaling.
Frankly, I also do not understand how to set up the data for GPA properly, 
especially how to use the group argument.

Does someone know another GPA function or has some code available?

Thanks in advance
Mark


Mark Heckmann
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com











    [[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pearson chi-square test

2011-09-27 Thread Meyners, Michael
Not sure what you want to test here with two matrices, but reading the manual 
helps here as well: 

y   a vector; ignored if x is a matrix.

x and y are matrices in your example, so it comes as no surprise that you get 
different results. On top of that, your manual calculation is not correct if 
you want to test whether two samples come from the same distribution (so don't 
be surprised if R still gives a different value...).

HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Michael Haenlein
 Sent: Tuesday, September 27, 2011 12:45
 To: r-help@r-project.org
 Subject: [R] Pearson chi-square test
 
 Dear all,
 
 I have some trouble understanding the chisq.test function.
 Take the following example:
 
 set.seed(1)
 A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
 B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
 C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
 x - table(A,B)
 y - table(A,C)
 
 When I calculate the test statistic by hand I get a value of
 approximately
 75.9:
 http://en.wikipedia.org/wiki/Pearson's_chi-
 square_test#Calculating_the_test-statistic
 sum((x-y)^2/y)
 
 But when I do chisq.test(x,y) I get a value of 12.2 while
 chisq.test(y,x)
 gives a value of 10.3.
 
 I understand that I must be doing something wrong here, but I'm not
 sure
 what.
 
 Thanks,
 
 Michael
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pearson chi-square test

2011-09-27 Thread Meyners, Michael
Just for completeness: the manual calculation you'd want is most likely

sum((x-y)^2  / (x+y))

(that's one you can find on the Wikipedia link you provided). To get the same 
from chisq.test, try something like 

chisq.test(data.frame(x,y)[,c(3,6)])

(there are surely smarter ways, but at least it works here). Note that 
something like 

chisq.test(as.vector(x), as.vector(y)) 

will give a different test, i.e. based on a contingency table of x cross y).
M. 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Meyners, Michael
 Sent: Tuesday, September 27, 2011 13:28
 To: Michael Haenlein; r-help@r-project.org
 Subject: Re: [R] Pearson chi-square test
 
 Not sure what you want to test here with two matrices, but reading the
 manual helps here as well:
 
 y a vector; ignored if x is a matrix.
 
 x and y are matrices in your example, so it comes as no surprise that
 you get different results. On top of that, your manual calculation is
 not correct if you want to test whether two samples come from the same
 distribution (so don't be surprised if R still gives a different
 value...).
 
 HTH, Michael
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Michael Haenlein
  Sent: Tuesday, September 27, 2011 12:45
  To: r-help@r-project.org
  Subject: [R] Pearson chi-square test
 
  Dear all,
 
  I have some trouble understanding the chisq.test function.
  Take the following example:
 
  set.seed(1)
  A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
  B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
  C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
  x - table(A,B)
  y - table(A,C)
 
  When I calculate the test statistic by hand I get a value of
  approximately
  75.9:
  http://en.wikipedia.org/wiki/Pearson's_chi-
  square_test#Calculating_the_test-statistic
  sum((x-y)^2/y)
 
  But when I do chisq.test(x,y) I get a value of 12.2 while
  chisq.test(y,x)
  gives a value of 10.3.
 
  I understand that I must be doing something wrong here, but I'm not
  sure
  what.
 
  Thanks,
 
  Michael
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pearson chi-square test

2011-09-27 Thread Meyners, Michael
I suspect that the chisquare-test might not be appropriate, as you have 
constraints (same number of observations for A in both contingency tables). I 
further suspect that there is no test readily available for that, but I might 
be wrong. Maybe randomization tests could help here, but it would require a bit 
of thinking AND programming to accomplish that. chisq.test might give you an 
approximate solution, but I can't say how good this will be (and it might also 
depend on the data, btw).
Best, Michael


From: Michael Haenlein
Sent: Tuesday, September 27, 2011 17:05
To: r-help@r-project.org
Cc: Meyners, Michael
Subject: RE: [R] Pearson chi-square test

Dear Michael,
 
Thanks very much for your answers!
 
The purpose of my analysis is to test whether the contingency table x is 
different from the contingency table y.
Or, to put it differently, whether there is a significant difference between 
the joint distribution AB and AC.
 
Based on your answer I'm wondering whether the best way to do this is really a 
chisq.test?
Or is there probably a different function or package I should use altogether?
 
Thanks,
 
Michael
 
 
 
-Original Message-
From: Meyners, Michael 
Sent: Dienstag, 27. September 2011 17:00
To: Michael Haenlein; r-help@r-project.org
Subject: RE: [R] Pearson chi-square test
 
Just for completeness: the manual calculation you'd want is most likely
 
sum((x-y)^2  / (x+y))
 
(that's one you can find on the Wikipedia link you provided). To get the same 
from chisq.test, try something like 
 
chisq.test(data.frame(x,y)[,c(3,6)])
 
(there are surely smarter ways, but at least it works here). Note that 
something like 
 
chisq.test(as.vector(x), as.vector(y)) 
 
will give a different test, i.e. based on a contingency table of x cross y).
M. 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Meyners, Michael
 Sent: Tuesday, September 27, 2011 13:28
 To: Michael Haenlein; r-help@r-project.org
 Subject: Re: [R] Pearson chi-square test
 
 Not sure what you want to test here with two matrices, but reading the
 manual helps here as well:
 
 y   a vector; ignored if x is a matrix.
 
 x and y are matrices in your example, so it comes as no surprise that
 you get different results. On top of that, your manual calculation is
 not correct if you want to test whether two samples come from the same
 distribution (so don't be surprised if R still gives a different
 value...).
 
 HTH, Michael
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Michael Haenlein
  Sent: Tuesday, September 27, 2011 12:45
  To: r-help@r-project.org
  Subject: [R] Pearson chi-square test
 
  Dear all,
 
  I have some trouble understanding the chisq.test function.
  Take the following example:
 
  set.seed(1)
  A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
  B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
  C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
  x - table(A,B)
  y - table(A,C)
 
  When I calculate the test statistic by hand I get a value of
  approximately
  75.9:
  http://en.wikipedia.org/wiki/Pearson's_chi-
  square_test#Calculating_the_test-statistic
  sum((x-y)^2/y)
 
  But when I do chisq.test(x,y) I get a value of 12.2 while
  chisq.test(y,x)
  gives a value of 10.3.
 
  I understand that I must be doing something wrong here, but I'm not
  sure
  what.
 
  Thanks,
 
  Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] library(SenoMineR)- Triangle Test Query

2011-06-08 Thread Meyners, Michael
Vijayan,

I cannot find an error in your code, but I had a look at the code of 
triangle.test -- unless I'm missing something, it contains a bug. If you study 
the way in which the matrix pref is updated, you find that the vector 
preference is compared to 1, 2 and 3 instead of X, Y and Z as it should 
be. That way, some of the non-diagonal entries of pref will always be zero, 
irrespective of the data, which does not make sense. I think it should work if 
you modify the code accordingly. Alternatively, a quick patch (untested!) might 
be to code preferences as 1, 2 and 3 instead of the letters (but I'm not sure 
whether this has any other implications).
I CC the author of the function and maintainer of the package; he should 
correct me if needed or could otherwise update the code for the next release (I 
worked on SensoMineR 1.11).

Hope this helps, 
Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Vijayan Padmanabhan
 Sent: Saturday, June 04, 2011 9:21
 To: r-help@r-project.org
 Subject: [R] library(SenoMineR)- Triangle Test Query
 
 Dear R Group
 I was trying to use the triangle.test function in SensoMineR and
 strangely i
 encounter a error in the output of preference matrix from the analysis.
 To illustrate, pl see the following dataframe of a design with the
 response
 and preference collected as shown below:
 
 design-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2,
 4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4,
 3, 1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1), `Product Z` = c(3,
 2, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 3, 2), Response =
 structure(c(1L,
 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
 1L, 1L, 2L), .Label = c(X, Z), class = factor), Preference =
 structure(c(1L,
 3L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L,
 2L, 1L, 2L), .Label = c(X, Y, Z), class = factor)), .Names =
 c(Product X,
 Product Y, Product Z, Response, Preference), class =
 data.frame,
 row.names = c(Panelist1.Test1,
 Panelist1.Test2, Panelist2.Test1, Panelist2.Test2,
 Panelist3.Test1,
 Panelist3.Test2, Panelist4.Test1, Panelist4.Test2,
 Panelist5.Test1,
 Panelist5.Test2, Panelist6.Test1, Panelist6.Test2,
 Panelist7.Test1,
 Panelist7.Test2, Panelist8.Test1, Panelist8.Test2,
 Panelist9.Test1,
 Panelist9.Test2, Panelist10.Test1, Panelist10.Test2))
 
 If you were to investigate the above dataframe, you would find that for
 the
 comparision of Product 2 Vs Product 3, the preference indicates product
 3 is
 preferred over product 2 all the time.
 
 ## Read output from the following script to see what i mean above:
 subset(design,`Product X`==2`Product Y`==3`Product Z`==3)
 
 ##Output of above would be:
 . Product X Product Y Product Z Response Preference
 Panelist3.Test2 2 3 3X  Y
 Panelist6.Test2 2 3 3X  Y
 Panelist9.Test2 2 3 3X  Y
 
 However when I analyse the design with the answers and preferences
 using the
 following script, I get the $pref output which shows that product 2 is
 preferred over 3 all the time. Can somebody explain what is wrong in my
 script?
 
 answer-as.vector(design$Response)
 preference-as.vector(design$Preference)
 triangle.test (design[,1:3], answer,preference)
 
 ##$pref output from the triangle.test function shows as follows:
 
 $pref
   1 2 3 4
 1 0 0 0 0
 2 4 0 3 0
 3 0 0 0 0
 4 0 0 0 0
 
 
 Any help in helping me identify what is going wrong here would be
 highly
 appreciated.
 Regards
 Vijayan Padmanabhan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] library(SenoMineR)- Triangle Test Query

2011-06-08 Thread Meyners, Michael
Unless I missed it, neither the OP nor the list was CC'd on this, so for anyone 
interested, I forward this solution (untested from my side) from the package 
maintainer. Not sure whether the file comes through, so I include the updated 
code in the message's body below. 
Cheers, Michael


** updated code for triangle.test 
triangle.test - function (design,answer,preference=NULL){

answer = gsub((\\w), \\U\\1, as.character(answer), perl=TRUE)
labprod = 
levels(as.factor(c(as.character(design[,1]),as.character(design[,2]),as.character(design[,3]
nbprod = length(labprod)
nb.answer = nb.good = pref = matrix(0,nbprod,nbprod)
for (i in 1:nrow(design)){
  for (j in 1:nbprod){
 if (labprod[j] == design[i,1]) i1 = j
 if (labprod[j] == design[i,2]) i2 = j
 if (labprod[j] == design[i,3]) i3 = j
  }
  if (i1!=i2) nb.answer [i1,i2] = nb.answer[i1,i2]+1
  if (i1==i2) nb.answer [i1,i3] = nb.answer[i1,i3]+1
  if ((i1==i2)(answer[i]==Z)){
nb.good[i1,i3]=nb.good[i1,i3]+1
if (length(preference)0){
  if (preference[i]!=Z) pref[i3,i1] = pref[i3,i1] +1
  if (preference[i]==Z) pref[i1,i3] = pref[i1,i3] +1
}
  }
  if ((i1==i3)(answer[i]==Y)){
nb.good[i1,i2]=nb.good[i1,i2]+1
if (length(preference)0){
  if (preference[i]!=Y) pref[i2,i1] = pref[i2,i1] +1
  if (preference[i]==Y) pref[i1,i2] = pref[i1,i2] +1
}
  }
  if ((i2==i3)(answer[i]==X)){
nb.good[i1,i2]=nb.good[i1,i2]+1
if (length(preference)0){
  if (preference[i]!=X) pref[i1,i2] = pref[i1,i2] +1
  if (preference[i]==X) pref[i2,i1] = pref[i2,i1] +1
}
  }
}
nb.good = nb.good + t(nb.good)
nb.answer = nb.answer + t(nb.answer)

diag(nb.answer)=1
prob = pbinom(nb.good-1,nb.answer,1/3,lower.tail=FALSE)
maxML = recognize = minimum = matrix(NA,nbprod,nbprod)
for (i in 1: (nbprod-1)){
  for (j in (i+1):nbprod){
aux = matrix(0,nb.good[i,j]+1,1)
for (k in 0:nb.good[i,j]) aux[k] = 
dbinom(nb.good[i,j]-k,nb.answer[i,j]-k,1/3)
maxML[i,j] = maxML[j,i] = max(aux)
recognize[i,j] = recognize[j,i] = rev(order(aux))[1]-1
mini = 0
for (k in round(nb.answer[i,j]/3):nb.answer[i,j]) if 
((mini==0)(dbinom(k,nb.answer[i,j],1/3)0.05)) mini=k
minimum[i,j]=minimum[j,i]=mini
  }
}

confusion = (nb.answer-recognize) / nb.answer
diag(nb.answer)=diag(recognize)=0
diag(maxML)=diag(confusion)=1

rownames(nb.answer) = colnames(nb.answer) = rownames(nb.good) = 
colnames(nb.good) = labprod
rownames(prob) = colnames(prob)= rownames(confusion) = colnames(confusion)= 
labprod
rownames(maxML) = colnames(maxML) = rownames(minimum) = colnames(minimum) = 
rownames(recognize) = colnames(recognize) = labprod
if (length(preference)0) rownames(pref) = colnames(pref) = labprod

res = list()
res$nb.comp = nb.answer
res$nb.ident = nb.good
res$p.value = prob
res$nb.recognition = recognize
res$maxML = maxML
res$confusion = confusion
res$minimum = minimum
if (length(preference)0) res$pref = pref
##res$complete = result
return(res)
}

** end updated code for triangle.test 




-Original Message-
From: Francois Husson 
Sent: Wednesday, June 08, 2011 14:43
To: Meyners, Michael
Subject: Re: [R] library(SenoMineR)- Triangle Test Query

  Dear Vijayan, dear Michael,

Indeed there was an error in the function triangle.test. I attach the new 
function.
Thanks Michael for your answer.
Best
Francois


Le 08/06/2011 12:32, Meyners, Michael a écrit :
 Vijayan,

 I cannot find an error in your code, but I had a look at the code of 
 triangle.test -- unless I'm missing something, it contains a bug. If you 
 study the way in which the matrix pref is updated, you find that the vector 
 preference is compared to 1, 2 and 3 instead of X, Y and Z as it should 
 be. That way, some of the non-diagonal entries of pref will always be zero, 
 irrespective of the data, which does not make sense. I think it should work 
 if you modify the code accordingly. Alternatively, a quick patch (untested!) 
 might be to code preferences as 1, 2 and 3 instead of the letters (but I'm 
 not sure whether this has any other implications).
 I CC the author of the function and maintainer of the package; he should 
 correct me if needed or could otherwise update the code for the next release 
 (I worked on SensoMineR 1.11).

 Hope this helps,
 Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Vijayan Padmanabhan
 Sent: Saturday, June 04, 2011 9:21
 To: r-help@r-project.org
 Subject: [R] library(SenoMineR)- Triangle Test Query

 Dear R Group
 I was trying to use the triangle.test function in SensoMineR and
 strangely i
 encounter a error in the output of preference matrix from the analysis.
 To illustrate, pl see the following dataframe of a design with the
 response
 and preference collected as shown below:

 design-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2,
 4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4,
 3, 1, 1, 4

Re: [R] help on permutation/randomization test

2011-05-24 Thread Meyners, Michael
I suspect you need to give more information/background on the data (though this 
is not primarily an R-related question; you might want to try other resources 
instead). Unless I'm missing something here, I cannot think of ANY reasonable 
test: A permutation (using permtest or anything else) would destroy the 
correlation structure and hence give invalid results, and the assumptions of 
parametric tests are violated as well. Basically, you only have two 
observations, one for each group; with some good will you might consider these 
as repeated measurements, but still on the same subject or whatsoever. Hence no 
way to discriminate the subject from a treatment effect. There is not enough 
data to permute or to rely a statistical test on. So unless you can get rid of 
the dependency within groups (or at least reasonably assume observations to be 
independent), I'm not very optimistic...
HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Wenjin Mao
 Sent: Monday, May 23, 2011 20:56
 To: r-help@r-project.org
 Subject: [R] help on permutation/randomization test
 
 Hi,
 
 I have two groups of data of different size:
group A: x1, x2, , x_n;
group B: y1, y2, , y_m; (m is not equal to n)
 
 The two groups are independent but observations within each group are
 not independent,
  i.e., x1, x2, ..., x_n are not independent; but x's are independent
 from y's
 
 I wonder if randomization test is still applicable to this case. Does
 R have any function that can do this test for large m and n? I notice
 that permtest can only handle small (m+n22) samples.
 
 Thank you very much,
 Wenjin
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer() with no intercept

2010-06-14 Thread Meyners, Michael
John, 
Why would you want to fit the model without intercept if you seemingly need it? 
Anyway, I assume that the intercept from your first model just moves into the 
random effects -- you have intercepts there for worker and day, so any of these 
(or both) will absorb it. No surprise that the estimates for the covariates 
only differ slightly, it should be that way. 
What you plot (your second call to panel.lines) is not the correct model, as 
you omit the intercept from the work and day (which is 0 or at least pretty 
close to it if you include the overall intercept in your model). That's why 
your red line is on the top edge (note that the intercept is negative). 
I'm therefore not sure that the model without intercept makes a lot of sense, 
but you consider posting related questions rather to the mixed-models SIG, 
where you might get more erudite comments than from me.
HTH, Michael


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of array chip
Sent: Saturday, June 12, 2010 1:07
To: r-help@r-project.org
Subject: [R] lmer() with no intercept

Hi, I asked this before, but haven't got any response. So would like to have 
another try. thanks for help. Also tried twice to join the  model mailing list 
so that I can ask question there, but still haven't got permission to join that 
list yet.



===


Hi, I am wondering how I can specify no intercept in a mixed model using 
lmer(). 

Here is an example dataset attached (test.txt). There are 3 workers, in 5 
days, measured a response variable y on independent variable x. I want to 
use a quadratic term (x2 in the dataset) to model the relationship between y 
and x.

test-read.table(test.txt,sep='\t',header=T)

If I just simply use lm() and ignore worker and day, so that I can try both a 
linear regression with and without an intercept, here is what I get:

lm(y~x+x2, data=test)
Coefficients:
(Intercept)x   x2  
-1.77491040.1099160   -0.0006152

lm(y~x+x2-1, data=test)
Coefficients:
 x  x2  
0.0490097  -0.0001962

Now, I want to try mixed model considering worker and day as random effect. 

With an intercept:

lmer(y~x+x2+(1|worker)+(1|day), data=test)
Fixed effects:
  Estimate Std. Error t value
(Intercept) -1.324e+00  4.490e-01  -2.948
x1.117e-01  8.563e-03  13.041
x2  -6.357e-04  7.822e-05  -8.127

Without an intercept:

lmer(y~x+x2+(1|worker)+(1|day)-1, data=test)
Fixed effects:
 Estimate Std. Error t value
x   1.107e-01  8.528e-03  12.981
x2 -6.304e-04  7.805e-05  -8.077

It seems working fine. But if you look at the fixed effect coefficients of both 
mixed models, the coefficients for x and x2 are not much different, regardless 
of whether an intercept is included or not. This is not the case for simple 
linear regression using lm() on the top.

If I plot all 4 models in the following plot:

xyplot(y~x,groups=worker,test, col.line = grey, lwd = 2,
, panel = function(x,y) { 
panel.xyplot(x,y, type='p')
x-sort(x)
panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) 
panel.lines(x,0.1107*x-0.0006304*x*x,col='red') 
panel.lines(x,0.04901*x-0.0001962*x*x,col='blue') 
panel.lines(x,-1.7749+0.10992*x-0.0006152*x*x,col='green') 


  }) 

As you can see, the mixed model without intercept (red line) does not fit the 
data very well (it's at the top edge of the data, instead of in the middle of 
the data), so I guess I did something wrong here.

Can anyone make any suggestions?

Thanks

John


  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RWinEdt

2010-05-27 Thread Meyners, Michael
I assume you installed WinEdt 6. See 
https://stat.ethz.ch/pipermail/r-help/2010-May/238540.html
and related messages. RWinEdt does not work (yet) with WinEdt 6, so you'll have 
to downgrade back to WinEdt 5.x (or use another editor for the time being).
HTH, Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Yves Aragon
Sent: Saturday, May 22, 2010 18:25
To: r-help@r-project.org
Subject: [R] RWinEdt

I have installed RWinEdt but when I try to load it,  I get the message :

Error in normalizePath(path) :
  path[1]=C:\Users\yves\AppData\Roaming\WinEdt/R.ini: The specified file 
cannot be found.

I have the problem with R 2.11.0 and R 2.10.1,  since I have re-installed 
WinEdt.

Yves

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Suppressing scientific notation on plot axis tick labels

2010-02-02 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Strange, the following works reproducibly on my machine (Windows 2000
Pro):

options(scipen = 50, digits = 5)
x  = c(1e7, 2e7)
?barplot
barplot(x) 

while I also get scientific with your code. After I called ?barplot
once, I'm incapable of getting the scientific notation again, though...
But I admit that I am outdated, I still run 2.8.1 (but not for long...)

Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Shvorob
 Sent: Dienstag, 2. Februar 2010 12:59
 To: r-help@r-project.org
 Subject: Re: [R] Suppressing scientific notation on plot axis 
 tick labels
 
 
 options(scipen = 50, digits = 5)
 x  = c(1e7, 2e7)
 barplot(x) 
 
 Still scientific...
 --
 View this message in context: 
 http://n4.nabble.com/Suppressing-scientific-notation-on-plot-a
xis-tick-labels-tp1459697p1459828.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Explanation w.r.t. rbind, please!

2010-01-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Uwe Dippel
 Sent: Freitag, 29. Januar 2010 11:57
 To: r-help@r-project.org
 Subject: [R] Explanation w.r.t. rbind, please!
 
 This is what I tried:
 
   num.vec - c(12.34,56.78,90.12,34.56)   
 names(num.vec)-c(first,second,third,fourth)
   num.vec
  first second  third fourth
  12.34  56.78  90.12  34.56
   seq-(1:4)  
   num.mat-rbind(num.vec,seq)
   num.mat   
 first second third fourth
 num.vec 12.34  56.78 90.12  34.56
 seq  1.00   2.00  3.00   4.00
   num.vec [3:4]
  third fourth
  90.12  34.56
 
 (until here I'm fine)
 
   num.mat [seq]
 [1] 12.34  1.00 56.78  2.00

What you (probably) want here is
num.mat [seq,]

   num.mat [num.vec]
 [1] NA NA NA NA   

num.mat[num.vec,]

and so on. You have to use tell R that you want the ROW (that's why the
comma is needed) defined by the NAME seq or num.vec (that's why you
need ) . Otherwise, R replaces seq by its value 1:4, so 
num.mat[seq] 
is identical to 
num.mat[1:4] 
(and num.mat[12.34] is NA of course in num.mat[num.vec])

You should have a closer look at ?Extract
HTH, Michael

   num.vec [seq]
  first second  third fourth
  12.34  56.78  90.12  34.56
   num.mat [num.vec]
 [1] NA NA NA NA
   num.mat [-seq]  
 [1] 90.12  3.00 34.56  4.00
 
 (and here I'm lost!)
 
 How could I display a row, instead of always seemingly 
 falling back to columns?
 
 Uwe
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exact wilcox test and Bonferroni correction

2010-01-22 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
?p.adjust
Apply that to a vector containing all p values you get from
wilcox.exact. Alternatively, multiply each p value by the number of
comparisons you perform, see any textbook for that. You might want to
consider a less conservative correction, though.
HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of netrunner
 Sent: Freitag, 22. Januar 2010 10:41
 To: r-help@r-project.org
 Subject: [R] exact wilcox test and Bonferroni correction
 
 
 Hi!
 I need to perform a Mann-Whitney test using Bonferroni 
 correction. I found the function pairwise.wilcox.test. 
 However, my data have ties. I know that when there are ties 
 is better to perform Mann Whitney test using the function 
 wilcox.exact. How can include in that the Bonferroni correction?
 
 thank you for your help!
 
 netrunner
 --
 View this message in context: 
 http://n4.nabble.com/exact-wilcox-test-and-Bonferroni-correcti
on-tp1099893p1099893.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exact wilcox test and Bonferroni correction

2010-01-22 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
(i) you EITHER correct the p value (by multiplying by 8 in your case) OR
you use the Bonferroni-threshold of 0.05/8, not both. If you correct the
p values, your threshold remains 0.05. If you use 0.05/8, you use the
original p values.
(ii) Yes, if the p value is 0.15, then the corrected one for 8 tests is
min(0.15*8, 1)=1 (i.e. all p values of 1/8 and higher will be 1 after
correction)
(iii) No, you can NOT AT ALL conclude that your samples are from the
same distribution. You have no evidence that they are from different
ones, but that does not imply that they come from the same (or a
similar) one. In particular as the power of the test can be assumed to
be pretty low.

Regarding (i), it might be helpful to search the web for some
information on Bonferroni correction, you could start with wikipedia,
but there are other resources around. Regarding (iii), I would strongly
recommend (re-)reading a good introduction into stats or consulting an
expert statistician. I'm afraid you are missing some of the basics of
statistical hypothesis testing here. To conclude similarity, you'd
have to define first what exactly that means (you can NEVER show exact
equality), and then construct some appropriate test for that. 

NB: I am sure that you won't succeed to show similarity with a
reasonable definition if you have just 10 observations, though. And for
completeness: If you were aiming at similarity, you should rather NOT
correct for multiplicity (while that depends again on what exactly you
want to show). But I think this goes far beyond the scope of this post
now. 

HTH, Michael
 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of netrunner
 Sent: Freitag, 22. Januar 2010 15:25
 To: r-help@r-project.org
 Subject: Re: [R] exact wilcox test and Bonferroni correction
 
 
 Dear Michael,
 thank you very much for your help.
 
 I perfomed the wilcox.exact function on each of the 8 items 
 for the two groups that I am analysing (that is, I performed 
 8 times the wilcox test).
 Here an example for the values (ratings from a questionnaire) 
 of one of the
 8 items:
 
 a1=5  9 10 10 10 10 10 10 10 10
 
 b1=7   5  10 NaN  10  10   8  10  10   8   9   9
 
 wilcox.exact(a1,b1, alternative=two.sided, mu=0, 
 paired=FALSE, exact=TRUE,
 conf.level=0.95)
 
 I obtained:
 
 data:  a1 and b1
 W = 73.5, p-value = 0.1514
 alternative hypothesis: true mu is not equal to 0 
 
 Then I adjusted p-values using p.adjust. 
 
 For the example above the p-bonferroni value was 1. The 
 threshold p-value is
 0.00625 (that is 0.05/8)
 Finally, because p-bonferroni  0.00625 can I conclude that 
 for each item my samples are from the same distribution?
 
 I am a little bit confused
 
 thank you!
 
 netrunner
 
 
 --
 View this message in context: 
 http://n4.nabble.com/exact-wilcox-test-and-Bonferroni-correcti
on-tp1099893p1100077.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutations from vectors out of a matrix

2010-01-20 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Etienne,
I don't see the point in avoiding some 'special' packages. If you are
willing to change your mind in this regard, try something like

library()
 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen
 Sent: Montag, 18. Januar 2010 19:20
 To: r-help@r-project.org
 Subject: [R] permutations from vectors out of a matrix
 
 Dear R-Users,
 I'm relativley new to R and have the following problem. I 
 need all permutations of the vectors created by the collumns 
 of a matrix. I will give a small example:

 p=3
 
 n=2^p-1   
   
 #number of obtainable vectors
 mat=matrix(0,p,p)
 for(i in 1:p)
  {
mat[i:1,i]=1/i
  }
 
 mat is now a quadratic matrix and n is the number of the 
 vectors I try to get when I compute all permutations of the 
 vectors built by the individual columns. It should work for 
 all quadratic matrix and I want to avoid using some 'special' 
 packages.
 In the example I need the following vectors at the end:
 (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); 
 (0,0.5,0.5); (1/3,1/3,1/3).
 I hope my intention becomes clear.
 
 I'm looking foward to any ideas and clues that might help me.
 Thanks in advance and best regards.
 
 Etienne
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutations from vectors out of a matrix

2010-01-20 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Sorry, wrong button. Below a hopefully more helpful solution...

Etienne,
I don't see the point in avoiding some 'special' packages. If you are
willing to change your mind in this regard, try one of the following
solutions that work for me:

library(combinat)
apply(mat, 2, function(x) unique(permn(x)))

# each object in the list contains the permutations from once column of
mat:
apply(mat, 2, function(x) do.call(rbind, unique(permn(x 

# all vectors you wanted in one matrix; note that they are in the rows,
so you might want to transpose this:
do.call(rbind, apply(mat, 2, function(x) do.call(rbind,
unique(permn(x) 

Not sure about the size of your original problem, though, it might take
a while. If you still want to avoid the (small!) package, you might
consider copying the code for permn from combinat to define the function
within your file. I guess it works (but didn't check) as it does not
seem to require any of the other functions of the package.

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen
 Sent: Montag, 18. Januar 2010 19:20
 To: r-help@r-project.org
 Subject: [R] permutations from vectors out of a matrix
 
 Dear R-Users,
 I'm relativley new to R and have the following problem. I 
 need all permutations of the vectors created by the collumns 
 of a matrix. I will give a small example:

 p=3
 
 n=2^p-1   
   
 #number of obtainable vectors
 mat=matrix(0,p,p)
 for(i in 1:p)
  {
mat[i:1,i]=1/i
  }
 
 mat is now a quadratic matrix and n is the number of the 
 vectors I try to get when I compute all permutations of the 
 vectors built by the individual columns. It should work for 
 all quadratic matrix and I want to avoid using some 'special' 
 packages.
 In the example I need the following vectors at the end:
 (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); 
 (0,0.5,0.5); (1/3,1/3,1/3).
 I hope my intention becomes clear.
 
 I'm looking foward to any ideas and clues that might help me.
 Thanks in advance and best regards.
 
 Etienne
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutations from vectors out of a matrix

2010-01-20 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Well, seems it's an assignment, so you should REALLY get them on your
own and not enquire the list. Foolish me...
M.

 -Original Message-
 From: einohr2...@web.de [mailto:einohr2...@web.de] 
 Sent: Mittwoch, 20. Januar 2010 19:32
 To: Meyners,Michael,LAUSANNE,AppliedMathematics; r-help@r-project.org
 Subject: Re: [R] permutations from vectors out of a matrix
 
 Meyners,Michael,LAUSANNE,AppliedMathematics schrieb:
  Sorry, wrong button. Below a hopefully more helpful solution...
 
  Etienne,
  I don't see the point in avoiding some 'special' packages. 
 If you are 
  willing to change your mind in this regard, try one of the 
 following 
  solutions that work for me:
 
  library(combinat)
  apply(mat, 2, function(x) unique(permn(x)))
 
  # each object in the list contains the permutations from 
 once column 
  of
  mat:
  apply(mat, 2, function(x) do.call(rbind, unique(permn(x 
 
  # all vectors you wanted in one matrix; note that they are in the 
  rows, so you might want to transpose this:
  do.call(rbind, apply(mat, 2, function(x) do.call(rbind,
  unique(permn(x) 
 
  Not sure about the size of your original problem, though, it might 
  take a while. If you still want to avoid the (small!) package, you 
  might consider copying the code for permn from combinat to 
 define the 
  function within your file. I guess it works (but didn't 
 check) as it 
  does not seem to require any of the other functions of the package.
 
  HTH, Michael
 
 

  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne 
  Stockhausen
  Sent: Montag, 18. Januar 2010 19:20
  To: r-help@r-project.org
  Subject: [R] permutations from vectors out of a matrix
 
  Dear R-Users,
  I'm relativley new to R and have the following problem. I need all 
  permutations of the vectors created by the collumns of a matrix. I 
  will give a small example:
 
  p=3
  
  n=2^p-1   

  #number of obtainable vectors
  mat=matrix(0,p,p)
  for(i in 1:p)
   {
 mat[i:1,i]=1/i
   }
 
  mat is now a quadratic matrix and n is the number of the vectors I 
  try to get when I compute all permutations of the vectors built by 
  the individual columns. It should work for all quadratic 
 matrix and I 
  want to avoid using some 'special'
  packages.
  In the example I need the following vectors at the end:
  (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); (0,0.5,0.5); 
  (1/3,1/3,1/3).
  I hope my intention becomes clear.
 
  I'm looking foward to any ideas and clues that might help me.
  Thanks in advance and best regards.
 
  Etienne
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
  
 -
  ---
 
 
  Eingehende eMail ist virenfrei.
  Von AVG uberpruft - www.avg.de
  Version: 9.0.730 / Virendatenbank: 270.14.150/2632 - Ausgabedatum: 
  01/19/10 08:34:00
 
  
 Hey Michael,
 
 thanks a lot for your answer. I hope I can manage it now. The 
 problem with the 'small' package is, that I have to solve the 
 task without using any package as a 'black box'. Therefore I 
 need to get the permutations on my own.
 
 Best regards and thanks again
 
 Etienne
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantreg - 'could not find functionrq'

2010-01-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Are you sure you called 
library(Quantreg) 
before calling any function?
M.

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Gough Lauren
 Sent: Donnerstag, 7. Januar 2010 13:44
 To: r-help@r-project.org
 Subject: [R] Quantreg - 'could not find functionrq'
 
 Hi all,
 
  
 
 I'm having some troubles with the Quantreg package.  I am 
 using R version 2.10.0, and have downloaded the most recent 
 version of Quantreg
 (4.44) and SparseM (0.83 - required package).  However, when 
 I try to run an analysis (e.g. fit1-rq(y~x, tau=0.5)) I get 
 an error message saying that the function rq could not be 
 found.  I get the same message when I try to search for any 
 of the other functions that should be in the package (e.g. 
 help(anova.rq)).
 
  
 
 I used this package last year and still have the old versions 
 of both quantreg and SparseM saved on my machine.  When I 
 load these older packages and attempt an analysis (which, 
 incidentally, worked fine last
 February) I get the same error messages as I do with the new 
 versions.  
 
  
 
 The only think I have changed since last February is the 
 version of R I am using - is there any reason why quantreg 
 wouldn't work in version 2.10.0?  I'm not very experienced 
 with R so I'm struggling to work out what may be going wrong 
 - does anyone have any suggestions!?
 
  
 
 Many thanks
 
  
 
 Lauren
 
  
 
 --___
 
  
 
 Lauren Gough - Postgraduate Research Student University of 
 Nottingham, School of Geography,  Nottingham,  NG7 2RD  
 
 Tel: +44 (0)115 8467052
 Email: lgx...@nottingham.ac.uk mailto:lgx...@nottingham.ac.uk 
 
 P Consider the environment. Please don't print this e-mail 
 unless you really need to. 
 
  
 
 This message has been checked for viruses but the contents of 
 an attachment may still contain software viruses which could 
 damage your computer system:
 you are advised to perform your own checks. Email 
 communications with the University of Nottingham may be 
 monitored as permitted by UK legislation.
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantreg - 'could not find functionrq'

2010-01-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Frankly, I have never even heard about this package before and have no
clue what it does, but the error messages nevertheless suggested a
missing call of library. And I didn't bother to check the spelling, my
fault. So I guess you are right... I still didn't check :-) So to the
OP: try 
library(asdf)
with asdf replaced by the correct package name, and mind the spelling!

Michael

 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net] 
 Sent: Donnerstag, 7. Januar 2010 14:22
 To: Meyners,Michael,LAUSANNE,AppliedMathematics
 Cc: Gough Lauren; r-help@r-project.org
 Subject: Re: [R] Quantreg - 'could not find functionrq'
 
 
 On Jan 7, 2010, at 8:07 AM, Meyners, Michael, LAUSANNE, 
 AppliedMathematics wrote:
 
  Are you sure you called
  library(Quantreg)
  before calling any function?
  M.
 
 Are you both using a package with the exact spelling 
 Quantreg? I would have expected it to be :
 
 library(quantreg)
 
 --
 David.
 
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Gough Lauren
  Sent: Donnerstag, 7. Januar 2010 13:44
  To: r-help@r-project.org
  Subject: [R] Quantreg - 'could not find functionrq'
 
  Hi all,
 
 
 
  I'm having some troubles with the Quantreg package.  I am using R 
  version 2.10.0, and have downloaded the most recent version of 
  Quantreg
  (4.44) and SparseM (0.83 - required package).  However, 
 when I try to 
  run an analysis (e.g. fit1-rq(y~x, tau=0.5)) I get an 
 error message 
  saying that the function rq could not be found.  I get the same 
  message when I try to search for any of the other functions that 
  should be in the package (e.g.
  help(anova.rq)).
 
 
 
  I used this package last year and still have the old 
 versions of both 
  quantreg and SparseM saved on my machine.  When I load these older 
  packages and attempt an analysis (which, incidentally, worked fine 
  last
  February) I get the same error messages as I do with the new 
  versions.
 
 
 
  The only think I have changed since last February is the 
 version of R 
  I am using - is there any reason why quantreg wouldn't work in 
  version 2.10.0?  I'm not very experienced with R so I'm 
 struggling to 
  work out what may be going wrong
  - does anyone have any suggestions!?
 
 
 
  Many thanks
 
 
 
  Lauren
 
 
 
  --___
 
 
 
  Lauren Gough - Postgraduate Research Student University of 
  Nottingham, School of Geography,  Nottingham,  NG7 2RD
 
  Tel: +44 (0)115 8467052
  Email: lgx...@nottingham.ac.uk mailto:lgx...@nottingham.ac.uk
 
  P Consider the environment. Please don't print this e-mail 
 unless you 
  really need to.
 
 
 
  This message has been checked for viruses but the contents of an 
  attachment may still contain software viruses which could 
 damage your 
  computer system:
  you are advised to perform your own checks. Email 
 communications with 
  the University of Nottingham may be monitored as permitted by UK 
  legislation.
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] updating arguments of formulae

2009-12-10 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Moreno, 

I leave the discussion on the mixed models to others (you might consider
the SIG group on mixed models as well for this), but try a few hints to
make your code more accessible:

* The . in updating a formula is substituted with the respective old
formula (depending on the side), but is not mandatory. You could give
the new formula explicitly, i.e. consider something like 
model1 = update(model, . ~ (1 |Sb2) + OS)
if you loose control about your models. See ?update.formula

* I don't see the need for using your construct with
as.formula(paste()), this makes things unnecessarily complicated. See my
above example, which should work as well on your data (and see ?update)

* There is also the - operator available in update.formula to remove
terms (because it uses formula, see ?formula). As to your question on
how to move from 
 depM ~ OS + (1 + OS | Sb2)
 to
 depM ~ OS + VR + (1 + OS + VR | Sb2)
try something like
update(model1, .~. - (1 + OS|Sb2) + VR + (1 + OS + VR | Sb2))
while it goes without saying that in this case, it would be easier to
drop the . and use something like 
update(model1, .~ OS + VR + (1 + OS + VR | Sb2)) 
directly.

* paste accepts more than just two arguments to be pasted: Try somthing
like
model2 = update(model1, as.formula(paste(. ~ . + (1 + , OS, | ,
Sb2, )))
instead of your construct with several nested calls to paste, and see
?paste. (Note that I added quotes to OS and Sb2, it didn't work for
me otherwise as I have no object OS, not sure what happens if you have
such an object on our search path, but I would suspect you encounter
problems as well.)

If you work yourself through these and thereby simplify your code, you
are more likely to get responses to your questions on which model to use
(which is actually independent from the use of update). As far as I see
it, it doesn't make sense to use a formula like in your model4, but the
mixed model experts might tell me wrong (and I got a bit lost in your
code as well). Please also try to provide commented, minimal,
self-contained, reproducible code for further enquiries (use e.g. one of
the examples on ?lmer to create appropriate examples for your
questions).

HTH, Michael 


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Moreno Ignazio Coco
 Sent: Donnerstag, 10. Dezember 2009 13:35
 To: R-help@r-project.org
 Subject: [R] updating arguments of formulae
 
 Dear R-Community,
 
 I am relatively new with R, so sorry for things which for you 
 might be obvious...
 I am trying to automatically update lmer formulae.
 
 the variables of the model are:
 
 depM= my dependent measure
 Sb2= a random factor
 OS = a predictor
 VR= another predictor
 
 So, I am building the first model with random intercept only:
 
 model = lmer(depM ~ (1 |Sb2))
 
 then I update the formula adding the first predictor
 
 model1 = update(model, as.formula(paste(. ~ . + , OS)))
 
 the resulting formula will be:
 
 depM ~ (1 |Sb2) + OS
 
 let suppose now I want to update the model to have OS both as 
 a fixed effect and in the random term, something like:
 
 depM ~ (1 + OS |Sb2) + OS
 
 I can do something very ugly (please tell me if there is a  
 more elegant way to do it) that looks like:
 
 model2 = update(model1, as.formula(paste(paste(paste(paste(. 
 ~ . + (1  
 + , OS), | ), Sb2), 
 
 the resulting model2 formula will be:
 
 depM ~ (1 |Sb2) + OS + (1 + OS | Sb2)
 
 one first thing I am wondering at this point is whether having
 (1 |Sb2) and (1 + OS | Sb2) in the same expression is redundant.
 in the output it will obviously tell me that group Sb2 is 
 considered twice:
 
 number of obs: 6514, groups:  Sb2, 23; Sb2, 23
 
 and i am not sure if am doing it correctly...any advice?
 
 So let suppose now I want to add the new predictor VR again 
 both in the fixed and in the random part of the formula.
 If i just repeat the two steps above:
 
 model3 = update(model2, as.formula(paste(. ~ . + , VR)))
 
 and then:
 
 model4 = update(model3, as.formula(paste(paste(paste(paste(. 
 ~ . + (1  
 + , VR), | ), Sb2), 
 
 the formula I get is:
 
 depM ~ (1 |Sb2) + OS + (1 + OS | Sb2) + VR + (1 + VR | Sb2)
 
 so, basically I am adding new stuff on the right side of the 
 formula...
 
 My first question at this point is whether the above formula 
 is equivalent to:
 
 depM ~ OS + VR + (1 + OS + VR | Sb2)
 
 if is not equivalent, which one of the two is correct?
 
 obviously in the second case, group Sb2, is considered only once.
 
 If the second version of the formula is the correct one, I 
 don't understand how I can update arguments inside the 
 formula rather than adding things on his right side...
 
 thus, in the ideal case,  how do I go from something like this:
 
 depM ~ OS + (1 + OS | Sb2)
 
 to something like this:
 
 depM ~ OS + VR + (1 + OS + VR | Sb2)
 
 Thanks a lot for your help,
 Best,
 
 Moreno
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 The University of Edinburgh is 

Re: [R] by function ??

2009-12-09 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Or, more general (if you need to include more than just one variable from 
TestData), something like

by(TestData, LEAID, function(x) median(x$RATIO))

Agreed, this is less appealing for the given example than Ista's code, but 
might help to better understand by and to generalize its use to other 
situations.
Michael 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Ista Zahn
 Sent: Mittwoch, 9. Dezember 2009 02:54
 To: L.A.
 Cc: r-help@r-project.org
 Subject: Re: [R] by function ??
 
 Hi,
 I think you want
 
 by(TestData[ , RATIO], LEAID, median)
 
 -Ista
 
 On Tue, Dec 8, 2009 at 8:36 PM, L.A. ro...@millect.com wrote:
 
  I'm just learning and this is probably very simple, but I'm stuck.
    I'm trying to understand the by().
  This works.
  by(TestData, LEAID, summary)
 
  But, This doesn't.
 
  by(TestData, LEAID, median(RATIO))
 
 
  ERROR: could not find function FUN
 
  HELP!
  Thanks,
  LA
  --
  View this message in context: 
  http://n4.nabble.com/by-function-tp955789p955789.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology http://yourpsyche.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] csimtest function in multcomp package

2009-12-04 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Daniel,
quick guess: There has been a major change in the package some time ago,
with quite a few functions vanishing and others added. So I guess your
recycled code was based on an older version of multcomp. My current
version (1.1-0) does not have csimtest anymore either. I guess you want
to look into ?glht in the same package, and check the archives to find
out more about the changes (and yes, you will have to modify your code
to some extent, at least so had I, it's not just replacing the function
name).
HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of 
 d.m.perk...@qmul.ac.uk
 Sent: Freitag, 4. Dezember 2009 16:47
 To: r-help@r-project.org
 Subject: [R] csimtest function in multcomp package
 
 Hello all,
 
 Quick question: I want to do posthoc contrasts for a linear 
 mixed effects model. However, when trying to use the csimtest 
 function in multcomp package I receive an error message 
 saying it cannot find the function, even after installing and 
 loading package multcomp.
 
 Any pointers would be greatly appreciated
 
 Daniel
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Learning R

2009-11-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Julia, see 

http://www.r-project.org/ - Documentation - Manuals (- An introduction to R) 
(or use: http://cran.r-project.org/manuals.html)

for a starting point. In addition, you might want to check the annotated list 
of books to see which one might best fit your needs:

http://www.r-project.org/doc/bib/R-books.html

You might also want to browse through the FAQs at some stage, you can learn a 
lot of useful things there. Or look additionally into the Wiki, if that's more 
your style of learning. You see, plenty of options from which you'd need to 
select the one that best meets your requirements. 

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Julia Cains
 Sent: Montag, 30. November 2009 08:23
 To: r-help@r-project.org
 Subject: [R] Learning R
 
 Dear R helpers,
 
 Almost 15 days back I have become member of this very active 
 and wonderful group. So far I have been only raising  queries 
 and in turn got them solved too and I really thank for the 
 spirit this group member show when it comes to the guidance.
 
 I wish to learn R language and I have given 2 months time for 
 this. Can anyone please guide me as how do I begin i.e. from 
 basics to advance.
 
 R is such a vast thing to learn, so I wish to learn it step 
 by step without getting lost at any stage.
 
 Please guide me where do I start and upgrade myself to higher 
 level step by step.
 
 Regards
 
 Julia
 
 
 
 
 
 
 
 Only a man of Worth sees Worth in other men
 
 
 
 
 
 
   
   [[alternative HTML version deleted]]
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected results in comparison (x == y)

2009-11-04 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
FAQ 7.31:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-the
se-numbers-are-equal_003f 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Tillmann
 Sent: Mittwoch, 4. November 2009 13:50
 To: r-help@r-project.org
 Subject: [R] unexpected results in comparison (x == y)
 
 
 Dear readers of the list,
 
 I have a problem a comparison of two data from a vector. The 
 comparison yields FALSE but should be TRUE. I have checked 
 for mode(), length() and attributes(). See the following code 
 (R2.10.0):
 ---
 # data vector of 66 double data
 X =
 matrix(c(41.41,38.1,39.22,38.1,47.29,46.82,82.46,90.11,45.24,4
 5.74,49.96,53.40,38.20,42.65,45.41,47.92,39.82,42.02,48.17,49.
 47,39.67,43.89,47.55,50.05,35.75,37.41,46.13,53.64,52.18,56.30
 ,45.15,47.13,41.57,39.08,43.39,44.73,49.38,47.00,45.67,50.53,4
 1.08,44.22,49.28,47.83,49.48,46.04,48.37,47.00,33.96,36.30,49.
 40,46.44,24.40,24.79,41.55,46.26,37.43,39.88,40.63,38.64,49.92
 ,50.19,47.88,48.61,43.73,44.18),ncol=1)
 i = dim(X)
 
 # calculating pairwise differences for each entry in X Y = 
 matrix(rep(0.0,i[1]*i[1]),ncol=i[1])
 for (j in 1:i[1]) {
   Y[j,] = X - X[j,]
 }
 
 # getting pairwise absolute differences to vector Z and 
 selecting only (xj - xk), omitting (xk - xj) # and (xj - xj) 
 i.e. diagonal vector Z = rep(0, ((i[1]*i[1])-i[1])/2) nn - 1 
 for (j in 1:i[1]) { for (k in 1:i[1]) {
   if (j  k)  {
   Z[nn] - abs(Y[j,k])
   nn - nn + 1
   }
 }
 }
 nn - nn - 1
 
 # sorting Z ascending to ZZ, to determine double entries with 
 same difference ii - 0 ZZ - sort(Z)
 
 # here the problem occurs:
 ZZ[1:10]
 ZZ[4]
 ZZ[5]
 ZZ[4] == ZZ[5]
 mode(ZZ[4])
 mode(ZZ[5])
 length(ZZ[4])
 length(ZZ[5])
 attributes(ZZ[4])
 attributes(ZZ[5])
 ---
 
 I get:
  
  ZZ[1:10]
  [1] 0.00 0.00 0.01 0.02 0.02 0.02 0.04 0.04 0.04 0.05
  ZZ[4]
 [1] 0.02
  ZZ[5]
 [1] 0.02
  ZZ[4] == ZZ[5]
 [1] FALSE
  mode(ZZ[4])
 [1] numeric
  mode(ZZ[5])
 [1] numeric
  length(ZZ[4])
 [1] 1
  length(ZZ[5])
 [1] 1
  attributes(ZZ[4])
 NULL
  attributes(ZZ[5])
 NULL
 
 Which is ok, except for ZZ[4] == ZZ[5].
 
 Can someone please give me an advice where to look?
 In real world situations the original vector (X) will contain 
 upto 100 entries. 
 
 
 
 --
 View this message in context: 
 http://old.nabble.com/unexpected-results-in-comparison-%28x-%3
D%3D-y%29-tp26195749p26195749.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fixed color scale for a series of contour plots

2009-11-03 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
You don't tell us which function you use, but fixing the zlim argument
in image or related functions should do the trick.
M.

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Marion Dumas
 Sent: Dienstag, 3. November 2009 14:38
 To: r-help@r-project.org
 Subject: [R] fixed color scale for a series of contour plots
 
 Hello,
 I am having trouble figuring out how to create a color scale 
 that will be consistent across multiple contour plots. What I 
 would like is to plot a series of contour or image plots for 
 different subsets of a data set but make sure that the color 
 key stays fixed so that the plots are easily comparable. That 
 is I need a map from the range of values of the entire data 
 set to the color scale and keep then use this color scale for 
 all plots + be able to display the color bar as a legend next 
 to the plot. This sounds trivial, but I can't find useful 
 hints in the documentation of the color arguments of the 
 different plots.
 Thanks so much for any help
 Best
 Marion
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop and plot

2009-10-19 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Or you open a new graphics window / device as part of the loop, see, e.g., 
?windows. Alternatively, you may write the content of the graphics device to a 
file for each iteration, see, e.g., ?savePlot (but you'd want to make sure that 
you have a different filename in each iteration, otherwise, you'll again have 
only one figure...).
Michael

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of joris meys
 Sent: Montag, 19. Oktober 2009 13:12
 To: Rene
 Cc: r-help@r-project.org
 Subject: Re: [R] loop and plot
 
 Hi Rene,
 
 the problem is probably due to the fact that R will send all 
 plots to the same graphical output window. Each next plot 
 just replaces the previous one.
 
 if it's only a few plots, you can divide the graphical window 
 with the commands par(mfrow=...) (see ?par) or 
 layout(matrix(...)) (see ?layout). Otherwise you have to ask 
 the window to wait before refreshing. par(ask=TRUE) (see ?par as well)
 
 Hope this helps
 Cheers
 Joris
 
 On Mon, Oct 19, 2009 at 12:14 PM, Rene kaixinma...@gmail.com wrote:
  Dear all,
 
  I am stuck at applying loop function for creating separated plots.
 
  I have coding like below:
 
  dataset.table -
  
 table(data.frame(var1=c(1,2,3,1,2,3,1),colour=c(a,b,c,c,a,b
  ,b)
  ))
  kk = function(f)
              {
               ls=as.character(f)
               pie(dataset.table[ls,],main=ls)
               box()
              }
 
  kk(1)
  kk(2)
  kk(3)
 
  By using above code, I can create 3 single plot 
 respectively, but when 
  I type kk(1:3), obviously it will not work.
 
  I know I have to vectorise the coding, then I can use 
 command kk(1:3). 
  I try to use loop:
 
  kk = function(f)
              {
               ls=as.character(f)
               for (i in length(f))
               {
               pie(dataset.table[ls[i],],main=ls[i])
               box()
               }
               }
  kk(1:3)
 
  the above code only gives me the last pie plot (ie. kk(3) plot) 
  instead of 3 plots respectively.
 
  Can someone please guide me how to revise the loop coding, 
 and produce 
  3 separated plots one after another on the screen by typing kk(1:3)?
 
  Thanks a lot.
 
  Rene.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculating p-values by row for data frames

2009-10-16 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Christoph Heuck
 Sent: Donnerstag, 15. Oktober 2009 17:51
 To: r-help@r-project.org
 Subject: [R] calculating p-values by row for data frames
 
 Hello R-users,
 I am looking for an elegant way to calculate p-values for 
 each row of a data frame.
 My situation is as follows:
 I have a gene expression results from a microarray with 64 
 samples looking at 25626 genes. The results are in a data 
 frame with the dimensions 64 by 25626 I want to create a 
 volcano plot of difference of means vs. -log(10) of the 
 p-values, comparing normal samples to abnormal samples. The 
 results of both type of samples are all in my data frame.
 Now, I have found a way to calculate the p-value using a for 
 (i in 1:25626) loop (see below):
 
 df.normal  #dataframe, which only contains the normal samples 
 df.samples  #dataframe, which only contains abnormal samples
 
 DM=rowMeans(df.normal)-rowMeans(df.samples) #gives me a 
 dataframe with the difference of means
 
 PV=array(1,c(25626,1))
 for (i in 1:25626){
 VL=t.test(matrix.b[i,],matrix.a[i,])
 V=as.numeric(VL[3])
 V=-log10(V)
 PV[i,1]=V}
 
 plot(DM, PV, main=title,xlab=x.lab, ylab=-log(10) P-Values,pch=20)}
 
 It takes around 3-5 minutes to generate the volcano plot this 
 way. I will be running arrays which will look at 2.2 million 
 sites  this approach will then take way too long.
 I was wondering if there is a more elegant way to calculate 
 the p-values for an array/fataframe/matrix in a row-by row 
 fashion, which is similar to rowMeans.
 
 I thought writing a function to get the p-value and then using
 apply(x,1,function) would be the best.
 
 I have the function which will give me the p-value
 
 p.value = function (x,y){
 PV=as.numeric(t.test(x,y)[3])
 }
 
 and I can get a result if I test it only on one row (below is 
 6 by 10 data frame example of my original data)
 
 RRR
  X259863X267862 X267906X300875
 X300877 X300878
 MSPI0406S0183 -3.2257205 -3.2248899  2.85590082 -2.6293602
 -3.5054348 -2.62817269
 MSPI0406S0238 -2.6661903 -3.1135020  2.17073881 -3.2357307
 -2.3309775 -1.76078452
 MSPI0406S0239 -1.7636439 -0.6702877  0.19471126 -0.7397132
 -1.4332662 -0.24822470
 MSPI0406S0300  0.6471381 -0.2638928 -0.61876054 -0.9180127
 0.2539848 -0.63122203
 MSPI0406S0301  0.9207208  0.2164267 -0.33238846 -1.1450717
 -0.2935584 -1.01659802
 MSPI0406S0321 -0.4073272 -0.2852402 -0.08085746 -0.4109428
 -0.2185432 -0.39736137
 MSPI0406S0352 -0.7074175 -0.6987548 -1.22004647 -0.8570551
 -0.5083861 -0.09267928
 MSPI0406S0353 -0.2745682  0.3012990 -0.64787221 -0.5654195
 0.4265007 -0.65963404
 MSPI0406S0354 -1.1858394 -1.4388609 -0.07329722 -2.0010785
 -1.3245696 -1.43216984
 MSPI0406S0360 -1.4599809 -1.4929059  0.63453235 -1.1476760
 -1.5849922 -1.03187399
 
  zz=p.value(RRR[1,1:3],RRR[1,4:6])
  zz
 $p.value
 [1] 0.485727
 
 but I cannot do this row by row using apply
 
  xxx=apply(RRR,1,p.value(RRR[,1:3],RRR[,4:6]))

xxx - apply(RRR, 1, function(x) p.value(x[1:3],x[4:6]))
works for me. Check the examples in ?apply.
HTH, Michael

 
 Error in match.fun(FUN) :
   'p.value(RRR[, 1:3], RRR[, 4:6])' is not a function, 
 character or symbol
 
 Does anyone have any suggestions?
 Thanks in advance
 
 Christoph Heuck
 Albert Einstein College of Medicine
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reference on permutation test

2009-10-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
It's always worthwhile to look at the articles by Pitman (and maybe the
textbook by Fisher, if you have access to it); Welch is a nice paper,
too, but might be pretty technical to learn about the area. I don't
know any of the textbooks except Edgington (which is in its 4th edition
now with co-author P Onghena), a book I can wholeheartedly recommend.
The authors explain the basic concepts in a way that should even be
accessible to non-statisticians, I believe. They also cover a lot of
special cases, and give exhaustive theoretical background for those
interested (you can easily skip these parts if you are not). It goes
along with a CD with some programs to run these test, though I did not
use it so far -- you can do it yourself in R, of course :-) I am not
sure whether they cover any other resampling methods at all, but if so,
only very briefly, so you'd need another reference for that. Efron 
Tibshirani is a classical, while I heard some people recommending
Davison  Hinkley -- I didn't find the time yet to look more closely
into the latter.
Just my 2 cents,
Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Peng Yu
 Sent: Mittwoch, 14. Oktober 2009 04:58
 To: r-h...@stat.math.ethz.ch
 Subject: [R] reference on permutation test
 
 I want learn permutation test and resampleing, etc. There are 
 a few references listed below. I'm wondering what is the best 
 book on this topic. Can somebody give me some advice. Thank you!
 

http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Permutation_tes
t
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] post-hoc test with kruskal.test()

2009-10-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Robert,

you can do the corresponding paired comparisons using wilcox.test. As far as I 
know, there is no such general correction as Tukey's HSD for the 
Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 
paired comparisons), the intersection-union principle and the theory of closed 
test procedures should allow you to do these test without further correction, 
given the global test was statistically significant.

HTH, Michael



 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki
 Sent: Mittwoch, 14. Oktober 2009 09:17
 To: r-help@r-project.org
 Subject: [R] post-hoc test with kruskal.test()
 
 Dear R users,
 
 I would like to know if there is a way in R to execute a 
 post-hoc test (factor levels comparison, like Tukey for 
 ANOVA) of a non-parametric analysis of variance with 
 kruskal.test() function. I am comparing three different 
 groups. The preliminary analysis using the 
 kruskal-wallis-test show significance, but I still don't know 
 the relationship and the significance level between each group?
 
  
 
 Do you have any suggestion?
 
  
 
 Many thanks in advance!
 
  
 
 Robert
 
  
 
  
 
 ___
 Robert M. Kalicki, MD
 
 Postdoctoral Fellow
 
 Department of Nephrology and Hypertension
 
 Inselspital
 
 University of Bern
 
 Switzerland
 
 
 
 Address:
 
 Klinik und Poliklinik für Nephrologie und Hypertonie
 
 KiKl G6
 
 Freiburgstrasse 15
 
 CH-3010 Inselspital Bern
 
 
 
 Tel +41(0)31 632 96 63
 
 Fax+41(0)31 632 14 58
 
 
 
 
   [[alternative HTML version deleted]]
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] post-hoc test with kruskal.test()

2009-10-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Robert,

What do you mean by not symmetric? If you mean unbalanced in terms of sample 
size, that's not a problem if you choose the right specifications for 
wilcox.test. The Kruskal-Wallis-Test is a generalization of the unpaired 
Wilcoxon test for more than two groups. Not sure whether kruskal.test works 
with just two groups, but if so, it should give the same results as wilcox.test 
if you set the arguments accordingly. 

Having said that, I should mention that unlike some normality-based post-hoc 
tests, the proposed approch is not based on a common error term. The paired 
comparisons will ignore the fact that you had a third group, and this will in 
particular result in (possibly quite) different power of the three comparisons, 
depending on the sample sizes and the noise given in just these two groups. I 
wouldn't know what to do about that, though.

Michael

 -Original Message-
 From: Robert Kalicki 
 Sent: Mittwoch, 14. Oktober 2009 14:11
 To: Meyners,Michael,LAUSANNE,AppliedMathematics
 Subject: RE: [R] post-hoc test with kruskal.test()
 
 Hi Michael,
 Thank you very much for your clear and prompt answer.
 Is it still valid if I use an unpaired comparison with 
 wilcox.test() since my groups are not symmetric.
 Many thanks
 
 Robert
 
 -Message d'origine-
 De : Meyners,Michael,LAUSANNE,AppliedMathematics
 Envoyé : mercredi 14 octobre 2009 10:30
 À : Robert Kalicki; r-help@r-project.org Objet : RE: [R] 
 post-hoc test with kruskal.test()
 
 Robert,
 
 you can do the corresponding paired comparisons using 
 wilcox.test. As far as I know, there is no such general 
 correction as Tukey's HSD for the Kruskal-Wallis-Test. 
 However, if you have indeed only 3 groups (resulting in
 3 paired comparisons), the intersection-union principle and 
 the theory of closed test procedures should allow you to do 
 these test without further correction, given the global test 
 was statistically significant.
 
 HTH, Michael
 
 
 
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki
  Sent: Mittwoch, 14. Oktober 2009 09:17
  To: r-help@r-project.org
  Subject: [R] post-hoc test with kruskal.test()
  
  Dear R users,
  
  I would like to know if there is a way in R to execute a 
 post-hoc test 
  (factor levels comparison, like Tukey for
  ANOVA) of a non-parametric analysis of variance with
  kruskal.test() function. I am comparing three different groups. The 
  preliminary analysis using the kruskal-wallis-test show 
 significance, 
  but I still don't know the relationship and the significance level 
  between each group?
  
   
  
  Do you have any suggestion?
  
   
  
  Many thanks in advance!
  
   
  
  Robert
  
   
  
   
  
  ___
  Robert M. Kalicki, MD
  
  Postdoctoral Fellow
  
  Department of Nephrology and Hypertension
  
  Inselspital
  
  University of Bern
  
  Switzerland
  
  
  
  Address:
  
  Klinik und Poliklinik für Nephrologie und Hypertonie
  
  KiKl G6
  
  Freiburgstrasse 15
  
  CH-3010 Inselspital Bern
  
  
  
  Tel +41(0)31 632 96 63
  
  Fax+41(0)31 632 14 58
  
  
  
  
  [[alternative HTML version deleted]]
  
  
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Probability of data values form DENSITY function

2009-09-30 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Lina, 

check ?density (and do so carefully). R uses a kernel estimate, by default 
Gaussian if I remember correctly. Values in a certain grid can be found from 
the code I sent earlier. I didn't check, but these are most likely just 
linearly interpolated by plot.density, and as the grid is sufficiently tight, 
it visually looks smooth. You could mimick this by linearly interpolating from 
the two closest neighbours for any new x values (or, as a rough 
approximation, choosing the density value of the nearest neighbour). I am not 
sure whether there are some specificities for time series in general or GARCH 
models in particular, that's not my area of expertise. Maybe there is some 
method for GARCH objects in the library you are using (you don't tell me which 
one). Try 
str(density(x)) 
from your original post to see whether you might have to adapt the proposed way 
to get these values. I copy the list in case someone can give more details on 
that.

Two side remarks: 
* You should note that the density function in R takes a lot of parameters 
that might heavily influence the outcome, and even though R will propose some 
defaults that will work frequently, they might not be optimal or even 
appropriate for your particular situation, model, and data. 
* From your messages I fear that you have a limited understanding of the notion 
of density functions, at least for continuous distributions; they do NOT give 
the probability that a certain value is taken. Try 
plot(x-seq(-1,1,0.01), dnorm(x, sd=.25), type=l) 
to see that the density can easily exceed 1 and can hence not represent a 
probability. In fact, the theoretical probability that a specific value is 
actually taken from a continuous distribution is always equal to 0. 

I would strongly suggest to contact a local statistician to clarify these 
issues.

HTH, Michael
 



From: Lina Rusyte [mailto:liner...@yahoo.co.uk] 
Sent: Dienstag, 29. September 2009 18:03
To: Meyners,Michael,LAUSANNE,AppliedMathematics
Subject: RE: [R] Probability of data values form DENSITY function


Dear Mr. Michael Meyners, 

  

Assume, that I have simulated data from GARCH  process: 

  

spec = garchSpec(model = list(omega = 0.01, alpha = 0.13, beta = 0.86, 
shape = 6), cond.dist = std) 

yt-garchSim(spec, n = 4000) 

  

The empirical distribution of this data (yt) is assumed to be not 
parametrized distribution). This data is my in-sample data. 

  

I have another set of the data (yt1), generated from the same process 
(but it is not the same as yt). I need to find the probabilities of these data 
(yt1)  points in previously mentioned empiric distribution (of yt). 

  

Simply to say, having continuous EMPIRIC density function, I need for 
some points in x axis (values) to find out the corresponding values of y axis 
(their probability). 

  

R plots empiric density function, so I assume it approximates somehow 
the empirical density function. 

  

Thank a lot for the help in advance. 

  

Best regards, 

Lina 


--- On Tue, 29/9/09, Meyners,Michael, wrote:



From: Meyners,Michael,LAUSANNE,AppliedMathematics
Subject: RE: [R] Probability of data values form DENSITY 
function
To: Lina Rusyte liner...@yahoo.co.uk
Cc: R help r-help@r-project.org
Date: Tuesday, 29 September, 2009, 4:59 PM


Lina, check whether something like

data.frame(density(rnorm(10))[1:2]) 

contains the information you want. Otherwise, try to be (much) 
more specific in what you want so that we do not need to guess (and of course 
provide minimal, self-contained, reproducible code). That has a higher chance 
to trigger some responses than posting the same message again. And if you even 
explain what you want to do with these values, you might get even better 
responses.

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
http://de.mc256.mail.yahoo.com/mc/compose?to=r-help-boun...@r-project.org  
 [mailto:r-help-boun...@r-project.org 
http://de.mc256.mail.yahoo.com/mc/compose?to=r-help-boun...@r-project.org ] 
On Behalf Of Lina Rusyte
 Sent: Dienstag, 29. September 2009 16:45
 To: R help
 Cc: R help
 Subject: [R] Probability of data values form DENSITY function
 
 Hello,
 Â
 Could someone help me please and to tell how to get the 
 probability from empirical DENSITY (not parametric

Re: [R] How do I do simple string concatenation in R?

2009-09-30 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
?paste 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Dr. Arie Ben David
 Sent: Mittwoch, 30. September 2009 10:22
 To: r-help@r-project.org
 Subject: [R] How do I do simple string concatenation in R?
 
 Dear R gurus
 How do I do simple string concatenation in R? 
 For example:
 A = klm
 B = jjj
 How can I assign a value to C such that C == klmjjj is True?
 Thank you
 Arie
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Probability of data values form DENSITY function

2009-09-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Lina, check whether something like

data.frame(density(rnorm(10))[1:2]) 

contains the information you want. Otherwise, try to be (much) more specific in 
what you want so that we do not need to guess (and of course provide minimal, 
self-contained, reproducible code). That has a higher chance to trigger some 
responses than posting the same message again. And if you even explain what you 
want to do with these values, you might get even better responses.

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Lina Rusyte
 Sent: Dienstag, 29. September 2009 16:45
 To: R help
 Cc: R help
 Subject: [R] Probability of data values form DENSITY function
 
 Hello,
 Â
 Could someone help me please and to tell how to get the 
 probability from empirical DENSITY (not parametric) for each 
 data value (R function). 
 For example, for normal distribution there is such a function like: 
 Â
 “dnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = 
 FALSE)”  I need the same function only for the empirical 
DENSITY function (which does not correspond to any typical  distribution). 
 R plots the density function of any data: 
 Â
 “plot(density(x))”
 Â
 I need to find out the probability for each data value from 
 this plot-line. 
 Â
 Best regards,
 Lina
 
 
   
   [[alternative HTML version deleted]]
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A stat related question

2009-09-18 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Let's assume you have just three observations, and x-- = 1:3 for your
observations. 

Predictor 1:y = x^2
Predictor 2:y = 1 if x=1
y = 4 if x=2
y = 9 if x=3
y = 0 elsewhere

These predictors are obviously not the same, but will give the same Mean
Squared Error for your data (whatever your observed y-values are). This
should suffice as a counter example. Or did I misunderstand your
question?

HTH, Michael


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of RON70
 Sent: Freitag, 18. September 2009 11:23
 To: r-help@r-project.org
 Subject: [R] A stat related question
 
 
 Can I ask a small stat. related question here?
 
 Suppose I have two predictors for a time series processes and 
 accuracy of predictor is measured from MSEs. My question is, 
 if two predictors give same MSE then, necessarily they have 
 to be identical? Can anyone provide me any counter example?
 
 Thanks.
 --
 View this message in context: 
 http://www.nabble.com/A-stat-related-question-tp25505618p25505618.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2

2009-09-08 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
See the respective help files. The continuity correction only affects
the normal approximation in wilcox.test. With this small samples sizes,
the default evaluation is exact, so it doesn't change anything. In
contrast, kruskal.test is incapable to compute exact values but always
uses the chi-square approximation. So the discrepancy is between exact
test and approximation (guess you'd be better off with the former).

If you get the urge to reproduce the p value from kruskal.test using
wilcox.test, and maybe to better understand what's happening, try

a - wilcox.test(x1, x2, paired=FALSE, exact=FALSE, correct=FALSE)

(and yes, now with exact=FALSE, the continuity correction makes a
difference).

HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of David Scott
 Sent: Dienstag, 8. September 2009 07:02
 To: Thomas Farrar
 Cc: r-help@r-project.org
 Subject: Re: [R] Equivalence of Mann-Whitney test and 
 Kruskal-Wallis test with k=2
 
 Thomas Farrar wrote:
  Hi all,
  
  The Kruskal-Wallis test is a generalization of the two-sample 
  Mann-Whitney test to *k* samples.  That being the case, the 
  Kruskal-Wallis test with *k*=2 should give an identical 
 p-value to the Mann-Whitney test, should it not?
  
  x1-c(1:5)
  x2-c(6,8,9,11)
  a-wilcox.test(x1,x2,paired=FALSE)
  b-kruskal.test(list(x1,x2),paired=FALSE)
  a$p.value
  [1] 0.01587302
  b$p.value
  [1] 0.01430588
  
  The p-values are slightly different (note that there are no ties in 
  the data, so computed p-values should be exact).
  
  Can anyone explain the discrepancy?  It's been awhile since 
 I studied 
  nonparametric stats and this one has me scratching my head.
  
  Many thanks!
  Tom
  
 
 The continuity correction? It is true by default for 
 wilcox.test and is not apparent in the help for kruskal.test.
 
 David Scott
 
 --
 _
 David Scott   Department of Statistics
   The University of Auckland, PB 92019
   Auckland 1142,NEW ZEALAND
 Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
 Email:d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018
 
 Director of Consulting, Department of Statistics
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior in friedman.test and ks.test

2009-09-08 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Alex,

It's mainly speculation, as I cannot check the Excel add-in nor Vassar, but 
I'll give it a try.

For the Friedman-test: Results of R coincide with those reported by Hollander  
Wolfe, which I'd take as a point in favor of R. In any case, my guess is that 
ties are handled differently (average ranks in R), but you'd have to check with 
the documentation of WinSTAT and Vassar. If it is not documented, see what test 
statistic you'd get manually according to which handling of ties.

For the ks.test: See the ?ks.test for meaning of the exact argument of this 
function. I'd assume that Excel gives you the asymptotic p value only, while R 
will by default return an exact one for 32 samples. From the same help page: 
Otherwise, asymptotic distributions are used whose approximations may be 
inaccurate in small samples. You could check using something like 
ks.test(myData$f1_A, pnorm, exact=FALSE). If that doesn't resolve the issue: do 
the KS test (semi-)manually, which should not be that difficult (even in Excel, 
if the need may be), and compare the D value with the one obtained from R and 
Excel, respectively.

HTH, Michael



 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of 
 atconsta-rh...@yahoo.com
 Sent: Dienstag, 8. September 2009 15:33
 To: R-help@r-project.org
 Subject: [R] Unexpected behavior in friedman.test and ks.test
 
 I have to start by saying that I am new to R, so I might miss 
 something crucial here. It seems to me that the results of 
 friedman.test and ks.test are wrong. Now, obviously, the 
 first thing which crossed my mind was it can't be, this is a 
 package used by so many, someone should have observed, but I 
 can't figure out what it might be.
 
 Problem: let's start with friedman.test. I have a lot of data 
 to analyze and I grew sick and tired of clicking and 
 selecting in Excel (for which we have a statistics Add-In 
 purchased, don't' start to flame me on using Excel for stats, 
 please!); so I wanted to automate the analysis in R and 
 figured out the results differ from Excel. Example Take the 
 data from example(friedman.test)  (Hollander  Wolfe (1973), 
 p. 140ff.). I ran the example in R and got:
 
 Friedman rank sum test
 data:  RoundingTimes
 Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805
 
 Same data, in Excel, using the WinSTAT for Excel (Fitch 
 software), gives: Friedman chi-squared = 10.6364, df = 2, 
 p-value =0.004902
 
 Puzzled, I entered the data in the calculator from Vassar 
 (http://faculty.vassar.edu/lowry/fried3.html ) and got 
 exactly the same values as in Excel (and, again, different 
 from R). Admittedly, the differences are not large, and both 
 fall below the 0.05 threshold, but, still.
 
 So, question 1 would be why is R different from both Excel 
 and Vassar?
 
 
 Now to the Kolmogorov-Smirnov test, from which my odeal 
 actually started: the results from ks.test are wildly 
 different from the ones I have got with the Excel add-in. 
 Basically, I have 32 sets of observations (patients) for 100 
 independent variables (different blood analyses). Question 
 was whether the data is normally distributed for each of the 
 analyses and, hence, whether I can apply a parametric test or not.
 Once I had loaded the data in a dataframe (and it looks as 
 expected), I ran:
 ks.test(myData$f1_A, pnorm)
 ks.test(myData$f8_A, pnorm)
 
 They give p-values of  2.2e-16 (with ties) and 8.882e-16. 
 The Excel Add-In gives p-values of 
  
 0.0074491 and, respectively, 0.2730477
 
 Here the difference is serious, like between highly 
 significant non-normal for both f1 and f8 (R), or one 
 non-normal and one normal (the Add-in). I first thought that 
 the difference might arise from different probablity 
 distributions (but what else, if not pnorm). Then I ran the 
 friedman test, to find out similar discrepancies.
 
 I'd really appreciate some input on this: what's wrong and 
 how should I decide whom to trust?
 
 Many thanks in advance,
 
 Alex
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] biplot graphical options?

2009-09-02 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Taking the example from ?biplot.princomp with reproducible code (hint,
hint):

biplot(princomp(USArrests))
biplot(princomp(USArrests), col=c(2,3), cex=c(1/2, 2)) 

clearly changes the color and font size on my system. For changing the
point symbols (which are text by default), try setting xlabs and
ylabs. But of course, an argument pch will not have any effect, as
biplot does have no call to points or similar, but to text only (and
hence mainly parameters of text will have an effect on what you aim
at). So for documentation, I'd propose ?biplot.princomp and
?biplot.default.

For changes that are truly impossible (there are indeed some), take the
code for biplot.default (or biplot.princomp) and amend where necessary
to fulfill your needs. 

HTH, Michael

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Manca Marco (PATH)
 Sent: Mittwoch, 2. September 2009 13:02
 To: r-help@r-project.org
 Subject: [R] biplot graphical options?
 Importance: High
 
 
 Dear R-help fellows
 
 good afternoon.
 
 I am struggling in the attempt to impose some graphical 
 conditions (changing point symbols, colors, etc) to biplot 
 function (I am using it to visualize the results of princomp) 
 but I can't apparently manage to change anything but the 
 axis... and I have been browsing manuals and vignettes 
 without finding any explicit suggestions on how to operate...
 
 Can anyone, please, point my attention to the relevant documentation?
 
 Thank you in advance and best regards,
 Marco
 
 --
 Marco Manca, MD
 University of Maastricht
 Faculty of Health, Medicine and Life Sciences (FHML) 
 Cardiovascular Research Institute (CARIM) PO Box 616 6200 MD 
 Maastricht
 
 E-mail: m.ma...@path.unimaas.nl
 Office telephone: +31(0)433874633
 Personal mobile: +31(0)626441205
 Twitter: @markomanka
 
 
 **
 ***
 
 This email and any files transmitted with it are 
 confide...{{dropped:15}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation test and R2 problem

2009-08-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roy
 Sent: Freitag, 14. August 2009 12:05
 To: r-help@r-project.org
 Subject: [R] Permutation test and R2 problem
 
 Hi,
 
 
 I have optimized the shrinkage parameter (GCV)for ridge  and  
 got my r2 value is 70% . to check the sensitivity of the 
 result, I did permutation test. I permuted the response 
 vector and run for 1000 times and draw a distribution. But 
 now, I get r2 values highest 98% and some of them more than 
 70 %. Is it expected from such type of test?

Depends on what exactly you are doing and on your data, but surely this
is not unexpected (even less given the information we have).
 
 *I was under impression that, r2 with real data set will 
 always maximum! And permutation will not be effected i.e. 
 permuted r2 will always less than real one! *

Why would that be? And even more, why would you do a permutation test
then if you knew in advance that all permuted values are below your
observed one? You optimize the shrinkage parameter for your data, not
your data for your shrinkage parameter. In the latter case you would
have been right. Given any fixed shrinkage parameter, you can always
find some data (e.g. the predicted values) that fit better than the
original. I guess that in most non-artificial cases with a reasonable
amount of data, there are at least some permutations that give a higher
r2.

Not sure what kind of sensitivity you want to check, but probably you'd
have to optimize the shrinkage parameter as well. And of course to make
sure that your permutations correspond to the original constraints. But
this opens a wide field, refer to any textbook on this matter for
further detail (e.g. Edgington  Onghena, Randomization tests).

HTH, Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-squared approximation may be incorrect

2009-08-11 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Nancy (?),
see ?chisq.test (in particular the examples and the comment on expected
frequency there). 
A rule of thumb (see any basic text book) for the chisquared
approximation being okay is that the expected value in each cell is at
least 5. The warning tells you that this does not hold true for at least
one of your cells. 
You might want to try 
chisq.test(t, simulate.p.value = TRUE) 
instead.
Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of SNN
Sent: Mittwoch, 12. August 2009 00:34
To: r-help@r-project.org
Subject: [R] Chi-squared approximation may be incorrect



Hi,


I am trying to run a Chi squre test but I am getting the following
message

Warning messages:
1: In chisq.test(t) : Chi-squared approximation may be incorrect

Does anyone what it means?

your help is appreciated


--
View this message in context:
http://www.nabble.com/Chi-squared-approximation-may-be-incorrect-tp24926
932p24926932.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional randomization of groups

2009-08-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Hi unknown,

As a quick patch, try something like

mydata$Set - 1
mydata$Set[c(sample(mydata$ID[mydata$Type==A],
ifelse(runif(1)1/2,2,3)), sample(mydata$ID[mydata$Type==B], 3) )[1:5]
] - 2
 
HTH, Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of yabado
Sent: Donnerstag, 6. August 2009 23:29
To: r-help@r-project.org
Subject: [R] Conditional randomization of groups


Hi

I have a data set which is like this:

Data

ID   Type
1   A
2   B 
3   A
4   A
5   A
6   B
7   B
8   A
9   B
10 B

As you can see, there are 5 A and 5 B. Now, I want to a randomization of
the data set which will assign 5 ID  to a new group called Set1 and
and the other 5 to Set2

The Set1 and Set2 will be either: 
Set1 (A A A B B ) and Set2 (A A B B B)
or
Set1(A A B B B) and Set2 (A A A B B).

Each new group need to have at two A and two B, and one more B or A.
Except this condition, everything else (ID, permutation, selection to
Set1 and
Set2) need to be random.

Can anyone show me how to do the randomization?


--
View this message in context:
http://www.nabble.com/Conditional-randomization-of-groups-tp24854783p248
54783.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rnorm() converted to daily

2009-04-17 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
SD in y is more than 15 times (!!!) larger than in x and z,
respectively, and hence SD of the mean y is also larger. 100,000 values
are not enough to stabilize this. You could have obtained 0.09 or even
larger as well. Try y with different seeds, y varies much more than x
and z do. Or check the variances: 

var.x - var(replicate(100, mean( rnorm( 10, mean=0.08, sd=0.25 
var.y - var(replicate(100, mean( rnorm( 10, mean=(0.08/252),
sd=(0.25/sqrt(252)) )) * 252))
var.z - var(replicate(100, mean( rnorm( 10, mean=(0.08/252),
sd=0.001 )) * 252))

I guess what you are doing wrong is assuming that 0.25/sqrt(252) is
similar to 0.25/252, the latter being close so 0.001, while the former
is obviously not. Try to replace 0.25 in y by 0.0158, than you should be
close enough.

HTH, Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Ken-JP
Sent: Freitag, 17. April 2009 09:26
To: r-help@r-project.org
Subject: [R] rnorm() converted to daily



yearly 8% drift, 25% sd

why are x and y so different (x and z look ok)?

Does this have something to do with biases due to the relationship
between population and sample-estimates as a function of n samples and
sd?  Or am I doing something wrong?

---

set.seed( 1 );
x - mean( rnorm( 10, mean=0.08, sd=0.25 )); set.seed( 1 ); y -
mean( rnorm( 10, mean=(0.08/252), sd=(0.25/sqrt(252)) )) * 252;
set.seed( 1 ); z - mean( rnorm( 10, mean=(0.08/252), sd=0.001 )) *
252; #
---
x# 0.07943898
y# 0.07109407
z# 0.07943449



--
View this message in context:
http://www.nabble.com/rnorm%28%29-converted-to-daily-tp23092444p23092444
.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Genstat into R - Randomisation test

2009-04-16 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Robert, Tom, Peter and all, 

If I remember correctly (don't have my copy at hand right now),
Edgington and Onghena differentiate between randomization tests and
permutation tests along the following lines:

Randomization test: Apply only to randomized experiments, for which we
consider the theoretically possible re-randomizations given the
constraints for the original randomization. In this setting, inference
is valid for the observed population, and does not require random
sampling (while, of course, this assumption is needed if we want to
generalize the findings). I think they also call it re-randomization
tests at least once, while the prefix is usually omitted for simplicity,
but this makes it clearer that an initial randomization is required.

Permutation test: Can be applied even to data from non-randomized
experiments, requires, however, random sampling. Hence, one additional
assumption is needed (one that is common to inference using classical
tests). This is hardly addressed in Edgington  Onghena. 

In both cases, the answer can be approximated by random draws if full
randomizations/permutations are difficult or impossible to perform, but
this applies to both methods. The distinction in Edginton (1980) as
mentioned by Tom is not used in the latest edition. 

In this regard, the comparison with exact vs. approximate integral
seems inappropriate to me, as the distinction is conceptual, not
computational. Neither is one the (non-)exhaustive variant of the other.
I doubt that randomization test is named according to _random_ choice
of permutations as implied by Peter, it's rather based on
randomization (of an experiment) in its best statistical meaning. 

Not sure whether the distinction is needed, but it might be helpful in
some instances, at least if used consistently. I have never seen another
distinction between these two, and the literature is inconsistent in its
use as well.

Just my two cents worth.
Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Robert A LaBudde
Sent: Donnerstag, 9. April 2009 17:38
To: Tom Backer Johnsen
Cc: r-help@r-project.org
Subject: Re: [R] Genstat into R - Randomisation test

At 04:43 AM 4/9/2009, Tom Backer Johnsen wrote:
Peter Dalgaard wrote:
  Mike Lawrence wrote:
  Looks like that code implements a non-exhaustive variant of the 
  randomization test, sometimes called a permutation test.
 
  Isn't it the other way around? (Permutation tests can be
 exhaustive by looking at all permutations, if a randomization test did

 that, then it wouldn't be random.)

Eugene Edgington wrote an early book (1980) on this subject called 
Randomization tests, published by Dekker.  As far as I remember, he 
differentiates between Systematic permutation tests where one looks 
at all possible permutations.  This is of course prohibitive for 
anything beyond trivially small samples.  For larger samples he uses 
what he calls Random permutations, where a random sample of the 
possible permutations is used.

Tom

Peter Dalgaard wrote:
Mike Lawrence wrote:
Looks like that code implements a non-exhaustive variant of the 
randomization test, sometimes called a permutation test.
Isn't it the other way around? (Permutation tests can be exhaustive by

looking at all permutations, if a randomization test did that, then it

wouldn't be random.)

Edginton and Onghena make a similar distinction in their book, but I
think such a distinction is without merit.

Do we distinguish between exact definite integrals and approximate
ones obtained by numerical integration, of which Monte Carlo sampling is
just one class of algorithms? Don't we just say: 
The integral was evaluated numerically by the [whatever] method to an
accuracy of [whatever], and the value was found to be [whatever]. 
Ditto for optimization problems.

A randomization test has one correct answer based upon theory. We are
simply trying to calculate that answer's value when it is difficult to
do so. Any approximate method that is used must be performed such that
the error of approximation is trivial with respect to the contemplated
use.

Doing Monte Carlo sampling to find an approximate answer to a
randomization test, or to an optimization problem, or to a bootstrap
distribution should be carried out with enough realizations to make sure
the residual error is trivial.

As Monte Carlo sampling is a random sampling-based approximate method.
The name does create confusion in terminology for randomization tests
for bootstrapping.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list

Re: [R] OT: A test with dependent samples.

2009-02-11 Thread Meyners, Michael, LAUSANNE, AppliedMathematics
Rolf,

as you explicitly asked for a comment on your proposal: It is generally
equivalent to McNemar's test and maybe even more appropriate because of
the asymptotics involved in the chi-squared distribution, which might
not be too good with n=12. In some more detail:

McNemar's test basically considers the difference between the
off-diagonal elements, normalized by their sum. You do the same,
ignoring all (0,0) and (1,1) pairs (the latter do not occur in your
example anyway). binom.test(12,12,alternative=greater) gives you the
one-sided p-value.

If, however, you do not use the exact binomial distribution (with n=12)
but the normal approximation, you find E(X)=6 and VAR(X)=3 with X the
number of sequences (0,1), and you observed x=12. This gives you a
z-score of (12-6)/sqrt(3) and a corresponding one-sided p-value:

 pnorm(6/sqrt(3), lower.tail=F)
[1] 0.0002660028

Further, McNemar's test WITHOUT continuity correction gives 

 mcnemar.test(matrix(c(61,0,12,0),2,2), correct=F)

McNemar's Chi-squared test

data:  matrix(c(61, 0, 12, 0), 2, 2) 
McNemar's chi-squared = 12, df = 1, p-value = 0.000532

It comes as no surprise that the reported (two-sided!) p-value is
exactly twice the (one-sided!) p-value from the binomial test with
normal approximation:

 pnorm(6/sqrt(3), lower.tail=F)*2
[1] 0.0005320055

I do not want to stress the pros and cons of continuity corrections, but
if neglected, you see that you get the same results (except that McNemar
is generally two-sided), due to the relation of normal and chi-squared
distribution. 

If you use the binomial test, you can forget about asymptotics and
continuity correction, that's why I indeed consider you approach
superior to the chi-squared approximation used in McNemar's test. But
you might fail to find a reference for the exact approach...


You briefly asked in a later mail testing for p=0. Indeed, _any_
incident will disprove this hypothesis, and the p value reported by 

 binom.test(1,73,p=0)

Exact binomial test

data:  1 and 73 
number of successes = 1, number of trials = 73, p-value  2.2e-16
alternative hypothesis: true probability of success is not equal to 0 
95 percent confidence interval:
 0.0003467592 0.0739763232 
sample estimates:
probability of success 
0.01369863 

is not wrong (as R just tells us it is BELOW a certain value), but could
be refined by saying p-value = 0. BTW, I am not sure what
p-value=TRUE tells me, as derived from binom.test(0,73,p=0). I
personally don't care about either, as to test H0: p=0, I would not use
any software but rather stress some basic statistical theory.


It remains the questions whether McNemar (or your proposal) is generally
appropriate here. Like others, I have doubts, except maybe if
observations were made on two subsequent days, on the second of which
the drug was administered. How could you otherwise be sure that the
increase in incidences it not due to the progression of cancer, say,
which would be difficult to rule out. Also, from my experience and in
contrast to you point of view, I'd rather assume it likely that a vet is
much more reluctant to apply the new treatment to an already vomitting
cat, your baseline with 0 out of 73 at least seems suspicious, and
until proven wrong, I'd take the assumption that this is not by chance
only.
I understand that this is an observational study with no
control/randomization, but concluding a side-effect from a statistically
significant change in incidence rates from this study seems questionable
to me. What I'd propose (and used in comparable situations) is to
confine yourself on the confidence interval and let the investigator
decide / discuss whether the lower bound of this is higher than what she
would expect under control or alternative treatment. She needs to have
some experience, if not historical data, and using a test or just this
confidence interval, you will only get a hint on a possible side-effect,
no formal proof (which is always difficult from observational studies).
Discussing the CI would seem fairer and even stronger to me then. In
other words: Mathematically, you'll get an appropriate test, while
conceptually, I'm far from being convinced.

Hope this makes sense, 
Michael




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Rolf Turner
Sent: Dienstag, 10. Februar 2009 22:33
To: R-help Forum
Subject: [R] OT: A test with dependent samples.


I am appealing to the general collective wisdom of this list in respect
of a statistics (rather than R) question.  This question comes to me
from a friend who is a veterinary oncologist.  In a study that she is
writing up there were 73 cats who were treated with a drug called
piroxicam.  None of the cats were observed to be subject to vomiting
prior to treatment; 12 of the cats were subject to vomiting after
treatment commenced.  She wants to be able to say that the treatment had
a ``significant''
impact with respect to