Re: [R] Randomization Test
Ogbos, To share data (in particularly lengthy one as yours), check out ?dput To replicate sampling, look at ?replicate (will output to a data frame to use further) - that should answer your Q1. Apart from that (and regarding Q2), the task you are after is getting more and more obscure to me. I don't see what you really want to test - between levels of n (or A in oodf)? If so, you would need to permute levels within each set (are they sets of -5:10, or are all observations completely independent? In the latter case, across all sets). What is the null hypothesis you want to test? And what is the data exactly? I don't understand what the two columns indicate. Then you need to decide on what the test statistic is you want to use. The average? The difference between each pair of averages? Anything like that? Do you want to test all levels simultaneously, or by pairs? Clearly, your way of sampling is inappropriate for a randomization test. You are rather simulating data that can take any value between min and max with equal probability. That does not seem to be the right null model to me (it might be, though, depending on your null hypothesis). It would not make a randomization test either way, but rather a Monte Carlo simulation under the null hypothesis. Then you would determine your test statistic(s) (whichever they are) and subsequently repeat that often enough and check whether your observed value is higher than 90 or 99% of the simulated ones. Again, that would be a simulation, not a randomization test. For the latter, you'd need to permute the data in an appropriate way (again depending on the null hypothesis, and on the structure, i.e. are they are coming in blocks of any kind, or all independent), and then recalculate the test statistic every time, and then proceed as before. I'm not sure whether the data was the result of a randomized experiment - if not, you can still use the idea, but fall into something that some refer to as "permutation testing" - the difference being that you need to make much stronger assumptions on independence etc of the data. Many use the terms equivalently, but just so you are aware. I really think you need to look into a textbook (eg. Edgington & Onghena) or some papers to understand the concept of randomization tests, or consult with a statistician with good background in that field. What you are suggesting is not near it, and unless you have a clear hypothesis and a good understanding of how the data was generated, it is impossible for you (and anyone else) to say how such a test might be designed. Michael > -Original Message- > From: Ogbos Okike > Sent: Mittwoch, 27. Februar 2019 22:53 > To: Meyners, Michael > Cc: r-help > Subject: Re: [R] Randomization Test > > Dear Kind List, > > I am still battling with this. I have, however, made some progress with the > suggestions of Micheal and others. At least, I have a better picture of what I > want to do now as I will attempt a detailed description here. > > I am aware I should show you just a small part of my code and data. > But when I copied out a small portion and run to see what you get when I > send that, I was not satisfied with the signal displayed. The epoch analysis > averages data and is quite sensitive to leveraging, especially if a small > sample > is used. > > So please permit/exercise patience me to display the series of epoch that > give the averaged valued used. You can just run the code and see the signal > of interest. Here is the code and the data: > > dta <- read.table( text ="n CR > -5 8969 . SNIP... . > 10 9566 > ",header=TRUE) > > data<-matrix(c(dta$CR),ncol=71) > A<-matrix(rep(-5:10,71)) > B<-matrix(data) > > oodf<-data.frame(A,B) > a<--5:10 > oodf<-data.frame(A,B) > library(plotrix) > std.error<-function(x) return(sd(x)/(sum(!is.na(x > oomean<-as.vector(by(oodf$B,oodf$A,mean)) > oose<-as.vector(by(oodf$B,oodf$A,std.error)) > plot(-5:10,oomean,type="l",ylim=c(8890,9100), > ) > A<-oomean-1.96*oose > B<-oomean+1.96*oose > lines(a,A,col="red") > lines(a,B,col="red") > > My Question: > I wish to conduct a randomization test of significance (90 and 99 > percentile) of the reductions/decreases as displayed by the signal. > > I am attempting using: > x<-sample(8890:9500,1136,replace=T ) > > to generate the random numbers, where 8890, 9500 and 1136 are the > minimum and maximum of the signal and 1136 the length of sample data. > Q1: Please how do I generate many samples as x above, say up to 5000 or > 10,000? I manually generated and stored as x1,x2, x3 up to x100. > > Q2: Please how do I use this randomly generated numbers to test the > statistical significan
Re: [R] Randomization Test
Ogbos, You do not seem to have received a reply over the list yet, which might be due to the fact that this seems rather a stats than an R question. Neither got your attachment (Figure) through - see posting guide. I'm not familiar with epoch analysis, so not sure what exactly you are doing / trying to achieve, but some general thoughts: * You do NOT want to restrict your re-randomizations in a way that "none of the dates corresponds with the ones in the real event" - actually, as a general principle, the true data must be an admissible re-randomization as well. You seem to have excluded that (and a lot of other randomizations at the same time which might have occurred, i.e. dates 1 and 2 reversed but all others the same), thereby rendering the test invalid. Any restrictions you have on your re-randomizations must've applied to the original randomization as well. * If you have rather observational data (which I suspect, but not sure), Edgington & Onghena (2007) would rather refer to this as a permutation test - the difference being that you have to make strong assumptions (similar to parametric tests) on the nature of the data, which are designed-in to be true for randomization tests. It might be a merely linguistic discrimination, but it is important to note which assumptions have to be (implicitly) made. * I'm not sure what you mean by "mean differences" of the events - is that two groups you are comparing? If so, that seems reasonable, but just make sure the test statistic you use is reasonable and sensitive against the alternatives you are mostly interested in. The randomization/permutation test will never proof that, e.g., means are significantly different, but only that there is SOME difference. By selecting the appropriate test statistic, you can influence what will pop up more easily and what not, but you can never be sure (unless you make strong assumptions about everything else, like in many parametric tests). * For any test statistic, you would then determine the proportion of its values among the 5000 samples where it is as large or larger than the one observed (or as small or smaller, or either, depending on the nature of the test statistic and whether you aim for a one- or a two-sided test). That is your p value. If small enough, conclude significance. At least conceptually important: The observed test statistic is always part of the re-randomization (i.e. your 5000) - so you truly only do 4999 plus the one you observed. Otherwise the test may be more or less liberal. Your p value is hence no smaller than 1/n, where n is the total number of samples you looked at (including the observed one), a p value of 0 is not possible in randomization tests (nor in other tests, of course). I hope this is helpful, but you will need to go through these and refer to your own setup to check whether you adhered to the principles or not, which is impossible for me to judge based on the information provided (and I won't be able to look at excessive code to check either). Michael > -Original Message- > From: R-help On Behalf Of Ogbos Okike > Sent: Montag, 28. Januar 2019 19:42 > To: r-help > Subject: [R] Randomization Test > > Dear Contributors, > > I conducting epoch analysis. I tried to test the significance of my result > using > randomization test. > > Since I have 71 events, I randomly selected another 71 events, making sure > that none of the dates in the random events corresponds with the ones in > the real event. > > Following the code I found here > (https://www.uvm.edu/~dhowell/StatPages/R/RandomizationTestsWithR/R > andom2Sample/TwoIndependentSamplesR.html), > I combined these two data set and used them to generate another 5000 > events. I then plotted the graph of the mean differences for the 5000 > randomly generated events. On the graph, I indicated the region of the > mean difference between the real 71 epoch and the randomly selected 71 > epoch. > > Since the two tail test shows that the mean difference falls at the extreme of > the randomly selected events, I concluded that my result is statistically > significant. > > > > I am attaching the graph to assistance you in you suggestions. > > I can attach both my code and the real and randomly generated events if you > ask for it. > > My request is that you help me to understand if I am on the right track or no. > This is the first time I am doing this and except the experts decide, I am not > quite sure whether I am right or not. > > Many thanks for your kind concern. > > Best > Ogbos > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To
Re: [R] ANOVA Permutation Test
Juan, Your question might be borderline for this list, as it ultimately rather seems a stats question coming in R disguise. Anyway, the short answer is that you *expect* to get a different p value from a permutation test unless you are able to do all possible permutation and therefore use the so-called systematic reference set. That is rarely the case, and only for relatively small problems. The permutation test uses a random subset of all possible permutations. Given this randomness, you'll get a different p value. In order to get reproducible results, you may specify a seed (?set.seed), yet that is only reproducible with this environment. Someone else with a different software and/or code might come out with a different p. The higher the number of permutations used, the smaller the variation around the p values, however. For most applications, 1000 seem good enough to me, but sometimes I go higher (in particular if the p value is borderline and I really need a strict above/below alpha decision). The permutations do not create an implicit normal distribution, but rather a null distribution that can (likely is depending on non-normality of your data) not normal. So your respective proposal does not appeal. I don't think you need an alternative - the permutation test is just fine, and recognizing the randomness in the execution does not render the (relatively small) variability in p values a major issue. You may want to have a look at the text book by Edgington & Onghena for details on permutation tests, and there are plenty of papers out there addressing them in various contexts, which will help to understand *why* you observe what you observe here. HTH, Michael > -Original Message- > From: R-help On Behalf Of Juan Telleria Ruiz > de Aguirre > Sent: Montag, 3. September 2018 17:18 > To: R help Mailing list > Subject: [R] ANOVA Permutation Test > > Dear R users, > > I have the following Question related to Package lmPerm: > > This package uses a modified version of aov() function, which uses > Permutation Tests instead of Normal Theory Tests for fitting an Analysis of > Variance (ANOVA) Model. > > However, when I run the following code for a simple linear model: > > library(lmPerm) > > e$t_Downtime_per_Intervention_Successful %>% > aovp( > formula = `Downtime per Intervention[h]` ~ `Working Hours`, > data = . > ) %>% > summary() > > I obtain different p-values for each run! > > With a regular ANOVA Test, I obtain instead a constant F-statistic, but I do > not > fulfill the required Normality Assumptions. > > So my questions are: > > Would it still be possible use the regular aov() by generating permutations in > advance (Obtaining therefore a Normal Distribution thanks to the Central > Limit Theorem)? And applying the aov() function afterwards? Does it have > sense? > > > Or maybe this issue could be due to unbalanced classes? I also tried to weight > observations based on proportions, but the function failed. > > > Any alternative solution for performing a One-Way ANOVA Test over Non- > Normal Data? > > > Thank you. > > Juan > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Letters group Games-Howell post hoc in R
Apologies if I missed any earlier replies - did you check multcompLetters in package {multcompView}? It allows you to get connecting letters reports (if that's what you are after, I didn't check what exactly agricolae is providing here). May have to add some manual steps to combine this with any data (means or whatever) you want to report. multcompLetters allows you to use p values or a logical (significant or not) HTH, Michael > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Bars > Cortina > Sent: Dienstag, 16. Januar 2018 13:51 > To: R-help@r-project.org > Subject: [R] Letters group Games-Howell post hoc in R > > Hello everybody, > > I use the sweetpotato database included in R package: > > data(sweetpotato) This dataset contains two variables: yield(continous > variable) and virus(factor variable). > > Due to Levene test is significant I cannot assume homogeneity of variances > and I apply Welch test in R instead of one-way ANOVA followed by Tukey > posthoc. > > Nevertheless, the problems come from when I apply posthoc test. In Tukey > posthoc test I use library(agricolae) and displays me the superscript letters > between virus groups. Therefore there are no problems. > > Nevertheless, to perform Games-Howell posthoc, I use > library(userfriendlyscience) and I obtain Games-Howell output but it's > impossible for me to obtain a letter superscript comparison between virus > groups as it is obtained through library(agricolae). > > The code used it was the following: > > library(userfriendlyscience) > > data(sweetpotato) > > oneway<-oneway(sweetpotato$virus, y=sweetpotato$yield, posthoc = > 'games-howell') > > oneway > > I try with cld() importing previously library(multcompView) but doesn't work. > > Can somebody could helps me? > > Thanks in advance, > > David Bars. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Jaccard index
Did you search the internet? At first attempt, vegdist {vegan}(worked well for me in the past) and dist.binary {ade4} seem to offer what you need. HTH, Michael -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of sreenath Sent: Mittwoch, 15. Juli 2015 09:35 To: r-help@r-project.org Subject: [R] Jaccard index How can i find jaccard index of two groups,which package is to be used? please help -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-index- tp4709883.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: RE: Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design
Angelo, the conservatism of Bonferroni aside for a moment, I don't really see what you are after. mcp(Emotion = Tukey) just defines the contrasts to test; in this case, all pairwise comparisons. The test=adjusted(bonferroni) argument then says that these contrasts should not be corrected for according to Tukey's procedure, but using Bonferroni adjustment. So this should give you tests for all pairwise comparisons with multiplicity correction according to Bonferroni. I wonder how this differs from the results you get from SPSS. Seemingly, this is not what you want, so the question is what you really want. Bonferroni method does not indicate which comparisons / contrasts to look at, but just takes all those that you are interested in and multiply the corresponding p values with the number of comparisons (and making sure it does not exceed 1). As John indicated, that can easily be handled manually. Yet, you need to create the tests for all comparisons that you are interested in - if not via linfct=mcp(Emotion = Tukey), you need to specify them otherwise (see the three options indicated on ?glht). The code I suggested offers a convenient shortcut in case you are interested in all pairwise comparisons and want them to be corrected according to Bonferroni, but if something else is of interest, you'd need to specify this (and let us know, as Bonferroni method does not give a clue about which comparisons to test). NB: You may want to pay attention to the warning halfway down the helppage for ?mcp; it may not be clear exactly which effects you want to compare; mcp uses just the main effects w/o interactions etc. Michael -Original Message- From: John Fox [mailto:j...@mcmaster.ca] Sent: Dienstag, 7. Juli 2015 03:24 To: angelo.arc...@virgilio.it Cc: Meyners, Michael; r-help@r-project.org Subject: Re: [R] R: RE: Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design Dear Angelo, The Bonferroni p-value is just the ordinary p-value times the number of tests, so, since R supports multiplication, you can apply the Bonferroni adjustment in R. Because Bonferroni tests for multiple comparisons can be very conservative, asking why you want to use them is a fair question. Best, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Tue, 7 Jul 2015 00:50:49 +0200 (CEST) angelo.arc...@virgilio.it angelo.arc...@virgilio.it wrote: Dear Michael, thank you for your answer, however, I am not asking for the tukey with the bonferroni adjustment, but doing the post hoc with the bonferroni method. Apparently this is done easily in SPSS, I am wondering whether it is possible with R. Can anyone help me? Thanks in advance Angelo Messaggio originale Da: meyner...@pg.com Data: 6-lug-2015 17.52 A: angelo.arc...@virgilio.itangelo.arc...@virgilio.it, r-help@r-project.orgr-help@r-project.org Ogg: RE: [R] Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design Untested, but if anything, your best bet is likely something like summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey)), test=adjusted(bonferroni)) should work (despite the question why you'd want to use Bonferroni rather than Tukey For a reference, see the book on the topic by the package authors. Might be in the paper, too, which is given by citation(multcomp) HTH, Michael -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of angelo.arc...@virgilio.it Sent: Montag, 6. Juli 2015 16:01 To: r-help@r-project.org Subject: [R] Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design Dear List Members, I need to perform a Bonferroni post hoc test in R on a table with three within subjects factors (Emotion, having 5 levels, Material, having 4 levels, Shoes, having 2 levels) and one between subject factor (Musician, having 2 levels). I normally use the Tukey method with the following formula require(nlme) lme_H2H = lme(H2H ~ Emotion*Material*Shoes*Musician, data=scrd, random = ~1|Subject) require(multcomp) summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey))) I am not able to find any reference that explains with an example of R code how to perform a post hoc test with the Bonferroni procedure. Can anyone provide an example to perform the same post hoc test in the code above but with Bonferroni instead of Tukey? Thank you in advance Angelo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design
Untested, but if anything, your best bet is likely something like summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey)), test=adjusted(bonferroni)) should work (despite the question why you'd want to use Bonferroni rather than Tukey For a reference, see the book on the topic by the package authors. Might be in the paper, too, which is given by citation(multcomp) HTH, Michael -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of angelo.arc...@virgilio.it Sent: Montag, 6. Juli 2015 16:01 To: r-help@r-project.org Subject: [R] Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design Dear List Members, I need to perform a Bonferroni post hoc test in R on a table with three within subjects factors (Emotion, having 5 levels, Material, having 4 levels, Shoes, having 2 levels) and one between subject factor (Musician, having 2 levels). I normally use the Tukey method with the following formula require(nlme) lme_H2H = lme(H2H ~ Emotion*Material*Shoes*Musician, data=scrd, random = ~1|Subject) require(multcomp) summary(glht(lme_H2H, linfct=mcp(Emotion = Tukey))) I am not able to find any reference that explains with an example of R code how to perform a post hoc test with the Bonferroni procedure. Can anyone provide an example to perform the same post hoc test in the code above but with Bonferroni instead of Tukey? Thank you in advance Angelo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail
Thanks Martin. Yep, I understand it is documented and my code wasn't as it should've been -- the confusion comes from the fact that it worked ok for hundreds of situations that seem very much alike, but one situation breaks. I agree that you typically can't be sure about having only numerical data in the data frame, but I was sure I had by design (numeric results of simulations, so no factors or anything else) and was then sloppy in passing the rows of the data frame to ecdf. So wondering what makes this situation different from all the others I had... Anyway, point taken and working solution found, so all fine :-) Cheers, Michael -Original Message- From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] Sent: Montag, 8. Juni 2015 16:43 To: Meyners, Michael Cc: r-help@r-project.org Subject: Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail Aehm, adding on this: I incorrectly *assumed* without testing that rounding would help; it doesn't: ecdf(round(test2,0))# a rounding that is way too rough for my application... #Error in xy.coords(x, y) : 'x' and 'y' lengths differ Digging deeper: The initially mentioned call to unique() is not very helpful, as test2 is a data frame, so I get what I deserve, an unchanged data frame with 1 row. Still, the issue remains and can even be simplified further: ecdf(data.frame(a=3, b=4)) Empirical CDF Call: ecdf(data.frame(a = 3, b = 4)) x[1:2] = 3, 4 works ok, but ecdf(data.frame(a=3, b=3)) Error in xy.coords(x, y) : 'x' and 'y' lengths differ doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead, ecdf(c(a=3, b=3)) Empirical CDF Call: ecdf(c(a = 3, b = 3)) x[1:1] = 3 does the trick. From ?ecdf, I get that x should be a numeric vector - apparently, my misuse of the function by applying it to a row of a data frame (i.e. a data frame with one row). In all my other (dozens of) cases that worked ok, though but not for this particular one. A simple unlist() helps: You were lucky. To use a one-row data frame instead of a numerical vector will typically *not* work unless ... well, you are lucky. No, do *not* pass data frame rows instead of numeric vectors. ecdf(unlist(data.frame(a=3, b=3))) Empirical CDF Call: ecdf(unlist(data.frame(a = 3, b = 3))) x[1:1] = 3 Yet, I'm even more confused than before: in my other data, there were also duplicated values in the vector (1-row-data frame), and it never caused any issue. For this particular example, it does. I must be missing something fundamental... well.. I'm confused about why you are confused, but if you are thinking about passing rows of data frames as numeric vectors, this means you are sure that your data frame only contains classical numbers (no factors, no 'Date's, no...). In such a case, transform your data frame to a numerical matrix *once* preferably using data.matrix(d.fr) instead of just as.matrix(d.fr) but in this case it should not matter. Then *check* the result and then work with that matrix from then on. All other things probably will continue to leave you confused .. ;-) Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail
Aehm, adding on this: I incorrectly *assumed* without testing that rounding would help; it doesn't: ecdf(round(test2,0))# a rounding that is way too rough for my application... #Error in xy.coords(x, y) : 'x' and 'y' lengths differ Digging deeper: The initially mentioned call to unique() is not very helpful, as test2 is a data frame, so I get what I deserve, an unchanged data frame with 1 row. Still, the issue remains and can even be simplified further: ecdf(data.frame(a=3, b=4)) Empirical CDF Call: ecdf(data.frame(a = 3, b = 4)) x[1:2] = 3, 4 works ok, but ecdf(data.frame(a=3, b=3)) Error in xy.coords(x, y) : 'x' and 'y' lengths differ doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead, ecdf(c(a=3, b=3)) Empirical CDF Call: ecdf(c(a = 3, b = 3)) x[1:1] = 3 does the trick. From ?ecdf, I get that x should be a numeric vector - apparently, my misuse of the function by applying it to a row of a data frame (i.e. a data frame with one row). In all my other (dozens of) cases that worked ok, though but not for this particular one. A simple unlist() helps: ecdf(unlist(data.frame(a=3, b=3))) Empirical CDF Call: ecdf(unlist(data.frame(a = 3, b = 3))) x[1:1] = 3 Yet, I'm even more confused than before: in my other data, there were also duplicated values in the vector (1-row-data frame), and it never caused any issue. For this particular example, it does. I must be missing something fundamental... Michael -Original Message- From: Meyners, Michael Sent: Montag, 8. Juni 2015 12:02 To: 'r-help@r-project.org' Subject: mismatch between match and unique causing ecdf (well, approxfun) to fail All, I encountered the following issue with ecdf which was originally on a vector of length 10,000, but I have been able to reduce it to a minimal reproducible example (just to avoid questions why I'd want to do this for a vector of length 2...): test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), .Names = c(X817, X4789), row.names = 74L, class = data.frame) ecdf(test2) # Error in xy.coords(x, y) : 'x' and 'y' lengths differ In an attempt to track this down, it occurs that unique(test2) # X817X4789 #74 3.398247 3.398247 while match(test2, unique(test2)) #[1] 1 1 matches both values to the first one. This causes a hiccup in the call to ecdf, as this uses (an equivalent to) a call to approxfun with x = test2 and y = cumsum(tabulate(match(test2, unique(test2, the latter now containing one entry less than the former, so xy.coords fails. I understand that the issue should be somehow related to FAQ 7.31, but I would have hoped that unique and match would be using the same precision and hence both or neither would consider the two values identical, but not one match while unique doesn't. Last but not least, it doesn't really cause an issue on my end (other than breaking my code and hence out of a loop at first place...); rounding will help w/o noteworthy changes to the outcome, so no need to propose a workaround :-) I'd rather like to raise the issue and learn whether there is a purpose for this behavior, and/or whether there is a generic fix to this, or whether I am completely missing something. Version info (under Windows 7): R version 3.2.0 (2015-04-16) -- Full of Ingredients Platform: x86_64-w64-mingw32/x64 (64-bit) Cheers, Michael __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mismatch between match and unique causing ecdf (well, approxfun) to fail
All, I encountered the following issue with ecdf which was originally on a vector of length 10,000, but I have been able to reduce it to a minimal reproducible example (just to avoid questions why I'd want to do this for a vector of length 2...): test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), .Names = c(X817, X4789), row.names = 74L, class = data.frame) ecdf(test2) # Error in xy.coords(x, y) : 'x' and 'y' lengths differ In an attempt to track this down, it occurs that unique(test2) # X817X4789 #74 3.398247 3.398247 while match(test2, unique(test2)) #[1] 1 1 matches both values to the first one. This causes a hiccup in the call to ecdf, as this uses (an equivalent to) a call to approxfun with x = test2 and y = cumsum(tabulate(match(test2, unique(test2, the latter now containing one entry less than the former, so xy.coords fails. I understand that the issue should be somehow related to FAQ 7.31, but I would have hoped that unique and match would be using the same precision and hence both or neither would consider the two values identical, but not one match while unique doesn't. Last but not least, it doesn't really cause an issue on my end (other than breaking my code and hence out of a loop at first place...); rounding will help w/o noteworthy changes to the outcome, so no need to propose a workaround :-) I'd rather like to raise the issue and learn whether there is a purpose for this behavior, and/or whether there is a generic fix to this, or whether I am completely missing something. Version info (under Windows 7): R version 3.2.0 (2015-04-16) -- Full of Ingredients Platform: x86_64-w64-mingw32/x64 (64-bit) Cheers, Michael __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R / JMP interface
Not sure about JMP 11, but remember that JMP 10 did not run with R version = 3.0.0 It depends a bit on the changes that come with new R versions; with JMP 10, several versions of the 2.x series were compatible even though JMP officially only supported earlier versions. I had hoped that with JMP 11 supporting R 3.0.1, this would work for a while for the following R versions, but that's apparently not the case; I haven't checked recently. You may have to downgrade (well, have a parallel install) of R 3.0.1 or earlier to get this working, that worked for me in the past, though admittedly not optimal. JMP will always be behind the latest R versions due to the extended testing phase before the release their new version -- R typically releases 2 versions or so during this testing phase alone... Alternatively, you might want to find out what changes in R render it incompatible with JMP and reverse these changes for your own R installation if possible -- that's certainly beyond my capabilities... My understanding is that it is not to be expected that R 3.1 will ever be supported by JMP 11; you'd likely have to wait for JMP 12; I was once told that they do not make updates in this regard with their within-major-release patches, though certainly desirable from my pov... HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Samuel J Gardner Sent: Mittwoch, 7. Mai 2014 18:20 To: Robert Douglas Kinley; r-help@r-project.org Subject: Re: [R] R / JMP interface https://communities.sas.com/message/199936 see link. Looks like R 3.1 may not yet be supported with JMP 11. _ From: Robert Douglas Kinley Sent: Wednesday, May 07, 2014 12:18 PM To: r-help@r-project.org Cc: Samuel J Gardner Subject: R / JMP interface hi useRs I am trying-out the facility to call R code from JMP. details: R 3.1.0 , JMP 11.1.1 , Windows 7 enterprise , all 64 bit. The test-script from the JMP help pages falls over at the first line :- R Init(); giving the error-message :- The installed version of R cannot be used. The entry point ConsoleIob could not be located. Has anyone else come across this - and ideally has anyone got a solution ? (I have submitted a request to JMP/SAS help too). cheers Bob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add constraints to rbinom
You don't need a constraint (rbinom won't give xn), but you need to make sure you are using the n you want to use: try x - cbind(x,rbinom(300,n[i],p[i]))# mind the [i] after the n at the respective line. Furthermore, you need to remove one transformation of x to make sure you divide by the right n, try rate - t(x/n) I think that should do the trick. I didn't check the remainder of the code, though. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Anamika Chaudhuri Sent: Freitag, 28. Juni 2013 04:06 To: r-help@r-project.org Subject: [R] Add constraints to rbinom Hi: I am trying to generate Beta-Binomial random variables and then calculate Binomial exact confidence intervals to the rate..I was wondering if I needed to add a constraint such that x=n to it, how do I go about it. The reason is I am getting data where xn which is giving a rate1. Heres my code: set.seed(111) k-63 x-NULL p-rbeta(k,3,3)# so that the mean nausea rate is alpha/(alpha+beta) min-10 max-60 n-as.integer(runif(k,min,max)) for(i in 1:k) x-cbind(x,rbinom(300,n,p[i])) x-t(x) rate-t(t(x)/n) se_rate-sqrt(rate*(1-rate)/n) # Exact Confidence Interval l_cl_exact-qbeta(.025,x,n-x+1) u_cl_exact-qbeta(.975,x+1,n-x) for (i in 1:63){ for (j in 1:300) { if (x[i,j]==0) { l_cl_exact[i,j]-0 u_cl_exact[i,j]-u_cl_exact[i,j] } else if (x[i,j]==n[i]) { l_cl_exact[i,j]-l_cl_exact[i,j] u_cl_exact[i,j]-1 } else l_cl_exact[i,j]-l_cl_exact[i,j] u_cl_exact[i,j]-u_cl_exact[i,j] #print(c(i,j)) } } Really appreciate any help. Thanks Anamika __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anova und Tukey HSD
Jochen, a) this is an English-spoken mailing list; other languages are not encouraged nor will they typically generate a lot of replies... b) your code is fine, so this is not an R-issue; you are rather stuck with some of the stats background -- you might want to see a friendly local statistician for advice :-) or try a stats mailing list (see e.g. the posting guide) c) your interpretation is ok -- if the CI does not contain 0, you'd have significance at the respective level. Alternatively, you might also use the p value (column p adj) and compare that to your significance level (and it seems odd that any search machine might fail to give you some links on the relationship of p values and CIs -- I got plenty at first try) d) In brief: you are doing ~100 (rough estimate) paired comparisons. With that, you can expect TukeyHSD to be pretty conservative, and it does not come as a surprise that you are unable to identify pairs of different groups, even less as the p value from the F test is not extremely small (~3%; whatever extremely small may mean in this case). Google also easily gets me to this site (and many other useful ones): http://www.pmean.com/05/TukeyTest.html which might provide some further guidance. Net, tough luck to encounter such a situation, but apparently none of the group differences is strong enough to still show up as significant after correcting for multiplicity (which is certainly called for) in this case where you have an inflated number of significance tests. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jochen Schreiber Sent: Montag, 13. Mai 2013 14:25 To: r-help@r-project.org Subject: [R] Anova und Tukey HSD Hallo zusammen, ich mache zuerst mittels eine Anova eine Überprüfung ob in den Daten ein signifikanter Unterschied exisitiert. Danach will ich mittels des TukeyHSD rausfinden, zwischen welchen Gruppe der Unterschied vorliegt. Allerdings weiß ich nicht wie ich die Daten interpretieren soll. Hier erstmal mein R Code: [code] Value-c(-0.9944999814033508,-0.3585381469727,0.7063000202178955,- 1.77435803833,- 1.080299973487854,0.3055071525574,1.849046325684,- 0.412440395355,0.5827999711036682,1.750669482422,- 6.693999767303467,-0.877943869019,- 1.3408000469207764,1.2560999393463135,- 0.1004081062317,1.849046325684,- 0.331905722046,0.4957999885082245,0.877943869019,0.73879998922 34802,0.877943869019,0.9154000282287598,0.877943869019,0.70630 00202178955,-1.3408000469207764,0.7063000202178955,- 0.331905722046,-1.6448999643325806,0.412440395355,- 1.6448999643325806,- 0.877943869019,0.7487000226974487,0.4399000108242035,1.8490463 25684,-1.6448999643325806,-2.4323999881744385,1.2265000343322754,- 0.4957999885082245,-9.999899864196777,-1.750669482422,- 1.6448999643325806,-9.999899864196777,0.877943869019,- 5.06279993057251,0.877943869019,-2.967745776367,- 5.06279993057251,-6.693999767303467,- 1.0990500450134277,0.9944999814033508,-0.467745776367,- 0.3585381469727,- 9.999899864196777,0.5827999711036682,0.7487000226974487,0.738799989223 4802,-0.2533000111579895,-9.999899864196777,- 1.0363999605178833,0.3055071525574,-1.174523162842,- 0.806410490417,-9.999899864196777,-0.9944999814033508,- 2.478300094604492,-0.150930858612,0.4957999885082245,- 4.571800231933594,-6.324900150299072,-0.38530001044273376,- 1.3408000469207764,-5.93179988861084,-6.693999767303467,- 2.967745776367,0.877943869019,-0.05020405311584,- 1.77435803833,-0.150930858612,-0.23725003004074097,- 0.643268528748,1.2560999393463135,- 0.1004081062317,0.4399000108242035,- 0.7063000202178955,0.9154000282287598,- 0.2181814033508,1.2265000343322754,- 0.412440395355,0.1764581741333,-1.4758000373840332,- 0.9944999814033508,-1.080299973487854,-0.643268528748,- 9.999899864196777,-2.0536999702453613,- 0.2181814033508,0.7487000226974487,0.02510202655792,- 1.0363999605178833,-0.05020405311584,- 0.7387999892234802,0.4957999885082245,-1.4758000373840332,- 0.7063000202178955,0.1764581741333,-5.06279993057251,- 0.643268528748,-1.4758000373840332,-0.9944999814033508,- 0.2533000111579895,0.1764581741333,- 0.331905722046,0.6776500344276428,0.3055071525574,- 0.05020405311584,0.5827999711036682,1.2560999393463135,- 0.4957999885082245,-0.38530001044273376,0.9944999814033508,- 2.4323999881744385,1.126338964844,- 0.9944999814033508,1.750669482422,1.080299973487854,- 0.7387999892234802,-1.3408000469207764,0.612820980835,- 2.0536999702453613,0.7063000202178955,-0.806410490417,- 0.877943869019,-0.05020405311584,-2.967745776367,- 0.877943869019,-2.0536999702453613,-1.3408000469207764,- 1.3408000469207764,-0.38530001044273376,0.7063000202178955,- 9.999899864196777,-
Re: [R] ks.test and wilcoxon.test results differ from other stat.packages
Impossible to say w/o a reproducible example, but to start with let me suggest looking at the exact= (both functions) and correct= (wilcox.test) arguments. Experience shows that some change of the default settings allows you to reproduce results from other software (and the help pages will explain you what the settings do; by that, you might also learn what your other tools actually calculate if you have no documentation at hand). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Alexander Favorov Sent: Freitag, 1. Februar 2013 09:00 To: r-help@r-project.org Subject: [R] ks.test and wilcoxon.test results differ from other stat.packages Probably, it's an obvious info, but I have not found anything in R FAQ concerning this feature/bug. The results of ks.test and wilcoxon.test (in the Mann-Whitney version, paired = 'FALSE') don't coincide with the results from the other statistical packages, e.g. Statistica, Medcalc, and (as for MW test) from the numerous online MW tests. E.g. Statistica p-value=0.0435353 Medcalc p-value=0.0435354 R p-value=0.04635 If I want to obtain result of test once, it does not matter. But what should I do if I want to perform Monte-Carlo simulations and I need in 1 or even 100 000 p-values and then will build some distribution and then use results of Statictica? Whtever, the descrepancy bothers. Are there some alternative packages for non-parametric statistics that produce results comparable with the other program packages? Or, probably, there is exists some environment variable to perform calculations in more common way? Thank you! Sasha. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unexpected (?) behavior of sort=TRUE in merge function
All, I realize from the archive that the sort argument in merge has been subject to discussion before, though I couldn't find an explanation for this behavior. I tried to simplify this to (kind of) minimal code from a real example to the following (and I have no doubts that there are smart people around achieving the same with smarter code :-)). I'm running R 2.15.1 64bit under MS Windows 7, full session info below. I do have a list with two dataframes: test - list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c(Y1, Y2, G, F, L, K), class = factor), cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286, 5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875, 2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429, 4.638889, 4.42424242424242, 4.78260869565217, 4.875, 3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125, 6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names = c(product, cong, x), row.names = c(NA, -24L), class = data.frame), structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c(Y1, Y2, G, F, L, K), class = factor), cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957, 4.18181818181818, 3.75, 4.31578947368421, 4.5, 3.73913043478261, 4.8876404494382, 5.20792079207921, 5.68, 5.70526315789474, 6.38636363636364, 4.96703296703297)), .Names = c(product, cong, x), row.names = c(NA, -12L), class = data.frame)) The dataframes are pretty much the same but for the values in the x-column and the fact that the second one has only half as many observations, missing the second half of the expand.grid if you like. Now if I run lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, L, K), cong=c(-1,0,1,11)), all=T, sort=TRUE)) # sort=TRUE is the default, so could be omitted sorts the first dataframe according to the labels of factor product, while for the second one the order is maintained from the first dataframes (x) to merge (which is the difference that I could not find being documented). Now I run the same code with sort=FALSE instead: lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, L, K), cong=c(-1,0,1,11)), all=T, sort=FALSE)) The results are at least consistent and fulfill my needs (this is, btw, not unexpected from the documentation). Note that I get exactly the same behavior if I apply merge subsequently to test[[1]] and test[[2]], so it is not an issue from lapply. (I realize that my dataframes are ordered by levels of product, but using test[[2]] - test[[2]][sample(12),] and applying the same code as above reveals that indeed no sorting is done but the order is maintained from the first dataframe.) I have a working solution for myself, so I'm not after any advice on how to achieve the sorting -- I'd just like to better understand what's going on here and/or what I might have missed in the documentation or in the list archives. Thanks in advance, Michael Session info: R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.15.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected (?) behavior of sort=TRUE in merge function
Rui, Thanks for looking into this. I apologize, I should've added my output, maybe it looks differently on my machine than on others. I also should have made my question more explicit: I'm not looking for a solution to get the sorting one way or another, I have that already. I rather want to understand why the same code behaves differently on two very similar datasets (one just having less rows, see below). The first call gives the following for me: lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, L, K), cong=c(-1,0,1,11)), all=T, sort=TRUE)) [[1]] product congx 1F -1 5.857143 2F0 3.625000 3F1 4.782609 4F 11 6.301887 5G -1 7.30 6G0 4.80 7G1 4.424242 8G 11 5.781250 9K -1 4.375000 10 K0 4.714286 11 K1 3.804348 12 K 11 5.566038 13 L -1 7.272727 14 L0 6.25 15 L1 4.875000 16 L 11 6.877551 17 Y1 -1 5.857143 18 Y10 3.875000 19 Y11 3.535714 20 Y1 11 5.731707 21 Y2 -1 5.90 22 Y20 2.50 23 Y21 4.638889 24 Y2 11 5.419355 [[2]] product congx 1 Y1 -1 3.043478 2 Y10 4.887640 3 Y11 NA 4 Y1 11 NA 5 Y2 -1 4.181818 6 Y20 5.207921 7 Y21 NA 8 Y2 11 NA 9G -1 3.75 10 G0 5.68 11 G1 NA 12 G 11 NA 13 F -1 4.315789 14 F0 5.705263 15 F1 NA 16 F 11 NA 17 L -1 4.50 18 L0 6.386364 19 L1 NA 20 L 11 NA 21 K -1 3.739130 22 K0 4.967033 23 K1 NA 24 K 11 NA So different from what you may have observed, here the first data set [[1]] is sorted by label of product, not by value. As you correctly stated, Y1 is coded as 1, Y2 as 2, etc., but the first rows are for F, followed by G etc. The second [[2]] is sorted by level (value). So I have different behavior on very similar looking data sets, and hence to me at least one of those cannot be right according to documentation (but I agree with you that the second is correct according to the help). In my larger example, it seems as if data sets which do not originally have all combinations of product and cong anyway are sorted like [[2]], and those that are complete (all 24 combinations occur) are sorted like [[1]] is, which to me is still unexpected. Hope this clarifies my question. Any thoughts appreciated. Michael -Original Message- From: Rui Barradas [mailto:ruipbarra...@sapo.pt] Sent: Dienstag, 4. September 2012 14:01 To: Meyners, Michael Cc: r-help Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function Hello, Inline. Em 04-09-2012 12:24, Meyners, Michael escreveu: All, I realize from the archive that the sort argument in merge has been subject to discussion before, though I couldn't find an explanation for this behavior. I tried to simplify this to (kind of) minimal code from a real example to the following (and I have no doubts that there are smart people around achieving the same with smarter code :-)). I'm running R 2.15.1 64bit under MS Windows 7, full session info below. I do have a list with two dataframes: test - list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c(Y1, Y2, G, F, L, K), class = factor), cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286, 5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875, 2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429, 4.638889, 4.42424242424242, 4.78260869565217, 4.875, 3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125, 6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names = c(product, cong, x), row.names = c(NA, -24L), class = data.frame), structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c(Y1, Y2, G, F, L, K), class = factor), cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957, 4.18181818181818, 3.75, 4.31578947368421, 4.5, 3.73913043478261, 4.8876404494382, 5.20792079207921, 5.68, 5.70526315789474, 6.38636363636364, 4.96703296703297)), .Names = c(product, cong, x), row.names = c(NA, -12L), class = data.frame)) The dataframes are pretty much the same but for the values in the x- column and the fact that the second one has only half as many observations, missing the second half of the expand.grid if you like. Now if I run lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, L, K), cong=c(-1,0,1,11)), all=T
Re: [R] unexpected (?) behavior of sort=TRUE in merge function
Rui, yes, without all=T it works fine, but of course there is no point in the whole exercise if I'd drop that, as print(test) would do the same, unless I have other values of product or cong in any dataset, which I haven't. :-) The purpose of the merge is to have all combinations of the levels of product and cong in each dataframe in my list -- there might be smarter ways, but it does the trick, and I need a unified setup (layout, size and sorting of the data) to ease the following steps in my code. Of course, I could achieve that easily by sorting the data subsequently, so there are multiple ways to get what I want to have. However, the purpose being a bit beyond this post, it's really about the behavior here on data sets that look so similar, and about the fact that one of those is not treated like it should be according to documentation). Thanks again for taking the time to reply. Cheers, Michael -Original Message- From: Rui Barradas [mailto:ruipbarra...@sapo.pt] Sent: Dienstag, 4. September 2012 16:58 To: Meyners, Michael Cc: r-help Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function Hello, You're right I had missed the point, sorry. I can't see a reason why that behavior, but it seems to have to do with all = T, remove it and the problem is gone. But that's probably not what you want. NA's issue? Rui Barradas Em 04-09-2012 15:17, Meyners, Michael escreveu: Rui, Thanks for looking into this. I apologize, I should've added my output, maybe it looks differently on my machine than on others. I also should have made my question more explicit: I'm not looking for a solution to get the sorting one way or another, I have that already. I rather want to understand why the same code behaves differently on two very similar datasets (one just having less rows, see below). The first call gives the following for me: lapply(test, function(x) merge(x, expand.grid(product=c(Y1, Y2, G, F, L, K), cong=c(-1,0,1,11)), all=T, sort=TRUE)) [[1]] product congx 1F -1 5.857143 2F0 3.625000 3F1 4.782609 4F 11 6.301887 5G -1 7.30 6G0 4.80 7G1 4.424242 8G 11 5.781250 9K -1 4.375000 10 K0 4.714286 11 K1 3.804348 12 K 11 5.566038 13 L -1 7.272727 14 L0 6.25 15 L1 4.875000 16 L 11 6.877551 17 Y1 -1 5.857143 18 Y10 3.875000 19 Y11 3.535714 20 Y1 11 5.731707 21 Y2 -1 5.90 22 Y20 2.50 23 Y21 4.638889 24 Y2 11 5.419355 [[2]] product congx 1 Y1 -1 3.043478 2 Y10 4.887640 3 Y11 NA 4 Y1 11 NA 5 Y2 -1 4.181818 6 Y20 5.207921 7 Y21 NA 8 Y2 11 NA 9G -1 3.75 10 G0 5.68 11 G1 NA 12 G 11 NA 13 F -1 4.315789 14 F0 5.705263 15 F1 NA 16 F 11 NA 17 L -1 4.50 18 L0 6.386364 19 L1 NA 20 L 11 NA 21 K -1 3.739130 22 K0 4.967033 23 K1 NA 24 K 11 NA So different from what you may have observed, here the first data set [[1]] is sorted by label of product, not by value. As you correctly stated, Y1 is coded as 1, Y2 as 2, etc., but the first rows are for F, followed by G etc. The second [[2]] is sorted by level (value). So I have different behavior on very similar looking data sets, and hence to me at least one of those cannot be right according to documentation (but I agree with you that the second is correct according to the help). In my larger example, it seems as if data sets which do not originally have all combinations of product and cong anyway are sorted like [[2]], and those that are complete (all 24 combinations occur) are sorted like [[1]] is, which to me is still unexpected. Hope this clarifies my question. Any thoughts appreciated. Michael -Original Message- From: Rui Barradas [mailto:ruipbarra...@sapo.pt] Sent: Dienstag, 4. September 2012 14:01 To: Meyners, Michael Cc: r-help Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function Hello, Inline. Em 04-09-2012 12:24, Meyners, Michael escreveu: All, I realize from the archive that the sort argument in merge has been subject to discussion before, though I couldn't find an explanation for this behavior. I tried to simplify this to (kind of) minimal code from a real example to the following (and I have no doubts that there are smart people around achieving the same with smarter code :-)). I'm running R 2.15.1 64bit under MS Windows 7, full session info
Re: [R] How to measure level of similarity of two data frames
Kel, in addition, and depending on how you define similarity, you might want to look into the RV coefficient as a measure of it (it is actually related to a correlation, so similarity would rather mean similar information though not necessarily small Euclidean distance); coeffRV in FactoMineR would be one option to determine it. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Lamke Sent: Samstag, 26. Mai 2012 20:05 To: r-help@r-project.org Subject: [R] How to measure level of similarity of two data frames Hi group, I've been thinking of calculating euclidean distance between each column of a data frames that each consists of standardized numerical columns. However, I don't know if there's a way of summarizing the overall distance by some kind of metrics. If anyone know a proper way of doing so and/or a package I would greatly appreciate your suggestions. Thanks very much! Kel -- View this message in context: http://r.789695.n4.nabble.com/How-to- measure-level-of-similarity-of-two-data-frames-tp4631466.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric test for repeated measures and post-hoc single comparisons in R?
No, the authors are correct: the individuals (i.e. the 17 individuals) you have need to be independent (i.e. no correlation between them, let alone any individual running through your temporal experiment more than once, as indicated in the citation), while the *observations* are of course dependent as they are within the same subject (individual -- they have the same subject effect). Think of Friedman as a non-parametric 2-way ANOVA with one of the factors being subject; observations of the same subject are dependent, but once you include the subject effect, the errors are assumed to be independent (which implies that subjects need to be independent and should, e.g., not work on the assessment together). The imprecision is in your interpretation of individuals vs. observations. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of saschav...@gmail.com Sent: Montag, 20. Februar 2012 09:59 To: peter dalgaard Cc: r-help@r-project.org Subject: Re: [R] Non-parametric test for repeated measures and post-hoc single comparisons in R? Thanks, I got it! (And I think I should have googled what replicated means!) However, then Bortz, Lienert, Boehnke are imprecise, if not wrong: Der Friedman-Test setzt voraus, dass die N Individuen wechselseitig unabhängig sind, dass also nicht etwa ein und dasselbe Individuum zweimal oder mehrmals im Untersuchungsplan auftritt (p. 271). Which I (hope to) translate: The Friedman test requires the N individuals to be reciprocally independent, which means that one individual cannot occur twice or more times in the research design. *S* On 19.02.12 22:04, peter dalgaard wrote: Repeated measures means that you have multiple measurements on the same individual. Usually, the same person measured at different time points. So if you have N individuals and T times, then you can place your observations in an N*T layout. In this layout, you can have 1 observation per cell or R 1 observations. In the former case, the design is referred to as unreplicated. Got it? -pd On Feb 19, 2012, at 19:25 , saschav...@gmail.com wrote: Some attribute x from 17 individuals was recorded repeatedly on 6 time points using a Likert scale with 7 distractors. Which statistical test(s) can I apply to check whether the changes along the 6 time points were significant? set.seed( 123 ) x- matrix( sample( 1:7, 17*6, repl=T ), nrow = 17, byrow = TRUE, dimnames = list(1:17, paste( 'T', 1:6, sep='' )) ) I found the Friedman test and the Quade test for testing the overall hypothesis. friedman.test( x ) quade.test( x ) However, the R help files, my text books (Bortz, Lienert and Boehnke, 2008; Köhler, Schachtel and Voleske, 2007; both German), and the Wikipedia texts differ in what they propose as requirements for the tests. R says that data need to be unreplicated. I read 'unreplicated' as 'not-repeated', but is that right? If so, the example, in contrast, in friedman.test() appears to use indeed repeated measures. Yet, Wikipedia says the contrary that is to say the test is good especially if data represents repeated measures. The text books say either (in the same paragraph, which is very confusing). What is right? In addition, what would be an appropriate test for post-hoc single comparisons for the indication which column differs from others significantly? Bortz, Lienert, Boehnke (2008). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer Köhler, Schachtel, Voleske (2007). Biostatistik: Eine Einführung für Biologen und Agrarwissenschaftler. Berlin: Springer -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 sample wilcox.test != kruskal.test
The devil is in the details (and in the arguments in Lukasz code). The defaults for the two functions are different: wilcox.test uses an exact test (which is not available in kruskal.test afaik) for your data, and uses the continuity correction if the normal approximation is requested (neither available in kruskal.test). See the manual (in particular ?wilcox.test) for details, or the pertinent literature for the theoretical background. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Lukasz Reclawowicz Sent: Wednesday, January 11, 2012 12:00 To: syrvn Cc: r-help@r-project.org Subject: Re: [R] 2 sample wilcox.test != kruskal.test 2012/1/11 syrvn ment...@gmx.net Hi, thanks for your answer. Unfortunately I cannot reproduce your results. In my example the results still differ when I use your approach: x - c(10,11,15,8,16,12,20) y - c(10,14,18,25,28,30,35) f - as.factor(c(rep(a,7), rep(b,7))) d - c(x,y) kruskal.test(x,y) Try to compare wilcox.test and right formula in kruskal, I got: all.equal(wilcox.test(x,y, correct = F,exact=F)$p.value,kruskal.test(d~f)$p.value) [1] TRUE -- Mi³ego dnia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using a mathematical expression in sapply() XXXX
Dan, It depends on what you want to achieve. I suspect you just want to remove missing values before summing; if so, consider sapply(x, sum, na.rm=TRUE) instead. To make your code running, try sapply(x, function(x) sum(!is.na(x))) However, this would just count the number of non-missing values per subgroup, not the sum of the values. Similarly, sapply(x, function(x) sum(x, na.rm=TRUE)) would follow the same rationale but give the sums (with missing values removed), i.e. the same results as the first line of code. I'd rather choose the first option than this one, though. Of course, you could also define a new function sum1 - function(x) sum(!is.na(x)) # or probably rather sum(x, na.rm=TRUE), depending on your needs and then use sapply(x, sum1) but that seems a bit overkill, I'd say... HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Dan Abner Sent: Wednesday, January 04, 2012 14:41 To: r-help@r-project.org Subject: [R] Using a mathematical expression in sapply() Hello everyone, I have the following call to sapply() and error message. Is the most efficient way to deal with this to make sum(!is.na(x)) a function in a separate line prior to this call? If not, please advise. N.Valid=sapply(x,sum(!is.na(x))) Error in match.fun(FUN) : 'sum(!is.na(x))' is not a function, character or symbol Thanks! Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generalized procrustes analysis without translation
I’m not aware of any, but if you really want this, it should be possible to modify the code of any of the functions you propose and delete the part doing the translation. I’m not sure that this is a good idea, though; either your matrices are *truly* centered, then it doesn’t make a difference, or they are only centered in theory but not in practice due to random error, then the rotation and scaling will depend heavily on the deviation of the center of the matrix from the origin. Optimal scaling and rotation in GPA are independent from each other so might be (de-)selected depending on needs, but they depend on the translation, so it seems to make sense that you cannot deselect that part. You wouldn’t do scaling of two variables without centering them around 0 either, would you? You may transform them back if required, but scaling without centering first does not appeal to me, same for rotations. HTH, Michael From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mark Heckmann Sent: Wednesday, December 07, 2011 19:09 To: r-help@r-project.org Subject: [R] generalized procrustes analysis without translation I am searching for a GPA function/algorithm that allows to omit translation of matrices. procGPA in the shapes package does not have this option. I need scaling, rotation and reflection but no translation as my matrices are naturally centered. I am not sure if GPA from FactoMineR might be used for this. There are no arguments allowing to switch on the above mentioned arguments except for scaling. Frankly, I also do not understand how to set up the data for GPA properly, especially how to use the group argument. Does someone know another GPA function or has some code available? Thanks in advance Mark Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2 / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Meyners, Michael Sent: Tuesday, September 27, 2011 13:28 To: Michael Haenlein; r-help@r-project.org Subject: Re: [R] Pearson chi-square test Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
I suspect that the chisquare-test might not be appropriate, as you have constraints (same number of observations for A in both contingency tables). I further suspect that there is no test readily available for that, but I might be wrong. Maybe randomization tests could help here, but it would require a bit of thinking AND programming to accomplish that. chisq.test might give you an approximate solution, but I can't say how good this will be (and it might also depend on the data, btw). Best, Michael From: Michael Haenlein Sent: Tuesday, September 27, 2011 17:05 To: r-help@r-project.org Cc: Meyners, Michael Subject: RE: [R] Pearson chi-square test Dear Michael, Thanks very much for your answers! The purpose of my analysis is to test whether the contingency table x is different from the contingency table y. Or, to put it differently, whether there is a significant difference between the joint distribution AB and AC. Based on your answer I'm wondering whether the best way to do this is really a chisq.test? Or is there probably a different function or package I should use altogether? Thanks, Michael -Original Message- From: Meyners, Michael Sent: Dienstag, 27. September 2011 17:00 To: Michael Haenlein; r-help@r-project.org Subject: RE: [R] Pearson chi-square test Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2 / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Meyners, Michael Sent: Tuesday, September 27, 2011 13:28 To: Michael Haenlein; r-help@r-project.org Subject: Re: [R] Pearson chi-square test Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(SenoMineR)- Triangle Test Query
Vijayan, I cannot find an error in your code, but I had a look at the code of triangle.test -- unless I'm missing something, it contains a bug. If you study the way in which the matrix pref is updated, you find that the vector preference is compared to 1, 2 and 3 instead of X, Y and Z as it should be. That way, some of the non-diagonal entries of pref will always be zero, irrespective of the data, which does not make sense. I think it should work if you modify the code accordingly. Alternatively, a quick patch (untested!) might be to code preferences as 1, 2 and 3 instead of the letters (but I'm not sure whether this has any other implications). I CC the author of the function and maintainer of the package; he should correct me if needed or could otherwise update the code for the next release (I worked on SensoMineR 1.11). Hope this helps, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Vijayan Padmanabhan Sent: Saturday, June 04, 2011 9:21 To: r-help@r-project.org Subject: [R] library(SenoMineR)- Triangle Test Query Dear R Group I was trying to use the triangle.test function in SensoMineR and strangely i encounter a error in the output of preference matrix from the analysis. To illustrate, pl see the following dataframe of a design with the response and preference collected as shown below: design-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2, 4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1), `Product Z` = c(3, 2, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 3, 2), Response = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c(X, Z), class = factor), Preference = structure(c(1L, 3L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 2L), .Label = c(X, Y, Z), class = factor)), .Names = c(Product X, Product Y, Product Z, Response, Preference), class = data.frame, row.names = c(Panelist1.Test1, Panelist1.Test2, Panelist2.Test1, Panelist2.Test2, Panelist3.Test1, Panelist3.Test2, Panelist4.Test1, Panelist4.Test2, Panelist5.Test1, Panelist5.Test2, Panelist6.Test1, Panelist6.Test2, Panelist7.Test1, Panelist7.Test2, Panelist8.Test1, Panelist8.Test2, Panelist9.Test1, Panelist9.Test2, Panelist10.Test1, Panelist10.Test2)) If you were to investigate the above dataframe, you would find that for the comparision of Product 2 Vs Product 3, the preference indicates product 3 is preferred over product 2 all the time. ## Read output from the following script to see what i mean above: subset(design,`Product X`==2`Product Y`==3`Product Z`==3) ##Output of above would be: . Product X Product Y Product Z Response Preference Panelist3.Test2 2 3 3X Y Panelist6.Test2 2 3 3X Y Panelist9.Test2 2 3 3X Y However when I analyse the design with the answers and preferences using the following script, I get the $pref output which shows that product 2 is preferred over 3 all the time. Can somebody explain what is wrong in my script? answer-as.vector(design$Response) preference-as.vector(design$Preference) triangle.test (design[,1:3], answer,preference) ##$pref output from the triangle.test function shows as follows: $pref 1 2 3 4 1 0 0 0 0 2 4 0 3 0 3 0 0 0 0 4 0 0 0 0 Any help in helping me identify what is going wrong here would be highly appreciated. Regards Vijayan Padmanabhan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(SenoMineR)- Triangle Test Query
Unless I missed it, neither the OP nor the list was CC'd on this, so for anyone interested, I forward this solution (untested from my side) from the package maintainer. Not sure whether the file comes through, so I include the updated code in the message's body below. Cheers, Michael ** updated code for triangle.test triangle.test - function (design,answer,preference=NULL){ answer = gsub((\\w), \\U\\1, as.character(answer), perl=TRUE) labprod = levels(as.factor(c(as.character(design[,1]),as.character(design[,2]),as.character(design[,3] nbprod = length(labprod) nb.answer = nb.good = pref = matrix(0,nbprod,nbprod) for (i in 1:nrow(design)){ for (j in 1:nbprod){ if (labprod[j] == design[i,1]) i1 = j if (labprod[j] == design[i,2]) i2 = j if (labprod[j] == design[i,3]) i3 = j } if (i1!=i2) nb.answer [i1,i2] = nb.answer[i1,i2]+1 if (i1==i2) nb.answer [i1,i3] = nb.answer[i1,i3]+1 if ((i1==i2)(answer[i]==Z)){ nb.good[i1,i3]=nb.good[i1,i3]+1 if (length(preference)0){ if (preference[i]!=Z) pref[i3,i1] = pref[i3,i1] +1 if (preference[i]==Z) pref[i1,i3] = pref[i1,i3] +1 } } if ((i1==i3)(answer[i]==Y)){ nb.good[i1,i2]=nb.good[i1,i2]+1 if (length(preference)0){ if (preference[i]!=Y) pref[i2,i1] = pref[i2,i1] +1 if (preference[i]==Y) pref[i1,i2] = pref[i1,i2] +1 } } if ((i2==i3)(answer[i]==X)){ nb.good[i1,i2]=nb.good[i1,i2]+1 if (length(preference)0){ if (preference[i]!=X) pref[i1,i2] = pref[i1,i2] +1 if (preference[i]==X) pref[i2,i1] = pref[i2,i1] +1 } } } nb.good = nb.good + t(nb.good) nb.answer = nb.answer + t(nb.answer) diag(nb.answer)=1 prob = pbinom(nb.good-1,nb.answer,1/3,lower.tail=FALSE) maxML = recognize = minimum = matrix(NA,nbprod,nbprod) for (i in 1: (nbprod-1)){ for (j in (i+1):nbprod){ aux = matrix(0,nb.good[i,j]+1,1) for (k in 0:nb.good[i,j]) aux[k] = dbinom(nb.good[i,j]-k,nb.answer[i,j]-k,1/3) maxML[i,j] = maxML[j,i] = max(aux) recognize[i,j] = recognize[j,i] = rev(order(aux))[1]-1 mini = 0 for (k in round(nb.answer[i,j]/3):nb.answer[i,j]) if ((mini==0)(dbinom(k,nb.answer[i,j],1/3)0.05)) mini=k minimum[i,j]=minimum[j,i]=mini } } confusion = (nb.answer-recognize) / nb.answer diag(nb.answer)=diag(recognize)=0 diag(maxML)=diag(confusion)=1 rownames(nb.answer) = colnames(nb.answer) = rownames(nb.good) = colnames(nb.good) = labprod rownames(prob) = colnames(prob)= rownames(confusion) = colnames(confusion)= labprod rownames(maxML) = colnames(maxML) = rownames(minimum) = colnames(minimum) = rownames(recognize) = colnames(recognize) = labprod if (length(preference)0) rownames(pref) = colnames(pref) = labprod res = list() res$nb.comp = nb.answer res$nb.ident = nb.good res$p.value = prob res$nb.recognition = recognize res$maxML = maxML res$confusion = confusion res$minimum = minimum if (length(preference)0) res$pref = pref ##res$complete = result return(res) } ** end updated code for triangle.test -Original Message- From: Francois Husson Sent: Wednesday, June 08, 2011 14:43 To: Meyners, Michael Subject: Re: [R] library(SenoMineR)- Triangle Test Query Dear Vijayan, dear Michael, Indeed there was an error in the function triangle.test. I attach the new function. Thanks Michael for your answer. Best Francois Le 08/06/2011 12:32, Meyners, Michael a écrit : Vijayan, I cannot find an error in your code, but I had a look at the code of triangle.test -- unless I'm missing something, it contains a bug. If you study the way in which the matrix pref is updated, you find that the vector preference is compared to 1, 2 and 3 instead of X, Y and Z as it should be. That way, some of the non-diagonal entries of pref will always be zero, irrespective of the data, which does not make sense. I think it should work if you modify the code accordingly. Alternatively, a quick patch (untested!) might be to code preferences as 1, 2 and 3 instead of the letters (but I'm not sure whether this has any other implications). I CC the author of the function and maintainer of the package; he should correct me if needed or could otherwise update the code for the next release (I worked on SensoMineR 1.11). Hope this helps, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Vijayan Padmanabhan Sent: Saturday, June 04, 2011 9:21 To: r-help@r-project.org Subject: [R] library(SenoMineR)- Triangle Test Query Dear R Group I was trying to use the triangle.test function in SensoMineR and strangely i encounter a error in the output of preference matrix from the analysis. To illustrate, pl see the following dataframe of a design with the response and preference collected as shown below: design-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2, 4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4, 3, 1, 1, 4
Re: [R] help on permutation/randomization test
I suspect you need to give more information/background on the data (though this is not primarily an R-related question; you might want to try other resources instead). Unless I'm missing something here, I cannot think of ANY reasonable test: A permutation (using permtest or anything else) would destroy the correlation structure and hence give invalid results, and the assumptions of parametric tests are violated as well. Basically, you only have two observations, one for each group; with some good will you might consider these as repeated measurements, but still on the same subject or whatsoever. Hence no way to discriminate the subject from a treatment effect. There is not enough data to permute or to rely a statistical test on. So unless you can get rid of the dependency within groups (or at least reasonably assume observations to be independent), I'm not very optimistic... HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Wenjin Mao Sent: Monday, May 23, 2011 20:56 To: r-help@r-project.org Subject: [R] help on permutation/randomization test Hi, I have two groups of data of different size: group A: x1, x2, , x_n; group B: y1, y2, , y_m; (m is not equal to n) The two groups are independent but observations within each group are not independent, i.e., x1, x2, ..., x_n are not independent; but x's are independent from y's I wonder if randomization test is still applicable to this case. Does R have any function that can do this test for large m and n? I notice that permtest can only handle small (m+n22) samples. Thank you very much, Wenjin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer() with no intercept
John, Why would you want to fit the model without intercept if you seemingly need it? Anyway, I assume that the intercept from your first model just moves into the random effects -- you have intercepts there for worker and day, so any of these (or both) will absorb it. No surprise that the estimates for the covariates only differ slightly, it should be that way. What you plot (your second call to panel.lines) is not the correct model, as you omit the intercept from the work and day (which is 0 or at least pretty close to it if you include the overall intercept in your model). That's why your red line is on the top edge (note that the intercept is negative). I'm therefore not sure that the model without intercept makes a lot of sense, but you consider posting related questions rather to the mixed-models SIG, where you might get more erudite comments than from me. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of array chip Sent: Saturday, June 12, 2010 1:07 To: r-help@r-project.org Subject: [R] lmer() with no intercept Hi, I asked this before, but haven't got any response. So would like to have another try. thanks for help. Also tried twice to join the model mailing list so that I can ask question there, but still haven't got permission to join that list yet. === Hi, I am wondering how I can specify no intercept in a mixed model using lmer(). Here is an example dataset attached (test.txt). There are 3 workers, in 5 days, measured a response variable y on independent variable x. I want to use a quadratic term (x2 in the dataset) to model the relationship between y and x. test-read.table(test.txt,sep='\t',header=T) If I just simply use lm() and ignore worker and day, so that I can try both a linear regression with and without an intercept, here is what I get: lm(y~x+x2, data=test) Coefficients: (Intercept)x x2 -1.77491040.1099160 -0.0006152 lm(y~x+x2-1, data=test) Coefficients: x x2 0.0490097 -0.0001962 Now, I want to try mixed model considering worker and day as random effect. With an intercept: lmer(y~x+x2+(1|worker)+(1|day), data=test) Fixed effects: Estimate Std. Error t value (Intercept) -1.324e+00 4.490e-01 -2.948 x1.117e-01 8.563e-03 13.041 x2 -6.357e-04 7.822e-05 -8.127 Without an intercept: lmer(y~x+x2+(1|worker)+(1|day)-1, data=test) Fixed effects: Estimate Std. Error t value x 1.107e-01 8.528e-03 12.981 x2 -6.304e-04 7.805e-05 -8.077 It seems working fine. But if you look at the fixed effect coefficients of both mixed models, the coefficients for x and x2 are not much different, regardless of whether an intercept is included or not. This is not the case for simple linear regression using lm() on the top. If I plot all 4 models in the following plot: xyplot(y~x,groups=worker,test, col.line = grey, lwd = 2, , panel = function(x,y) { panel.xyplot(x,y, type='p') x-sort(x) panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) panel.lines(x,0.1107*x-0.0006304*x*x,col='red') panel.lines(x,0.04901*x-0.0001962*x*x,col='blue') panel.lines(x,-1.7749+0.10992*x-0.0006152*x*x,col='green') }) As you can see, the mixed model without intercept (red line) does not fit the data very well (it's at the top edge of the data, instead of in the middle of the data), so I guess I did something wrong here. Can anyone make any suggestions? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RWinEdt
I assume you installed WinEdt 6. See https://stat.ethz.ch/pipermail/r-help/2010-May/238540.html and related messages. RWinEdt does not work (yet) with WinEdt 6, so you'll have to downgrade back to WinEdt 5.x (or use another editor for the time being). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Yves Aragon Sent: Saturday, May 22, 2010 18:25 To: r-help@r-project.org Subject: [R] RWinEdt I have installed RWinEdt but when I try to load it, I get the message : Error in normalizePath(path) : path[1]=C:\Users\yves\AppData\Roaming\WinEdt/R.ini: The specified file cannot be found. I have the problem with R 2.11.0 and R 2.10.1, since I have re-installed WinEdt. Yves [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suppressing scientific notation on plot axis tick labels
Strange, the following works reproducibly on my machine (Windows 2000 Pro): options(scipen = 50, digits = 5) x = c(1e7, 2e7) ?barplot barplot(x) while I also get scientific with your code. After I called ?barplot once, I'm incapable of getting the scientific notation again, though... But I admit that I am outdated, I still run 2.8.1 (but not for long...) Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Shvorob Sent: Dienstag, 2. Februar 2010 12:59 To: r-help@r-project.org Subject: Re: [R] Suppressing scientific notation on plot axis tick labels options(scipen = 50, digits = 5) x = c(1e7, 2e7) barplot(x) Still scientific... -- View this message in context: http://n4.nabble.com/Suppressing-scientific-notation-on-plot-a xis-tick-labels-tp1459697p1459828.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Explanation w.r.t. rbind, please!
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Uwe Dippel Sent: Freitag, 29. Januar 2010 11:57 To: r-help@r-project.org Subject: [R] Explanation w.r.t. rbind, please! This is what I tried: num.vec - c(12.34,56.78,90.12,34.56) names(num.vec)-c(first,second,third,fourth) num.vec first second third fourth 12.34 56.78 90.12 34.56 seq-(1:4) num.mat-rbind(num.vec,seq) num.mat first second third fourth num.vec 12.34 56.78 90.12 34.56 seq 1.00 2.00 3.00 4.00 num.vec [3:4] third fourth 90.12 34.56 (until here I'm fine) num.mat [seq] [1] 12.34 1.00 56.78 2.00 What you (probably) want here is num.mat [seq,] num.mat [num.vec] [1] NA NA NA NA num.mat[num.vec,] and so on. You have to use tell R that you want the ROW (that's why the comma is needed) defined by the NAME seq or num.vec (that's why you need ) . Otherwise, R replaces seq by its value 1:4, so num.mat[seq] is identical to num.mat[1:4] (and num.mat[12.34] is NA of course in num.mat[num.vec]) You should have a closer look at ?Extract HTH, Michael num.vec [seq] first second third fourth 12.34 56.78 90.12 34.56 num.mat [num.vec] [1] NA NA NA NA num.mat [-seq] [1] 90.12 3.00 34.56 4.00 (and here I'm lost!) How could I display a row, instead of always seemingly falling back to columns? Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exact wilcox test and Bonferroni correction
?p.adjust Apply that to a vector containing all p values you get from wilcox.exact. Alternatively, multiply each p value by the number of comparisons you perform, see any textbook for that. You might want to consider a less conservative correction, though. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of netrunner Sent: Freitag, 22. Januar 2010 10:41 To: r-help@r-project.org Subject: [R] exact wilcox test and Bonferroni correction Hi! I need to perform a Mann-Whitney test using Bonferroni correction. I found the function pairwise.wilcox.test. However, my data have ties. I know that when there are ties is better to perform Mann Whitney test using the function wilcox.exact. How can include in that the Bonferroni correction? thank you for your help! netrunner -- View this message in context: http://n4.nabble.com/exact-wilcox-test-and-Bonferroni-correcti on-tp1099893p1099893.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exact wilcox test and Bonferroni correction
(i) you EITHER correct the p value (by multiplying by 8 in your case) OR you use the Bonferroni-threshold of 0.05/8, not both. If you correct the p values, your threshold remains 0.05. If you use 0.05/8, you use the original p values. (ii) Yes, if the p value is 0.15, then the corrected one for 8 tests is min(0.15*8, 1)=1 (i.e. all p values of 1/8 and higher will be 1 after correction) (iii) No, you can NOT AT ALL conclude that your samples are from the same distribution. You have no evidence that they are from different ones, but that does not imply that they come from the same (or a similar) one. In particular as the power of the test can be assumed to be pretty low. Regarding (i), it might be helpful to search the web for some information on Bonferroni correction, you could start with wikipedia, but there are other resources around. Regarding (iii), I would strongly recommend (re-)reading a good introduction into stats or consulting an expert statistician. I'm afraid you are missing some of the basics of statistical hypothesis testing here. To conclude similarity, you'd have to define first what exactly that means (you can NEVER show exact equality), and then construct some appropriate test for that. NB: I am sure that you won't succeed to show similarity with a reasonable definition if you have just 10 observations, though. And for completeness: If you were aiming at similarity, you should rather NOT correct for multiplicity (while that depends again on what exactly you want to show). But I think this goes far beyond the scope of this post now. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of netrunner Sent: Freitag, 22. Januar 2010 15:25 To: r-help@r-project.org Subject: Re: [R] exact wilcox test and Bonferroni correction Dear Michael, thank you very much for your help. I perfomed the wilcox.exact function on each of the 8 items for the two groups that I am analysing (that is, I performed 8 times the wilcox test). Here an example for the values (ratings from a questionnaire) of one of the 8 items: a1=5 9 10 10 10 10 10 10 10 10 b1=7 5 10 NaN 10 10 8 10 10 8 9 9 wilcox.exact(a1,b1, alternative=two.sided, mu=0, paired=FALSE, exact=TRUE, conf.level=0.95) I obtained: data: a1 and b1 W = 73.5, p-value = 0.1514 alternative hypothesis: true mu is not equal to 0 Then I adjusted p-values using p.adjust. For the example above the p-bonferroni value was 1. The threshold p-value is 0.00625 (that is 0.05/8) Finally, because p-bonferroni 0.00625 can I conclude that for each item my samples are from the same distribution? I am a little bit confused thank you! netrunner -- View this message in context: http://n4.nabble.com/exact-wilcox-test-and-Bonferroni-correcti on-tp1099893p1100077.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutations from vectors out of a matrix
Etienne, I don't see the point in avoiding some 'special' packages. If you are willing to change your mind in this regard, try something like library() -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen Sent: Montag, 18. Januar 2010 19:20 To: r-help@r-project.org Subject: [R] permutations from vectors out of a matrix Dear R-Users, I'm relativley new to R and have the following problem. I need all permutations of the vectors created by the collumns of a matrix. I will give a small example: p=3 n=2^p-1 #number of obtainable vectors mat=matrix(0,p,p) for(i in 1:p) { mat[i:1,i]=1/i } mat is now a quadratic matrix and n is the number of the vectors I try to get when I compute all permutations of the vectors built by the individual columns. It should work for all quadratic matrix and I want to avoid using some 'special' packages. In the example I need the following vectors at the end: (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); (0,0.5,0.5); (1/3,1/3,1/3). I hope my intention becomes clear. I'm looking foward to any ideas and clues that might help me. Thanks in advance and best regards. Etienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutations from vectors out of a matrix
Sorry, wrong button. Below a hopefully more helpful solution... Etienne, I don't see the point in avoiding some 'special' packages. If you are willing to change your mind in this regard, try one of the following solutions that work for me: library(combinat) apply(mat, 2, function(x) unique(permn(x))) # each object in the list contains the permutations from once column of mat: apply(mat, 2, function(x) do.call(rbind, unique(permn(x # all vectors you wanted in one matrix; note that they are in the rows, so you might want to transpose this: do.call(rbind, apply(mat, 2, function(x) do.call(rbind, unique(permn(x) Not sure about the size of your original problem, though, it might take a while. If you still want to avoid the (small!) package, you might consider copying the code for permn from combinat to define the function within your file. I guess it works (but didn't check) as it does not seem to require any of the other functions of the package. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen Sent: Montag, 18. Januar 2010 19:20 To: r-help@r-project.org Subject: [R] permutations from vectors out of a matrix Dear R-Users, I'm relativley new to R and have the following problem. I need all permutations of the vectors created by the collumns of a matrix. I will give a small example: p=3 n=2^p-1 #number of obtainable vectors mat=matrix(0,p,p) for(i in 1:p) { mat[i:1,i]=1/i } mat is now a quadratic matrix and n is the number of the vectors I try to get when I compute all permutations of the vectors built by the individual columns. It should work for all quadratic matrix and I want to avoid using some 'special' packages. In the example I need the following vectors at the end: (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); (0,0.5,0.5); (1/3,1/3,1/3). I hope my intention becomes clear. I'm looking foward to any ideas and clues that might help me. Thanks in advance and best regards. Etienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutations from vectors out of a matrix
Well, seems it's an assignment, so you should REALLY get them on your own and not enquire the list. Foolish me... M. -Original Message- From: einohr2...@web.de [mailto:einohr2...@web.de] Sent: Mittwoch, 20. Januar 2010 19:32 To: Meyners,Michael,LAUSANNE,AppliedMathematics; r-help@r-project.org Subject: Re: [R] permutations from vectors out of a matrix Meyners,Michael,LAUSANNE,AppliedMathematics schrieb: Sorry, wrong button. Below a hopefully more helpful solution... Etienne, I don't see the point in avoiding some 'special' packages. If you are willing to change your mind in this regard, try one of the following solutions that work for me: library(combinat) apply(mat, 2, function(x) unique(permn(x))) # each object in the list contains the permutations from once column of mat: apply(mat, 2, function(x) do.call(rbind, unique(permn(x # all vectors you wanted in one matrix; note that they are in the rows, so you might want to transpose this: do.call(rbind, apply(mat, 2, function(x) do.call(rbind, unique(permn(x) Not sure about the size of your original problem, though, it might take a while. If you still want to avoid the (small!) package, you might consider copying the code for permn from combinat to define the function within your file. I guess it works (but didn't check) as it does not seem to require any of the other functions of the package. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen Sent: Montag, 18. Januar 2010 19:20 To: r-help@r-project.org Subject: [R] permutations from vectors out of a matrix Dear R-Users, I'm relativley new to R and have the following problem. I need all permutations of the vectors created by the collumns of a matrix. I will give a small example: p=3 n=2^p-1 #number of obtainable vectors mat=matrix(0,p,p) for(i in 1:p) { mat[i:1,i]=1/i } mat is now a quadratic matrix and n is the number of the vectors I try to get when I compute all permutations of the vectors built by the individual columns. It should work for all quadratic matrix and I want to avoid using some 'special' packages. In the example I need the following vectors at the end: (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); (0,0.5,0.5); (1/3,1/3,1/3). I hope my intention becomes clear. I'm looking foward to any ideas and clues that might help me. Thanks in advance and best regards. Etienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - --- Eingehende eMail ist virenfrei. Von AVG uberpruft - www.avg.de Version: 9.0.730 / Virendatenbank: 270.14.150/2632 - Ausgabedatum: 01/19/10 08:34:00 Hey Michael, thanks a lot for your answer. I hope I can manage it now. The problem with the 'small' package is, that I have to solve the task without using any package as a 'black box'. Therefore I need to get the permutations on my own. Best regards and thanks again Etienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantreg - 'could not find functionrq'
Are you sure you called library(Quantreg) before calling any function? M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gough Lauren Sent: Donnerstag, 7. Januar 2010 13:44 To: r-help@r-project.org Subject: [R] Quantreg - 'could not find functionrq' Hi all, I'm having some troubles with the Quantreg package. I am using R version 2.10.0, and have downloaded the most recent version of Quantreg (4.44) and SparseM (0.83 - required package). However, when I try to run an analysis (e.g. fit1-rq(y~x, tau=0.5)) I get an error message saying that the function rq could not be found. I get the same message when I try to search for any of the other functions that should be in the package (e.g. help(anova.rq)). I used this package last year and still have the old versions of both quantreg and SparseM saved on my machine. When I load these older packages and attempt an analysis (which, incidentally, worked fine last February) I get the same error messages as I do with the new versions. The only think I have changed since last February is the version of R I am using - is there any reason why quantreg wouldn't work in version 2.10.0? I'm not very experienced with R so I'm struggling to work out what may be going wrong - does anyone have any suggestions!? Many thanks Lauren --___ Lauren Gough - Postgraduate Research Student University of Nottingham, School of Geography, Nottingham, NG7 2RD Tel: +44 (0)115 8467052 Email: lgx...@nottingham.ac.uk mailto:lgx...@nottingham.ac.uk P Consider the environment. Please don't print this e-mail unless you really need to. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantreg - 'could not find functionrq'
Frankly, I have never even heard about this package before and have no clue what it does, but the error messages nevertheless suggested a missing call of library. And I didn't bother to check the spelling, my fault. So I guess you are right... I still didn't check :-) So to the OP: try library(asdf) with asdf replaced by the correct package name, and mind the spelling! Michael -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Donnerstag, 7. Januar 2010 14:22 To: Meyners,Michael,LAUSANNE,AppliedMathematics Cc: Gough Lauren; r-help@r-project.org Subject: Re: [R] Quantreg - 'could not find functionrq' On Jan 7, 2010, at 8:07 AM, Meyners, Michael, LAUSANNE, AppliedMathematics wrote: Are you sure you called library(Quantreg) before calling any function? M. Are you both using a package with the exact spelling Quantreg? I would have expected it to be : library(quantreg) -- David. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gough Lauren Sent: Donnerstag, 7. Januar 2010 13:44 To: r-help@r-project.org Subject: [R] Quantreg - 'could not find functionrq' Hi all, I'm having some troubles with the Quantreg package. I am using R version 2.10.0, and have downloaded the most recent version of Quantreg (4.44) and SparseM (0.83 - required package). However, when I try to run an analysis (e.g. fit1-rq(y~x, tau=0.5)) I get an error message saying that the function rq could not be found. I get the same message when I try to search for any of the other functions that should be in the package (e.g. help(anova.rq)). I used this package last year and still have the old versions of both quantreg and SparseM saved on my machine. When I load these older packages and attempt an analysis (which, incidentally, worked fine last February) I get the same error messages as I do with the new versions. The only think I have changed since last February is the version of R I am using - is there any reason why quantreg wouldn't work in version 2.10.0? I'm not very experienced with R so I'm struggling to work out what may be going wrong - does anyone have any suggestions!? Many thanks Lauren --___ Lauren Gough - Postgraduate Research Student University of Nottingham, School of Geography, Nottingham, NG7 2RD Tel: +44 (0)115 8467052 Email: lgx...@nottingham.ac.uk mailto:lgx...@nottingham.ac.uk P Consider the environment. Please don't print this e-mail unless you really need to. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] updating arguments of formulae
Moreno, I leave the discussion on the mixed models to others (you might consider the SIG group on mixed models as well for this), but try a few hints to make your code more accessible: * The . in updating a formula is substituted with the respective old formula (depending on the side), but is not mandatory. You could give the new formula explicitly, i.e. consider something like model1 = update(model, . ~ (1 |Sb2) + OS) if you loose control about your models. See ?update.formula * I don't see the need for using your construct with as.formula(paste()), this makes things unnecessarily complicated. See my above example, which should work as well on your data (and see ?update) * There is also the - operator available in update.formula to remove terms (because it uses formula, see ?formula). As to your question on how to move from depM ~ OS + (1 + OS | Sb2) to depM ~ OS + VR + (1 + OS + VR | Sb2) try something like update(model1, .~. - (1 + OS|Sb2) + VR + (1 + OS + VR | Sb2)) while it goes without saying that in this case, it would be easier to drop the . and use something like update(model1, .~ OS + VR + (1 + OS + VR | Sb2)) directly. * paste accepts more than just two arguments to be pasted: Try somthing like model2 = update(model1, as.formula(paste(. ~ . + (1 + , OS, | , Sb2, ))) instead of your construct with several nested calls to paste, and see ?paste. (Note that I added quotes to OS and Sb2, it didn't work for me otherwise as I have no object OS, not sure what happens if you have such an object on our search path, but I would suspect you encounter problems as well.) If you work yourself through these and thereby simplify your code, you are more likely to get responses to your questions on which model to use (which is actually independent from the use of update). As far as I see it, it doesn't make sense to use a formula like in your model4, but the mixed model experts might tell me wrong (and I got a bit lost in your code as well). Please also try to provide commented, minimal, self-contained, reproducible code for further enquiries (use e.g. one of the examples on ?lmer to create appropriate examples for your questions). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Moreno Ignazio Coco Sent: Donnerstag, 10. Dezember 2009 13:35 To: R-help@r-project.org Subject: [R] updating arguments of formulae Dear R-Community, I am relatively new with R, so sorry for things which for you might be obvious... I am trying to automatically update lmer formulae. the variables of the model are: depM= my dependent measure Sb2= a random factor OS = a predictor VR= another predictor So, I am building the first model with random intercept only: model = lmer(depM ~ (1 |Sb2)) then I update the formula adding the first predictor model1 = update(model, as.formula(paste(. ~ . + , OS))) the resulting formula will be: depM ~ (1 |Sb2) + OS let suppose now I want to update the model to have OS both as a fixed effect and in the random term, something like: depM ~ (1 + OS |Sb2) + OS I can do something very ugly (please tell me if there is a more elegant way to do it) that looks like: model2 = update(model1, as.formula(paste(paste(paste(paste(. ~ . + (1 + , OS), | ), Sb2), the resulting model2 formula will be: depM ~ (1 |Sb2) + OS + (1 + OS | Sb2) one first thing I am wondering at this point is whether having (1 |Sb2) and (1 + OS | Sb2) in the same expression is redundant. in the output it will obviously tell me that group Sb2 is considered twice: number of obs: 6514, groups: Sb2, 23; Sb2, 23 and i am not sure if am doing it correctly...any advice? So let suppose now I want to add the new predictor VR again both in the fixed and in the random part of the formula. If i just repeat the two steps above: model3 = update(model2, as.formula(paste(. ~ . + , VR))) and then: model4 = update(model3, as.formula(paste(paste(paste(paste(. ~ . + (1 + , VR), | ), Sb2), the formula I get is: depM ~ (1 |Sb2) + OS + (1 + OS | Sb2) + VR + (1 + VR | Sb2) so, basically I am adding new stuff on the right side of the formula... My first question at this point is whether the above formula is equivalent to: depM ~ OS + VR + (1 + OS + VR | Sb2) if is not equivalent, which one of the two is correct? obviously in the second case, group Sb2, is considered only once. If the second version of the formula is the correct one, I don't understand how I can update arguments inside the formula rather than adding things on his right side... thus, in the ideal case, how do I go from something like this: depM ~ OS + (1 + OS | Sb2) to something like this: depM ~ OS + VR + (1 + OS + VR | Sb2) Thanks a lot for your help, Best, Moreno -- The University of Edinburgh is
Re: [R] by function ??
Or, more general (if you need to include more than just one variable from TestData), something like by(TestData, LEAID, function(x) median(x$RATIO)) Agreed, this is less appealing for the given example than Ista's code, but might help to better understand by and to generalize its use to other situations. Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ista Zahn Sent: Mittwoch, 9. Dezember 2009 02:54 To: L.A. Cc: r-help@r-project.org Subject: Re: [R] by function ?? Hi, I think you want by(TestData[ , RATIO], LEAID, median) -Ista On Tue, Dec 8, 2009 at 8:36 PM, L.A. ro...@millect.com wrote: I'm just learning and this is probably very simple, but I'm stuck. I'm trying to understand the by(). This works. by(TestData, LEAID, summary) But, This doesn't. by(TestData, LEAID, median(RATIO)) ERROR: could not find function FUN HELP! Thanks, LA -- View this message in context: http://n4.nabble.com/by-function-tp955789p955789.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] csimtest function in multcomp package
Daniel, quick guess: There has been a major change in the package some time ago, with quite a few functions vanishing and others added. So I guess your recycled code was based on an older version of multcomp. My current version (1.1-0) does not have csimtest anymore either. I guess you want to look into ?glht in the same package, and check the archives to find out more about the changes (and yes, you will have to modify your code to some extent, at least so had I, it's not just replacing the function name). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of d.m.perk...@qmul.ac.uk Sent: Freitag, 4. Dezember 2009 16:47 To: r-help@r-project.org Subject: [R] csimtest function in multcomp package Hello all, Quick question: I want to do posthoc contrasts for a linear mixed effects model. However, when trying to use the csimtest function in multcomp package I receive an error message saying it cannot find the function, even after installing and loading package multcomp. Any pointers would be greatly appreciated Daniel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Learning R
Julia, see http://www.r-project.org/ - Documentation - Manuals (- An introduction to R) (or use: http://cran.r-project.org/manuals.html) for a starting point. In addition, you might want to check the annotated list of books to see which one might best fit your needs: http://www.r-project.org/doc/bib/R-books.html You might also want to browse through the FAQs at some stage, you can learn a lot of useful things there. Or look additionally into the Wiki, if that's more your style of learning. You see, plenty of options from which you'd need to select the one that best meets your requirements. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Julia Cains Sent: Montag, 30. November 2009 08:23 To: r-help@r-project.org Subject: [R] Learning R Dear R helpers, Almost 15 days back I have become member of this very active and wonderful group. So far I have been only raising queries and in turn got them solved too and I really thank for the spirit this group member show when it comes to the guidance. I wish to learn R language and I have given 2 months time for this. Can anyone please guide me as how do I begin i.e. from basics to advance. R is such a vast thing to learn, so I wish to learn it step by step without getting lost at any stage. Please guide me where do I start and upgrade myself to higher level step by step. Regards Julia Only a man of Worth sees Worth in other men [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected results in comparison (x == y)
FAQ 7.31: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-the se-numbers-are-equal_003f -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Tillmann Sent: Mittwoch, 4. November 2009 13:50 To: r-help@r-project.org Subject: [R] unexpected results in comparison (x == y) Dear readers of the list, I have a problem a comparison of two data from a vector. The comparison yields FALSE but should be TRUE. I have checked for mode(), length() and attributes(). See the following code (R2.10.0): --- # data vector of 66 double data X = matrix(c(41.41,38.1,39.22,38.1,47.29,46.82,82.46,90.11,45.24,4 5.74,49.96,53.40,38.20,42.65,45.41,47.92,39.82,42.02,48.17,49. 47,39.67,43.89,47.55,50.05,35.75,37.41,46.13,53.64,52.18,56.30 ,45.15,47.13,41.57,39.08,43.39,44.73,49.38,47.00,45.67,50.53,4 1.08,44.22,49.28,47.83,49.48,46.04,48.37,47.00,33.96,36.30,49. 40,46.44,24.40,24.79,41.55,46.26,37.43,39.88,40.63,38.64,49.92 ,50.19,47.88,48.61,43.73,44.18),ncol=1) i = dim(X) # calculating pairwise differences for each entry in X Y = matrix(rep(0.0,i[1]*i[1]),ncol=i[1]) for (j in 1:i[1]) { Y[j,] = X - X[j,] } # getting pairwise absolute differences to vector Z and selecting only (xj - xk), omitting (xk - xj) # and (xj - xj) i.e. diagonal vector Z = rep(0, ((i[1]*i[1])-i[1])/2) nn - 1 for (j in 1:i[1]) { for (k in 1:i[1]) { if (j k) { Z[nn] - abs(Y[j,k]) nn - nn + 1 } } } nn - nn - 1 # sorting Z ascending to ZZ, to determine double entries with same difference ii - 0 ZZ - sort(Z) # here the problem occurs: ZZ[1:10] ZZ[4] ZZ[5] ZZ[4] == ZZ[5] mode(ZZ[4]) mode(ZZ[5]) length(ZZ[4]) length(ZZ[5]) attributes(ZZ[4]) attributes(ZZ[5]) --- I get: ZZ[1:10] [1] 0.00 0.00 0.01 0.02 0.02 0.02 0.04 0.04 0.04 0.05 ZZ[4] [1] 0.02 ZZ[5] [1] 0.02 ZZ[4] == ZZ[5] [1] FALSE mode(ZZ[4]) [1] numeric mode(ZZ[5]) [1] numeric length(ZZ[4]) [1] 1 length(ZZ[5]) [1] 1 attributes(ZZ[4]) NULL attributes(ZZ[5]) NULL Which is ok, except for ZZ[4] == ZZ[5]. Can someone please give me an advice where to look? In real world situations the original vector (X) will contain upto 100 entries. -- View this message in context: http://old.nabble.com/unexpected-results-in-comparison-%28x-%3 D%3D-y%29-tp26195749p26195749.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fixed color scale for a series of contour plots
You don't tell us which function you use, but fixing the zlim argument in image or related functions should do the trick. M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Marion Dumas Sent: Dienstag, 3. November 2009 14:38 To: r-help@r-project.org Subject: [R] fixed color scale for a series of contour plots Hello, I am having trouble figuring out how to create a color scale that will be consistent across multiple contour plots. What I would like is to plot a series of contour or image plots for different subsets of a data set but make sure that the color key stays fixed so that the plots are easily comparable. That is I need a map from the range of values of the entire data set to the color scale and keep then use this color scale for all plots + be able to display the color bar as a legend next to the plot. This sounds trivial, but I can't find useful hints in the documentation of the color arguments of the different plots. Thanks so much for any help Best Marion __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop and plot
Or you open a new graphics window / device as part of the loop, see, e.g., ?windows. Alternatively, you may write the content of the graphics device to a file for each iteration, see, e.g., ?savePlot (but you'd want to make sure that you have a different filename in each iteration, otherwise, you'll again have only one figure...). Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of joris meys Sent: Montag, 19. Oktober 2009 13:12 To: Rene Cc: r-help@r-project.org Subject: Re: [R] loop and plot Hi Rene, the problem is probably due to the fact that R will send all plots to the same graphical output window. Each next plot just replaces the previous one. if it's only a few plots, you can divide the graphical window with the commands par(mfrow=...) (see ?par) or layout(matrix(...)) (see ?layout). Otherwise you have to ask the window to wait before refreshing. par(ask=TRUE) (see ?par as well) Hope this helps Cheers Joris On Mon, Oct 19, 2009 at 12:14 PM, Rene kaixinma...@gmail.com wrote: Dear all, I am stuck at applying loop function for creating separated plots. I have coding like below: dataset.table - table(data.frame(var1=c(1,2,3,1,2,3,1),colour=c(a,b,c,c,a,b ,b) )) kk = function(f) { ls=as.character(f) pie(dataset.table[ls,],main=ls) box() } kk(1) kk(2) kk(3) By using above code, I can create 3 single plot respectively, but when I type kk(1:3), obviously it will not work. I know I have to vectorise the coding, then I can use command kk(1:3). I try to use loop: kk = function(f) { ls=as.character(f) for (i in length(f)) { pie(dataset.table[ls[i],],main=ls[i]) box() } } kk(1:3) the above code only gives me the last pie plot (ie. kk(3) plot) instead of 3 plots respectively. Can someone please guide me how to revise the loop coding, and produce 3 separated plots one after another on the screen by typing kk(1:3)? Thanks a lot. Rene. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating p-values by row for data frames
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Christoph Heuck Sent: Donnerstag, 15. Oktober 2009 17:51 To: r-help@r-project.org Subject: [R] calculating p-values by row for data frames Hello R-users, I am looking for an elegant way to calculate p-values for each row of a data frame. My situation is as follows: I have a gene expression results from a microarray with 64 samples looking at 25626 genes. The results are in a data frame with the dimensions 64 by 25626 I want to create a volcano plot of difference of means vs. -log(10) of the p-values, comparing normal samples to abnormal samples. The results of both type of samples are all in my data frame. Now, I have found a way to calculate the p-value using a for (i in 1:25626) loop (see below): df.normal #dataframe, which only contains the normal samples df.samples #dataframe, which only contains abnormal samples DM=rowMeans(df.normal)-rowMeans(df.samples) #gives me a dataframe with the difference of means PV=array(1,c(25626,1)) for (i in 1:25626){ VL=t.test(matrix.b[i,],matrix.a[i,]) V=as.numeric(VL[3]) V=-log10(V) PV[i,1]=V} plot(DM, PV, main=title,xlab=x.lab, ylab=-log(10) P-Values,pch=20)} It takes around 3-5 minutes to generate the volcano plot this way. I will be running arrays which will look at 2.2 million sites this approach will then take way too long. I was wondering if there is a more elegant way to calculate the p-values for an array/fataframe/matrix in a row-by row fashion, which is similar to rowMeans. I thought writing a function to get the p-value and then using apply(x,1,function) would be the best. I have the function which will give me the p-value p.value = function (x,y){ PV=as.numeric(t.test(x,y)[3]) } and I can get a result if I test it only on one row (below is 6 by 10 data frame example of my original data) RRR X259863X267862 X267906X300875 X300877 X300878 MSPI0406S0183 -3.2257205 -3.2248899 2.85590082 -2.6293602 -3.5054348 -2.62817269 MSPI0406S0238 -2.6661903 -3.1135020 2.17073881 -3.2357307 -2.3309775 -1.76078452 MSPI0406S0239 -1.7636439 -0.6702877 0.19471126 -0.7397132 -1.4332662 -0.24822470 MSPI0406S0300 0.6471381 -0.2638928 -0.61876054 -0.9180127 0.2539848 -0.63122203 MSPI0406S0301 0.9207208 0.2164267 -0.33238846 -1.1450717 -0.2935584 -1.01659802 MSPI0406S0321 -0.4073272 -0.2852402 -0.08085746 -0.4109428 -0.2185432 -0.39736137 MSPI0406S0352 -0.7074175 -0.6987548 -1.22004647 -0.8570551 -0.5083861 -0.09267928 MSPI0406S0353 -0.2745682 0.3012990 -0.64787221 -0.5654195 0.4265007 -0.65963404 MSPI0406S0354 -1.1858394 -1.4388609 -0.07329722 -2.0010785 -1.3245696 -1.43216984 MSPI0406S0360 -1.4599809 -1.4929059 0.63453235 -1.1476760 -1.5849922 -1.03187399 zz=p.value(RRR[1,1:3],RRR[1,4:6]) zz $p.value [1] 0.485727 but I cannot do this row by row using apply xxx=apply(RRR,1,p.value(RRR[,1:3],RRR[,4:6])) xxx - apply(RRR, 1, function(x) p.value(x[1:3],x[4:6])) works for me. Check the examples in ?apply. HTH, Michael Error in match.fun(FUN) : 'p.value(RRR[, 1:3], RRR[, 4:6])' is not a function, character or symbol Does anyone have any suggestions? Thanks in advance Christoph Heuck Albert Einstein College of Medicine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reference on permutation test
It's always worthwhile to look at the articles by Pitman (and maybe the textbook by Fisher, if you have access to it); Welch is a nice paper, too, but might be pretty technical to learn about the area. I don't know any of the textbooks except Edgington (which is in its 4th edition now with co-author P Onghena), a book I can wholeheartedly recommend. The authors explain the basic concepts in a way that should even be accessible to non-statisticians, I believe. They also cover a lot of special cases, and give exhaustive theoretical background for those interested (you can easily skip these parts if you are not). It goes along with a CD with some programs to run these test, though I did not use it so far -- you can do it yourself in R, of course :-) I am not sure whether they cover any other resampling methods at all, but if so, only very briefly, so you'd need another reference for that. Efron Tibshirani is a classical, while I heard some people recommending Davison Hinkley -- I didn't find the time yet to look more closely into the latter. Just my 2 cents, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peng Yu Sent: Mittwoch, 14. Oktober 2009 04:58 To: r-h...@stat.math.ethz.ch Subject: [R] reference on permutation test I want learn permutation test and resampleing, etc. There are a few references listed below. I'm wondering what is the best book on this topic. Can somebody give me some advice. Thank you! http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Permutation_tes t __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post-hoc test with kruskal.test()
Robert, you can do the corresponding paired comparisons using wilcox.test. As far as I know, there is no such general correction as Tukey's HSD for the Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 paired comparisons), the intersection-union principle and the theory of closed test procedures should allow you to do these test without further correction, given the global test was statistically significant. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 09:17 To: r-help@r-project.org Subject: [R] post-hoc test with kruskal.test() Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis-test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? Many thanks in advance! Robert ___ Robert M. Kalicki, MD Postdoctoral Fellow Department of Nephrology and Hypertension Inselspital University of Bern Switzerland Address: Klinik und Poliklinik für Nephrologie und Hypertonie KiKl G6 Freiburgstrasse 15 CH-3010 Inselspital Bern Tel +41(0)31 632 96 63 Fax+41(0)31 632 14 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post-hoc test with kruskal.test()
Robert, What do you mean by not symmetric? If you mean unbalanced in terms of sample size, that's not a problem if you choose the right specifications for wilcox.test. The Kruskal-Wallis-Test is a generalization of the unpaired Wilcoxon test for more than two groups. Not sure whether kruskal.test works with just two groups, but if so, it should give the same results as wilcox.test if you set the arguments accordingly. Having said that, I should mention that unlike some normality-based post-hoc tests, the proposed approch is not based on a common error term. The paired comparisons will ignore the fact that you had a third group, and this will in particular result in (possibly quite) different power of the three comparisons, depending on the sample sizes and the noise given in just these two groups. I wouldn't know what to do about that, though. Michael -Original Message- From: Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 14:11 To: Meyners,Michael,LAUSANNE,AppliedMathematics Subject: RE: [R] post-hoc test with kruskal.test() Hi Michael, Thank you very much for your clear and prompt answer. Is it still valid if I use an unpaired comparison with wilcox.test() since my groups are not symmetric. Many thanks Robert -Message d'origine- De : Meyners,Michael,LAUSANNE,AppliedMathematics Envoyé : mercredi 14 octobre 2009 10:30 À : Robert Kalicki; r-help@r-project.org Objet : RE: [R] post-hoc test with kruskal.test() Robert, you can do the corresponding paired comparisons using wilcox.test. As far as I know, there is no such general correction as Tukey's HSD for the Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 paired comparisons), the intersection-union principle and the theory of closed test procedures should allow you to do these test without further correction, given the global test was statistically significant. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 09:17 To: r-help@r-project.org Subject: [R] post-hoc test with kruskal.test() Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis-test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? Many thanks in advance! Robert ___ Robert M. Kalicki, MD Postdoctoral Fellow Department of Nephrology and Hypertension Inselspital University of Bern Switzerland Address: Klinik und Poliklinik für Nephrologie und Hypertonie KiKl G6 Freiburgstrasse 15 CH-3010 Inselspital Bern Tel +41(0)31 632 96 63 Fax+41(0)31 632 14 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probability of data values form DENSITY function
Lina, check ?density (and do so carefully). R uses a kernel estimate, by default Gaussian if I remember correctly. Values in a certain grid can be found from the code I sent earlier. I didn't check, but these are most likely just linearly interpolated by plot.density, and as the grid is sufficiently tight, it visually looks smooth. You could mimick this by linearly interpolating from the two closest neighbours for any new x values (or, as a rough approximation, choosing the density value of the nearest neighbour). I am not sure whether there are some specificities for time series in general or GARCH models in particular, that's not my area of expertise. Maybe there is some method for GARCH objects in the library you are using (you don't tell me which one). Try str(density(x)) from your original post to see whether you might have to adapt the proposed way to get these values. I copy the list in case someone can give more details on that. Two side remarks: * You should note that the density function in R takes a lot of parameters that might heavily influence the outcome, and even though R will propose some defaults that will work frequently, they might not be optimal or even appropriate for your particular situation, model, and data. * From your messages I fear that you have a limited understanding of the notion of density functions, at least for continuous distributions; they do NOT give the probability that a certain value is taken. Try plot(x-seq(-1,1,0.01), dnorm(x, sd=.25), type=l) to see that the density can easily exceed 1 and can hence not represent a probability. In fact, the theoretical probability that a specific value is actually taken from a continuous distribution is always equal to 0. I would strongly suggest to contact a local statistician to clarify these issues. HTH, Michael From: Lina Rusyte [mailto:liner...@yahoo.co.uk] Sent: Dienstag, 29. September 2009 18:03 To: Meyners,Michael,LAUSANNE,AppliedMathematics Subject: RE: [R] Probability of data values form DENSITY function Dear Mr. Michael Meyners, Assume, that I have simulated data from GARCH process: spec = garchSpec(model = list(omega = 0.01, alpha = 0.13, beta = 0.86, shape = 6), cond.dist = std) yt-garchSim(spec, n = 4000) The empirical distribution of this data (yt) is assumed to be not parametrized distribution). This data is my in-sample data. I have another set of the data (yt1), generated from the same process (but it is not the same as yt). I need to find the probabilities of these data (yt1) points in previously mentioned empiric distribution (of yt). Simply to say, having continuous EMPIRIC density function, I need for some points in x axis (values) to find out the corresponding values of y axis (their probability). R plots empiric density function, so I assume it approximates somehow the empirical density function. Thank a lot for the help in advance. Best regards, Lina --- On Tue, 29/9/09, Meyners,Michael, wrote: From: Meyners,Michael,LAUSANNE,AppliedMathematics Subject: RE: [R] Probability of data values form DENSITY function To: Lina Rusyte liner...@yahoo.co.uk Cc: R help r-help@r-project.org Date: Tuesday, 29 September, 2009, 4:59 PM Lina, check whether something like data.frame(density(rnorm(10))[1:2]) contains the information you want. Otherwise, try to be (much) more specific in what you want so that we do not need to guess (and of course provide minimal, self-contained, reproducible code). That has a higher chance to trigger some responses than posting the same message again. And if you even explain what you want to do with these values, you might get even better responses. HTH, Michael -Original Message- From: r-help-boun...@r-project.org http://de.mc256.mail.yahoo.com/mc/compose?to=r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org http://de.mc256.mail.yahoo.com/mc/compose?to=r-help-boun...@r-project.org ] On Behalf Of Lina Rusyte Sent: Dienstag, 29. September 2009 16:45 To: R help Cc: R help Subject: [R] Probability of data values form DENSITY function Hello,  Could someone help me please and to tell how to get the probability from empirical DENSITY (not parametric
Re: [R] How do I do simple string concatenation in R?
?paste -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dr. Arie Ben David Sent: Mittwoch, 30. September 2009 10:22 To: r-help@r-project.org Subject: [R] How do I do simple string concatenation in R? Dear R gurus How do I do simple string concatenation in R? For example: A = klm B = jjj How can I assign a value to C such that C == klmjjj is True? Thank you Arie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probability of data values form DENSITY function
Lina, check whether something like data.frame(density(rnorm(10))[1:2]) contains the information you want. Otherwise, try to be (much) more specific in what you want so that we do not need to guess (and of course provide minimal, self-contained, reproducible code). That has a higher chance to trigger some responses than posting the same message again. And if you even explain what you want to do with these values, you might get even better responses. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lina Rusyte Sent: Dienstag, 29. September 2009 16:45 To: R help Cc: R help Subject: [R] Probability of data values form DENSITY function Hello,  Could someone help me please and to tell how to get the probability from empirical DENSITY (not parametric) for each data value (R function). For example, for normal distribution there is such a function like:  “dnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)† I need the same function only for the empirical DENSITY function (which does not correspond to any typical distribution). R plots the density function of any data:  “plot(density(x))† I need to find out the probability for each data value from this plot-line.  Best regards, Lina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A stat related question
Let's assume you have just three observations, and x-- = 1:3 for your observations. Predictor 1:y = x^2 Predictor 2:y = 1 if x=1 y = 4 if x=2 y = 9 if x=3 y = 0 elsewhere These predictors are obviously not the same, but will give the same Mean Squared Error for your data (whatever your observed y-values are). This should suffice as a counter example. Or did I misunderstand your question? HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of RON70 Sent: Freitag, 18. September 2009 11:23 To: r-help@r-project.org Subject: [R] A stat related question Can I ask a small stat. related question here? Suppose I have two predictors for a time series processes and accuracy of predictor is measured from MSEs. My question is, if two predictors give same MSE then, necessarily they have to be identical? Can anyone provide me any counter example? Thanks. -- View this message in context: http://www.nabble.com/A-stat-related-question-tp25505618p25505618.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2
See the respective help files. The continuity correction only affects the normal approximation in wilcox.test. With this small samples sizes, the default evaluation is exact, so it doesn't change anything. In contrast, kruskal.test is incapable to compute exact values but always uses the chi-square approximation. So the discrepancy is between exact test and approximation (guess you'd be better off with the former). If you get the urge to reproduce the p value from kruskal.test using wilcox.test, and maybe to better understand what's happening, try a - wilcox.test(x1, x2, paired=FALSE, exact=FALSE, correct=FALSE) (and yes, now with exact=FALSE, the continuity correction makes a difference). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Scott Sent: Dienstag, 8. September 2009 07:02 To: Thomas Farrar Cc: r-help@r-project.org Subject: Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2 Thomas Farrar wrote: Hi all, The Kruskal-Wallis test is a generalization of the two-sample Mann-Whitney test to *k* samples. That being the case, the Kruskal-Wallis test with *k*=2 should give an identical p-value to the Mann-Whitney test, should it not? x1-c(1:5) x2-c(6,8,9,11) a-wilcox.test(x1,x2,paired=FALSE) b-kruskal.test(list(x1,x2),paired=FALSE) a$p.value [1] 0.01587302 b$p.value [1] 0.01430588 The p-values are slightly different (note that there are no ties in the data, so computed p-values should be exact). Can anyone explain the discrepancy? It's been awhile since I studied nonparametric stats and this one has me scratching my head. Many thanks! Tom The continuity correction? It is true by default for wilcox.test and is not apparent in the help for kruskal.test. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email:d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior in friedman.test and ks.test
Alex, It's mainly speculation, as I cannot check the Excel add-in nor Vassar, but I'll give it a try. For the Friedman-test: Results of R coincide with those reported by Hollander Wolfe, which I'd take as a point in favor of R. In any case, my guess is that ties are handled differently (average ranks in R), but you'd have to check with the documentation of WinSTAT and Vassar. If it is not documented, see what test statistic you'd get manually according to which handling of ties. For the ks.test: See the ?ks.test for meaning of the exact argument of this function. I'd assume that Excel gives you the asymptotic p value only, while R will by default return an exact one for 32 samples. From the same help page: Otherwise, asymptotic distributions are used whose approximations may be inaccurate in small samples. You could check using something like ks.test(myData$f1_A, pnorm, exact=FALSE). If that doesn't resolve the issue: do the KS test (semi-)manually, which should not be that difficult (even in Excel, if the need may be), and compare the D value with the one obtained from R and Excel, respectively. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of atconsta-rh...@yahoo.com Sent: Dienstag, 8. September 2009 15:33 To: R-help@r-project.org Subject: [R] Unexpected behavior in friedman.test and ks.test I have to start by saying that I am new to R, so I might miss something crucial here. It seems to me that the results of friedman.test and ks.test are wrong. Now, obviously, the first thing which crossed my mind was it can't be, this is a package used by so many, someone should have observed, but I can't figure out what it might be. Problem: let's start with friedman.test. I have a lot of data to analyze and I grew sick and tired of clicking and selecting in Excel (for which we have a statistics Add-In purchased, don't' start to flame me on using Excel for stats, please!); so I wanted to automate the analysis in R and figured out the results differ from Excel. Example Take the data from example(friedman.test) (Hollander Wolfe (1973), p. 140ff.). I ran the example in R and got: Friedman rank sum test data: RoundingTimes Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805 Same data, in Excel, using the WinSTAT for Excel (Fitch software), gives: Friedman chi-squared = 10.6364, df = 2, p-value =0.004902 Puzzled, I entered the data in the calculator from Vassar (http://faculty.vassar.edu/lowry/fried3.html ) and got exactly the same values as in Excel (and, again, different from R). Admittedly, the differences are not large, and both fall below the 0.05 threshold, but, still. So, question 1 would be why is R different from both Excel and Vassar? Now to the Kolmogorov-Smirnov test, from which my odeal actually started: the results from ks.test are wildly different from the ones I have got with the Excel add-in. Basically, I have 32 sets of observations (patients) for 100 independent variables (different blood analyses). Question was whether the data is normally distributed for each of the analyses and, hence, whether I can apply a parametric test or not. Once I had loaded the data in a dataframe (and it looks as expected), I ran: ks.test(myData$f1_A, pnorm) ks.test(myData$f8_A, pnorm) They give p-values of 2.2e-16 (with ties) and 8.882e-16. The Excel Add-In gives p-values of 0.0074491 and, respectively, 0.2730477 Here the difference is serious, like between highly significant non-normal for both f1 and f8 (R), or one non-normal and one normal (the Add-in). I first thought that the difference might arise from different probablity distributions (but what else, if not pnorm). Then I ran the friedman test, to find out similar discrepancies. I'd really appreciate some input on this: what's wrong and how should I decide whom to trust? Many thanks in advance, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] biplot graphical options?
Taking the example from ?biplot.princomp with reproducible code (hint, hint): biplot(princomp(USArrests)) biplot(princomp(USArrests), col=c(2,3), cex=c(1/2, 2)) clearly changes the color and font size on my system. For changing the point symbols (which are text by default), try setting xlabs and ylabs. But of course, an argument pch will not have any effect, as biplot does have no call to points or similar, but to text only (and hence mainly parameters of text will have an effect on what you aim at). So for documentation, I'd propose ?biplot.princomp and ?biplot.default. For changes that are truly impossible (there are indeed some), take the code for biplot.default (or biplot.princomp) and amend where necessary to fulfill your needs. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Manca Marco (PATH) Sent: Mittwoch, 2. September 2009 13:02 To: r-help@r-project.org Subject: [R] biplot graphical options? Importance: High Dear R-help fellows good afternoon. I am struggling in the attempt to impose some graphical conditions (changing point symbols, colors, etc) to biplot function (I am using it to visualize the results of princomp) but I can't apparently manage to change anything but the axis... and I have been browsing manuals and vignettes without finding any explicit suggestions on how to operate... Can anyone, please, point my attention to the relevant documentation? Thank you in advance and best regards, Marco -- Marco Manca, MD University of Maastricht Faculty of Health, Medicine and Life Sciences (FHML) Cardiovascular Research Institute (CARIM) PO Box 616 6200 MD Maastricht E-mail: m.ma...@path.unimaas.nl Office telephone: +31(0)433874633 Personal mobile: +31(0)626441205 Twitter: @markomanka ** *** This email and any files transmitted with it are confide...{{dropped:15}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation test and R2 problem
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roy Sent: Freitag, 14. August 2009 12:05 To: r-help@r-project.org Subject: [R] Permutation test and R2 problem Hi, I have optimized the shrinkage parameter (GCV)for ridge and got my r2 value is 70% . to check the sensitivity of the result, I did permutation test. I permuted the response vector and run for 1000 times and draw a distribution. But now, I get r2 values highest 98% and some of them more than 70 %. Is it expected from such type of test? Depends on what exactly you are doing and on your data, but surely this is not unexpected (even less given the information we have). *I was under impression that, r2 with real data set will always maximum! And permutation will not be effected i.e. permuted r2 will always less than real one! * Why would that be? And even more, why would you do a permutation test then if you knew in advance that all permuted values are below your observed one? You optimize the shrinkage parameter for your data, not your data for your shrinkage parameter. In the latter case you would have been right. Given any fixed shrinkage parameter, you can always find some data (e.g. the predicted values) that fit better than the original. I guess that in most non-artificial cases with a reasonable amount of data, there are at least some permutations that give a higher r2. Not sure what kind of sensitivity you want to check, but probably you'd have to optimize the shrinkage parameter as well. And of course to make sure that your permutations correspond to the original constraints. But this opens a wide field, refer to any textbook on this matter for further detail (e.g. Edgington Onghena, Randomization tests). HTH, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-squared approximation may be incorrect
Nancy (?), see ?chisq.test (in particular the examples and the comment on expected frequency there). A rule of thumb (see any basic text book) for the chisquared approximation being okay is that the expected value in each cell is at least 5. The warning tells you that this does not hold true for at least one of your cells. You might want to try chisq.test(t, simulate.p.value = TRUE) instead. Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of SNN Sent: Mittwoch, 12. August 2009 00:34 To: r-help@r-project.org Subject: [R] Chi-squared approximation may be incorrect Hi, I am trying to run a Chi squre test but I am getting the following message Warning messages: 1: In chisq.test(t) : Chi-squared approximation may be incorrect Does anyone what it means? your help is appreciated -- View this message in context: http://www.nabble.com/Chi-squared-approximation-may-be-incorrect-tp24926 932p24926932.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional randomization of groups
Hi unknown, As a quick patch, try something like mydata$Set - 1 mydata$Set[c(sample(mydata$ID[mydata$Type==A], ifelse(runif(1)1/2,2,3)), sample(mydata$ID[mydata$Type==B], 3) )[1:5] ] - 2 HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of yabado Sent: Donnerstag, 6. August 2009 23:29 To: r-help@r-project.org Subject: [R] Conditional randomization of groups Hi I have a data set which is like this: Data ID Type 1 A 2 B 3 A 4 A 5 A 6 B 7 B 8 A 9 B 10 B As you can see, there are 5 A and 5 B. Now, I want to a randomization of the data set which will assign 5 ID to a new group called Set1 and and the other 5 to Set2 The Set1 and Set2 will be either: Set1 (A A A B B ) and Set2 (A A B B B) or Set1(A A B B B) and Set2 (A A A B B). Each new group need to have at two A and two B, and one more B or A. Except this condition, everything else (ID, permutation, selection to Set1 and Set2) need to be random. Can anyone show me how to do the randomization? -- View this message in context: http://www.nabble.com/Conditional-randomization-of-groups-tp24854783p248 54783.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rnorm() converted to daily
SD in y is more than 15 times (!!!) larger than in x and z, respectively, and hence SD of the mean y is also larger. 100,000 values are not enough to stabilize this. You could have obtained 0.09 or even larger as well. Try y with different seeds, y varies much more than x and z do. Or check the variances: var.x - var(replicate(100, mean( rnorm( 10, mean=0.08, sd=0.25 var.y - var(replicate(100, mean( rnorm( 10, mean=(0.08/252), sd=(0.25/sqrt(252)) )) * 252)) var.z - var(replicate(100, mean( rnorm( 10, mean=(0.08/252), sd=0.001 )) * 252)) I guess what you are doing wrong is assuming that 0.25/sqrt(252) is similar to 0.25/252, the latter being close so 0.001, while the former is obviously not. Try to replace 0.25 in y by 0.0158, than you should be close enough. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ken-JP Sent: Freitag, 17. April 2009 09:26 To: r-help@r-project.org Subject: [R] rnorm() converted to daily yearly 8% drift, 25% sd why are x and y so different (x and z look ok)? Does this have something to do with biases due to the relationship between population and sample-estimates as a function of n samples and sd? Or am I doing something wrong? --- set.seed( 1 ); x - mean( rnorm( 10, mean=0.08, sd=0.25 )); set.seed( 1 ); y - mean( rnorm( 10, mean=(0.08/252), sd=(0.25/sqrt(252)) )) * 252; set.seed( 1 ); z - mean( rnorm( 10, mean=(0.08/252), sd=0.001 )) * 252; # --- x# 0.07943898 y# 0.07109407 z# 0.07943449 -- View this message in context: http://www.nabble.com/rnorm%28%29-converted-to-daily-tp23092444p23092444 .html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Genstat into R - Randomisation test
Robert, Tom, Peter and all, If I remember correctly (don't have my copy at hand right now), Edgington and Onghena differentiate between randomization tests and permutation tests along the following lines: Randomization test: Apply only to randomized experiments, for which we consider the theoretically possible re-randomizations given the constraints for the original randomization. In this setting, inference is valid for the observed population, and does not require random sampling (while, of course, this assumption is needed if we want to generalize the findings). I think they also call it re-randomization tests at least once, while the prefix is usually omitted for simplicity, but this makes it clearer that an initial randomization is required. Permutation test: Can be applied even to data from non-randomized experiments, requires, however, random sampling. Hence, one additional assumption is needed (one that is common to inference using classical tests). This is hardly addressed in Edgington Onghena. In both cases, the answer can be approximated by random draws if full randomizations/permutations are difficult or impossible to perform, but this applies to both methods. The distinction in Edginton (1980) as mentioned by Tom is not used in the latest edition. In this regard, the comparison with exact vs. approximate integral seems inappropriate to me, as the distinction is conceptual, not computational. Neither is one the (non-)exhaustive variant of the other. I doubt that randomization test is named according to _random_ choice of permutations as implied by Peter, it's rather based on randomization (of an experiment) in its best statistical meaning. Not sure whether the distinction is needed, but it might be helpful in some instances, at least if used consistently. I have never seen another distinction between these two, and the literature is inconsistent in its use as well. Just my two cents worth. Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert A LaBudde Sent: Donnerstag, 9. April 2009 17:38 To: Tom Backer Johnsen Cc: r-help@r-project.org Subject: Re: [R] Genstat into R - Randomisation test At 04:43 AM 4/9/2009, Tom Backer Johnsen wrote: Peter Dalgaard wrote: Mike Lawrence wrote: Looks like that code implements a non-exhaustive variant of the randomization test, sometimes called a permutation test. Isn't it the other way around? (Permutation tests can be exhaustive by looking at all permutations, if a randomization test did that, then it wouldn't be random.) Eugene Edgington wrote an early book (1980) on this subject called Randomization tests, published by Dekker. As far as I remember, he differentiates between Systematic permutation tests where one looks at all possible permutations. This is of course prohibitive for anything beyond trivially small samples. For larger samples he uses what he calls Random permutations, where a random sample of the possible permutations is used. Tom Peter Dalgaard wrote: Mike Lawrence wrote: Looks like that code implements a non-exhaustive variant of the randomization test, sometimes called a permutation test. Isn't it the other way around? (Permutation tests can be exhaustive by looking at all permutations, if a randomization test did that, then it wouldn't be random.) Edginton and Onghena make a similar distinction in their book, but I think such a distinction is without merit. Do we distinguish between exact definite integrals and approximate ones obtained by numerical integration, of which Monte Carlo sampling is just one class of algorithms? Don't we just say: The integral was evaluated numerically by the [whatever] method to an accuracy of [whatever], and the value was found to be [whatever]. Ditto for optimization problems. A randomization test has one correct answer based upon theory. We are simply trying to calculate that answer's value when it is difficult to do so. Any approximate method that is used must be performed such that the error of approximation is trivial with respect to the contemplated use. Doing Monte Carlo sampling to find an approximate answer to a randomization test, or to an optimization problem, or to a bootstrap distribution should be carried out with enough realizations to make sure the residual error is trivial. As Monte Carlo sampling is a random sampling-based approximate method. The name does create confusion in terminology for randomization tests for bootstrapping. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list
Re: [R] OT: A test with dependent samples.
Rolf, as you explicitly asked for a comment on your proposal: It is generally equivalent to McNemar's test and maybe even more appropriate because of the asymptotics involved in the chi-squared distribution, which might not be too good with n=12. In some more detail: McNemar's test basically considers the difference between the off-diagonal elements, normalized by their sum. You do the same, ignoring all (0,0) and (1,1) pairs (the latter do not occur in your example anyway). binom.test(12,12,alternative=greater) gives you the one-sided p-value. If, however, you do not use the exact binomial distribution (with n=12) but the normal approximation, you find E(X)=6 and VAR(X)=3 with X the number of sequences (0,1), and you observed x=12. This gives you a z-score of (12-6)/sqrt(3) and a corresponding one-sided p-value: pnorm(6/sqrt(3), lower.tail=F) [1] 0.0002660028 Further, McNemar's test WITHOUT continuity correction gives mcnemar.test(matrix(c(61,0,12,0),2,2), correct=F) McNemar's Chi-squared test data: matrix(c(61, 0, 12, 0), 2, 2) McNemar's chi-squared = 12, df = 1, p-value = 0.000532 It comes as no surprise that the reported (two-sided!) p-value is exactly twice the (one-sided!) p-value from the binomial test with normal approximation: pnorm(6/sqrt(3), lower.tail=F)*2 [1] 0.0005320055 I do not want to stress the pros and cons of continuity corrections, but if neglected, you see that you get the same results (except that McNemar is generally two-sided), due to the relation of normal and chi-squared distribution. If you use the binomial test, you can forget about asymptotics and continuity correction, that's why I indeed consider you approach superior to the chi-squared approximation used in McNemar's test. But you might fail to find a reference for the exact approach... You briefly asked in a later mail testing for p=0. Indeed, _any_ incident will disprove this hypothesis, and the p value reported by binom.test(1,73,p=0) Exact binomial test data: 1 and 73 number of successes = 1, number of trials = 73, p-value 2.2e-16 alternative hypothesis: true probability of success is not equal to 0 95 percent confidence interval: 0.0003467592 0.0739763232 sample estimates: probability of success 0.01369863 is not wrong (as R just tells us it is BELOW a certain value), but could be refined by saying p-value = 0. BTW, I am not sure what p-value=TRUE tells me, as derived from binom.test(0,73,p=0). I personally don't care about either, as to test H0: p=0, I would not use any software but rather stress some basic statistical theory. It remains the questions whether McNemar (or your proposal) is generally appropriate here. Like others, I have doubts, except maybe if observations were made on two subsequent days, on the second of which the drug was administered. How could you otherwise be sure that the increase in incidences it not due to the progression of cancer, say, which would be difficult to rule out. Also, from my experience and in contrast to you point of view, I'd rather assume it likely that a vet is much more reluctant to apply the new treatment to an already vomitting cat, your baseline with 0 out of 73 at least seems suspicious, and until proven wrong, I'd take the assumption that this is not by chance only. I understand that this is an observational study with no control/randomization, but concluding a side-effect from a statistically significant change in incidence rates from this study seems questionable to me. What I'd propose (and used in comparable situations) is to confine yourself on the confidence interval and let the investigator decide / discuss whether the lower bound of this is higher than what she would expect under control or alternative treatment. She needs to have some experience, if not historical data, and using a test or just this confidence interval, you will only get a hint on a possible side-effect, no formal proof (which is always difficult from observational studies). Discussing the CI would seem fairer and even stronger to me then. In other words: Mathematically, you'll get an appropriate test, while conceptually, I'm far from being convinced. Hope this makes sense, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rolf Turner Sent: Dienstag, 10. Februar 2009 22:33 To: R-help Forum Subject: [R] OT: A test with dependent samples. I am appealing to the general collective wisdom of this list in respect of a statistics (rather than R) question. This question comes to me from a friend who is a veterinary oncologist. In a study that she is writing up there were 73 cats who were treated with a drug called piroxicam. None of the cats were observed to be subject to vomiting prior to treatment; 12 of the cats were subject to vomiting after treatment commenced. She wants to be able to say that the treatment had a ``significant'' impact with respect to