Re: [R] slow computation of mixed ANOVA using aov
Steven Lacey [EMAIL PROTECTED] writes: Dear R-help list, I am trying to do a mixed ANOVA on a 8960 x 5 dataframe. I have 3 factors for which I want to test all main effects and interactions : f1 (40 levels), f2 (7 levels), and f3 (4 levels). I also have a subject factor, subject, and a dependent variable, dv. Some more information about the factors: f2 is a between-subject factor. That is, for each level of f2 there are 8 nested levels of the subject factor. For example, levels 1-8 of subject are nested in level 1 of f2. Levels 9-16 of subject are nested in level 2 of f2. In other words, the subjects that participated in any level of f2 are different from the subjects that participated in any other level of f2. In contrast, f1 and f3 are within-subject factors. That is, for any one of the 56 subjects, we have a 160 medians corresponding to each condition from a complete crossing of factors f1 and f2. While it is true that we do have replicate observations for any subject in each of these conditions, we take the median of those values and operate as if there is only a single observation for each subject in each of the 160 within-subject conditions. Below is code that will generate dataframe d, which is just like the one I am working with: f1-gl(40,1,8960,ordered=T) f2-gl(7,1280,8960,ordered=T) f3-gl(4,40,8960,ordered=T) subject-gl(56,160,8960,ordered=T) dv-rnorm(8960,mean=500,sd=50) d - data.frame(f1,f2,f3,f4,dv) To run the mixed ANOVA I use the following call (modeled after J. Baron): aov(dv~f1*f2*f3+Error(subject/(f1*f2)),data=d) [You mean subject/(f1*f3), right? f2 is a coarsening of subject] WARNING: Exert caution when running the aov command. I have run the exact same command on Windows and Unix machines (each with 1GB of RAM; allocated up to 3 or 4GB of memory for the analysis ) and it has taken many, many hours to finish. That said, this is not a new problem posted on the R-help list. There are several posts where analysts have run into similar problems. My general impression of these posts, and correct me if I am wrong, is that because aov is a wrapper around lm, the extra time is required to build and manipulate a design matrix (via qr decomposition) that is 8960 x several thousand columns large! Is that so? It seems fitting because if I call aov with only a single factor, then it returns in a few seconds. Yes, this is basically correct. The main issue is the calculation of the projection onto the terms in the Error model, which is done using the generic lm algorithm even though the design is typicaly orthogonal so that there are much faster ways to get to the result. To paraphrase: if you have double determinations and an error term at the level of each pair, the algorithm fits a model with N/2 parameters in order to calculate the mean and difference within each pair. For large designs, this becomes an issue... This is in some sense a tradeoff of generality for speed, but the results for non-orthogonal designs are typically very hard to interpret. The topic does come up off and on, and we have been considering the option of adding an algorithm where it is assumed that the Error model consists of mutually orthogonal and balanced terms (in the sense that all factor levels appear equally frequently). Just a simple matter of programming... For the near term, there are a couple of things that you can do: - avoid having an error term that is equivalent to the full data set. In your case, subject:f1:f3 is such a term, and subject/(f1+f3) is actually equivalent (the second order interaction term becomes the residual stratum). This at least saves you from inverting an NxN matrix. - use a version of R compiled with a fast BLAS, on a fast computer with a lot of RAM... (A ~2K square matrix inversion will take a while, but hours sounds a bit excessive). - (not too sure of this, but R 2.1.0 will have some new code for multivariate lm, with intra-individual designs, and tests under the sphericity assumptions; it is possible that your models can be reformulated in that framework. You'd have to set it up as a model with 160 responses on 56 subjects, though, and the code wasn't really designed with that sort of intrasubject dimensionality in mind.) In order to test if the computational slowness was something unique to R, I ran the same analysis, including all 3 factors, in SPSS. To my surprise SPSS returned almost instantaneously. I do not know much about the algorithm in SPSS, but I suspect it may be calculating condition means and sums of squares rather than generating a design matrix. Does that sound plausible? At this point I am a dedicated R user. However, I do the kind of analysis described above quite often. It is important that my statistical package be able to handle it more efficiently than what I have been able to get R to do at this point. Am I doing anything obviously wrong? Is there a
[R] How I calculate nCr with R ? (Como calculo nCr con R? )
En español (In Spanish) Necesito calcular la en numeros de combinaciones de n cosas tomando k al tiempo. Como hago eso en R ??? Yo escribí mi propia función pero pienso que de esa forma no es fácil para mis estudiantes . He estado buscando en la ayuda y no he encontrado información sobre una función que calcule eso directamente. Podrían ayudarme In English (en Inglés ) I need calculate the combination of n things taking k at time. How do I do that in R ? I wrote my own function but in this form isn't easy for my students. Can you help me ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] slow computation of mixed ANOVA using aov
Peter Dalgaard [EMAIL PROTECTED] writes: - use a version of R compiled with a fast BLAS, on a fast computer with a lot of RAM... (A ~2K square matrix inversion will take a while, but hours sounds a bit excessive). To wit: system.time(aov(dv~f1*f2*f3+Error(subject/(f1+f3)),data=d)) [1] 582.46 9.31 619.01 0.00 0.00 i.e. about 10 minutes on an Opteron 240 (dual, but it only seemed to use one cpu for this task) with 1GB. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How I calculate nCr with R ? (Como calculo nCr con R? )
Mario Morales wrote: En español (In Spanish) Necesito calcular la en numeros de combinaciones de n cosas tomando k al tiempo. Como hago eso en R ??? Yo escribí mi propia función pero pienso que de esa forma no es fácil para mis estudiantes . He estado buscando en la ayuda y no he encontrado información sobre una función que calcule eso directamente. Podrían ayudarme In English (en Inglés ) I need calculate the combination of n things taking k at time. Just the number of combinations: ?choose For listing all of them (you could have found those packages yourself, BTW), e.g: - combinations() in package gtools in bundle gregmisc - combn() in package combinat Uwe Ligges How do I do that in R ? I wrote my own function but in this form isn't easy for my students. Can you help me ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How I calculate nCr with R ? (Como calculo nCr con R? )
do you mean n choose k which is a built in function see ?choose On Sat, 19 Mar 2005, Mario Morales wrote: En español (In Spanish) Necesito calcular la en numeros de combinaciones de n cosas tomando k al tiempo. Como hago eso en R ??? Yo escribí mi propia función pero pienso que de esa forma no es fácil para mis estudiantes . He estado buscando en la ayuda y no he encontrado información sobre una función que calcule eso directamente. Podrían ayudarme In English (en Inglés ) I need calculate the combination of n things taking k at time. How do I do that in R ? I wrote my own function but in this form isn't easy for my students. Can you help me ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] the number of cluster
XP Sun wrote: hi, all, how to decide the number of cluster before you use kmeans and hclust? thank you in advance! Depends on your criterion. Best idea is always to use the brain and think about how many clusters are sensible for the particular task/problem/data. For hclust, you can also look at the dendrogram's height and distances of dissimilarities in order to cut. Uwe Ligges best -xpsun __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to use a R package with C code
C function name not in load table means that youe DLL/SO file is not loaded. You should use .First.lib - function(lib, pkg) { library.dynam(your_pkg_name, pkg, lib) } Regads, Samuel. Duncan Murdoch [EMAIL PROTECTED] wrote: On Wed, 9 Mar 2005 18:54:36 -0500, [EMAIL PROTECTED] wrote : Hello, everybody, I created a R package which includes C code. But I load this package, and carry out the R function in it. It shows C function is not in load table as follows. Would you tell me what is the problem? Where do I make mistake? You aren't giving enough information for anyone to know that. You need to tell us exactly what you did to create your package, and what operating system you're on. Duncan Murdoch Maggie [Previously saved workspace restored] library(var) Attaching package 'var': The following object(s) are masked _by_ .GlobalEnv : b wxt0124() Error in .C(wxt1221) : C function name not in load table __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] How I calculate nCr with R ? (Como calculo nCr con R? )
Try choose(k,r) choose(10,2) [1] 45 JB -Original Message- From: Mario Morales [mailto:[EMAIL PROTECTED] Sent: March 19, 2005 5:37 AM To: r-help@stat.math.ethz.ch Subject: [R] How I calculate nCr with R ? (Como calculo nCr con R? ) En español (In Spanish) Necesito calcular la en numeros de combinaciones de n cosas tomando k al tiempo. Como hago eso en R ??? Yo escribí mi propia función pero pienso que de esa forma no es fácil para mis estudiantes . He estado buscando en la ayuda y no he encontrado información sobre una función que calcule eso directamente. Podrían ayudarme In English (en Inglés ) I need calculate the combination of n things taking k at time. How do I do that in R ? I wrote my own function but in this form isn't easy for my students. Can you help me ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problem with legend
I have problem with legend command. Please look this script: dcbv.fm Time Series: Start = 1980 End = 2002 Frequency = 1 [1] 2994.023 3388.414 3111.762 2990.967 3077.438 3058.274 3049.934 2974.130 [9] 2889.659 2801.790 2631.391 2661.700 2312.526 2518.968 2567.044 2443.952 [17] 2117.638 2042.461 2025.816 1939.560 1640.775 1583.609 1659.912 dcbv.ms Time Series: Start = 1980 End = 2002 Frequency = 1 [1] 3700.239 4076.438 3856.495 3680.345 3871.887 3789.770 3831.173 3768.876 [9] 3585.572 3754.374 3372.859 3419.667 3185.194 3319.215 3445.845 3265.214 [17] 2773.961 2661.904 2669.835 2569.190 2187.719 2217.756 2196.378 plot(dcbv.ms,ylim=c(min(dcbv.fm),max(dcbv.ms))) lines(dcbv.fm,col=2) legend(1984,2500,c(DCVB-MS,DCBV-FM),col=c(1,2),cex=.6,fill=T) At end of script the legend of plot have only one color: black. I think the legend will have two colors: black and red. Where I make mistake? version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor0.1 year 2004 month11 day 15 language R Thanks in advance Bernardo Rangel Tura, MD, MSc National Institute of Cardiology Laranjeiras Rio de Janeiro Brazil -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How I calculate nCr with R ? (Como calculo nCr con R? )
At 10:37 19/03/05, Mario Morales wrote: En español (In Spanish) Necesito calcular la en numeros de combinaciones de n cosas tomando k al tiempo. In English we usually read this as N choose r and with that clue you might go ?choose Incidentally your English is fine although I see the logic in giving us both. Como hago eso en R ??? Yo escribí mi propia función pero pienso que de esa forma no es fácil para mis estudiantes . He estado buscando en la ayuda y no he encontrado información sobre una función que calcule eso directamente. Podrían ayudarme In English (en Inglés ) I need calculate the combination of n things taking k at time. How do I do that in R ? I wrote my own function but in this form isn't easy for my students. Can you help me ? Michael Dewey [EMAIL PROTECTED] http://www.aghmed.fsnet.co.uk/home.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] simple problem, but not for me
Hello, I'm new in R and I want to do one thing that is very easy in excel, however, I cant do it in R. Suppose we have the data frame: data- data.frame(A=c(a1,a2,a3,a4,a5)) I need to obtain another column in the same data frame (lets say B=c(b1,b2,b3,b4,b5) in the following way: b1=a1/(a1+a2+a3+a4+a5) b2=a2/(a2+a3+a4+a5) b3=a3/(a3+a4+a5) b4=a4/(a4+a5) b5=a5/a5 a1..a5 and b1...b5 are always numeric values (this is just an example, what I really want is apply this kind of formula to a much larger data frame) I think this is easy for those who are used to work in R (and I suppose there are many different ways), but I can make it in this moment, because I cannot leave behind, the excel thinking way. I hope you understand my problem and please, help me soon. Alexandre [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Control of vertical spacing in Lattice key text?
On Tuesday 15 March 2005 17:18, Patrick Connolly wrote: I find the key and legend functions in Lattice very useful. Trouble is, now I can see what else I'd like to be able to do with them. If I put a title on a key, it appears too close to the key itself, and if there's a line break in the title (which often happens), the leading between the lines is too much. What I can do is print to a postscript file, then find the lines in the postscript file where my text appears and fiddle with the postscript code to move the text up or down as I wish. For single plots that's OK, but I'd prefer to be able to do it within R especially when I wish to do dozens of them. Is there something in the documentation I overlooked? Nope. The space used for the title is currently hard-coded (to be 1.2 times the height of the title string). It should be easy enough to add a new option in draw.key, provided I can think up a good name (suggestions welcome). Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] simple problem, but not for me
alexbri wrote: Hello, I'm new in R and I want to do one thing that is very easy in excel, however, I cant do it in R. Suppose we have the data frame: data- data.frame(A=c(a1,a2,a3,a4,a5)) I need to obtain another column in the same data frame (lets say B=c(b1,b2,b3,b4,b5) in the following way: b1=a1/(a1+a2+a3+a4+a5) b2=a2/(a2+a3+a4+a5) b3=a3/(a3+a4+a5) b4=a4/(a4+a5) b5=a5/a5 temp - rev(data$A) data$B - rev(temp / cumsum(temp)) Uwe Ligges a1..a5 and b1...b5 are always numeric values (this is just an example, what I really want is apply this kind of formula to a much larger data frame) I think this is easy for those who are used to work in R (and I suppose there are many different ways), but I can make it in this moment, because I cannot leave behind, the excel thinking way. I hope you understand my problem and please, help me soon. Alexandre [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with legend
Bernardo Rangel Tura wrote: I have problem with legend command. Please look this script: dcbv.fm Time Series: Start = 1980 End = 2002 Frequency = 1 [1] 2994.023 3388.414 3111.762 2990.967 3077.438 3058.274 3049.934 2974.130 [9] 2889.659 2801.790 2631.391 2661.700 2312.526 2518.968 2567.044 2443.952 [17] 2117.638 2042.461 2025.816 1939.560 1640.775 1583.609 1659.912 dcbv.ms Time Series: Start = 1980 End = 2002 Frequency = 1 [1] 3700.239 4076.438 3856.495 3680.345 3871.887 3789.770 3831.173 3768.876 [9] 3585.572 3754.374 3372.859 3419.667 3185.194 3319.215 3445.845 3265.214 [17] 2773.961 2661.904 2669.835 2569.190 2187.719 2217.756 2196.378 plot(dcbv.ms,ylim=c(min(dcbv.fm),max(dcbv.ms))) lines(dcbv.fm,col=2) legend(1984,2500,c(DCVB-MS,DCBV-FM),col=c(1,2),cex=.6,fill=T) So you want filles boxes? Then you should specify the color in the fill argument: legend(1984, 2500, c(DCVB-MS, DCBV-FM), cex=.6, fill=1:2) or do you want some lines? legend(1984, 2500, c(DCVB-MS, DCBV-FM), cex=.6, col=1:2, lwd=2) Uwe Ligges At end of script the legend of plot have only one color: black. I think the legend will have two colors: black and red. Where I make mistake? version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor0.1 year 2004 month11 day 15 language R Thanks in advance Bernardo Rangel Tura, MD, MSc National Institute of Cardiology Laranjeiras Rio de Janeiro Brazil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] simple problem, but not for me
[EMAIL PROTECTED] wrote: Hello, I'm new in R and I want to do one thing that is very easy in excel, however, I cant do it in R. Well, if you've deadened your brain by using Excel, no wonder. Suppose we have the data frame: data- data.frame(A=c(a1,a2,a3,a4,a5)) Oh, for Pete's sake! This makes ``data'' (NOT a good name for an object!) into a data frame with a single column named ``A''. That column will be a factor with 5 entries (an 5 levels) with these levels being (the character strings) a1,a2,a3,a4, and a5. Nothing to do with what you actually want. I need to obtain another column in the same data frame (lets say B=c(b1,b2,b3,b4,b5) in the following way: This would make B a ***vector*** equal to the concatenation of b1, ..., b5. Perhaps you mean: B - cbind(b1,b2,b3,b4,b5) b1=a1/(a1+a2+a3+a4+a5) b2=a2/(a2+a3+a4+a5) b3=a3/(a3+a4+a5) b4=a4/(a4+a5) b5=a5/a5 a1..a5 and b1...b5 are always numeric values (this is just an example, what I really want is apply this kind of formula to a much larger data frame) You are very confused. Your notation and your use of the function c() are all wrong. If you are going to use R, get the basic syntax straight. You probably should be using matrices rather than data frames given that the entries are all numeric. Be that as it may, if A is a (numeric) matrix then B - A/t(apply(A,1,function(x){rev(cumsum(x))})) will give what you appear to want. cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] slow computation of mixed ANOVA using aov
Peter Dalgaard [EMAIL PROTECTED] writes: - (not too sure of this, but R 2.1.0 will have some new code for multivariate lm, with intra-individual designs, and tests under the sphericity assumptions; it is possible that your models can be reformulated in that framework. You'd have to set it up as a model with 160 responses on 56 subjects, though, and the code wasn't really designed with that sort of intrasubject dimensionality in mind.) OK, I've tried this now and it does seem to work, and much faster than the aov() approach. I had to fix a buglet in the code (eigenvalues coming up with small imaginary parts due to numerics), so currently available versions won't quite work, but it should be in the alpha releases that are supposed to start Monday. Here's how it works: ### orig setup, edited so as to actually work... f1-gl(40,1,8960,ordered=T) f2-gl(7,1280,8960,ordered=T) f3-gl(4,40,8960,ordered=T) subject-gl(56,160,8960,ordered=T) dv-rnorm(8960,mean=500,sd=50) d - data.frame(f1,f2,f3,subject,dv) ### Regroup to 56x160 matrix response f2a - f2[seq(1,8801,160)] idata - d[1:160,] # intrasubj. design, actually need only f1,f3 from this Y - matrix(dv,56,byrow=T) ### now fit models with effect of f2a, constant, and empty fit - lm(Y~f2a) fit2 - lm(Y~1) fit3 - lm(Y~0) ## The main trick is to do anova on within-subject effects. First look ## at the interactions, alias residuals from an additive model ~f1+f3. ## The reduction Model 1- Model 2 corresponds to testing the f1:f2:f3 ## interaction in aov (tests whether the f1:f3 interaction depends on ## f2a) and Model 2 - Model 3 is the test for the f1:f3 interaction ## being zero. ## I'm not quite sure what to make of the correction terms when the ## estimated SSD matrix is singular (it has to be, the dimension is ## 117, but the df is only 49). Probably the G-G epsilon is bogus, but ## the H-F one seems to fare rather well. anova(fit,fit2,fit3,test=Spherical,X=~f1+f3,idata=idata) Analysis of Variance Table Model 1: Y ~ f2a Model 2: Y ~ 1 Model 3: Y ~ 0 Contrasts orthogonal to ~f1 + f3 Greenhouse-Geisser epsilon: 0.2903 Huynh-Feldt epsilon:0.9639 Res.Df Df Gen.var. F num Df den Df Pr(F) G-G Pr H-F Pr 1 49 0.0 2 556 0.0 0.9036702 5733 0.96034 0.82268 0.95748 3 561 0.0 1.0460117 5733 0.35003 0.39624 0.35206 ## Next, we can look at the f1 effects. This is basically the ## difference between two projections, on ~f1+f3 and ~f3 ## This gives us the tests for f2:f1 and f1 anova(fit,fit2,fit3,test=Spherical,M=~f1+f3,X=~f3,idata=idata) Analysis of Variance Table Model 1: Y ~ f2a Model 2: Y ~ 1 Model 3: Y ~ 0 Contrasts orthogonal to ~f3 Contrasts spanned by ~f1 + f3 Greenhouse-Geisser epsilon: 0.5598 Huynh-Feldt epsilon:1.0284 Res.Df Df Gen.var. F num Df den Df Pr(F) G-G Pr H-F Pr 1 49315.74 2 556 344.98 0.9883234 1911 0.54 0.52 0.54 3 561 347.15 0.8393 39 1911 0.75 0.68 0.75 ## Same thing with f3 anova(fit,fit2,fit3,test=Spherical,M=~f1+f3,X=~f1,idata=idata) Analysis of Variance Table Model 1: Y ~ f2a Model 2: Y ~ 1 Model 3: Y ~ 0 Contrasts orthogonal to ~f1 Contrasts spanned by ~f1 + f3 Greenhouse-Geisser epsilon: 0.9482 Huynh-Feldt epsilon:1.0128 Res.Df Df Gen.var. F num Df den Df Pr(F) G-G Pr H-F Pr 1 49 35.171 2 55 6 34.679 0.9010 18147 0.578 0.574 0.578 3 56 1 34.658 1.0039 3147 0.393 0.390 0.393 ## Actually, because of the complete design, we can ignore f1 and get ## the same analysis: anova(fit,fit2,fit3,test=Spherical,M=~f3,X=~1,idata=idata) Analysis of Variance Table Model 1: Y ~ f2a Model 2: Y ~ 1 Model 3: Y ~ 0 Contrasts orthogonal to ~1 Contrasts spanned by ~f3 Greenhouse-Geisser epsilon: 0.9482 Huynh-Feldt epsilon:1.0128 Res.Df Df Gen.var. F num Df den Df Pr(F) G-G Pr H-F Pr 1 49 35.171 2 55 6 34.679 0.9010 18147 0.578 0.574 0.578 3 56 1 34.658 1.0039 3147 0.393 0.390 0.393 ## Finally we get the main effect of f2a as follows. Notice that the ## Model 2 - Model 3 reduction is the test for zero overall mean, ## which you might (and aov does) want to omit. anova(fit,fit2,fit3,test=Spherical,M=~1,X=~0,idata=idata) Analysis of Variance Table Model 1: Y ~ f2a Model 2: Y ~ 1 Model 3: Y ~ 0 Contrasts orthogonal to ~0 Contrasts spanned by ~1 Greenhouse-Geisser epsilon: 1 Huynh-Feldt epsilon:1 Res.Df Df Gen.var. F num Df den Df Pr(F) G-G Pr H-F Pr 1 49 11 2 55 6 12 1.5010e+00 6 49 1.976e-01 1.976e-01 1.976e-01 3 56 1 249956 1.2699e+06 1 49 8.346e-110 8.346e-110 8.346e-110 ## Finally aov() results for comparison: system.time(aa - aov(dv~f1*f2*f3+Error(subject/(f1+f3)),data=d)) [1] 526.50 9.27 561.29 0.00 0.00 summary(aa) Error:
Re: [R] simple problem, but not for me
The suggestions by Uwe and Rolf use some of the subtler features of R. A simpler (to me) if more tedious approach is provided by the following: Data - data.frame(a1=1:4, a2=5:8, a3=9:12) Data$b1 - Data$a1/(Data$a1+Data$a2+Data$a3) Data$b2 - Data$a2/(Data$a2+Data$a3) Data a1 a2 a3 b1b2 1 1 5 9 0.0667 0.3571429 2 2 6 10 0. 0.375 3 3 7 11 0.14285714 0.389 4 4 8 12 0.1667 0.400 The above can be written in fewer characters using with: Data$b1 - with(Data, a1/(a1+a2+a3)) Data$b2 - with(Data, a2/(a2+a3)) If you want something that will work with an arbitrary number of columns, consider the following: Data - data.frame(a1=1:4, a2=5:8, a3=9:12) dat2 - Data k - length(dat2) for(i in (k-1):1) dat2[[i]] - (dat2[[i]]+dat2[[i+1]]) Dat2 - cbind(Data, Data/dat2) names(Dat2)[k+(1:k)] - paste(b, 1:k, sep=) Data a1 a2 a3 1 1 5 9 2 2 6 10 3 3 7 11 4 4 8 12 dat2 a1 a2 a3 1 15 14 9 2 18 16 10 3 21 18 11 4 24 20 12 Dat2 a1 a2 a3 b1b2 b3 1 1 5 9 0.0667 0.3571429 1 2 2 6 10 0. 0.375 1 3 3 7 11 0.14285714 0.389 1 4 4 8 12 0.1667 0.400 1 If you want to do this more than once, you can wrap the above code in a function; see Writing your own functions in An Introduction to R, available, e.g., vial help.start(). hope this helps. spencer graves p.s. The posting guide http://www.R-project.org/posting-guide.html; also seems quite valuable. In an discussion on (and off) this list earlier this week, several people reported they had found solutions to their own problems in the process of preparing a question to post to this list. And even if you don't find a solution, the result will more likely elicit useful replies. Rolf Turner wrote: [EMAIL PROTECTED] wrote: Hello, I'm new in R and I want to do one thing that is very easy in excel, however, I cant do it in R. Well, if you've deadened your brain by using Excel, no wonder. Suppose we have the data frame: data- data.frame(A=c(a1,a2,a3,a4,a5)) Oh, for Pete's sake! This makes ``data'' (NOT a good name for an object!) into a data frame with a single column named ``A''. That column will be a factor with 5 entries (an 5 levels) with these levels being (the character strings) a1,a2,a3,a4, and a5. Nothing to do with what you actually want. I need to obtain another column in the same data frame (lets say B=c(b1,b2,b3,b4,b5) in the following way: This would make B a ***vector*** equal to the concatenation of b1, ..., b5. Perhaps you mean: B - cbind(b1,b2,b3,b4,b5) b1=a1/(a1+a2+a3+a4+a5) b2=a2/(a2+a3+a4+a5) b3=a3/(a3+a4+a5) b4=a4/(a4+a5) b5=a5/a5 a1..a5 and b1...b5 are always numeric values (this is just an example, what I really want is apply this kind of formula to a much larger data frame) You are very confused. Your notation and your use of the function c() are all wrong. If you are going to use R, get the basic syntax straight. You probably should be using matrices rather than data frames given that the entries are all numeric. Be that as it may, if A is a (numeric) matrix then B - A/t(apply(A,1,function(x){rev(cumsum(x))})) will give what you appear to want. cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Re: Repeated Measures, groupedData and lme
Emma, I am not an expert, but I have been trying to fit similar models. Adding to Keith's reply to your question, I can suggest what I concluded was the most reasonable model for my case. Based on Keith's Model1, you might also want to allow for a correlation among years within each experimental unit (I am assuming the experiment was conducted at the exact same location over the 3 years). Say you want to impose an autoregressive, order 1 structure (you can change this to any other structure you may consider appropriate) To do this at the block*treatment unit (the smallest experimental unit size in your experiment), you can add to keith's code: correlation=corAR1(form=~1|block/treatment) thus the entire code would be Model1-lme(mg~treatment + year + treatment:year, random=~1|block, correlation=corAR1(form=~1|block/treatment),data=magnesium) This results in a model with a certain covariance among all exp.units within the same block, plus another covariance between pairs of years within the same exp.unit, and this covariance decreases as the difference in time increases. You can see graphically the structure of this covariance by doing: rho-0.3 ar1-corAR1(value=rho,form=~1|block/treatment) ar1-Initialize(ar1,data=yourdata) m1-corMatrix(ar1) plot(m1$1/name of first treatment[,1]) Now, I really hope someone more knowledgeable is checking this out there and will point out whether this is incorrect, as I have used it for my analysis assuming was the correct approach. Ignacio -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Keith Wong Sent: Friday, March 18, 2005 5:41 PM To: r-help@stat.math.ethz.ch Subject: [R] Re: Repeated Measures, groupedData and lme Hello, I'm an R-newbie, but I've been learning to use lme for repeated measures experiments as well. If I understand correctly: Outcome variable: Mg (Kg/ha) Subject/grouping variable: block Condition/treatment: treatment (19 levels) Repeated factor: time (3 levels: 99, 02, 04) I think if you specify the model formula in the lme call, then the formula structure specified in the groupedData object is ignored. One suggestion for the model: Model1-lme(mg~treatment + year + treatment:year, random=~1|block, data=magnesium) If the question of interest is the treatment:year interaction Or Model2 - lme(mg~treatment, random=~1|block, data=magnesium) Hope this helps ... and hope the experts chime in at this point to give more guidance. Keith --quoting original post--- Hello I am trying to fit a REML to some soil mineral data which has been collected over the time period 1999 - 2004. I want to know if the 19 different treatments imposed, differ in terms of their soil mineral content. A tree model of the data has shown differences between the treatments can be attributed to the Magnesium, Potassium and organic matter content of the soil, with Magnesium being the primary separating variable. I am looking at soil mineral data were collected : 99, 02, 04. In the experiment, there are 19 different treatments (treatmentcontrol, treatment6TFYM, treatment 12TFYM etc), which are replicated in 3 blocks. For the magnesium soil data, I have created the following groupedData object: magnesium-groupedData(Mg~year|treatment, inner=~block) Where mg=magnesium Kg/ha As it is a repeated measures I was going to use an lme. I have looked at Pinherio and Bates : Mixed-Effects models in S and S-plus and I am getting slightly confused. In order to fit the lme, should I specify the data to use in the model as the grouped structure model? If so is the following command correct: Model1-lme(mg~treatment, random=block|year, data=magnesium)? I am slightly worried that it isn't, because in model summary, instead of listing the 19 different treatments in the fixed effects section, it writes intercept (as normal), then treatment^1, treatment^2 etc. However if I don't specify the groupedData object in the model, then in the fixed effects section, it names the treatments (i.e. intercept, treatmentcontrol, treatment6TFYM. Should I be fitting the model using the whole data set rather than the groupedData object? Thank you very much for your help Emma Pilgrim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] newbie question about beta distribution
hi everyone, I'm still a newbie in statistics, I have a question about beta distribution, that is, On the ref/tutorials I've found on the net, why beta distribution always have value p(x) more than 1? As I know, any probability density function always have value not more than 1? is there any one who can explain to me, I'm not statistics people, but I need to code that needing some of this distribution function. thx before __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie question about beta distribution
A probability density must integrate to 1. The specific values of the density can be either more or less than 1. -roger [EMAIL PROTECTED] wrote: hi everyone, I'm still a newbie in statistics, I have a question about beta distribution, that is, On the ref/tutorials I've found on the net, why beta distribution always have value p(x) more than 1? As I know, any probability density function always have value not more than 1? is there any one who can explain to me, I'm not statistics people, but I need to code that needing some of this distribution function. thx before __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html