[R] Combining two ANOVA outputs of different lengths
Dear R users, I have been trying to combine two anova outputs into one single table (for later publication). The outputs are of different length, and share only some common explanatory variables. Using merge() or melt() (from the reshape package) did not work out. Here are the model outputs and what I would like to have: anova(model1) numDF denDF F-value p-value (Intercept) 174 0.063446 0.8018 days174 6.613997 0.0121 logdiv 174 1.587983 0.2116 leg 174 4.425843 0.0388 anova(model2) numDF denDF F-value p-value (Intercept) 173 165.94569 <.0001 funcgr 173 7.91999 0.0063 grass173 42.16909 <.0001 leg 173 4.72108 0.0330 funcgr:grass 173 8.49068 0.0047 #"merge(anova(model1),anova(model2),...)" F-value 1 p-val1 F-value 2 p-value 2 (Intercept) 0.0634460.8018 165.94569 <.0001 days6.6139970.0121 NA NA logdiv 1.5879830.2116 NA NA leg 4.4258430.0388 4.72108 0.033 funcgr NA NA 7.91999 0.0063 grass NA NA 42.16909<.0001 funcgr:grassNA NA 8.49068 0.0047 I would be glad if someone would have an idea of how to do this in principle. I am using R 2.5.1 on Windows XP. Thanks very much in advance! Best wishes Christoph -- Dr. Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen Germany phone +49(0)551 39 8807 fax +49(0)551 39 8806 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Selecting all values smaller than X in a dataframe
Dear R users, I have a correlation matrix for a dataframe called "synth", for which I now want to select only those cells that have correlations larger than +/-0.6: synth=data.frame(x=rnorm(10,1),y=rnorm(10,2),z=rnorm(10,0.5)) w=cor(synth,use="pairwise.complete.obs") w=as.data.frame(w) w[,sapply(w,abs(w),">",0.6)] The problem is that using sapply with ">" or "<" doesn´t seem to work. How could I solve this problem? Thank you very much in advance for your help! Best wishes Christoph (I am using R 2.5.0 on Windows XP). -- Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen +49-(0)551-39-8807 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading a big file
Dear Remigijus, You should change memory allocation in Windows XP, as described in http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021 Hope this helps. Best wishes Christoph -- Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen +49-(0)551-39-8807 Remigijus Lapinskas schrieb: > Dear All, > > I am on WindowsXP with 512 MB of RAM, R 2.4.0, and I want to read in a > big file mln100.txt. The file is 390MB big, it contains a column of 100 > millions integers. > >> mln100=scan("mln100.txt") > Error: cannot allocate vector of size 512000 Kb > In addition: Warning messages: > 1: Reached total allocation of 511Mb: see help(memory.size) > 2: Reached total allocation of 511Mb: see help(memory.size) > > In fact, I would be quite happy if I could read, say, every tenth > integer (line) of the file. Is it possible to do this? > > Cheers, > Rem > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > . > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chosing a subset of a non-sorted vector
Dear all, I have a tricky problem here: I have a dataframe with biodiversity data in which suplots are a repeated sequence from 1 to 4 (1234,1234,...) Now, I want to randomly pick two subplots each from each diversity level (DL). The problem is that it works up to that point - but if I try to subset the whole dataframe, I get stuck: DL=gl(3,4) subplot=rep(1:4,3) diversity.data=data.frame(DL,subplot) subplot.sampled=NULL for(i in 1:3) subplot.sampled=c(subplot.sampled,sort(sample(4,2,replace=F))) subplot.sampled [1] 3 4 1 3 1 3 subplot[subplot.sampled] [1] 3 4 1 3 1 3 ## here comes the tricky bit: diversity.data[subplot.sampled,] DL subplot 31 3 41 4 11 1 3.1 1 3 1.1 1 1 3.2 1 3 How can I select those rows of diversity.data that match the exact subplots in "subplot.sampled"? Thank you very much for your help! Best wishes, Christoph (I am using R 2.4.1 on Windows XP) ## Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen +49-(0)551-39-8807 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme4/lmer: P-Values from mcmc samples or chi2-tests?
Dear R users, I have now tried out several options of obtaining p-values for (quasi)poisson lmer models, including Markov-chain Monte Carlo sampling and single-term deletions with subsequent chi-square tests (although I am aware that the latter may be problematic). However, I encountered several problems that can be classified as (1) the quasipoisson lmer model does not give p-values when summary() is called (see below) (2) dependence on the size of the mcmc sample (3) lack of correspondence between different p-value estimates. How can I proceed, left with these uncertainties in the estimations of the p-values? Below is the corresponding R code with some output so that you can see it all for your own: ## m1<-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),primula,poisson,method="ML") m2<-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),primula,quasipoisson,method="ML") summary(m1);summary(m2) #m1: [...] Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.403020.57403 -0.7021 0.48262 logpatch 0.109150.04111 2.6552 0.00793 ** loghab 0.087500.06128 1.4279 0.15331 landscape_diversity 1.023380.40604 2.5204 0.01172 * #m2: [...] #for the quasipoisson model, no p values are shown?! Fixed effects: Estimate Std. Error t value (Intercept) -0.4030 0.6857 -0.5877 logpatch 0.1091 0.0491 2.2228 loghab0.0875 0.0732 1.1954 landscape_diversity 1.0234 0.4850 2.1099 ## anova(m1) #here, no p-values appear (only in the current version of lme4) Analysis of Variance Table Df Sum Sq Mean Sq logpatch 1 11.6246 11.6246 loghab 1 6.0585 6.0585 landscape_diversity 1 6.3024 6.3024 anova(m2) Analysis of Variance Table Df Sum Sq Mean Sq logpatch 1 11.6244 11.6244 loghab 1 6.0581 6.0581 landscape_diversity 1 6.3023 6.3023 So I am left with the p-values estimated from the poisson model; single-term deletion tests for each of the individual terms lead to different p-values; I restrict this here to m2: ## m2a=update(m2,~.-loghab) anova(m2,m2a,test="Chi") Data: primula Models: m2a: number_pollinators ~ logpatch + landscape_diversity + (1 | site) m2: number_pollinators ~ logpatch + loghab + landscape_diversity + (1 | site) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m2a 4 84.713 91.850 -38.357 m2 5 84.834 93.755 -37.417 1.8793 1 0.1704 ## m2b=update(m2,~.-logpatch) anova(m2,m2b,test="Chi") Data: primula Models: m2b: number_pollinators ~ loghab + landscape_diversity + (1 | site) m2: number_pollinators ~ logpatch + loghab + landscape_diversity + m2b: (1 | site) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m2b 4 90.080 97.217 -41.040 m2 5 84.834 93.755 -37.417 7.246 1 0.007106 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## m2c=update(m2,~.-landscape_diversity) anova(m2,m2c,test="Chi") Data: primula Models: m2c: number_pollinators ~ logpatch + loghab + (1 | site) m2: number_pollinators ~ logpatch + loghab + landscape_diversity + m2c: (1 | site) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m2c 4 89.036 96.173 -40.518 m2 5 84.834 93.755 -37.417 6.2023 10.01276 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The p-values from mcmc are: ## markov1=mcmcsamp(m2,5000) HPDinterval(markov1) lower upper (Intercept) -1.394287660 0.6023229 logpatch 0.031154910 0.1906861 loghab0.002961281 0.2165285 landscape_diversity 0.245623183 1.6442544 log(site.(In)) -41.156007604 -1.6993996 attr(,"Probability") [1] 0.95 ## mcmcpvalue(as.matrix(markov1[,1])) #i.e. the p value for the intercept [1] 0.3668 > mcmcpvalue(as.matrix(markov1[,2])) #i.e. the p-value for logpatch [1] 0.004 > mcmcpvalue(as.matrix(markov1[,3])) #i.e. the p-value for loghab [1] 0.0598 > mcmcpvalue(as.matrix(markov1[,4])) #i.e. the p-value for landscape.div [1] 0.0074 If one runs the mcmcsamp function for, say, 50,000 runs, the p-values are slightly different (not shown here). ##here are the p-values summarized in tabular form: (Intercept) 0.3668 logpatch0.004 loghab 0.0598 landscape_diversity 0.0074 ##and from the single-term deletions: (Intercept)N.A. logpatch0.007106 loghab 0.1704 landscape_diversity 0.01276 ## Compare this with the p-values from the poisson model: Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.403020.57403 -0.7021 0.48262 logpatch 0.109150.04111 2.6552 0.00793 ** loghab 0.087500.06128 1.4279 0.15331 landscape_diversity 1.023
Re: [R] lmer and estimation of p-values: error with mcmcpvalue()
Dear Henric, Thanks, now it works; but how reliable are these estimates? Especially with p-values close to 0.05 it is of course important that the range of the estimates is not too large. I´ve just run several simulations, each of which yielding sometimes quite different p-values. Best wishes Christoph ## Dr. rer. nat. Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen http://wwwuser.gwdg.de/~uaoe/Agroecology.html +49-(0)551-39-8807 Henric Nilsson (Public) schrieb: > Den Må, 2007-02-12, 13:58 skrev Christoph Scherber: >> Dear all, >> >> I am currently analyzing count data from a hierarchical design, and I?ve >> tried to follow the suggestions for a correct estimation of p-values as >> discusssed at R-Wiki >> (http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests&s=lme%20and%20aov). >> >> However, I have the problem that my model only consists of parameters >> with just 1 d.f. (intercepts, slopes), so that the "mcmcpvalue" function >> defined below obviously produces error messages. >> >> How can I proceed in estimating the p-values, then? >> >> I very much acknowledge any suggestions. >> >> Best regards >> Christoph. >> >> ## >> mcmcpvalue <- function(samp) >> { std <- backsolve(chol(var(samp)), >> >> cbind(0, t(samp)) - colMeans(samp), >> transpose = TRUE) >> >> >> sqdist <- colSums(std * std) >> sum(sqdist[-1] > sqdist[1])/nrow(samp) } >> >> m1<-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),quasipoisson) >> >> Generalized linear mixed model fit using Laplace >> Formula: number_pollinators ~ logpatch + loghab + landscape_diversity + >> (1 | site) >> Data: primula >> Family: quasipoisson(log link) >> AIC BIC logLik deviance >> 84.83 93.75 -37.4274.83 >> Random effects: >> Groups NameVariance Std.Dev. >> site (Intercept) 0.036592 0.19129 >> Residual 1.426886 1.19452 >> number of obs: 44, groups: site, 15 >> >> Fixed effects: >> Estimate Std. Error t value >> (Intercept) -0.4030 0.6857 -0.5877 >> logpatch 0.1091 0.0491 2.2228 >> loghab0.0875 0.0732 1.1954 >> landscape_diversity 1.0234 0.4850 2.1099 >> >> Correlation of Fixed Effects: >> (Intr) lgptch loghab >> logpatch 0.091 >> loghab -0.637 -0.121 >> lndscp_dvrs -0.483 -0.098 -0.348 >> >> >> markov1=mcmcsamp(m1,5000) >> HPDinterval(markov1) >> >> mcmcpvalue(as.matrix(markov1)[,1]) > > Try `mcmcpvalue(as.matrix(markov1[,1]))'. > > > HTH, > Henric > > > >> Error in colMeans(samp) : 'x' must be an array of at least two dimensions >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > . > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer and estimation of p-values: error with mcmcpvalue()
Dear all, I am currently analyzing count data from a hierarchical design, and I?ve tried to follow the suggestions for a correct estimation of p-values as discusssed at R-Wiki (http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests&s=lme%20and%20aov). However, I have the problem that my model only consists of parameters with just 1 d.f. (intercepts, slopes), so that the "mcmcpvalue" function defined below obviously produces error messages. How can I proceed in estimating the p-values, then? I very much acknowledge any suggestions. Best regards Christoph. ## mcmcpvalue <- function(samp) { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/nrow(samp) } m1<-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),quasipoisson) Generalized linear mixed model fit using Laplace Formula: number_pollinators ~ logpatch + loghab + landscape_diversity + (1 | site) Data: primula Family: quasipoisson(log link) AIC BIC logLik deviance 84.83 93.75 -37.4274.83 Random effects: Groups NameVariance Std.Dev. site (Intercept) 0.036592 0.19129 Residual 1.426886 1.19452 number of obs: 44, groups: site, 15 Fixed effects: Estimate Std. Error t value (Intercept) -0.4030 0.6857 -0.5877 logpatch 0.1091 0.0491 2.2228 loghab0.0875 0.0732 1.1954 landscape_diversity 1.0234 0.4850 2.1099 Correlation of Fixed Effects: (Intr) lgptch loghab logpatch 0.091 loghab -0.637 -0.121 lndscp_dvrs -0.483 -0.098 -0.348 markov1=mcmcsamp(m1,5000) HPDinterval(markov1) mcmcpvalue(as.matrix(markov1)[,1]) Error in colMeans(samp) : 'x' must be an array of at least two dimensions __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Poisson example from Snedechor & Cochran
Dear R users, I am trying to reproduce table 7.4.12 (page 131) from Snedechor & Cochran (eigth edition); the example is counts of weed seeds with a fitted Poisson distribution, tested for goodness-of-fit using a Chi-square: observed=c(3,17,26,16,18,9,3,5,0,1,0,0) expected=dpois(0:11,lambda=3.020408)*98 chisq.test(observed,p=expected,rescale.p=T) Now the problem I have is that chisq.test gives me the chi-squared value of roughly 8.30 (which is the value given by Snedechor & Cochran), but I am wondering why the warning message occurs at the end of the test. Further, it is not clear to me how these calculations could be done using the full dataset of N=98 observations: observed.full=rep(0:11,c(3,17,26,16,18,9,3,5,0,1,0,0)) What would the correct specification for a chisq.test against a poisson distribution look like in this case? Thanks for your help! Best wishes Christoph ## Chi-squared test for given probabilities data: observed X-squared = 8.2628, df = 8, p-value = 0.4082 Warning message: Chi-squared approximation may be incorrect in: chisq.test(observed, p = expected, rescale.p = T) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Poisson example from Snedechor & Cochran
Dear R users, I am trying to reproduce table 7.4.12 (page 131) from Snedechor & Cochran (eigth edition); the example is counts of weed seeds with a fitted Poisson distribution, tested for goodness-of-fit using a Chi-square: observed=c(3,17,26,16,18,9,3,5,0,1,0,0) expected=dpois(0:11,lambda=3.020408)*98 chisq.test(observed,p=expected,rescale.p=T) Now the problem I have is that chisq.test gives me the chi-squared value of roughly 8.30 (which is the value given by Snedechor & Cochran), but I am wondering why the warning message occurs at the end of the test. Further, it is not clear to me how these calculations could be done using the full dataset of N=98 observations: observed.full=rep(0:11,c(3,17,26,16,18,9,3,5,0,1,0,0)) What would the correct specification for a chisq.test against a poisson distribution look like in this case? Thanks for your help! Best wishes Christoph ## Chi-squared test for given probabilities data: observed X-squared = 8.2628, df = 8, p-value = 0.4082 Warning message: Chi-squared approximation may be incorrect in: chisq.test(observed, p = expected, rescale.p = T) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Batch printing of existing postscript files with file names included
Dear R users, I have created a series of postscript files and I´d like to print them with the file name added to the printout. Is there a way of reading these files into R (e.g. using rimage after conversion to jpeg), adding the file name, and then sending the files to a windows printer? I have already tried the rimage package, and several image batch conversion program, but none was so far able to do the job, unfortunately. I´m using R 2.3.0 on Windows XP. Thank you very much for your help! Best regards Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Use of step() with an unbalanced ANOVA model
Dear all, Is it dangerous to use step() during model simplification when I have an ANOVA design that is unbalanced (i.e. there is order dependence when entering the terms into the full model)? Thanks very much for your help! Kind regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] xyplot question
Dear R users, I have a question regarding the use of xyplot in the lattice() package. I have two factors (each with two levels), and I´d like to change the order of the panels in a 2x2 panel layout from the default alphabetic order that R uses based on the names of the factor levels. My approach is (in principle) xyplot(y~x|Factor1+Factor2) Let´s assume, my factor levels for Factor1 are A and B, and for Factor2 they´re C and D, respectively. Now the default arrangement of my panels would be (from bottom top left to bottom right): "BC","CA","BD","AD" What I´d like to have is "BD","AC","BC","AD". Can anyone tell me how to solve this problem easily? I´ve read that using "perm.cond" and/or "index.cond" could solve this problem, but couldn´t find an appropriate example, unfortunately... Thank you very much for your help! Regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Calculation of the median in survfit()
Dear R users, Can anyone tell me how the medians in survfit() are computed? I´ve looked it up in the source code (print.survfit.s version 4.19 07/09/00), but I´m not a programmer...;-) Especially, I´d like to know what the pfun() function inside print.survfit.s() works. The help file does describe the calculation of the median, but I´d like to know if and how this differs from just directly calculating the median of the survival times (e.g. using tapply(time,factor,median)). I would be most grateful for any help. Thanks a lot! Christoph ### [I am using R 2.1.1 on WinXP with the latest version of the survival package] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Mean survival times
Dear list, I have data on insect survival in different cages; these have the following structure: deathtime status id cageS F G L S 1.5 1 1 C1 8 2 1 1 1 1.5 1 2 C1 8 2 1 1 1 11.5 1 3 C1 8 2 1 1 1 11.5 1 4 C1 8 2 1 1 1 There are 81 cages and each 20 individuals whose survival was followed over time. The columns S,F,G,L and S are experimentally manipulated factors thought to have an influence on survival. Using survfit(Surv(deathtime,status)~cage) gives me the survivorship curves for every cage. But what I´d like to have is a mean survivorship value for every cage. Obviously, using tapply (deathtime,cage,mean) gives me mean values, but I´d like to have a better estimate of this using a proper statistical model. I´ve tried a glm with poisson errors (as suggested in Crawley´s book, page 628), but the back-transformed estimates (using status as the response variable and deathtime as an offset) were totally unrealistic. As I´m new to survival analysis, it would be great if anyone could give me some hints on what method would be best. Thanks a lot! Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] trellis: positioning of key
Dear R users, Using xyplot(), how can I position the key in the *margin* outside the plotting area ? My problem is that the key always overlaps with the x axis labels, no matter how I try to specify any of the par() arguments (e.g. oma()). Many thanks for any suggestions! Christoph ### for information, here´s the code I use par(oma=c(0,0,3,0)) ###this, I think, is what should be changed xyplot(jitter(height,factor=1) ~ sowndiv | time2, groups=treatment, data=caging04051, subscripts=T, xlab=list("Diversity",cex=1.5), ylab=list("Height",cex=1.5), layout=c(3, 1), key=list(space="top",border=TRUE, columns=1, transparent=FALSE, points=list(pch=c(1,2),cex=2), lines=list(col=c(1,1), lty=c(1,2), lwd=c(1.5,1.5)), text=list(levels(treatment), cex=1.2)), par.strip.text=list(cex=1.2), strip = function(..., strip.names) strip.default(..., strip.names=c(FALSE, TRUE), style = 1), scales=list( x=list(at=c(1, 2, 4, 8, 16, 60), labels=c(1,2,4,8,16,60),log=TRUE,cex=1.4,tck=0.02), y=list(tck=0.02)), panel=function(x, y, subscripts, groups){ panel.superpose(x, y, subscripts, groups,cex=1.4) which <- groups[subscripts] panel.loess(x[which=="C"],y[which=="C"],lty=1) panel.loess(x[which=="H"],y[which=="H"],lty=2) }) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] standard errors in summary.lm
Dear R users, If I have an aov object, how are the standard errors of the estimates in summary.lm calculated? Using treatment contrasts, I would like to use the estimated differences in mean values ("intercepts") to calculate the mean values per factor level, and for these mean values I´f like to use the model output to calculate the standard errors. Many thanks for your help! Regards, Christoph. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregating using several functions
Dear R users, I would like to aggregate a data frame using several functions at once (e.g., mean plus standard error). How can I make this work using aggregate()? The help file says scalar functions are needed; can anyone help? Below is the code for my "meanse" function which I´d like to use like this: aggregate(dataframe, list(factorA,factoB),meanse) Thanks for your help! Christoph meanse<-function(x,...,na.rm=T){ if(na.rm) x<-x[!is.na(x)] m<-mean(x) s<-sd(x) l<-length(x) serr<-s/sqrt(l) return(data.frame(list("Mean"=m,"Std.Error"=serr))) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] dataframe (matrix) multiplication
Dear R users, Suppose I have 2 parts of a dataframe, say ABCD 2143 3245 2154 (the real dataframe is 160 columns with each 120 rows) and I want to multiply every element in [,A:B] with every element in [,C:D]; What is the most elegant way to do this? I´ve been thinking of converting [,A:B] to a matrix, and then multiplying it with the inverse of [,C:D]; would that be correct? The result should look like E;F 8;3 12;10 10;4 Thanks very much for any suggestions Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lme and ordering of terms
Dear R users, When fitting a lme() object (from the nlme library), is it possible to test interactions *before* main effects? As I understand, R conventionally re-orders all terms such that highest-order interactions come last - but I´d like to know if it´s possible (and sensible) to change this ordering of terms. I´ve tried the terms() command (from aov) but I don´t know if something similar exists for lme() objects. Thanks a lot for your help! Best wishes Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] regression with more than one observation per x value
Dear R users, How can I do a regression analysis in R where there is more than one observation per x value? I tried the example in Sokal&Rohlf (3rd edn., 1995), page 476 ff., but I somehow couldn´t find a way to partition the sums of squares into "linear regression", "deviations from regression", and within-groups SS. I tried model1<-lm(y~as.numeric(x)+as.factor(x) #with treatment contrasts but I am sure there´s a better way around it. I would be very happy if anyone could give me some suggestions on this. Best regards Christoph. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] JGR and R 2.0.X
Dear all, I am running Windows XP with several parallel installations of R (2.0.1; 2.1 and so on). How can I install JGR for the 2.0.1 version? I keep on getting error messages when trying to install it. Best wishes Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] "apply" question
Dear R users, I´ve got a simple question but somehow I can´t find the solution: I have a data frame with columns 1-5 containing one set of integer values, and columns 6-10 containing another set of integer values. Columns 6-10 contain NA´s at some places. I now want to calculate (1) the number of values in each row of columns 6-10 that were NA´s (2) the sum of all values on columns 1-5 for which there were no missing values in the corresponding cells of columns 6-10. Example: (let´s call the data frame "data") Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 1 2 5 2 3 NA 5 NA1 4 3 1 4 5 2 6 NA 4 NA 1 The result would then be (for the first row) (1) "There were 2 NA´s in columns 6-10." (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA´s in the 1st and 3rd position in rows 6-10) So far, I know how to calculate the rowSums for the data.frame, but I don´t know how to condition these on the values of columns 6-10 rowSums(data[,1:5]) #that´s straightforward apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine But I don´t know how to select just the desired values of columns 1-5 (as described above) Can anyone help me? Thanks a lot in advance! Best regards Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glm with poisson errors
Dear R users, I have data on percentage leaf area damaged (in classes, e.g. 1%, 5%, 10%) in plants. My two questions are: (1) Could I use a glm with poisson errors on these data? (2) Could I still use this glm with poisson errors after arcsine transformation of the data? Thank you very much for your help! Best regards Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] some help interpreting ANOVA results, please?
Dear RenE, Can you explain a bit more how you derive your T.SPart? That´s what I think is the tricky part of your analysis. I would suggest you should try to end up with something like this: model1<-aov(SR~WasSick*Time+Error(Subject/Time) model2<-aov(SR~SC*Time+Error(Subject/Time) This way it would be like a repeated measures ANOVA, where WasSick (or SC) are the primary covariates, and Time is nested within Subject. I think the correct specification of "time" is crucial for the whole analysis. It´s like in a split-plot ANOVA, where finding the appropriate codings for plots of different sizes can sometimes take a very long time. Regards, Christoph 0) Subject, the subject identifier 1) physiological recordings, say SR (skin resistance): time series 2) a SessionPart variable (parts R1 and R2, separated in time by a pause) 3) time, T.SPart: normalised per subject and per SessionPart, so twice 0..1 4) a subjective sickness estimate (SC): time series 5) a per-subject classification: WasSick or not (available as a time series, but constant in time of course) RenE J.V. Bertin wrote: On Sun, 10 Oct 2004 19:55:41 +0200, "RenE J.V. Bertin" <[EMAIL PROTECTED]> wrote regarding "Re: [R] some help interpreting ANOVA results, please?" I'm would like to come back to a question I posted quite a while ago, concerning the analysis of data of an ongoing experiment. I have, for a given number of subjects: 0) Subject, the subject identifier 1) physiological recordings, say SR (skin resistance): time series 2) a SessionPart variable (parts R1 and R2, separated in time by a pause) 3) time, T.SPart: normalised per subject and per SessionPart, so twice 0..1 4) a subjective sickness estimate (SC): time series 5) a per-subject classification: WasSick or not (available as a time series, but constant in time of course) I would like to make statements on whether or not sickness (measured by 4 or 5) can be deduced from the physiological recordings, e.g. something like aov( SR ~ WasSick * T.SPart ) expecting a significant effect of time (sickness building up), of WasSick, and a significant interaction showing that the effect is stronger (or only significant) in the WasSick=TRUE subjects. A simple t.test(SR~WasSick) gives a significant difference, as well as t.test( SR~ (T.SPart>=0.5) ) . The problem I'm having is that WasSick (and SC) are not independent variables properly speaking. So I cannot do aov( SR ~ WasSick * T.SPart + Error(Subject/WasSick*T.SPart) ) R would remove WasSick from the Error term, and do the analysis without it, giving a significant T.SPart effect and WasSick:T.SPart interaction (?), both listed under Error: Subject:T.SPart : Error: Subject:T.SPart Df Sum Sq Mean Sq F value Pr(>F) T.SPart 5 318.263.6 8.336 7.46e-07 *** WasSick:T.SPart 5 125.525.1 3.289 0.0079 ** Residuals 129 984.9 7.6 There is no trace of a WasSick effect other than in that interaction (of which I'm not sure it is truly one). I have 2 questions at this point: A) I think one could assimilate WasSick to a grouping variable (like in a clinical stdudy), forgetting it is actually an observation on the subjects. In that case, I could do aov( SR ~ WasSick * T.SPart ) which gives me the expected two significant main effects and the significant interaction (which agrees with visual inspection of the data). Is this an acceptable approach/model? B) Should I contine putting the Subject id in an Error term, e.g. aov( SR ~ WasSick + Error(Subject) ) WithOUT this error term, that anova gives a significant effect, confirming the t.test mentioned above. If I include the error term, the effect is no longer significant. Is that because the model does not make sense, rather because my data are so non-normal that a t.test cannot be used? (?Error has a similar model, and calls it "not particularly sensible statistically".) I would really appreciate some more constructive comments! Thanks, RenE Bertin PS: I must add that it has been suggested to try lme. I went over what docs I have (help and MASS 4), but these are far to specialistic for me, so I haven't gotten anywhere in that direction :( __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] xyplot() question
Dear R Users, I have an xyplot() where different plotting symbols are used for subgroups (originally used within S-Plus, but hopefully it´s also applicable to R users). How can I fit separate regression lines for every subgroup? So far, I can only plot the overall fitted line. The code looks like this: trellis.device() sps<-trellis.par.get("superpose.symbol") sps$pch<-1:7 trellis.par.set("superpose.symbol",sps) spl <- trellis.par.get("superpose.line") ps$lty <- 1:7 trellis.par.set("superpose.line",spl) xyplot(a~b|factor+treatment, groups=external,data=ownframe,layout=c(2,2), panel=function(x,y,subscripts,...){panel.superpose(x,y,subscripts,...);panel.lmline(x,y)}) Thank you very much for your help! Regards Christioph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] panel plot with log-scaled x-axis
Hi, I´ll try to rephrase my question using the following example: xyplot(conc~uptake|Treatment+Type,data=CO2) Would it be possible to make each of the x axes log-scaled (but with the same tick labels)? Thanks very much! Christoph Deepayan Sarkar wrote: On Tuesday 08 February 2005 10:19, Christoph Scherber wrote: Dear all, Can anyone help me plotting a panel plot with log-scaled x axes? My formula looks like this: coplot(response~div|treatment+grass, panel=function(x,y,...){panel.xyplot(x,y,...);panel.lmline(x,y,...)}) Are you sure that actually works? And I´d like to have "div" plotted in log scale. How to do that would depend at least on whether you are using coplot or xyplot (neither may have good solutions implemented). Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] panel plot with log-scaled x-axis
Dear all, Can anyone help me plotting a panel plot with log-scaled x axes? My formula looks like this: coplot(response~div|treatment+grass, panel=function(x,y,...){panel.xyplot(x,y,...);panel.lmline(x,y,...)}) And I´d like to have "div" plotted in log scale. Thanks very much for your help! Best regards Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] integration function
Dear R users, I have tried to write a function which gives the step-wise integral for an exponential function (moving from -3 to 3 in steps of 0.1, where the output for every step shall be the integral under the curve of y against x. However, something seems to be wrong with this function; can anyone please help me? x<-seq(-3,3,0.1) y<-exp(x) integral<-function(z,a,b,step){ for(i in (1:((b-a)/step))){ c<-0 c[i]<-integrate(z,lower=a+(i-1)*step,upper=a+i*step) print(c$integral) }} integral(y,-3,3,0.1) Best regards Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] publishing random effects from lme
Hi Dieter, Yes, I´ve tried both options. The anova(lme(...)) gives me good results for the fixed effects part, but what I´m specifically interested in is what to do with the random effects. I have tried glmmPQL (generalized linear mixed-effects models), which did in fact greatly help account for heteroscedasticity, but I can´t do model simplification with these models (and they´re still heavily debated, as I read from previous postings to "R Help". How would you deal with the random effects part of the models when publishing results from lme? Thanks for your help! Christoph ### Here are my original questions once again (with an example below): 1) What is the total variance of the random effects at each level? (2) How can I test the significance of the variance components? (3) Is there something like an "r squared" for the whole model which I can state? ##it seems, there isn´t (as I learned from a previous posting The data come from an experiment on plant performance with and without insecticide, with and without grasses present, and across different levels of plant diversity ("div"). Thanks for your help! Christoph. lme(asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass, random = ~ 1 | plotcode/treatment, na.action = na.exclude, method = "ML") Linear mixed-effects model fit by maximum likelihood Data: NULL AIC BIC logLik -290.4181 -268.719 152.209 Random effects: Formula: ~ 1 | plotcode (Intercept) StdDev: 0.04176364 Formula: ~ 1 | treatment %in% plotcode (Intercept) Residual StdDev: 0.08660458 0.00833387 Fixed effects: asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass Value Std.Error DF t-value p-value (Intercept) 0.1858065 0.01858581 81 9.997225 <.0001 treatment 0.0201384 0.00687832 81 2.927803 0.0044 logb(div + 1, 2) -0.0203301 0.00690074 79 -2.946073 0.0042 grass 0.0428934 0.01802506 79 2.379656 0.0197 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -0.2033155 -0.05739679 -0.00943737 0.04045958 0.3637217 Number of Observations: 164 Number of Groups: plotcode ansatz %in% plotcode 82 164 Dieter Menne wrote: Suppose I have a linear mixed-effects model (from the package nlme) with nested random effects (see below); how would I present the results from the random effects part in a publication? Have you tried anova(lme())? Your asin(sqrt()) looks a bit like these are percentages of counts. The method is still quoted in old books, but has fallen a bit out of favor. Have you thought of some glm model instead (http://www.stats.ox.ac.uk/pub/MASS4/)? Dieter Menne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Split-split plot ANOVA
Hi Mike, Do you have a schematic drawing of how exactly your treatments were applied? In split-plot experiments, it is generally very important to clearly define the sequence of plot sizes, because if you don´t do this properly, then the output will be confusing. Checking if your degrees of freedom at each level are correct should give you a good idea about whether you´ve specified the model in the right way. Generally, I see some problem with your model specification as you seem to have two (not one) treatments in some of your subplots. If I got it right, the sequence of terms should be something like Block/Whole.plot/Caging/Competition/Species at least if it´s a full split-plot. Can you send me some more details on the design? Regards, Christoph [EMAIL PROTECTED] wrote: I have been going over and over the examples in MASS and the Pinheiro and Bates example, but cannot get my model to run correctly with either aov or lme. Could someone give me a hand with the correct model statement? It would help to see some of the things you have tried already ... First a description of the design. We are studying germination rates for various species under a variety of treaments. This is a blocked split-split plot design. The levels and treatments are: Blocks: 1-6 Whole plot treatment: Overstory: Yes or No Split plot treatments: Caging (to protect against seed predators): Yes or No Herbaceous competition (i.e., grass): Yes or No Split-split plot treatment: Tree species: 7 kinds The response variable is Lag, which is a indication of when the seeds first germinated. I would try somthing like lme (fixed= Lag ~ Caging + herbaceous + tree, data= your.data, random= ~ 1 | Overstory/split/splitsplit) Perhaps you want/need to add some interactions as well. Overstory, split and splitsplit would be factors with specific levels for each of the plots, split plots and split-split plots, respectively. Thus what I attempted here is to separate the variables of the hierarchical design of data gathering (which go into the random effects) and the treatments (which go into the fixed effects). The degrees of freedom for the fixed effects are automatically adjusted to the correct level in the hierarchy. Did you try that? What did not work out with it? Lastly, I have unbalanced data since some treatment combinations never had any germination. In principle, the REML estimates in lme are not effected by unbalanced data. BUT I do not think that the missing germinations by themselves lead to an unbalanced data set: I assume it is informative that in some treatment combinations there was no germination. Thus, your lag there is something close to infinity (or at least longer than you cared to wait ;-). Thus, I would argue you have to somehow include these data points as well, otherwise you can only make a very restricted statement of the kind: if there was germination, this depended on such and such. Since the data are highly nonnormal, I hope to do a permutations test on the F-values for each main effect and interaction in order to get my p-values. As these are durations a log transformation of your response might be enough. Regards, Lorenz - Lorenz Gygax, Dr. sc. nat. Centre for proper housing of ruminants and pigs Swiss Federal Veterinary Office agroscope FAT Tänikon, CH-8356 Ettenhausen / Switzerland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] random effects in lme
Dear all, Suppose I have a linear mixed-effects model (from the package nlme) with nested random effects (see below); how would I present the results from the random effects part in a publication? Specifically, I´d like to know: (1) What is the total variance of the random effects at each level? (2) How can I test the significance of the variance components? (3) Is there something like an "r squared" for the whole model which I can state? The data come from an experiment on plant performance with and without insecticide, with and without grasses present, and across different levels of plant diversity ("div"). Thanks for your help! Christoph. lme(asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass, random = ~ 1 | plotcode/treatment, na.action = na.exclude, method = "ML") Linear mixed-effects model fit by maximum likelihood Data: NULL AIC BIC logLik -290.4181 -268.719 152.209 Random effects: Formula: ~ 1 | plotcode (Intercept) StdDev: 0.04176364 Formula: ~ 1 | treatment %in% plotcode (Intercept) Residual StdDev: 0.08660458 0.00833387 Fixed effects: asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass Value Std.Error DF t-value p-value (Intercept) 0.1858065 0.01858581 81 9.997225 <.0001 treatment 0.0201384 0.00687832 81 2.927803 0.0044 logb(div + 1, 2) -0.0203301 0.00690074 79 -2.946073 0.0042 grass 0.0428934 0.01802506 79 2.379656 0.0197 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -0.2033155 -0.05739679 -0.00943737 0.04045958 0.3637217 Number of Observations: 164 Number of Groups: plotcode ansatz %in% plotcode 82 164 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] publishing random effects from lme
Dear all, Suppose I have a linear mixed-effects model (from the package nlme) with nested random effects (see below); how would I present the results from the random effects part in a publication? Specifically, I´d like to know: (1) What is the total variance of the random effects at each level? (2) How can I test the significance of the variance components? (3) Is there something like an "r squared" for the whole model which I can state? The data come from an experiment on plant performance with and without insecticide, with and without grasses present, and across different levels of plant diversity ("div"). Thanks for your help! Christoph. lme(asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass, random = ~ 1 | plotcode/treatment, na.action = na.exclude, method = "ML") Linear mixed-effects model fit by maximum likelihood Data: NULL AIC BIC logLik -290.4181 -268.719 152.209 Random effects: Formula: ~ 1 | plotcode (Intercept) StdDev: 0.04176364 Formula: ~ 1 | treatment %in% plotcode (Intercept) Residual StdDev: 0.08660458 0.00833387 Fixed effects: asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass Value Std.Error DF t-value p-value (Intercept) 0.1858065 0.01858581 81 9.997225 <.0001 treatment 0.0201384 0.00687832 81 2.927803 0.0044 logb(div + 1, 2) -0.0203301 0.00690074 79 -2.946073 0.0042 grass 0.0428934 0.01802506 79 2.379656 0.0197 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -0.2033155 -0.05739679 -0.00943737 0.04045958 0.3637217 Number of Observations: 164 Number of Groups: plotcode ansatz %in% plotcode 82 164 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] self-written function
Dear Jari, I had included the test for NA´s just to be sure the function can handle datasets with missing values. Here´s an updated version without this specification: backsin<-function(x,y){ backtransf<-list() back<-((sin(x))^2)*100 backtransf$mback<-tapply(back,y,mean) backtransf$sdback<-tapply(back,y,sd)/sqrt(length(y)) backtransf } Jari Oksanen wrote: Christoph, I guess your "NA" refers to Not Available (missing, NA). In that case, you should check it with is.na() instead of != "NA". See this: x <- c(3,4,NA,5) x != "NA" [1] TRUE TRUE NA TRUE !is.na(x) [1] TRUE TRUE FALSE TRUE No idea if this helps, but it was a problem with the code anyway. cheers, jari oksanen On Fri, 2005-01-28 at 11:04 +0100, Christoph Scherber wrote: Hi! OK, here are some more details on the function: My dataframe consists of several columns of categorical variables (let´s call them A,B,C) plus a column with a response variable y (which is arcsine-square root transformed proportions) I am now trying to write a function that automatically gives me the back-transformed mean values + standard errors for y for each column in the dataframe. Ideally, this would be something like tapply(y,list(A,B,C,D),backtransformed.mean) Here is the correct version of the function: backsin<-function(x,y){ backtransf<-list() back<-((sin(x[x!="NA"]))^2)*100 backtransf$mback<-tapply(back,y[x!="NA"],mean) backtransf$sdback<-tapply(back,y[x!="NA"],std)/sqrt(length(y[x!="NA"])) backtransf } Regards, Christoph Petr Pikal wrote: Hi I am not sure if anybody gave you a reply, but do you think you gave enough detail about your data, what you expect, what you did, what was the response and how it differ from expected output and best of all ***working*** example? BTW, what is stdev? If you wanted to compute standard deviation sd is enough. Cheers Petr On 27 Jan 2005 at 12:20, Christoph Scherber wrote: Dear all, I´ve got a simple self-written function to calculate the mean + s.e. from arcsine-transformed data: backsin<-function(x,y,...){ backtransf<-list() backtransf$back<-((sin(x[x!="NA"]))^2)*100 backtransf$mback<-tapply(backtransf$back,y[x!="NA"],mean) backtransf$sdback<-tapply(backtransf$back,y[x!="NA"],std)/sqrt(length(y[x!="NA"])) backtransf } I would like to apply this function to whole datasets, such as tapply(variable,list(A,B,C,D),backsin) Of course, this doesn´t work with the way in which the backsin() function is specified. Does anyone have suggestions on how I could improve my function? Regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] self-written function
Hi! OK, here are some more details on the function: My dataframe consists of several columns of categorical variables (let´s call them A,B,C) plus a column with a response variable y (which is arcsine-square root transformed proportions) I am now trying to write a function that automatically gives me the back-transformed mean values + standard errors for y for each column in the dataframe. Ideally, this would be something like tapply(y,list(A,B,C,D),backtransformed.mean) Here is the correct version of the function: backsin<-function(x,y){ backtransf<-list() back<-((sin(x[x!="NA"]))^2)*100 backtransf$mback<-tapply(back,y[x!="NA"],mean) backtransf$sdback<-tapply(back,y[x!="NA"],std)/sqrt(length(y[x!="NA"])) backtransf } Regards, Christoph Petr Pikal wrote: Hi I am not sure if anybody gave you a reply, but do you think you gave enough detail about your data, what you expect, what you did, what was the response and how it differ from expected output and best of all ***working*** example? BTW, what is stdev? If you wanted to compute standard deviation sd is enough. Cheers Petr On 27 Jan 2005 at 12:20, Christoph Scherber wrote: Dear all, I´ve got a simple self-written function to calculate the mean + s.e. from arcsine-transformed data: backsin<-function(x,y,...){ backtransf<-list() backtransf$back<-((sin(x[x!="NA"]))^2)*100 backtransf$mback<-tapply(backtransf$back,y[x!="NA"],mean) backtransf$sdback<-tapply(backtransf$back,y[x!="NA"],std)/sqrt(length(y[x!="NA"])) backtransf } I would like to apply this function to whole datasets, such as tapply(variable,list(A,B,C,D),backsin) Of course, this doesn´t work with the way in which the backsin() function is specified. Does anyone have suggestions on how I could improve my function? Regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] self-written function
Hi! I´m using the function for a data frame, but up to now it only works with single vectors, such as backsin(variable,grouping.factor) Actually, you´re right, the "..." in the function definition is not needed...:-) Regards Christoph Erin Hodgess wrote: Hello Christoph! I have a question about your question, please: In your first line of code, backsin <- function(x,y,...){ why do you have the three dots at the end? Also, what kind of data sets are you applying this to, please? Data frames or Matrixes? Thanks, Erin mailto: [EMAIL PROTECTED] Dear all, I´ve got a simple self-written function to calculate the mean + s.e. from arcsine-transformed data: backsin<-function(x,y,...){ backtransf<-list() backtransf$back<-((sin(x[x!="NA"]))^2)*100 backtransf$mback<-tapply(backtransf$back,y[x!="NA"],mean) backtransf$sdback<-tapply(backtransf$back,y[x!="NA"],stdev)/sqrt(length(y[x!="NA"])) backtransf } I would like to apply this function to whole datasets, such as tapply(variable,list(A,B,C,D),backsin) Of course, this doesn´t work with the way in which the backsin() function is specified. Does anyone have suggestions on how I could improve my function? Regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] self-written function
Dear all, I´ve got a simple self-written function to calculate the mean + s.e. from arcsine-transformed data: backsin<-function(x,y,...){ backtransf<-list() backtransf$back<-((sin(x[x!="NA"]))^2)*100 backtransf$mback<-tapply(backtransf$back,y[x!="NA"],mean) backtransf$sdback<-tapply(backtransf$back,y[x!="NA"],stdev)/sqrt(length(y[x!="NA"])) backtransf } I would like to apply this function to whole datasets, such as tapply(variable,list(A,B,C,D),backsin) Of course, this doesn´t work with the way in which the backsin() function is specified. Does anyone have suggestions on how I could improve my function? Regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lme and varFunc()
Dear all, I am expecting a Poisson error distribution in my lme with weights=varFunc(). The "weigths= varPower (form= fitted (.))" doesn´t work due to missing values in the response: Problem in lme.formula(fixed = sqrt(nrmainaxes + 0...: Maximum number of iterations reached without convergence. Use traceback() to see the call stack That´s why I´ve used one of my most important explanatory variables as a variance covariate: weigths= varPower (form=~explanatory) With that, it worked out properly so far. What would your suggestion in such a case be? Regards Christoph [EMAIL PROTECTED] wrote: I am currently analyzing a dataset using lme(). The model I use has the following structure: model<-lme(response~Covariate+TreatmentA+TreatmentB, random=~1|Block/Plot,method="ML") When I plot the residuals against the fitted values, I see a clear positive trend (meaning that the variance increases with the mean). I tried to solve this issue using weights=varPower(), but it doesn´t change the residual plot at all. You are aware that you need to use something like weigths= varPower (form= fitted (.)) and the plot residuals using e.g. scatter.smooth (fitted (model), resid (model, type= 'n')) Maybe the latter should also be ok with the default pearson residuals, but I am not sure. Possibly a look into the following would help? @Book{Pin:00a, author ={Pinheiro, Jose C and Bates, Douglas M}, title = {Mixed-Effects Models in {S} and {S}-{P}{L}{U}{S}}, publisher = {Springer}, year = {2000}, address = {New York} } How would you implement such a positive trend in the variance? I´ve tried glmmPQL (which works great with poisson errors), but using glmmPQL I can´t do model simplification. Well, what error distribution do you have / do you expect? Regards, Lorenz - Lorenz Gygax, Dr. sc. nat. Centre for proper housing of ruminants and pigs Swiss Federal Veterinary Office __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Box-Cox / data transformation question
Dear R users, Is it reasonable to transform data (measurements of plant height) to the power of 1/4? I´ve used boxcox(response~A*B) and lambda was close to 0.25. Regards, Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lme and varFunc()
Dear all, Regarding the lme with varFunc() question I posted a few days ago: I have used the following two approaches: model1<-lme(response~Covariate+Block+TreatmentA+TreatmentB,random=~1|Plot/Subplot,method="ML") model2a<-update(model1,weights=varPower(form=~ fitted(.))) model2b<-update(model1,weights=varPower(form=~block)) While model2a produces an error "Problem in .C("mixed_loglik",: subroutine mixed_loglik: Missing values in argument 1 Use traceback() to see the call stack" Model 2b seems to work fine, now. I´m not sure why model2a doesn´t work, but using an important explanatory variable (block) as a variance covariate seems to do a better job (although I don´t really understand why) Does anyone have an explanation for this? Regards, Chris. Andrew Robinson wrote: Dear Christoph, what command are you using to plot the residuals? If you use the default residuals it will not reflect the variance model. If you use the argument type="p" then you get the Pearson residuals, which will reflect the weights model. Try something like this: plot(model, resid(., type = "p") ~ fitted(.), abline = 0) I hope that this helps, Andrew On Mon, Jan 24, 2005 at 02:28:44PM +0100, Christoph Scherber wrote: Dear R users, I am currently analyzing a dataset using lme(). The model I use has the following structure: model<-lme(response~Covariate+TreatmentA+TreatmentB,random=~1|Block/Plot,method="ML") When I plot the residuals against the fitted values, I see a clear positive trend (meaning that the variance increases with the mean). I tried to solve this issue using weights=varPower(), but it doesn?t change the residual plot at all. How would you implement such a positive trend in the variance? I?ve tried glmmPQL (which works great with poisson errors), but using glmmPQL I can?t do model simplification. Many thanks for your help! Regards Chris. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lme and varFunc()
Dear R users, I am currently analyzing a dataset using lme(). The model I use has the following structure: model<-lme(response~Covariate+TreatmentA+TreatmentB,random=~1|Block/Plot,method="ML") When I plot the residuals against the fitted values, I see a clear positive trend (meaning that the variance increases with the mean). I tried to solve this issue using weights=varPower(), but it doesn´t change the residual plot at all. How would you implement such a positive trend in the variance? I´ve tried glmmPQL (which works great with poisson errors), but using glmmPQL I can´t do model simplification. Many thanks for your help! Regards Chris. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] comparing glmmPQL models
Dear R users, Is there a way to compare glmmPQL models differing in their fixed-effects structure (similar to the ANOVA approach in lme) ? Thank you very much for your help! Chris. This mail was sent through http://webmail.uni-jena.de __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] mixed effects model:how to include initial conditions
Dear R users, I am analyzing a dataset on growth of plants in response to several factors. I am using a mixed-effects model of the following structure: model<-lme(growth~block*treatment*factor1*factor2, random=~1|plot/treatment/initialsize) I have measured the initial size of the plants (in 2003) and thought it might be sensible to include this (random) variation into the random effects term of the model. Is that correct? Or should "initialsize" rather be included as a covariate into the fixed effects term, as in: alternative<-lme(growth~block*initialsize*treatment*factor1*factor2, random=~1|plot/treatment) I would very much appreciate any suggestions on how to analyze these data correctly. Best regards Chris. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Naming Convention
Dear Reinhold, All entries are allowed except "price swap" or "price_swap" Of course it´s most convenient to use short names and small letters for quicker typing. Regards Christoph Quoting "Hafner, Reinhold (Risklab)" <[EMAIL PROTECTED]>: > I was wondering whether there exists a naming convention for row and column > names in R data frames and matrices. > E.g: PriceSwap or PRICESWAP or PRICE.SWAP > > Many thanks. > Reinhold > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > This mail was sent through http://webmail.uni-jena.de __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Calculate Mean of Column Vectors?
Dear Thomas, My suggestion would be as follows: y <- rnorm(3000) dim(y) <- c(3, 1000). #then create three column vectors out of y using cbind(): w<-cbind(y[1,],y[2,],y[3,]) # and calculate the row means for (i in 1:1000) z[i]<-mean(w[i,1:3]) z Hope I got it right! Regards, Christoph Thomas Hopper wrote: Hello, I've got an array defined as y <- rnorm(3000), dim(y) <- c(3, 1000). I'd like to produce a 1000-element vector z that is the mean of the corresponding elements of y (like z[1,1] <- mean(y[1,1], y[2,1], y[3,1])), but being new to R, I'm not sure how to do this for all elements at once (or, at least, simply). Any help is appreciated. Thanks, Tom __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] coercing columns
Dear all, I have a data frame that looks like this: c1 c2 c3 ABC BCA AAB and so on; I´d like to produce one single vector consisting of the columns c1,c2, c3, such that vector=("A","B","A","B","C","A","C","A","B") I guess it´s easy to do but I don´t know how...Can anyone help me? Thanks a lot! Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dominant factors in aov?
Dear Rene, At least from the part of the data.frame attached to your mail, I assumed that C,D and E changed in identical ways (but maybe I got this wrong). With your following combination of factors: A (four levels) B (three levels) C (two levels) D (29 levels) with E (four replicates) And assuming independence of the treatment levels, you should get 3 d.f. for A 2 d.f. for B 28 d.f. for D 3 d.f. for E ? residual d.f. (how big is total number of Y values?) The problem arises if parts of treatments B,D and E are applied to the same subjects, e.g. B D E Y 1 1 1 400 2 2 2 300 2 2 3 420 2 2 4 350 (etc) then you immediately run into problems because treatments B and D (in this case) change in an identical way, i.e. the variances calculated for each level of B and D are the same; this is what causes the ´singularities´. Errors need to be independent, otherwise you will have order dependence in your analyses. i.e. the output of your aov model will change depending on the sequence in which the terms A,B,C,D,E are entered. Did I get this right? It would probably help to see the full dataset Best wishes Christoph Rene Eschen wrote: Dear Christoph, The reason for the "singularities" is that B, C and D are not independent (in fact, they´re identical in their factor levels, and hence in their effect on Y). I do not understand this. You gave the correct levels for A, B and E, but I do not see how they are identical. They have different levels and different codings, or is it because A has the same number of levels as E, and E shares some of the coding with B? René Eschen. --- For this reason, only the effects of A, B and E can be estimated: Df Sum Sq Mean Sq F valuePr(>F) A3 302286 100762 7.9887 0.002396 ** B1 422869 422869 33.5263 4.683e-05 *** E3 222817427 0.5888 0.632334 Residuals 14 176583 12613 A has 4 levels so there should be 3 d.f. (that´s correct in the table) B has 2 levels so there is only 1 d.f. (that´s also correct) E has 4 levels so there should be 3 d.f. (also O.K.) In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so that´s OK, too. Hope this helps, Christoph levels(A) [1] "0""250" "500" "1000" > levels(B) [1] "1" "2" > levels(E) [1] "1" "2" "3" "4" Rene Eschen wrote: Hi all, I'm using R 2.0.1. for Windows to analyze the influence of following factors on response Y: A (four levels) B (three levels) C (two levels) D (29 levels) with E (four replicates) The dataset looks like this: A B C D E Y 0 1 1 1 1 491.9 0 1 1 1 2 618.7 0 1 1 1 3 448.2 0 1 1 1 4 632.9 250 1 1 1 1 92.4 250 1 1 1 2 117 250 1 1 1 3 35.5 250 1 1 1 4 102.7 500 1 1 1 1 47 500 1 1 1 2 57.4 500 1 1 1 3 6.5 500 1 1 1 4 50.9 10001 1 1 1 0.7 10001 1 1 2 6.2 10001 1 1 3 0.5 10001 1 1 4 1.1 0 2 2 2 1 6 0 2 2 2 2 4.2 0 2 2 2 3 20.3 0 2 2 2 4 3.5 250 2 2 2 1 8.4 250 2 2 2 2 2.8 etc. If I ask the following: summary(aov(Y~A+B+C+D+E)) R gives me this answer: Df Sum Sq Mean Sq F value Pr(>F) A 3 135.602 45.201 310.2166 <2e-16 *** B 2 0.553 0.276 1.8976 0.1512 C 1 0.281 0.281 1.9264 0.1659 D 25 92.848 3.714 25.4890 <2e-16 *** E 3 0.231 0.077 0.5279 0.6634 Residuals 411 59.885 0.146 Can someone explain me why factor C has only 25 Df (in stead of 28, what I expected), and why this number changes when I leave out factors B or C (but not A)? Why do factors B and C (but again: not A) not show up in the calculation if they appear later in the formula than D? When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D were not defined because of "singularities" (what does this word mean?). After checking and playing around with the dataset, I find no logical reason for which levels are not defined. Even if I construct a "perfect" dataset (balanced, no missing values) I never get the correct number of Df. My other datasets are analyzed as expected using the similar function calls and similar datasets. Am I doing something wrong here? Many thanks, René Eschen. ___ drs. René Eschen CABI Bioscience Switzerland Centre 1 Rue des Grillons CH-2800 Delémont Switzerland +41 32
Re: [R] Dominant factors in aov?
Dear Rene, First of all, note that A,B,C,D, and E need to be declared as factors in the beginning, using factor() (but I think you did this already). Also, make sure that the data are read into R in the correct way (i.e. "." separating decimal places). The reason for the "singularities" is that B, C and D are not independent (in fact, they´re identical in their factor levels, and hence in their effect on Y). For this reason, only the effects of A, B and E can be estimated: Df Sum Sq Mean Sq F valuePr(>F) A3 302286 100762 7.9887 0.002396 ** B1 422869 422869 33.5263 4.683e-05 *** E3 222817427 0.5888 0.632334 Residuals 14 176583 12613 A has 4 levels so there should be 3 d.f. (that´s correct in the table) B has 2 levels so there is only 1 d.f. (that´s also correct) E has 4 levels so there should be 3 d.f. (also O.K.) In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so that´s OK, too. Hope this helps, Christoph levels(A) [1] "0""250" "500" "1000" > levels(B) [1] "1" "2" > levels(E) [1] "1" "2" "3" "4" Rene Eschen wrote: Hi all, I'm using R 2.0.1. for Windows to analyze the influence of following factors on response Y: A (four levels) B (three levels) C (two levels) D (29 levels) with E (four replicates) The dataset looks like this: A B C D E Y 0 1 1 1 1 491.9 0 1 1 1 2 618.7 0 1 1 1 3 448.2 0 1 1 1 4 632.9 250 1 1 1 1 92.4 250 1 1 1 2 117 250 1 1 1 3 35.5 250 1 1 1 4 102.7 500 1 1 1 1 47 500 1 1 1 2 57.4 500 1 1 1 3 6.5 500 1 1 1 4 50.9 10001 1 1 1 0.7 10001 1 1 2 6.2 10001 1 1 3 0.5 10001 1 1 4 1.1 0 2 2 2 1 6 0 2 2 2 2 4.2 0 2 2 2 3 20.3 0 2 2 2 4 3.5 250 2 2 2 1 8.4 250 2 2 2 2 2.8 etc. If I ask the following: summary(aov(Y~A+B+C+D+E)) R gives me this answer: Df Sum Sq Mean Sq F value Pr(>F) A 3 135.602 45.201 310.2166 <2e-16 *** B 2 0.553 0.276 1.8976 0.1512 C 1 0.281 0.281 1.9264 0.1659 D 25 92.848 3.714 25.4890 <2e-16 *** E 3 0.231 0.077 0.5279 0.6634 Residuals 411 59.885 0.146 Can someone explain me why factor C has only 25 Df (in stead of 28, what I expected), and why this number changes when I leave out factors B or C (but not A)? Why do factors B and C (but again: not A) not show up in the calculation if they appear later in the formula than D? When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D were not defined because of "singularities" (what does this word mean?). After checking and playing around with the dataset, I find no logical reason for which levels are not defined. Even if I construct a "perfect" dataset (balanced, no missing values) I never get the correct number of Df. My other datasets are analyzed as expected using the similar function calls and similar datasets. Am I doing something wrong here? Many thanks, René Eschen. ___ drs. René Eschen CABI Bioscience Switzerland Centre 1 Rue des Grillons CH-2800 Delémont Switzerland +41 32 421 48 87 (Direct) +41 32 421 48 70 (Secretary) +41 32 421 48 71 (Fax) http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] treatment contrasts and summary.lm
Dear list members, I have a 2-factor ANOVA where the summary.lm output looks like this (using treatment contrasts): Value Std. Error t value Pr(>|t|) (Intercept) 0.0389 0.0220 1.7695 0.0817 as.factor(Block)1 0.0156 0.0066 2.3597 0.0215 as.factor(Block)2 -0.0018 0.0037-0.4857 0.6289 as.factor(Block)3 -0.0007 0.0026-0.2812 0.7795 as.factor(AZ)1 -0.0066 0.0076-0.8670 0.3893 as.factor(AZ)2 0.0064 0.0047 1.3530 0.1810 as.factor(AZ)3 -0.0015 0.0031-0.4863 0.6284 as.factor(AZ)4 0.0054 0.0025 2.1499 0.0355 as.factor(AZ)5 0.0062 0.0037 1.6653 0.1009 Block has 4 levels and AZ has 6 levels. My question now is: What exactly do the values for AZ show? I know it´s somehow differences between intercepts, but it would be great if someone could tell me more on how exactly to interprete the output. Thanks very much in advance! Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] integration problem
Dear R users, I have spectral data (say, wavelength vs. extinction coefficient) for which I´d like to calculate an integral (i.e. the area underneath the curve). Suppose the (artificial) dataset is lambda E 1 2 2 4 3 5 4 8 5 1 6 5 7 4 8 9 9 8 10 2 How can I calculate an integral for these values using R? Many thanks for any help! Regards Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] error bars
Dear all, is there an easy way to create error bars for the following types of plots: a) barplots b) interaction plots Many other statistics packages (e.g. Statistica) offer very nice interaction plots with error bars, and I´d like to be able to do the same in R. Best regards Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] analysis of life tables
Dear all, How can I analyze a life table (e.g. for a cohort of insects) in R? I have 20 insects in 200 cages with two different treatments, whose survival is followed over time, such that, e.g., in one treatment, the number of animals surviving is c(20,18,16,12,10,8,4,0), while in the other treatment the survival is c(20,20,18,18,16,15,15,14) at 8 subsequent time intervals. I would very much appreciate any suggestions on how to analyze such a dataset. Best regards Christoph. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] log-linear contrasts
Dear list members, suppose I have a one-way ANOVA where the explanatory variable is of class ordered(). How can I define, instead of the default linear, quadratic, cubic contrasts, a set of log-linear contrasts corresponding to logb(c(1,2,4,8,16,60)+1,2) ? Regards, Christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] reading data
Just one small remark: why don´t you try ("C:\\dados10.txt") ? It seems to me that the double "\\" is important! Cheers Chris Christoph Lange wrote: (Reply to Margarida Júlia Rodrigues Igreja) Hello! When i print: a<-read.table(file="C:/dados10.txt") The next error appears: Error in file(file, "r") : unable to open connection In addition: Warning message: cannot open file `C:/dados10.txt' Can you help me? Contrary to what Professor Ripley wrote, this file name is of course totally valid under unixoid systems. But 'C:' looks like a windows hard drive mounted under a Unix directory named 'C:'. Usually this is done directly under '/', so just try: a<-read.table(file="/C:/dados10.txt") -cl __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RE: more on lm(y~x) question: removing NA´s
Great!!! This works, many thanks! ** using lsfit(x,y) instead of lm(y~x) produces a perfectly correct output. *** Thomas Lumley wrote: On Tue, 4 May 2004, Christoph Scherber wrote: it all works fine (the regression lines fit correctly to the data) as long as there are not both missing values in j and k. That's very strange. The lines for (k in 1:length(foranalysis[93:174,i])) number[k]_substring(plotcode[foranalysis[k,1]],1,5) should set result in k being the scalar value 81 after the loop is over. In R (unlike S-PLUS), loop indices are just ordinary variables in the environment where the loop is executed. I'd expect this code to work in S-PLUS but not in R. That loop is actually redundant, since substring() is vectorised: number <- substring(plotcode[foranalysis[93:174,1]],1,5) should work just as well. It's also strange that you create a data frame df from j and k but don't use it in the lm() call (or AFAICS anywhere else). What suggestions would you have for this? Or, more precisely, how would you create multiple graphs from subsequent columns of a data.frame? I'd probably use lsfit. The following is obviously not tested, since I don't have the data (or even understand fully the data layout). L <- length(93:174) for(i in p) { X<-foranalysis[93:174, i] Y<-foranalysis[93:174, i+1] corr<-cor(X,Y) corrtrunc<-cor(X[X<0.9], Y[X<0.9]) mainlab <- paste(substring(names(foranalysis[i]), 2, 8), "; corr.:", corr, ";excl.Mono", corrtrunc)) plot(X,Y,main=mainlab, xlab="% of total biomass",ylab="% of total cover",pch="n") number <- substring(plotcode[foranalysis[1:L,1]], 1, 5) text(X, Y, number) model <- lsfit(X,Y) abline(model) abline(0, 1, lty=2) } -thomas par(mfrow=c(5,5)) p_seq(3,122,2) i_0 k_0 number_0 for (i in p) { j_foranalysis[93:174,i+1] k_foranalysis[93:174,i] df_data.frame(j,k) mainlab1_substring(names(foranalysis[i]),2,8) mainlab2_"; corr.:" mainlab3_round(cor(j,k,na.method="available"),4) mainlab4_"; excl.Mono:" mainlab5_round(cor(j[j<0.9],k[j<0.9],na.method="available"),4) mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5) plot(k,j,main=mainlab,xlab="% of total biomass",ylab="% of total cover",pch="n") for (k in 1:length(foranalysis[93:174,i])) number[k]_substring(plotcode[foranalysis[k,1]],1,5) text(foranalysis[93:174,i],foranalysis[93:174,i+1],number) ** model_lm(j~k,na.action=na.exclude]) ** abline(model) abline(0,1,lty=2) } Does anyone have any suggestions on this? Best regards Chris., Liaw, Andy wrote: By (`factory') default that's done for you automagically, because options("na.action") is `na.omit'. If you really want to do it `by hand', and have the data in a data frame, you can use something like: lm(y ~ x, df[complete.cases(df),]) HTH, Andy From: Christoph Scherber Dear all, I have a data frame with different numbers of NA´s in each column, e.g.: x y 1 2 NA 3 NA 4 4 NA 1 5 NA NA I now want to do a linear regression on y~x with all the NA´s removed. The problem now is that is.na(x) (and is.na(y) obviously gives vectors with different lengths. How could I solve this problem? Thank you very much for any help. Best regards Chris __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RE: more on lm(y~x) question: removing NA´s
it all works fine (the regression lines fit correctly to the data) as long as there are not both missing values in j and k. What suggestions would you have for this? Or, more precisely, how would you create multiple graphs from subsequent columns of a data.frame? Thomas Lumley wrote: On Tue, 4 May 2004, Liaw, Andy wrote: 1. If your code actually runs, you should upgrade R, and quit using `_' for assignment... 8-) 2. You seem to have an extraneous `]' after the na.exclude. Could that be the problem? More seriously, the for() loop over k will mess up the value of k that you want to use for lm. -thomas Andy From: Christoph Scherber actually, the situation is much more complicated. I am producing multiple graphs within a "for" loop. For some strange reason, the plotting routine always stops once lm(y~x) encounters more than one missing value (I have marked the important bit with "***"): par(mfrow=c(5,5)) p_seq(3,122,2) i_0 k_0 number_0 for (i in p) { j_foranalysis[93:174,i+1] k_foranalysis[93:174,i] df_data.frame(j,k) mainlab1_substring(names(foranalysis[i]),2,8) mainlab2_"; corr.:" mainlab3_round(cor(j,k,na.method="available"),4) mainlab4_"; excl.Mono:" mainlab5_round(cor(j[j<0.9],k[j<0.9],na.method="available"),4) mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5) plot(k,j,main=mainlab,xlab="% of total biomass",ylab="% of total cover",pch="n") for (k in 1:length(foranalysis[93:174,i])) number[k]_substring(plotcode[foranalysis[k,1]],1,5) text(foranalysis[93:174,i],foranalysis[93:174,i+1],number) ** model_lm(j~k,na.action=na.exclude]) ** abline(model) abline(0,1,lty=2) } Does anyone have any suggestions on this? Best regards Chris., Liaw, Andy wrote: By (`factory') default that's done for you automagically, because options("na.action") is `na.omit'. If you really want to do it `by hand', and have the data in a data frame, you can use something like: lm(y ~ x, df[complete.cases(df),]) HTH, Andy From: Christoph Scherber Dear all, I have a data frame with different numbers of NA´s in each column, e.g.: x y 1 2 NA 3 NA 4 4 NA 1 5 NA NA I now want to do a linear regression on y~x with all the NA´s removed. The problem now is that is.na(x) (and is.na(y) obviously gives vectors with different lengths. How could I solve this problem? Thank you very much for any help. Best regards Chris __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] more on lm(y~x) question: removing NA´s
actually, the situation is much more complicated. I am producing multiple graphs within a "for" loop. For some strange reason, the plotting routine always stops once lm(y~x) encounters more than one missing value (I have marked the important bit with "***"): par(mfrow=c(5,5)) p_seq(3,122,2) i_0 k_0 number_0 for (i in p) { j_foranalysis[93:174,i+1] k_foranalysis[93:174,i] df_data.frame(j,k) mainlab1_substring(names(foranalysis[i]),2,8) mainlab2_"; corr.:" mainlab3_round(cor(j,k,na.method="available"),4) mainlab4_"; excl.Mono:" mainlab5_round(cor(j[j<0.9],k[j<0.9],na.method="available"),4) mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5) plot(k,j,main=mainlab,xlab="% of total biomass",ylab="% of total cover",pch="n") for (k in 1:length(foranalysis[93:174,i])) number[k]_substring(plotcode[foranalysis[k,1]],1,5) text(foranalysis[93:174,i],foranalysis[93:174,i+1],number) ** model_lm(j~k,na.action=na.exclude]) ** abline(model) abline(0,1,lty=2) } Does anyone have any suggestions on this? Best regards Chris., Liaw, Andy wrote: By (`factory') default that's done for you automagically, because options("na.action") is `na.omit'. If you really want to do it `by hand', and have the data in a data frame, you can use something like: lm(y ~ x, df[complete.cases(df),]) HTH, Andy From: Christoph Scherber Dear all, I have a data frame with different numbers of NA´s in each column, e.g.: x y 1 2 NA 3 NA 4 4 NA 1 5 NA NA I now want to do a linear regression on y~x with all the NA´s removed. The problem now is that is.na(x) (and is.na(y) obviously gives vectors with different lengths. How could I solve this problem? Thank you very much for any help. Best regards Chris __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lm(y~x) question: removing NA´s
Dear all, I have a data frame with different numbers of NA´s in each column, e.g.: x y 1 2 NA 3 NA 4 4 NA 1 5 NA NA I now want to do a linear regression on y~x with all the NA´s removed. The problem now is that is.na(x) (and is.na(y) obviously gives vectors with different lengths. How could I solve this problem? Thank you very much for any help. Best regards Chris __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] quantile regression
OK, Thank you all very much for the help! Best regards Chris. Christoph Scherber wrote: Dear colleagues, How can I do quantile regression with R? Best regards Chris. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] quantile regression
Dear colleagues, How can I do quantile regression with R? Best regards Chris. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] binomial errors in split-plot design
Thanks! And how can I then plot a Q-Q line for model checking? qqnorm works fine, but I couldn´t find how to use qqline for mixed effects models of this type so far, I have tried (e.g.) qqnorm(glm1,~resid(.)|TREATMENT) but I don´t know how to specify qqline for this the full model is glm1_glmmPQL(cbindarea~BLOCK+targetweight+TREATMENT+SOWNDIV+GRASS+LEG+SHERB+THERB,random=~1|PLOTCODE/TREATMENT,family=binomial) where categorical variables are in capital letters Best regards, Chris. Prof Brian Ripley wrote: >There are several possibilities, including glmmPQL (MASS) and GLMM (lme4). >Be careful with the interpretation, as you don't have the advantages of >balance that the classical AoV gives you. > >On Thu, 4 Mar 2004, Christoph Scherber wrote: > > > >>I have proportion data with binomial errors. The problem is that the >>whole experiment was laid out as a split-plot design. >> >>Ideally, what I would like is having a glm with an Error term such as >>glm(y~x+Error(A/B)) but I fear this is not possible. Would using lme be >>an alternative? How could I state the variance structure, then? >> >> > > > [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] binomial errors in split-plot design
Dear all, I have proportion data with binomial errors. The problem is that the whole experiment was laid out as a split-plot design. Ideally, what I would like is having a glm with an Error term such as glm(y~x+Error(A/B)) but I fear this is not possible. Would using lme be an alternative? How could I state the variance structure, then? I would very much appreciate any suggestions! Best regards Christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] zoom graphics
Interactive Plots with zoom options etc. can be performed using the "iplots" library. It´s really very useful and can be downloaded from http://stats.math.uni-augsburg.de/iPlots/index.shtml Best regards Chris Barry Rowlingson wrote: [EMAIL PROTECTED] wrote: Hi, i don't understand how i cant zoom in and zoom out a graphics (plots) exist a package for that? Thanks Ruben I dont think you can do it quite like you can zoom in and out in a program like 'photoshop'. All you can really do is redraw the plot with a different set of X and Y limits. Example: # some data xy=list(x=rnorm(100),y=rnorm(100)) # default plot shows all the data: plot(xy) # define an interactive 'zoom' function: zoom=function(){reg=locator(2);plot(reg,type='n')} # click two points at the corners of the zoom region: zoom() # add the points - dont use 'plot' since this resets the axes: points(xy) If you've got a complex plot with lots of things on it, then yes, you have to redraw it all again, but then you probably should have put that complex plot into a single function! Barry __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] interactive graphic s
Dear Ruben, Yes, the iplots package works fine! You can download it from http://stats.math.uni-augsburg.de/iPlots/index.shtml Good luck! Chris. [EMAIL PROTECTED] wrote: Hi, i dont understand ¿Graphics in R are interactives or not?, I hear the the package iplots can do it (zoom, scaling etc), is true that?, Thanks Ruben __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
In McGill et al. (1978) there´s a description of the calculation as follows (p. 16): "The widths [are] computed from the midspread or interquartile range (R) of the data (...), and the number of observations (N) for each group. The Gaussian-based asymptotic approximation (Kendall and Stuart 1967) of the standard deviation s of the median (M) is given by s=1.25 R/1.35 sqrt(N) and can be shown to be reasonably broadly applicable to other distributions (...) The notch around each median can then be calculated as M +- Cs, where C is a constant. Should one desire a notch indicating 95 percent confidence interval about each median, C = 1.96 would be used (...) It can be shown that C=1.96 would only be appropriate if the standard deviations of the two groups were vastly different (...) Thus, the notches were computed as M+-1.7(1.25R/1.35 sqrt(N)) Hope this helps. Best regards Chris. REF: McGill, R; Tukey, JW & Larsen, WA (1978) Variations of Box Plots. The American Statistician, Vol.32 No. 1, pp.12-16. Kendall, MG & Stuart, A (1967): The Advanced Theory of Statistics, Vol.1, 2nd ed., Ch14., New York, Hafner Publishing Co. * Michael Friendly wrote: I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. The factor 1.58 for H-spr/\sqrt{n} comes from the product of three approximations going from a 95% confidence interval for a difference in means, to one for a difference in medians, using the H-spr=IQR instead of the standard deviation: H-spr/1.349 \approx \sigma in a N(0,1) dist/n \sqrt{ \pi / 2} \approx std error of a median 1.7 / sqrt{n} is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors for the standard error of the difference between two means, in the cases where one variance is tiny, and where both are equal. I believe this is explained in @Article{McGill-etal:78, author = "R. McGill and J. W. Tukey and W. Larsen", year = "1978", title ="Variations of Box Plots", journal = TAS, volume = "32", pages ="12--16", } __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
Dear colleagues, I think it would be a good idea to include a short note in the R boxplot() help file, stating exactly how the confidence levels are calculated ("the notches are +/- 1.58 IQR/sqrt(n)") - at least as a guidance for users not advanced enough to directly interpret the code. Would this be possible? Regards, Christoph. David James wrote: Prof Brian Ripley wrote: On Mon, 1 Mar 2004, Martin Maechler wrote: "TL" == Thomas Lumley <[EMAIL PROTECTED]> on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL> On Mon, 1 Mar 2004, Christoph Scherber wrote: >> Dear list members, >> >> Can anyone tell me how the notches in boxplot(Y~X,notch=T) are >> calculated? What do these notches represent exactly? I´d suppose they >> are Conficence Intervals for the median, but I´ve also been told they >> might show Least Significant Difference (LSD) equivalents. TL> The help page says that TL> " If the notches of two plots do not overlap then TL> the medians are significantly different at the 5 percent level." TL> The only thing wrong with this is that it isn't true. TL> The code says that the notches are +/- 1.58 IQR/sqrt(n), TL> so I think the claimed confidence level holds only for TL> normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. @article{McGi:Tuke:Lars:1978, author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.}, title = {Variations of {B}ox plots}, year = {1978}, journal = {The American Statistician}, volume = {32}, pages = {12--16}, keywords = {Exploratory data analysis; Graphics} } @book{Cham:Clev:Klei:Tuke:1983, author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat and Tukey, Paul A.}, title = {Graphical methods for data analysis}, year = {1983}, pages = {395}, publisher = {Wadsworth Publishing Co Inc} } -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] boxplot notches
Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. I would very much appreciate any help from you. Best regards Chris. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html