[R] Annotate forest plot 'forest.rma()' for meta-analysis with metafor package
Dear R-experts, The forest.rma() function from the metafor package creates nice forest plots for presenting the results of a meta-analysis. These plots can be annotated for e.g. giving names to the columns. E.g. as in the documentation of the package: data(dat.bcg) ### meta-analysis of the log relative risks using a random-effects model res <- rma(ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg, measure="RR", slab=paste(author, year, sep=", "), method="REML") forest(res, slab=paste(dat.bcg$author, dat.bcg$year, sep=", "), xlim=c(-16, 6), at=log(c(.05, .25, 1, 4)), atransf=exp, ilab=cbind(dat.bcg$tpos, dat.bcg$tneg, dat.bcg$cpos, dat.bcg$cneg), ilab.xpos=c(-9.5,-8,-6,-4.5), cex=.75) op <- par(cex=.75, font=2) text(c(-9.5,-8,-6,-4.5), 15, c("TB+", "TB-", "TB+", "TB-")) text(c(-8.75,-5.25), 16, c("Vaccinated", "Control")) text(-16,15, "Author(s) and Year", pos=4) text(6, 15, "Relative Risk [95% CI]", pos=2) par(op) However the column names have to be re-positioned everytime the number of included studies in the meta-analysis changes. E.g.: dat.bcg <- dat.bcg[1:12,] ### meta-analysis of the log relative risks using a random-effects model res <- rma(ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg, measure="RR", slab=paste(author, year, sep=", "), method="REML") forest(res, slab=paste(dat.bcg$author, dat.bcg$year, sep=", "), xlim=c(-16, 6), at=log(c(.05, .25, 1, 4)), atransf=exp, ilab=cbind(dat.bcg$tpos, dat.bcg$tneg, dat.bcg$cpos, dat.bcg$cneg), ilab.xpos=c(-9.5,-8,-6,-4.5), cex=.75) op <- par(cex=.75, font=2) text(c(-9.5,-8,-6,-4.5), 15, c("TB+", "TB-", "TB+", "TB-")) text(c(-8.75,-5.25), 16, c("Vaccinated", "Control")) text(-16,15, "Author(s) and Year", pos=4) text(6, 15, "Relative Risk [95% CI]", pos=2) par(op) Is there a way to define the colum names' position relatively to the plot so they would be adequately positioned no matter what is the number of studies included in the plot? Many thanks for your help!, Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Power calculation using pwr.t.test()
Dear R experts, I have conducted a power calculation in order to estimate the number of subjects needed to detect an effect size of d=0.28 (cohen's d) for a difference between two independent groups (alpha level should be 0.05 and the effect should be detected with 80% probability). The results from the code below indicates that I would need n=400 subjects (200 in each group). This is seems so incredibly high that I mistrust my results & wanted to ask whether I miscalculated n? library(pwr) pwr.t.test(d=0.28,sig.level=0.05,power=0.8,type="two.sample",alternative="two.sided") Many thanks for you help! Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correct implementation of a mixed-model ANOVA in R
Dear R-experts! I having trouble with the correct implementation of a mixed-model ANOVA in R. I my dataset subjects were tested in a cognitive performance test (numerical outcome variable 'score'). This cognitive performance test is devided into five blocks (categorical factor 'block'). All subjects were tested two times (in random order once following placebo treatment and once following drug treatment: categorical factor 'treatment'). In order to account for re-test effects I coded the testing session (categorical factor 'session' with 1 coding for the first and 2 coding for the second session). Also all subjects once underwent a blood screening before the study in which a stable blood marker was measured which is hypothesized to mediate treatment effects (covariate 'blood'). I would like to look for treatment effects on my outcome variable 'score' and whether the treatment effect is mediated by the covariate 'blood'. Also I would like to use 'subject' as a random factor, include 'block' as a within-subjects factor and control for re-test effects by including 'session'. Could you advice me regarding the proper implementation of my model? Also I am happy about any pointers to R-related literature. Please find below some dummy data and initially model formulation. Many thanks & best wishes, Jokel # some dummy data data <- structure(list(subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), block = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2", "3"), class = "factor"), treatment = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("drug", "placebo"), class = "factor"), session = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1st", "2nd"), class = "factor"), blood = c(3.04, 3.04, 3.04, 3.04, 3.04, 3.04, 4.93, 4.93, 4.93, 4.93, 4.93, 4.93, 5.65, 5.65, 5.65, 5.65, 5.65, 5.65, 5.7, 5.7, 5.7, 5.7, 5.7, 5.7, 5.77, 5.77, 5.77, 5.77, 5.77, 5.77), score = c(10.98, 11.66, 9.99, 9.15, 9.9, 10.3, 10.78, 9.43, 8.6, 9.67, 10.75, 9.09, 11.18, 10.33, 8.61, 10.04, 9.13, 8.71, 10.03, 9.38, 11.17, 9.82, 10.07, 8.34, 9.94, 11.38, 9.19, 8.35, 10.17, 9.04)), .Names = c("subject", "block", "treatment", "session", "blood", "score"), row.names = c(NA, -30L), class = "data.frame") # some first try which however throws a warning aov1 <- aov(score~blood*treatment*session*block+Error(subject/(treatment*session*block)), data=data) summary(aov1) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using dcast with multiple functions to aggregate
Dear R communitiy, I am trying to use multiple functions for aggregation within a function call for dcast. However this seems to result in an error. Also I have not managed to make dcast() work with fun.aggregate=sd. Please find attached some example code using the ChickWeight data. Many thanks for your help! Jokel #Chick weight example names(ChickWeight) <- tolower(names(ChickWeight)) sd(ChickWeight$weight) # works fine mean(ChickWeight$weight) # works fine length(ChickWeight$weight) # works fine chick_m <- melt(ChickWeight, id=2:4, na.rm=TRUE) dcast(chick_m, time~variable, mean) # works fine dcast(chick_m, time~variable, length) # works fine dcast(chick_m, time~variable, fun.aggregate=sd) # gives an error dcast(chick_m, time~variable, c(mean, length)) # gives an error [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Embed R code in online database
Dear R-List! I would like to embed R code in an online database such as i.e. a google spreadsheet in way that users can add data to the database and that R's calculations are updated automatically and i.e. given out in the spreadsheet. Maybe even graphs could be updated online? Is there a way to implement this? Many thanks! J. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Workflow for binary classification problem using svm via e1071 package
Dear R-list! I am using the e1071 package in R to solve a binary classification problem in a dataset of round 180 predictor variables (blood metabolites) of two groups of subjects (patients and healthy controls). I am confused regarding the correct way to assess the classification accuracy of the trained svm. (A) The svm command allows to specificy via the 'cross=k' parameter to specify a k-fold crossvalidation. This results in k values for classification accuracy and their corresponding mean. (B) On the other hand most textbooks and tutorials I was browsing, recommend separating the data into a training and a test-set and then to assess prediction accuarcy by checking the accuracy of the trained svm when applied to the test-set. I am not sure whether (A) and (B) would be alternative ways to assess prediction accuracy? Or is option (A) only the accuracy of the svm when applied to the test set and therefore I should implement option (B) after I used option (A)? So would it be the correct way to use first (A) then do grid-search (via the tune command) to find the best hyperparameters and then test the trained svm by applying it to the test set? And in case I use a linear kernel instead of RBF, I guess I do not need to run grid-search as there are no hyperparameters to estimate? BEst, Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assessing prediction performance of SVM using e1071 package
Dear R-Users! I am using the svm function (e1071 package) to classify two groups using a set of 180 indicator variables. Now I am confused about the cross-validation procedure. (A) On one hand I use the setting cross=10 in the svm function to run 10 cross-validation iterations and to get an estimate of the svm's performance in prediction. (B) On the other hand most tutorials I found recommended separating the set into two sets using one set for training of the svm and the other for testing the predictive power of the trained svm. My understanding would be that (A) and (B) are alternative ways to estimate the performance of the svm. Or would I have to implement both? Many thanks for your help! Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Main-effect of categorical variables in meta-analysis (metafor)
Dear R-experts! In a meta-analysis (metafor) I would like to assess the effect of two categorical covariates (A & B) whereas they both have 4 levels. Is my understanding correct that this would require to dummy-code (0,1) each level of each covariate (A & B)? However I am interested in the main-effects and the interaction of these two covariates and the dummy-coding would only allow to detect the effect of one level of one factor. Would there be a way to assess main-effects and interactions (something like an meta-analysis-ANOVA)? Many thanks, Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using NCBI E-Utilities in R to extract data from PubMed
Dear R-experts! I found a great post on the "getting genetics done" blog which describes how to make use of NCBI E-utilities in Python to extract data from PubMed: http://gettinggeneticsdone.blogspot.com/2011/05/using-ncbi-e-utilities.html I was wondering whether there would be a way to directly make use of the NBCI E-utilities from R? It would be amazing to e.g. extract the number of articles by an author? Or to investigate co-authorship networks? Many thanks for the help! Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting F-value from ANOVA to cohen's d in meta-analysis (metafor-package)
Dear R-experts! Running a meta-analysis (using the magnificent metafor-package), I use cohen's d as a main outcome measure in a random-effects model. For most of the samples cohen's d is derived form a comparison of two groups (A & B). However some studies report results from an ANOVA (one-factor with three levels: C,D,E) whereas two groups (C,D) correspond to one group in the other studies (B=C,D). Is there a way? The handbook of research synthesis and meta-analysis By Harris M. Cooper says that: d=sqrt((F*(n1+n2)/n1*n2)) for ANOVA (one-factor with two-levels) but does this also hold for ANOVA (one-factor with three levels)? Many thanks for your help! Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract confidence intervals from rma object (metafor package)
Dear R-experts! I am working on some meta-analysis using the metafor package. I would like to extract values of the confidence intervals of the effect sizes of the single studies from an rma object. Those values are printed out when plotting a forest plot using the forest function on the rma object, however I was not able to locate them. Many thanks for your help! Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimate average standard deviation of mean of two dependent groups
Dear R-experts! I am currently running a meta-analysis with the help of the great metafor package. However I have some difficulties setting up my raw data to enter it to the meta-analysis models. I have a group of subjects that have been measured in two continuous variables (A & B). I have the information about the mean of the two variables, the group size (n) and the standard deviations of the two variables. Now I would like to average both variables (A & B) and get the mean and standard deviation of the merged variable (C). As for the mean this would be quiet easy: I would just take the mean of mean A and mean B to get the mean of C. However for the standard deviation this seems more tricky as it is to assume that standard deviations in A & B correlate. I assume (based on further analysis) a correlation of r =0.5. I found the formula to get the standard deviation of the SUM (not the mean) of two variables: SD=SQRT(SD_A^2 + SD_B^2 + 2*r*SD_A*SD_B) with SD_B and SD_B being the standard deviation of A and B. And r*SD_A*SD_B being the covariance of A and B. Would this formula also be valid if I want to average (and not sum) my two variables? Many thanks for any help & best wishes, Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.