Re: [R] Common elements in columns
Hi, May be this might help: set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) df1$C1[df1$C1%in%df2$C1[df2$C1%in%df3$C1[df3$C1%in%df4$C1]]] #[1] G E H J A.K. - Original Message - From: Silvano Cesar da Costa To: r-help@r-project.org Cc: Sent: Sunday, September 2, 2012 7:05 PM Subject: [R] Common elements in columns Hi, I have 4 files with 1 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, - Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ciências Exatas Departamento de Estatística Fone: (43) 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splits with 0s in middle columns
Hi, It's working for me. Try this: dat1<-read.csv("test.csv") dat2<-na.omit(dat1) nrow(dat1) #[1] 635 nrow(dat2) #[1] 627 B<-gsub(" slope{0,1}\\s\\((.*)\\)","\\1",dat2$Composition_percent_part) fun1<-function(x){ C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",x))) res<-data.frame(do.call(rbind,strsplit(C," "))) colnames(res)<-paste("A",1:4,sep="") res } fun1(B) head(fun1(B),15) # A1 A2 A3 A4 #1 60 25 0 15 #2 60 25 0 15 #3 40 35 15 10 #4 40 35 15 10 #5 40 35 15 10 #6 40 35 15 10 #7 40 35 15 10 #8 50 40 0 10 #9 50 40 0 10 #10 50 40 0 10 #11 50 40 0 10 #12 50 30 0 20 #13 50 30 0 20 #14 50 30 0 20 #15 50 30 0 20 #or, fun1<-function(x){ B<-gsub(" slope{0,1}\\s\\((.*)\\)","\\1",x) C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",B))) res<-data.frame(do.call(rbind,strsplit(C," "))) colnames(res)<-paste("A",1:4,sep="") res } fun1(dat2$Composition_percent_part) head(fun1(dat2$Composition_percent_part),5) # A1 A2 A3 A4 #1 60 25 0 15 #2 60 25 0 15 #3 40 35 15 10 #4 40 35 15 10 #5 40 35 15 10 A.K. From: Sapana Lohani To: arun Sent: Sunday, September 2, 2012 8:39 PM Subject: Re: [R] splits with 0s in middle columns Hi Arun, I do not know whats wrong with my data, so am sending you the whole column I wanted to split. Could you please have a look and suggest me the error? I ma totally stuck at this point of my analysis From: arun To: Rui Barradas Cc: Sapana Lohani ; R help Sent: Sunday, September 2, 2012 1:54 PM Subject: Re: [R] splits with 0s in middle columns HI, You can also try this as a function: fun1<-function(x){ B<-gsub("percent{0,1}\\s\\((.*)\\)","\\1",x) C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",B))) dat1<-data.frame(do.call(rbind,strsplit(C," "))) colnames(dat1)<-paste0("A",1:4) dat1 } fun1(A) # A1 A2 A3 A4 #1 10 20 0 30 #2 40 0 0 20 #3 60 10 10 5 #4 80 0 0 10 A.K. - Original Message - From: Rui Barradas To: Sapana Lohani Cc: r-help Sent: Sunday, September 2, 2012 1:05 PM Subject: Re: [R] splits with 0s in middle columns Hello, You don't need a new function, what you need is to prepare your data in such a way that the function can process it. A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") B <- gsub("\\(|\\)|percent| ", "", A) fun(B) Also, please use dput to post the data examples, dput(A) c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") Then copy&paste in your post. Rui Barradas Em 02-09-2012 04:22, Sapana Lohani escreveu: > Dear Rui, > > The new code works fine for what I wanted. I have another similar column but > it looks like > > A > percent (10/20/30) > percent (40/20) > percent (60/10/10/5) > percent (80/10) > > I want a similar split but delete the percent in the front. The output should look like > > A1 A2 A3 A4 > 10 20 0 30 > 40 0 0 20 > 60 10 10 5 > 80 0 0 10 > > Could you please make the small change in the code that you gave me. It must > be a small edition but I could not figure that out. FYI the code that worked > was > > fun <- function(X){ > xname <- deparse(substitute(X)) > s <- strsplit(X, "/") > n <- max(sapply(s, length)) > tmp <- numeric(n) > > f <- function(x){ > x <- as.numeric(x) > m <- length(x) > if(m > 1){ > tmp[n] <- x[m] > tmp[seq_len(m - 1)] <- x[seq_len(m - 1)] > }else tmp[1] <- x > tmp > } > > res <- do.call(rbind, lapply(s, f)) > colnames(res) <- paste(xname, seq_along(s), sep = "") > data.frame(res) > } > > fun(A) > > Thank you so very much. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coxph not converging with continuous variable
The coxph function in R is not working for me when I use a continuous predictor in the model. Specifically, it fails to converge, even when bumping up the number of max iterations or setting reasonable initial values. The estimated Hazard ratio from the model is incorrect (verified by an AFT model). I've isolated it to the "x1" variable in the example below, which is log-normally distributed. The x1 here has extreme values, but I've been able to reproduce the problem intermittently with less extreme values. It seemed odd that I couldn't find this question asked anywhere, so I'm wondering if I'm just not seeing a mistake I've made. Any help is appreciated to get this model to converge. Code, output, sessionInfo follows signature. Alex Keil UNC Chapel Hill Here's an example: library(survival) set.seed(8182828) #data N = 10 shape = 0.75 hr = 3.5 betax <- -log(hr)/(shape) ctime = 5 z1 <- rbinom(N, 1, 0.5) z2 <- rbinom(N, 1, 0.5) x1 <- exp(rnorm(N, 0 + 2*z1, 1)) wscale1 <- exp(0 + betax*x1 + z1 + z2 ) time1 <- rweibull(N, shape, wscale1) t1 <- pmin(time1, rep(ctime, N)) d1 <- 1*(t1 == time1) dat <- as.data.frame(cbind(t1, time1, d1, z1, z2, x1)) summary(dat) #output > summary(dat) t1 time1 d1 z1z2 x1 Min. :0.00 Min. : 0. Min. :0. Min. :0.0 Min. :0. Min. : 0.0147 1st Qu.:0.04 1st Qu.: 0. 1st Qu.:1. 1st Qu.:0.0 1st Qu.:0. 1st Qu.: 0.9552 Median :0.008400 Median : 0.0084 Median :1. Median :0.5 Median :0. Median : 2.7052 Mean :0.295397 Mean : 0.3260 Mean :0.9906 Mean :0.5 Mean :0.4997 Mean : 6.8948 3rd Qu.:0.178325 3rd Qu.: 0.1783 3rd Qu.:1. 3rd Qu.:1.0 3rd Qu.:1. 3rd Qu.: 7.7409 Max. :5.00 Max. :52.8990 Max. :1. Max. :1.0 Max. :1. Max. :431.2779 > exp((cox1 <- coxph(Surv(t1, d1)~ x1 + z1+ z2, ties="breslow"))[[1]]) #hrs x1z1z2 3.3782387 0.4925040 0.4850214 Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge #accelerated failure time model confirms estimate is off from coxph > aft1 <- survreg(Surv(t1, d1)~ x1 + z1 + z2, dist="weibull"); coefs <- > aft1$coefficients; scale <- aft1$scale > data.frame(psi = -coefs[2], scale, hr=exp(-coefs[2]*(1/scale))) #hr, > scale is "weibull shape" in sas psiscale hr x1 1.670406 1.91 3.499958 #end output > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] survival_2.36-14 loaded via a namespace (and not attached): [1] timereg_1.6-5 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a newbie seeking for a simple problem
As df2 has only one column and is thus effectively a variable in this case, you don't even need to merge. df1[df1$gene_name%in%df2$gene_name , ] should do. HTH, Daniel wong, honkit (Stephen) wrote > > Dear Experienced R users, > > I have a looks-like simple but complicated problem urgently needed to be > solved. Below is the detail: > > I have two dataframes, df1, df2. df1 contains two column and many > thousands rows: column 1 is a "gene_name", column 2 is "value". df2 > contains only one column which is "gene_name" with couple hundred rows. I > want to change "value" of df2 for those "gene_name" also appear in df2 > "gene_name". How to do that? Millions thanks. > > > Ste > > __ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/a-newbie-seeking-for-a-simple-problem-tp4642029p4642031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R suitability for development project
Hello All, Eric Langley here with my first post to this list. I am looking to determine if R is suitable for a development project I am working on and if so possibly finding someone proficient in R that would be interested in doing the coding. I would like to preface my inquiry that while I am not a programmer I can communicate in a dialog my objectives. An array of rank ordered data looks like this: Item-Rank First Second Third Fourth Totals Item1 6 8 0 0 14 Item2 7 5 2 0 14 Item3 1 1 11 1 14 Item4 0 0 1 1314 Totals14 1414 14 The required output of R will be two fold; 1, a numerical score for each of the Items (1-4) from highest to lowest and lowest to highest on a scale of 0-99 that is statistically accurate. For this example the scores would be Item1 highest number down to Item4 with the lowest number. In reverse Item4 would be the highest number down to Item1 the lowest number. For the Highest like this; Item1=94, Item2=88, Item3=48, Item4=2 (just guessing here on the scores...:) 2, a graphical output of the data based on the scores in three special graphs with a middle line at '0' and increasing numbers to the left AND right. The graphs plot the Highest ranked Items, the Lowest Ranked items and a combination of the two. Sample graphs are here: http://community.abeo.us/sample-graphs/ Looking forward to hearing if R will be able to accomplish this. TYIA, ~eric -- Eric Langley Founder abeo e...@abeo.us 404-326-5382 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [newbie] scripting remote check for R package
I would call something like this via ssh (please note: I used "ggplot2" in the example): Rscript -e 'as.numeric(suppressWarnings(suppressPackageStartupMessages(require(ggplot2' You could easily extend this to loop through the required packages and tweak the output for your needs. Best, Gergely On Sun, Sep 2, 2012 at 10:42 PM, Tom Roche wrote: > > summary: e.g., how to replace '' > in the following: > > for RSERVER in 'foo' 'bar' 'baz' ; do > ssh ${RSERVER} '' > done > > or is there a better way to script checking for an R package? > > details: > > For my work I need a few non-standard R packages. I do that work on 2 > clusters. One uses environment modules, so any given server on that > cluster can provide (via `module add`) pretty much the same resources. > > The other cluster, on which I must also unfortunately work, has > servers managed individually, so I get in the habit of doing, e.g., > > EXEC_NAME='whatever' > for S in 'foo' 'bar' 'baz' ; do > ssh ${RSERVER} "${EXEC_NAME} --version" > done > > periodically. I'm wondering, what incantation to utter (bash preferred) > via ssh to query a given server's R for a given package? > > TIA, Tom Roche > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on finding specific columns in matrix
HI Andras, No problem. But, i guess using colMeans() as suggested by Rainer should be faster in large datasets. A.K. - Original Message - From: Andras Farkas To: smartpink...@yahoo.com Cc: Sent: Sunday, September 2, 2012 4:59 PM Subject: Re: [R] Help on finding specific columns in matrix this works nicely, thanks arun... Andras -- On Sun, Sep 2, 2012 11:50 AM EDT arun wrote: >Hi, >Try this: > which.min(apply(a,2,mean)) >#[1] 1 > which.max(apply(a,2,mean)) >#[1] 4 >A.K. > > > > >- Original Message - >From: Andras Farkas >To: "r-help@r-project.org" >Cc: >Sent: Sunday, September 2, 2012 5:49 AM >Subject: [R] Help on finding specific columns in matrix > >Dear All, > >I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific >columns in the set: the one that holds the highest values and the one that >holds the lowest values. In this case the column's mean would be apropriate to >use to try to find those specific columns because each columns mean is >different and they all change together based on the same "change of rate >constants" as a function of time (ie: tme is the most important determinant of >the magnitude of that mean). The columns are not to be named, if possible >Any thoughts on that? Would apreciate the help > >example: > >a ><-matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4) > >I would need to find and plot with a box percentile plot column 1, the column >with the lowest mean, and column 4, the column with the highest mean > >thanks, > >Andras > [[alternative HTML version deleted]] > > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splits with 0s in middle columns
HI, You can also try this as a function: fun1<-function(x){ B<-gsub("percent{0,1}\\s\\((.*)\\)","\\1",x) C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",B))) dat1<-data.frame(do.call(rbind,strsplit(C," "))) colnames(dat1)<-paste0("A",1:4) dat1 } fun1(A) # A1 A2 A3 A4 #1 10 20 0 30 #2 40 0 0 20 #3 60 10 10 5 #4 80 0 0 10 A.K. - Original Message - From: Rui Barradas To: Sapana Lohani Cc: r-help Sent: Sunday, September 2, 2012 1:05 PM Subject: Re: [R] splits with 0s in middle columns Hello, You don't need a new function, what you need is to prepare your data in such a way that the function can process it. A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") B <- gsub("\\(|\\)|percent| ", "", A) fun(B) Also, please use dput to post the data examples, dput(A) c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") Then copy&paste in your post. Rui Barradas Em 02-09-2012 04:22, Sapana Lohani escreveu: > Dear Rui, > > The new code works fine for what I wanted. I have another similar column but > it looks like > > A > percent (10/20/30) > percent (40/20) > percent (60/10/10/5) > percent (80/10) > > I want a similar split but delete the percent in the front. The output should > look like > > A1 A2 A3 A4 > 10 20 0 30 > 40 0 0 20 > 60 10 10 5 > 80 0 0 10 > > Could you please make the small change in the code that you gave me. It must > be a small edition but I could not figure that out. FYI the code that worked > was > > fun <- function(X){ > xname <- deparse(substitute(X)) > s <- strsplit(X, "/") > n <- max(sapply(s, length)) > tmp <- numeric(n) > > f <- function(x){ > x <- as.numeric(x) > m <- length(x) > if(m > 1){ > tmp[n] <- x[m] > tmp[seq_len(m - 1)] <- x[seq_len(m - 1)] > }else tmp[1] <- x > tmp > } > > res <- do.call(rbind, lapply(s, f)) > colnames(res) <- paste(xname, seq_along(s), sep = "") > data.frame(res) > } > > fun(A) > > Thank you so very much. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Common elements in columns
Hi, I have 4 files with 1 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, - Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ciências Exatas Departamento de Estatística Fone: (43) 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Skew-Normal CDF using psn
Dear R-users, I have been using the code below in order to verify how the CDF of a skew-normal distribution was calculated: library(sn) s=seq(-30,30,by=0.1) a<-matrix(nrow=length(s),ncol=5) lambda=1 for(i in 1:length(s)){ a[i,1]=pnorm(s[i],mean=0,sd=1); a[i,2]=T.Owen(s[i],lambda); a[i,3]=a[i,5]-2*a[i,6]; a[i,4]=pnorm(s[i])-2*T.Owen(s[i],lambda); a[i,5]=psn(s[i],shape=lambda); } >From the literature I was expecting the column 3, 4 and 5 to give me the exact same results but it actually doesn't. There seem to be some approximations for small values of x. Does anyone know where does this come from? Any help would be greatly appreciated, Cheers, Boris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splits with 0s in middle columns
Hello, You should Cc the list, there are others presenting solutions. What's going on should be obvious, your data example had "percent" in it, and your data file has "slope"! How could you expect it to work? Just in case, I'm changing the regular expression to removing everything but bars and digits. dat <- read.csv("test.csv") B <- gsub("[^/[:digit:]]+", "", dat$Composition_percent_part) Rui Barradas Em 03-09-2012 01:38, Sapana Lohani escreveu: Hello Rui, I do not know whats wrong with my data, so am sending you the whole column I wanted to split. Could you please have a look and suggest me the error? I ma totally stuck at this point of my analysis From: Rui Barradas To: Sapana Lohani Cc: r-help Sent: Sunday, September 2, 2012 10:05 AM Subject: Re: [R] splits with 0s in middle columns Hello, You don't need a new function, what you need is to prepare your data in such a way that the function can process it. A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") B <- gsub("\\(|\\)|percent| ", "", A) fun(B) Also, please use dput to post the data examples, dput(A) c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") Then copy&paste in your post. Rui Barradas Em 02-09-2012 04:22, Sapana Lohani escreveu: Dear Rui, The new code works fine for what I wanted. I have another similar column but it looks like A percent (10/20/30) percent (40/20) percent (60/10/10/5) percent (80/10) I want a similar split but delete the percent in the front. The output should look like A1 A2 A3 A4 10 20 0 30 40 0 0 20 60 10 10 5 80 0 0 10 Could you please make the small change in the code that you gave me. It must be a small edition but I could not figure that out. FYI the code that worked was fun <- function(X){ xname <- deparse(substitute(X)) s <- strsplit(X, "/") n <- max(sapply(s, length)) tmp <- numeric(n) f <- function(x){ x <- as.numeric(x) m <- length(x) if(m > 1){ tmp[n] <- x[m] tmp[seq_len(m - 1)] <- x[seq_len(m - 1)] }else tmp[1] <- x tmp } res <- do.call(rbind, lapply(s, f)) colnames(res) <- paste(xname, seq_along(s), sep = "") data.frame(res) } fun(A) Thank you so very much. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Impact of cex changing as a function of mfrow
Duncan I are getting closer to an understanding. 1. "index" is not relevant to the discussion. i am focused solely on margin text 2. the size pf the margin text does not change between pages 3. however, the location does change, moreso on the top / right than on the left / bottom. I rewrote the code to better illustrate the problem: ### FINDCEX <- function() { CORRECT <- 1 MFROW <- par()$mfrow MFCOL <- par()$mfcol TEST<- all(MFROW == MFCOL) if (TEST && MFROW == c(2,2))CORRECT <- 1 / 0.83 if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66 if (!TEST) cat("MFROW does not equal MFCOL\n") cat(MFROW, CORRECT, "\n") return(CORRECT) } DRAW<- function(N) { for (i in 1:N) { plot(1, bty="n") box(which="inner") } for (SIDE in 1:2) { mtext(outer=T, side=SIDE, line=0, cex=1, "line 0") mtext(outer=T, side=SIDE, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=SIDE, line=-FINDCEX(), cex=1, "line FINDCEX") } for (SIDE in 3:4) { mtext(outer=T, side=SIDE, line=0, cex=1, "line 0") mtext(outer=T, side=SIDE, line=1, cex=1, "line 1") } } pdf("TestCEX.pdf", 10, 8) par(mfrow=c(1,1), omi=c(1,1,1,1)) DRAW(1) par(mfrow=c(2,2), omi=c(1,1,1,1)) DRAW(4) par(mfrow=c(3,3), omi=c(1,1,1,1)) DRAW(9) graphics.off() system("open TestCEX.pdf") system("open TestCEX.pdf") ## in Windows, "start" instead of "open" If you execute this code, you will see: 1. the box at the outer border is identical on all pages (as expected) 2. the margin side on the left side superimposes between pages 3. on the bottom, margin text is spaced the same on each page (unlike at the top). However, the vertical position differs between pages (most obvious on comparing pages 1 and 3). I could correct the vertical positioning by brute force, i.e., adding a constant to each mtext command (applying a different value based on the value of mfrow) -- It turn out that a correction of 0.2 (for mfrow=c(2,2) and 0.4 (for mfrow=c(2,2) aligns everything properly. . However, this is not elegant. The correction factor took care of line-to-line spacing but not vertical position. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com On Sep 2, 2012, at 10:40 AM, Dennis Fisher wrote: > R 2.15.1 > OS X (MLion) > > Colleagues, > > I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a > layout with exactly two rows and columns the base value of ‘"cex"’ is reduced > by a factor of 0.83: if there are three or more of either rows or columns, > the reduction factor is 0.66). > > I generate a multipage PDF in which mfrow varies such that cex is impacted. > This affect mtext in the outer margin. Sample code is pasted at the bottom > of this email. The impact is most obvious if one examines the text at the > bottom of each page as one moves page-to-page. > > Does anyone have a suggestion for how to overcome this (other than using > brute force). > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > # In a layout with exactly two rows and columns the base value of ‘"cex"’ is > reduced by a factor of 0.83: if > # there are three or more of either rows or columns, the reduction factor is > 0.66. > > FINDCEX <- function() > { > CORRECT <- 1 > MFROW <- par()$mfrow > MFCOL <- par()$mfcol > TEST<- all(MFROW == MFCOL) > if (TEST && MFROW == c(2,2))CORRECT <- 1 / > 0.83 > if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66 > if (!TEST) cat("MFROW does not equal MFCOL\n") > cat(MFROW, CORRECT, "\n") > return(CORRECT) > } > pdf("TestCEX.pdf", 8, 6) > par(mfrow=c(1,1), omi=c(1,1,1,1)) > plot(1) > mtext(outer=T, side=1, line=0, cex=1, "line 0") > mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") > mtext(outer=T, side=3, line=0, cex=1, "line 0") > mtext(outer=T, side=3, line=1, cex=1, "line 1") > mtext(outer=T, side=2, line=0, cex=1, "line 0") > mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") > mtext(outer=T, side=4, line=0, cex=1, "line 0") > mtext(outer=T, side=4, line=1, cex=1, "line 1") > > > par(mfrow=c(1,2), omi=c(1,1,1,1)) > plot(1) > mtext(outer=T, side=1, line=0, cex=1, "line 0") > mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
Re: [R] a newbie seeking for a simple problem
Read the posting guide... you need to provide more specific information such as sample data (?dput). For this problem you should probably read ?merge --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. "Hon Kit (Stephen) Wong" wrote: >Dear Experienced R users, > >I have a looks-like simple but complicated problem urgently needed to >be solved. Below is the detail: > >I have two dataframes, df1, df2. df1 contains two column and many >thousands rows: column 1 is a "gene_name", column 2 is "value". df2 >contains only one column which is "gene_name" with couple hundred rows. >I want to change "value" of df2 for those "gene_name" also appear in >df2 "gene_name". How to do that? Millions thanks. > > >Ste > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a newbie seeking for a simple problem
Dear Experienced R users, I have a looks-like simple but complicated problem urgently needed to be solved. Below is the detail: I have two dataframes, df1, df2. df1 contains two column and many thousands rows: column 1 is a "gene_name", column 2 is "value". df2 contains only one column which is "gene_name" with couple hundred rows. I want to change "value" of df2 for those "gene_name" also appear in df2 "gene_name". How to do that? Millions thanks. Ste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why variations in accuracy between R to ARCGIS for the same point reprojection?
I apologize, I did not intend to blame anyone. I actually thought there may be some underlying differences in the formulas used. The patterns overall are very similar but they are not that precise to one another. The function I use in r is called spTransform from the package "rgdal". Again, my guess is that it has to do something with the functions used. As an example, I found two different ways to measure the radius of the Earth. So perhaps there is an additional parameter that differentiates the scripts from R and ArcGis?. Sorry again for the confusion and thanks, Camilo p.s. I always pride R. It is just hard to image academic life without it... Camilo Mora, Ph.D. Department of Geography, University of Hawaii Currently available in Colombia Phone: Country code: 57 Provider code: 313 Phone 776 2282 From the USA or Canada you have to dial 011 57 313 776 2282 http://www.soc.hawaii.edu/mora/ Quoting Prof Brian Ripley : There is no 'reprojection' in R (which is upper case). Please attribute blame correctly. You seem to be talking about some contributed addon package, not specified. But I think you should be asking this on R-sig-geo. On 02/09/2012 19:24, Camilo Mora wrote: Hi everyone, I wonder if anyone knows the reason why the outputs of the same reprojection in r and arcgis are different?. The magnitude of the change can be up to 40 km in the poles. Basically, I have a database of points equally separated by one degree over the globe. In ARCGIS, I am projecting the data in GCS-WGS-1984 and then reprojected it to Berhmann to ensure equal area distribution of the points. In R, I am using: spPoint <- SpatialPoints(coords=coordinates(Data),proj4string=CRS("+proj=longlat +datum=WGS84")) and then reprojecting it to Berhmann with: spPointReprj=spTransform(Data,CRS("+proj=cea +lon_0=0 +lat_ts=30 +x_0=0 +y_0=0 +ellps=WGS84")) If I put the two outputs of the reprojections in the same map, they are off by few meters in the tropics by up to 40km in the poles. I decided to investigate how the reprojections are done and my calculations are different from both R and ArcGis: First, I calculated the radious of the Earth in two different ways: =Re * (1 - e^2)/ (1 - e^2 *SIN(RADIANS(Latitude))^2)^(3/2) =Re * SQRT(1 - e^2) / (1 - e^2 * (SIN(RADIANS(Latitude)))^2) where Re is the radius of the Earth at the tropics(6378km) and e is the eccentricity of the ellipsoid (0.081082). According to several books I used, the position of a point in the Y-axis in the Berhmann projection could be estimated as: =Re*(SIN(RADIANS(Latitude))/COS(RADIANS(Spll))) where Spll is the standard parallel, which in the Berhmann's projection is 30. Unfortunately, the resulting output, although similar in shape to the outputs in R and Arcgis, is still not quite the same. Any thoughts why these differences in supposedly the same calculations? Any input will be greatly appreciated, Thanks, Camilo Camilo Mora, Ph.D. Department of Geography, University of Hawaii Currently available in Colombia Phone: Country code: 57 Provider code: 313 Phone 776 2282 From the USA or Canada you have to dial 011 57 313 776 2282 http://www.soc.hawaii.edu/mora/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NANs in test Breslow-Day for svytable??
Hello, Thanks for the indications for use epi.2by2 in svytable, however when running the code throws warnings that reference NANs, could help me with this? Following you show the results: > int19<-svytable(~Perdida_peso_no_intensionada+APES_DICOT+tabaco,Muestra.comp,Ntotal > = NULL) > int19 , , tabaco = Nunca fumador APES_DICOT Perdida_peso_no_intensionada Buena Mala No/No sabe 2108.3 599.8 Si 264.8 253.6 , , tabaco = Activo pasivo APES_DICOT Perdida_peso_no_intensionada Buena Mala No/No sabe 200.8 32.0 Si 25.0 32.0 > epi.2by2(dat = int19, method = "case.control", conf.level = 0.95, + units = 100, homogeneity = "breslow.day", verbose = TRUE)$OR.homog test.statistic dfp.value 1 6.566923 1 0.01038914 Hubo 12 avisos (use warnings() para verlos) > warnings(int19) Warning messages: 1: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 2: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 3: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 4: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 5: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 6: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 7: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 8: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 9: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 10: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 11: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 12: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 264.8 599.8 253.6 200.8 25 32 32 Thanks > Date: Sat, 1 Sep 2012 12:34:11 +1200 > Subject: Re: [R] test Breslow-Day for svytable?? > From: tlum...@uw.edu > To: dwinsem...@comcast.net > CC: dianamm...@hotmail.com; r-help@r-project.org; m.steven...@massey.ac.nz > > On Sat, Sep 1, 2012 at 4:27 AM, David Winsemius > wrote: > > > > On Aug 31, 2012, at 7:20 AM, Diana Marcela Martinez Ruiz wrote: > > > >> Hi all, > >> > >> I want to know how to perform the test Breslow-Day test for homogeneity of > >> odds ratios (OR) stratified for svytable. This test is obtained with the > >> following code: > >> > >> epi.2by2 (dat = daty, method = "case.control" conf.level = 0.95, > > > > missing comma here ...^ > > > >>units = 100, homogeneity = "breslow.day", verbose = TRUE) > >> > >> where "daty" is the object type table svytable consider it, but when I > >> run the code > >> does not throw the homogeneity test. > > > > You are asked in the Posting guide to copy all errors and warnings when > > asking about unexpected behavior. When I run epi.2y2 on the output of a > > syvtable object I get no errors, but I do get warnings which I think are > > due to non-integer entries in the weighted table. I also get from a > > svytable() usingits first example on the help page an object that is NOT a > > set of 2 x 2 tables in an array of the structure as expected by epi.2by2(). > > The fact that epi.2by2() will report numbers with labels for a 2 x 3 table > > means that its error checking is weak. > > > > This is the output of str(dat) from one of the example on epi.2by2's help > > page: > > > >> str(dat) > > table [1:2, 1:2, 1:3] 41 13 6 53 66 37 25 83 23 37 ... > > - attr(*, "dimnames")=List of 3 > > ..$ Exposure: chr [1:2] "+" "-" > > ..$ Disease : chr [1:2] "+" "-" > > ..$ Strata : chr [1:3] "20-29 yrs" "30-39 yrs" "40+ yrs" > > > > Notice that is is a 2 x 2 x n array. (Caveat:: from here on out I am simply > > reading the help pages and using str() to look at the objects created to > > get an idea regarding success or failure. I am not an experienced user of > > either package.) I doubt that what you got from svytable is a 2 x 2 > > table. As another example you can build a 2 x 2 x n table from the built-in > > dataset: UCBAdmissions > > > > DF <- as.data.frame(UCBAdmissions) > > ## Now 'DF' is a data frame with a grid of the factors and the counts > > ## in variable 'Freq'. > > dat2 <- xtabs(Freq ~ Gender + Admit+Dept, DF) > > epiR::epi.2by2(dat = dat2, method = "case.control", conf.level = 0.95, > > units = 100, homogeneity = "breslow.day", verbose = TRUE)$OR.homog > > #- > > test.statistic dfp.value >
[R] [newbie] scripting remote check for R package
summary: e.g., how to replace '' in the following: for RSERVER in 'foo' 'bar' 'baz' ; do ssh ${RSERVER} '' done or is there a better way to script checking for an R package? details: For my work I need a few non-standard R packages. I do that work on 2 clusters. One uses environment modules, so any given server on that cluster can provide (via `module add`) pretty much the same resources. The other cluster, on which I must also unfortunately work, has servers managed individually, so I get in the habit of doing, e.g., EXEC_NAME='whatever' for S in 'foo' 'bar' 'baz' ; do ssh ${RSERVER} "${EXEC_NAME} --version" done periodically. I'm wondering, what incantation to utter (bash preferred) via ssh to query a given server's R for a given package? TIA, Tom Roche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Impact of cex changing as a function of mfrow
On 12-09-02 3:52 PM, Dennis Fisher wrote: Duncan Perhaps I did not explain sufficiently -- the code that I sent makes the correction in FINDCEX. But, that correction does not accomplish the intended goal -- as evidenced by the graphic that is created from the code that I sent. The option of "brute force" would require trial and error. I was hoping that invoking some other option in mtext, such as padj, would accomplish my goal of needing only to multiply the line= value by the value returned from my function. You're right, I didn't understand you. Here's what I see in your sample code display (in 2.15.0 on a Mac with OSX 10.6.8): The spacing between "line 0" and "line 1" (on the top and right) gets smaller from page 1 to page 4. The size of the margin text is the same in all pages. The size of "Index" within the subfigures is smaller in the later pages. There are some other small movements of text from page to page, but not much. What do you see? Duncan Murdoch Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com On Sep 2, 2012, at 11:43 AM, Duncan Murdoch wrote: On 12-09-02 1:40 PM, Dennis Fisher wrote: R 2.15.1 OS X (MLion) Colleagues, I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a layout with exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66). I generate a multipage PDF in which mfrow varies such that cex is impacted. This affect mtext in the outer margin. Sample code is pasted at the bottom of this email. The impact is most obvious if one examines the text at the bottom of each page as one moves page-to-page. Does anyone have a suggestion for how to overcome this (other than using brute force). I am not sure what's wrong with brute force here. You know the formula for the reduction; just apply a corresponding increase first. Duncan Murdoch Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com # In a layout with exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor of 0.83: if # there are three or more of either rows or columns, the reduction factor is 0.66. FINDCEX <- function() { CORRECT <- 1 MFROW <- par()$mfrow MFCOL <- par()$mfcol TEST<- all(MFROW == MFCOL) if (TEST && MFROW == c(2,2))CORRECT <- 1 / 0.83 if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3)) CORRECT <- 1 / 0.66 if (!TEST) cat("MFROW does not equal MFCOL\n") cat(MFROW, CORRECT, "\n") return(CORRECT) } pdf("TestCEX.pdf", 8, 6) par(mfrow=c(1,1), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(1,2), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(2,2), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(3,3), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") graphics.off() system("open TestCEX.pdf") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provid
Re: [R] Impact of cex changing as a function of mfrow
Duncan Perhaps I did not explain sufficiently -- the code that I sent makes the correction in FINDCEX. But, that correction does not accomplish the intended goal -- as evidenced by the graphic that is created from the code that I sent. The option of "brute force" would require trial and error. I was hoping that invoking some other option in mtext, such as padj, would accomplish my goal of needing only to multiply the line= value by the value returned from my function. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com On Sep 2, 2012, at 11:43 AM, Duncan Murdoch wrote: > On 12-09-02 1:40 PM, Dennis Fisher wrote: >> R 2.15.1 >> OS X (MLion) >> >> Colleagues, >> >> I am aware that changes in mfrow / mfcol in par() affect cex (from help: In >> a layout with exactly two rows and columns the base value of ‘"cex"’ is >> reduced by a factor of 0.83: if there are three or more of either rows or >> columns, the reduction factor is 0.66). >> >> I generate a multipage PDF in which mfrow varies such that cex is impacted. >> This affect mtext in the outer margin. Sample code is pasted at the bottom >> of this email. The impact is most obvious if one examines the text at the >> bottom of each page as one moves page-to-page. >> >> Does anyone have a suggestion for how to overcome this (other than using >> brute force). > > I am not sure what's wrong with brute force here. You know the formula for > the reduction; just apply a corresponding increase first. > > Duncan Murdoch > >> >> Dennis >> >> Dennis Fisher MD >> P < (The "P Less Than" Company) >> Phone: 1-866-PLessThan (1-866-753-7784) >> Fax: 1-866-PLessThan (1-866-753-7784) >> www.PLessThan.com >> >> # In a layout with exactly two rows and columns the base value of ‘"cex"’ is >> reduced by a factor of 0.83: if >> # there are three or more of either rows or columns, the reduction factor is >> 0.66. >> >> FINDCEX <- function() >> { >> CORRECT <- 1 >> MFROW <- par()$mfrow >> MFCOL <- par()$mfcol >> TEST<- all(MFROW == MFCOL) >> if (TEST && MFROW == c(2,2))CORRECT <- 1 / >> 0.83 >> if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66 >> if (!TEST) cat("MFROW does not equal MFCOL\n") >> cat(MFROW, CORRECT, "\n") >> return(CORRECT) >> } >> pdf("TestCEX.pdf", 8, 6) >> par(mfrow=c(1,1), omi=c(1,1,1,1)) >> plot(1) >> mtext(outer=T, side=1, line=0, cex=1, "line 0") >> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=3, line=0, cex=1, "line 0") >> mtext(outer=T, side=3, line=1, cex=1, "line 1") >> mtext(outer=T, side=2, line=0, cex=1, "line 0") >> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=4, line=0, cex=1, "line 0") >> mtext(outer=T, side=4, line=1, cex=1, "line 1") >> >> >> par(mfrow=c(1,2), omi=c(1,1,1,1)) >> plot(1) >> mtext(outer=T, side=1, line=0, cex=1, "line 0") >> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=3, line=0, cex=1, "line 0") >> mtext(outer=T, side=3, line=1, cex=1, "line 1") >> mtext(outer=T, side=2, line=0, cex=1, "line 0") >> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=4, line=0, cex=1, "line 0") >> mtext(outer=T, side=4, line=1, cex=1, "line 1") >> >> par(mfrow=c(2,2), omi=c(1,1,1,1)) >> plot(1) >> mtext(outer=T, side=1, line=0, cex=1, "line 0") >> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=3, line=0, cex=1, "line 0") >> mtext(outer=T, side=3, line=1, cex=1, "line 1") >> mtext(outer=T, side=2, line=0, cex=1, "line 0") >> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=4, line=0, cex=1, "line 0") >> mtext(outer=T, side=4, line=1, cex=1, "line 1") >> >> par(mfrow=c(3,3), omi=c(1,1,1,1)) >> plot(1) >> mtext(outer=T, side=1, line=0, cex=1, "line 0") >> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=3, line=0, cex=1, "line 0") >> mtext(outer=T, side=3, line=1, cex=1, "line 1") >> mtext(outer=T, side=2, line=0, cex=1, "line 0") >> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") >> mtext(outer=T, side=4, line=0, cex=1, "line 0") >> mtext(outer=T, side=4, line=1, cex=1, "line 1") >> >> graphics.off() >> system("open TestCEX.pdf") >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.or
Re: [R] Loading Chess Data
Hello, Try the following library(XML) url <- "http://ratings.fide.com/top.phtml?list=men"; chess <- readHTMLTable(url, header = TRUE, which = 5) str(chess) # See what we have # All variables are factors, # convert these to integer chess$Rating <- as.integer(chess$Rating) chess$Games <- as.integer(chess$Games) chess$`B-Year` <- as.integer(chess$`B-Year`) head(chess, 20) # See first 20 rows Hope this helps, Rui Barradas Em 02-09-2012 17:41, David Arnold escreveu: All, What would be the most efficient way to load the data at the following address into a dataframe? http://ratings.fide.com/top.phtml?list=men Thanks, David -- View this message in context: http://r.789695.n4.nabble.com/Loading-Chess-Data-tp4642006.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.lm(...,type="terms") question
Hello, In the mean time, I've "discovered" an R package that might do the job, chemCal. It only implements first order linear regression (weighted), and like I'd said in my first reply, and was repeated several times, the standard errors are not reversed, the prediction bands follow Massart et al. (1988). Take a look at it, it's a very simple package. Rui Barradas Em 02-09-2012 18:07, John Thaden escreveu: Thank you all. My muddle about predict.lm(..., type = "terms") was evident even in my first sentence of my original posting How can I actually use the output of predict.lm(..., type="terms") to predict new term values from new response values? the answer being that I cannot; that new response values, if included in newdata, will simply be ignored by predict.lm, as well they should. As for the calibration issue, I am reviewing literature now as suggested. Though predict.lm performed to spec (no bug), may I suggest a minor change to ?predict.lm text? Existing: newdata An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. Proposed: newdata An optional data frame in which to look for new values of terms with which to predict. If omitted, the fitted values are used. -John Thaden, Ph.D. College Station, TX --- On Sun, 9/2/12, peter dalgaard wrote: From: peter dalgaard Subject: Re: [R] predict.lm(...,type="terms") question To: "David Winsemius" Cc: "Rui Barradas" , r-help@r-project.org, "jjthaden" Date: Sunday, September 2, 2012, 1:35 AM On Sep 2, 2012, at 03:38 , David Winsemius wrote: Why should predict not complain when it is offered a newdata argument that does no contain a vector of values for "x"? The whole point of the terms method of prediction is to offer estimates for specific values of items on the RHS of the formula. The OP seems to have trouble understanding that point. Putting in a vector with the name of the LHS item makes no sense to me. I certainly cannot see that any particular behavior for this pathological input is described for predict.lm in its help page, but throwing an error seems perfectly reasonable to me. Yes. Lots of confusion going on here. First, data= is _always_ used as the _first_ place to look for variables, if things are not in it, search continues into the formula's environment. To be slightly perverse, notice that even this works: y <- rnorm(10) x <- rnorm(10) d <- data.frame(z=rnorm(9)) lm(y ~ x, d) Call: lm(formula = y ~ x, data = d) Coefficients: (Intercept)x -0.2760 0.2328 Secondly, what is predict(..., type="terms") supposed to have to do with inverting a regression equation? That's just not what it does, it only splits the prediction formula into its constituent terms. Thirdly; no, you do not invert a regression equation by regressing y on x. That only works if you can be sure that your new (x, y) are sampled from the same population as the data, which is not going to be the case if you are fitting to data with, say, selected equispaced x values. There's a whole literature on how to do this properly, Google e.g. "inverse calibration" for enlightenment. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why variations in accuracy between R to ARCGIS for the same point reprojection?
There is no 'reprojection' in R (which is upper case). Please attribute blame correctly. You seem to be talking about some contributed addon package, not specified. But I think you should be asking this on R-sig-geo. On 02/09/2012 19:24, Camilo Mora wrote: Hi everyone, I wonder if anyone knows the reason why the outputs of the same reprojection in r and arcgis are different?. The magnitude of the change can be up to 40 km in the poles. Basically, I have a database of points equally separated by one degree over the globe. In ARCGIS, I am projecting the data in GCS-WGS-1984 and then reprojected it to Berhmann to ensure equal area distribution of the points. In R, I am using: spPoint <- SpatialPoints(coords=coordinates(Data),proj4string=CRS("+proj=longlat +datum=WGS84")) and then reprojecting it to Berhmann with: spPointReprj=spTransform(Data,CRS("+proj=cea +lon_0=0 +lat_ts=30 +x_0=0 +y_0=0 +ellps=WGS84")) If I put the two outputs of the reprojections in the same map, they are off by few meters in the tropics by up to 40km in the poles. I decided to investigate how the reprojections are done and my calculations are different from both R and ArcGis: First, I calculated the radious of the Earth in two different ways: =Re * (1 - e^2)/ (1 - e^2 *SIN(RADIANS(Latitude))^2)^(3/2) =Re * SQRT(1 - e^2) / (1 - e^2 * (SIN(RADIANS(Latitude)))^2) where Re is the radius of the Earth at the tropics(6378km) and e is the eccentricity of the ellipsoid (0.081082). According to several books I used, the position of a point in the Y-axis in the Berhmann projection could be estimated as: =Re*(SIN(RADIANS(Latitude))/COS(RADIANS(Spll))) where Spll is the standard parallel, which in the Berhmann's projection is 30. Unfortunately, the resulting output, although similar in shape to the outputs in R and Arcgis, is still not quite the same. Any thoughts why these differences in supposedly the same calculations? Any input will be greatly appreciated, Thanks, Camilo Camilo Mora, Ph.D. Department of Geography, University of Hawaii Currently available in Colombia Phone: Country code: 57 Provider code: 313 Phone 776 2282 From the USA or Canada you have to dial 011 57 313 776 2282 http://www.soc.hawaii.edu/mora/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Impact of cex changing as a function of mfrow
On 12-09-02 1:40 PM, Dennis Fisher wrote: R 2.15.1 OS X (MLion) Colleagues, I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a layout with exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66). I generate a multipage PDF in which mfrow varies such that cex is impacted. This affect mtext in the outer margin. Sample code is pasted at the bottom of this email. The impact is most obvious if one examines the text at the bottom of each page as one moves page-to-page. Does anyone have a suggestion for how to overcome this (other than using brute force). I am not sure what's wrong with brute force here. You know the formula for the reduction; just apply a corresponding increase first. Duncan Murdoch Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com # In a layout with exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor of 0.83: if # there are three or more of either rows or columns, the reduction factor is 0.66. FINDCEX <- function() { CORRECT <- 1 MFROW <- par()$mfrow MFCOL <- par()$mfcol TEST<- all(MFROW == MFCOL) if (TEST && MFROW == c(2,2))CORRECT <- 1 / 0.83 if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3)) CORRECT <- 1 / 0.66 if (!TEST) cat("MFROW does not equal MFCOL\n") cat(MFROW, CORRECT, "\n") return(CORRECT) } pdf("TestCEX.pdf", 8, 6) par(mfrow=c(1,1), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(1,2), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(2,2), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(3,3), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") graphics.off() system("open TestCEX.pdf") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why variations in accuracy between R to ARCGIS for the same point reprojection?
Hi everyone, I wonder if anyone knows the reason why the outputs of the same reprojection in r and arcgis are different?. The magnitude of the change can be up to 40 km in the poles. Basically, I have a database of points equally separated by one degree over the globe. In ARCGIS, I am projecting the data in GCS-WGS-1984 and then reprojected it to Berhmann to ensure equal area distribution of the points. In R, I am using: spPoint <- SpatialPoints(coords=coordinates(Data),proj4string=CRS("+proj=longlat +datum=WGS84")) and then reprojecting it to Berhmann with: spPointReprj=spTransform(Data,CRS("+proj=cea +lon_0=0 +lat_ts=30 +x_0=0 +y_0=0 +ellps=WGS84")) If I put the two outputs of the reprojections in the same map, they are off by few meters in the tropics by up to 40km in the poles. I decided to investigate how the reprojections are done and my calculations are different from both R and ArcGis: First, I calculated the radious of the Earth in two different ways: =Re * (1 - e^2)/ (1 - e^2 *SIN(RADIANS(Latitude))^2)^(3/2) =Re * SQRT(1 - e^2) / (1 - e^2 * (SIN(RADIANS(Latitude)))^2) where Re is the radius of the Earth at the tropics(6378km) and e is the eccentricity of the ellipsoid (0.081082). According to several books I used, the position of a point in the Y-axis in the Berhmann projection could be estimated as: =Re*(SIN(RADIANS(Latitude))/COS(RADIANS(Spll))) where Spll is the standard parallel, which in the Berhmann's projection is 30. Unfortunately, the resulting output, although similar in shape to the outputs in R and Arcgis, is still not quite the same. Any thoughts why these differences in supposedly the same calculations? Any input will be greatly appreciated, Thanks, Camilo Camilo Mora, Ph.D. Department of Geography, University of Hawaii Currently available in Colombia Phone: Country code: 57 Provider code: 313 Phone 776 2282 From the USA or Canada you have to dial 011 57 313 776 2282 http://www.soc.hawaii.edu/mora/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading Chess Data
Mr Arnold, > What would be the most efficient way to load the data at the following >> address into a dataframe? >> > To what end? In other words, what are you trying to achieve with the ratings list? -- H -- Sent from my mobile device Envoyait de mon portable [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading Chess Data
You might take a look at the link titled "Download Rating List", in the blue box on the right side of that page... albyn On 2012-09-02 9:41, David Arnold wrote: All, What would be the most efficient way to load the data at the following address into a dataframe? http://ratings.fide.com/top.phtml?list=men Thanks, David -- View this message in context: http://r.789695.n4.nabble.com/Loading-Chess-Data-tp4642006.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Impact of cex changing as a function of mfrow
R 2.15.1 OS X (MLion) Colleagues, I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a layout with exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66). I generate a multipage PDF in which mfrow varies such that cex is impacted. This affect mtext in the outer margin. Sample code is pasted at the bottom of this email. The impact is most obvious if one examines the text at the bottom of each page as one moves page-to-page. Does anyone have a suggestion for how to overcome this (other than using brute force). Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com # In a layout with exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor of 0.83: if # there are three or more of either rows or columns, the reduction factor is 0.66. FINDCEX <- function() { CORRECT <- 1 MFROW <- par()$mfrow MFCOL <- par()$mfcol TEST<- all(MFROW == MFCOL) if (TEST && MFROW == c(2,2))CORRECT <- 1 / 0.83 if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66 if (!TEST) cat("MFROW does not equal MFCOL\n") cat(MFROW, CORRECT, "\n") return(CORRECT) } pdf("TestCEX.pdf", 8, 6) par(mfrow=c(1,1), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(1,2), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(2,2), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") par(mfrow=c(3,3), omi=c(1,1,1,1)) plot(1) mtext(outer=T, side=1, line=0, cex=1, "line 0") mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=3, line=0, cex=1, "line 0") mtext(outer=T, side=3, line=1, cex=1, "line 1") mtext(outer=T, side=2, line=0, cex=1, "line 0") mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX") mtext(outer=T, side=4, line=0, cex=1, "line 0") mtext(outer=T, side=4, line=1, cex=1, "line 1") graphics.off() system("open TestCEX.pdf") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loading Chess Data
All, What would be the most efficient way to load the data at the following address into a dataframe? http://ratings.fide.com/top.phtml?list=men Thanks, David -- View this message in context: http://r.789695.n4.nabble.com/Loading-Chess-Data-tp4642006.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.lm(...,type="terms") question
Thank you all. My muddle about predict.lm(..., type = "terms") was evident even in my first sentence of my original posting > How can I actually use the output of > predict.lm(..., type="terms") to predict > new term values from new response values? the answer being that I cannot; that new response values, if included in newdata, will simply be ignored by predict.lm, as well they should. As for the calibration issue, I am reviewing literature now as suggested. Though predict.lm performed to spec (no bug), may I suggest a minor change to ?predict.lm text? Existing: newdata An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. Proposed: newdata An optional data frame in which to look for new values of terms with which to predict. If omitted, the fitted values are used. -John Thaden, Ph.D. College Station, TX --- On Sun, 9/2/12, peter dalgaard wrote: > From: peter dalgaard > Subject: Re: [R] predict.lm(...,type="terms") question > To: "David Winsemius" > Cc: "Rui Barradas" , r-help@r-project.org, "jjthaden" > > Date: Sunday, September 2, 2012, 1:35 AM > > On Sep 2, 2012, at 03:38 , David Winsemius wrote: > > > > > Why should predict not complain when it is offered a > newdata argument that does no contain a vector of values for > "x"? The whole point of the terms method of prediction is to > offer estimates for specific values of items on the RHS of > the formula. The OP seems to have trouble understanding that > point. Putting in a vector with the name of the LHS item > makes no sense to me. I certainly cannot see that any > particular behavior for this pathological input is described > for predict.lm in its help page, but throwing an error seems > perfectly reasonable to me. > > Yes. Lots of confusion going on here. > > First, data= is _always_ used as the _first_ place to look > for variables, if things are not in it, search continues > into the formula's environment. To be slightly perverse, > notice that even this works: > > > y <- rnorm(10) > > x <- rnorm(10) > > d <- data.frame(z=rnorm(9)) > > lm(y ~ x, d) > > Call: > lm(formula = y ~ x, data = d) > > Coefficients: > (Intercept) x > > -0.2760 > 0.2328 > > Secondly, what is predict(..., type="terms") supposed to > have to do with inverting a regression equation? That's just > not what it does, it only splits the prediction formula into > its constituent terms. > > Thirdly; no, you do not invert a regression equation by > regressing y on x. That only works if you can be sure that > your new (x, y) are sampled from the same population as the > data, which is not going to be the case if you are fitting > to data with, say, selected equispaced x values. There's a > whole literature on how to do this properly, Google e.g. > "inverse calibration" for enlightenment. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk > Priv: pda...@gmail.com > > > > > > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulation of genetic data
Hi, I am looking for the best way or best package available for simulating a genetic association between a specific SNP and a quantitative phenotype. All of the packages I saw in R seem to be specialised in pedigree data or in population data where coalescence and other evolutionary factors are specified, but I don't have any experience in population genetics and I only want to simulate the simple case of European population which is the most similar to my real data, having a normal distribution for the trait and an additive effect for the genotype. I am looking for something in R similar to the function in Plink where one needs to specify a range of allele frequencies, a range for the phenotype, and a variant which should result associated with the genotype. Can someone please help me? Thank you very much. James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R_closest date
Hi, No problem. If you use join() instead of merge(), the original order of columns may not get altered. dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min) library(plyr) join(dat1,dat3,type="inner") #Joining by: PT_ID, DAYS_DIFF # PT_ID IDX_DT OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY #1 4549 2002-08-21 2002-08-20 -1 183 2 #2 4839 2006-11-28 2006-11-28 0 179 2 A.K. From: Weijia Wang To: arun Sent: Saturday, September 1, 2012 5:11 PM Subject: Re: [R] R_closest date Thank you Arun, for your help again. Best __ WANG WEIJIA Graudate Research and Teaching Assistant Department of Environmental Medicine New York University, School of Medicine wwang@gmail.com On Sep 1, 2012, at 5:04 PM, arun wrote: Hi, >Try this: >dat1 <- read.table(text=" > PT_ID IDX_DT OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY >13 4549 2002-08-21 2002-08-20 -1 183 2 >14 4549 2002-08-21 2002-11-14 85 91 1 >15 4549 2002-08-21 2003-02-18 181 89 1 >16 4549 2002-08-21 2003-05-15 267 109 2 >17 4549 2002-08-21 2003-12-16 482 96 1 >128 4839 2006-11-28 2006-11-28 0 179 2 >", header=TRUE) >dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min) >merge(dat1,dat3) ># PT_ID DAYS_DIFF IDX_DT OBS_DATE OBS_VALUE CATEGORY >#1 4549 -1 2002-08-21 2002-08-20 183 2 >#2 4839 0 2006-11-28 2006-11-28 179 2 > >#or, >dat2<- tapply(dat1$DAYS_DIFF,dat1$PT_ID,min) >dat4<-data.frame(PT_ID=row.names(data.frame(dat2)),DAYS_DIFF=dat2) > row.names(dat4)<-1:nrow(dat4) >merge(dat1,dat4) ># PT_ID DAYS_DIFF IDX_DT OBS_DATE OBS_VALUE CATEGORY >#1 4549 -1 2002-08-21 2002-08-20 183 2 >#2 4839 0 2006-11-28 2006-11-28 179 2 >A.K. > > > > > >- Original Message - >From: WANG WEIJIA >To: "r-help@R-project.org" >Cc: >Sent: Saturday, September 1, 2012 1:10 PM >Subject: [R] R_closest date > >Hi, > >I have encountered an issue about finding a date closest to another date > >So this is how the data frame looks like: > > PT_ID IDX_DT OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY >13 4549 2002-08-21 2002-08-20 -1 183 2 >14 4549 2002-08-21 2002-11-14 85 91 1 >15 4549 2002-08-21 2003-02-18 181 89 1 >16 4549 2002-08-21 2003-05-15 267 109 2 >17 4549 2002-08-21 2003-12-16 482 96 1 >128 4839 2006-11-28 2006-11-28 0 179 2 > >I need to find, the single observation, which has the closest date of >'OBS_DATE' to 'IDX_DT'. > >For example, for 'PT_ID' of 4549, I need row 13, of which the OBS_DATE is just >one day away from IDX_DT. > >I was thinking about using abs(), and I got this: > >baseline<- function(x){ >+ >+ #remove all uncessary variables >+ baseline<- x[,c("PT_ID","DAYS_DIFF")] >+ >+ #get a list of every unique ID >+ uniqueID <- unique(baseline$PT_ID) >+ >+ #make a vector that will contain the smallest DAYS_DIFF >+ first <- rep(-99,length(uniqueID)) >+ >+ i = 1 >+ #loop through each unique ID >+ for (PT_ID in uniqueID){ >+ >+ #for each iteration get the smallest DAYS_DIFF for that ID >+ first[i] <- >min(baseline[which(baseline$PT_ID==PT_ID),abs(baseline$DAYS_DIFF)]) >+ >+ #up the iteration counter >+ i = i + 1 >+ >+ } >+ #make a data frame with the lowest DAYS_DIFF and ID >+ newdata <- data.frame(uniqueID,first) >+ names(newdata) <- c("PT_ID","DAYS_DIFF") >+ >+ #return the data frame containing the lowest GPI for each ID >+ return(newdata) >+ } > >ldl.b<-baseline(ldl) #get all baseline ldl patient ID, total 11368 obs, all >unique# >>Error in `[.data.frame`(baseline, which(baseline$PT_ID == PT_ID), >>abs(baseline$DAYS_DIFF)) : > undefined columns selected > >Can anyone help me in figuring out how to get the minimum value of the >absolute value of DAYS_DIFF for unique ID? > >Thanks a lot > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on finding specific columns in matrix
Hi, Try this: which.min(apply(a,2,mean)) #[1] 1 which.max(apply(a,2,mean)) #[1] 4 A.K. - Original Message - From: Andras Farkas To: "r-help@r-project.org" Cc: Sent: Sunday, September 2, 2012 5:49 AM Subject: [R] Help on finding specific columns in matrix Dear All, I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific columns in the set: the one that holds the highest values and the one that holds the lowest values. In this case the column's mean would be apropriate to use to try to find those specific columns because each columns mean is different and they all change together based on the same "change of rate constants" as a function of time (ie: tme is the most important determinant of the magnitude of that mean). The columns are not to be named, if possible Any thoughts on that? Would apreciate the help example: a <-matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4) I would need to find and plot with a box percentile plot column 1, the column with the lowest mean, and column 4, the column with the highest mean thanks, Andras [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmulti runs indefinitely when using genetic algorithm with lme4
Right, I've worked this one out - the problem is that the example (above) I was using to test run this package only contains 3 variables. When you add in a fourth it works fine: d = runif(30) And run again telling it to use GA: babs <- glmulti(y~a*b*c*d, level = 2, fitfunc = lmer.glmulti, random = "+(1|x)", method = "g") Returns: ... After 190 generations: Best model: y~1 Crit= 159.374382952181 Mean crit= 163.380382861026 Improvements in best and average IC have bebingo en below the specified goals. Algorithm is declared to have converged. Completed. Using glmulti out-of-the-box with a GLM gives the same result if you try to use GA with less than three variables. This is not really an issue however as if you've only got three variables it is possible to do an exhaustive search. The problem was the example. Thomas --- Original message --- From: Thomas To: r-help@r-project.org Cc: Subject: glmulti runs indefinitely when using genetic algorithm with lme4 Date: Sun, 02 Sep 2012 11:23:23 +0100 Dear List, I'm using glmulti for model averaging in R. There are ~10 variables in my model, making exhaustive screening impractical - I therefore need to use the genetic algorithm (GA) (call: method = "g"). I need to include random effects so I'm using glmulti as a wrapper for lme4. Methods for doing this are available here http://www.inside-r.org/packages/cran/glmulti/docs/glmulti and there is also a pdf included with the glmulti package that goes into more detail. The problem is that when telling glmulti to use GA in this setting it runs indefinitely, even after the best model has been found. This is the example taken from the pdf included in the glmulti package: library(lme4) library(glmulti) # create a function for glmulti to act as a wrapper for lmer: lmer.glmulti <- function (formula, data, random = "", ...) { lmer(paste(deparse(formula), random), data = data, REML=F, ...) } # set some random variables: y = runif(30,0,10) # mock dependent variable a = runif(30) # dummy covariate b = runif(30) # another dummy covariate c = runif(30) # an another one x = as.factor(round(runif(30),1))# dummy grouping factor # run exhaustive screening with lmer: bab <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random = "+(1|x)") This works fine. The problem is when I tell it to use the genetic algorithm: babs <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random = "+(1|x)", method = "g") It just keeps running indefinitely and the AIC does not change: After 19550 generations: Best model: y~1 Crit= 161.038899734164 Mean crit= 164.13629335762 Change in best IC: 0 / Change in mean IC: 0 After 19560 generations: Best model: y~1 Crit= 161.038899734164 Mean crit= 164.13629335762 Change in best IC: 0 / Change in mean IC: 0 After 19570 generations: Best model: y~1 Crit= 161.038899734164 Mean crit= 164.13629335762 etc. I have tried using calls that tell glmulti when to stop (deltaB = 0, deltaM = 0.01, conseq = 6) but nothing seems to work. I think the problem must lie with setting the function (?). It may be something really obvious however I'm new to R and I can't work it out. I am using R v.2.15.0, glmulti v.1.0.6 and lme4 v.0.99-0 on Windows. Any help with this would be much appreciated. Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] most efficient plyr solution
Dear list members, Any help on this efficiency issue would be greatly appreciated. I would like to find the most efficient way to run a non-vectorized function (here: fisher exact test p-value) iteratively using 4 matrices with identical dimensions. And as a result I aim for an array with identical dimensions containing the corresponding p-values. Please consider some code using a trivial example with 3x4 arrays below. Eventually I would like to run code on 2e3 x 7e6 arrays, for which someone suggested Amazon EC2 already... Q1: would you agree that fisher.test() is not vectorizable? e.g. fisher.test( matrix(c(Ax,Ay,Bx,By),ncol=2) ) does not work Q2: direct use of Ax, Ay, Bx, By as input instead of a (list) transform for the input would seem beneficial for speed Q3: parallelization of the iterative process seems to make sense. Q4: a progress bar seems to save peace of mind having no clue of the runtime. Q5: avoidance of an output transform to get array from vector Q6: for Q2/3/4/5 plyr seems to be ideal (e.g. maply) Please also find some solutions below. solution 1: using mapply solution 2: using lapply solution 3: using mclapply attempt 4: stuck on plyr implementation --Philip ### CODE START ### Ax <- matrix(c(2,3,5,6, 3,7,8,9, 8,2,1,3), ncol = 4) Ay <- matrix(c(9,8,5,7, 4,9,9,9, 8,7,5,4), ncol = 4) Bx <- matrix(c(1,5,9,8, 4,7,8,9, 2,3,2,1), ncol = 4) By <- matrix(c(5,5,9,9, 9,8,8,9, 5,5,3,2), ncol = 4) ### solution 1 using mapply # proper answer, no input transform, output transform, no parallelization, no progress update sol1 <- function() { res1 <- mapply( function(i,j,k,l) { fisher.test( matrix(c(i,j,k,l), ncol=2), conf.int=F)$p.value }, i=Ax, j=Ay, k=Bx, l=By, SIMPLIFY=TRUE) ans1 <- matrix(res1,ncol=4) return(ans1) } s1 <- sol1() ### solution 2 using lapply # proper answer, input transform, output transform, no parallelization, no progress update sol2 <- function() { tmp.list <- as.data.frame(rbind(as.numeric(Ax), as.numeric(Ay), as.numeric(Bx), as.numeric(By))) # determine fisher.test p-values as list res2 <- lapply(tmp.list, function(x) { fisher.test( matrix(unlist(x), ncol=2), conf.int=F)$p.value }) ans2 <- matrix(unlist(res2),ncol=4) return(ans2) } s2 <- sol2() ### solution 3 using mclapply # proper answer using input transform, output transform, parallelization, no progress update library(multicore) sol3 <- function() { tmp.list <- as.data.frame(rbind(as.numeric(Ax), as.numeric(Ay), as.numeric(Bx), as.numeric(By))) # determine fisher.test p-values as list res3 <- mclapply(tmp.list, function(x) { fisher.test( matrix(unlist(x), ncol=2), conf.int=F)$p.value }, mc.cores=4) ans3 <- matrix(unlist(res3),ncol=4) } s3 <- sol3() ### solution 4 using plyr::maply # difficulty finding equivalent code # benefit could be: no input transform, no output transform, parallelization, and progress update library(plyr) library(abind) library(doMC) registerDoMC(cores=4) sol4 <- function() { ans4 <- maply( #.data = abind(i=Ax,j=Ay,k=Bx,l=By,along=0), #.data = abind(Ax,Ay,Bx,By,along=3), #.data = data.frame(i=Ax, j=Ay, k=Bx, l=By), #.data = cbind(i=as.vector(Ax), j=as.vector(Ay), k=as.vector(Bx), l=as.vector(By)), #.data = list(i=Ax, j=Ay, k=Bx, l=By), .data = abind(i=Ax, j=Ay, k=Bx, l=By, along=3), .fun = function(i,j,k,l) { fisher.test( matrix(c(i,j,k,l), ncol=2), conf.int=F)$p.value }, .progress = "text", .parralel = TRUE ) return(ans4) } all.equal(s1,s2) # TRUE all.equal(s1,s3) # TRUE library(microbenchmark) microbenchmark(sol1, times=1000) microbenchmark(sol2, times=1000) microbenchmark(sol3, times=1000) microbenchmark(sol4, times=1000) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for help post-hoc comparisons
On Sep 2, 2012, at 12:16 AM, Luo Yinling wrote: Dear R community In my experiment, there are three treatments (F, R, and S), and the storge temperatures of these seeds were 4, 10 and 15. The seeds number of the treatment were the same. The question is whether the treatments and storage temperatures changed the seed germinating. The data attached as a ".csv" file names "g.csv". The mail server checks to see what file type is attached. If it is not mime-text, it gets discarded, and I suspect that your mail client sent it as something else. You might get better success with a vanilla editor like Notepad on Windows or TextEdit.app on a Mac and make sure to save with extension ".txt". If you want everyone to "be on the same page" you should also include your code. I performed a generalized linear mixed model using glm. Time and Temp are significant (p < 0.0001) and I would like to perform post- hoc comparisons to check where the difference among Time and temp categories but I did not find a solution in R help list or others. You might want to look at ?p.adjust to see if it will meet your post- hoc testing needs. If not you can also look at the multcomp package. Thank you very much in advance for your help. -- Xishuangbanna Tropical Botanical Garden Chinese Academy of Sciences Menglun Town, Mengla County Yunnan 666303 Tel.: +86-691-8716759 -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splits with 0s in middle columns
Hello, You don't need a new function, what you need is to prepare your data in such a way that the function can process it. A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") B <- gsub("\\(|\\)|percent| ", "", A) fun(B) Also, please use dput to post the data examples, dput(A) c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)", "percent (80/10)") Then copy&paste in your post. Rui Barradas Em 02-09-2012 04:22, Sapana Lohani escreveu: Dear Rui, The new code works fine for what I wanted. I have another similar column but it looks like A percent (10/20/30) percent (40/20) percent (60/10/10/5) percent (80/10) I want a similar split but delete the percent in the front. The output should look like A1 A2 A3 A4 10 20 0 30 40 0 0 20 60 10 10 5 80 0 0 10 Could you please make the small change in the code that you gave me. It must be a small edition but I could not figure that out. FYI the code that worked was fun <- function(X){ xname <- deparse(substitute(X)) s <- strsplit(X, "/") n <- max(sapply(s, length)) tmp <- numeric(n) f <- function(x){ x <- as.numeric(x) m <- length(x) if(m > 1){ tmp[n] <- x[m] tmp[seq_len(m - 1)] <- x[seq_len(m - 1)] }else tmp[1] <- x tmp } res <- do.call(rbind, lapply(s, f)) colnames(res) <- paste(xname, seq_along(s), sep = "") data.frame(res) } fun(A) Thank you so very much. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing 'exp(1e3)*0' correctly....
I very appreciate for good comments and tip regarding my question. All postings are excellent to know when I am writing such expression in R. Thank you so much, and my question is completely resolved from all your postings. On Sun, 2012-09-02 at 00:50 +0100, Rui Barradas wrote: > Em 02-09-2012 00:10, Jeff Newmiller escreveu: > > I disagree that this answer is "wrong". If you want a mathematically > > correct answer you are going to have to obtain it by applying intelligence > > to the algorithm in which this calculation occurred. > > Logarithms are the product of intelligence. > And the standard trick to make this sort of computation. > > x <- 1e3 > exp(x + log(0)) # zero > > x <- 1e300 > exp(x + log(0)) # zero > > Rui Barradas > > This is not a mailing list about numerical methods in general, so it > > probably isn't appropriate to pursue that conversation here. > > --- > > Jeff NewmillerThe . . Go Live... > > DCN:Basics: ##.#. ##.#. Live Go... > >Live: OO#.. Dead: OO#.. Playing > > Research Engineer (Solar/BatteriesO.O#. #.O#. with > > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > > --- > > Sent from my phone. Please excuse my brevity. > > > > CHEL HEE LEE wrote: > > > >> I have some trouble to deal the value of 'NaN'. For example, > >> > >>> exp(1e3) > >> [1] Inf > >>> exp(1e3)*0 > >> [1] NaN > >> > >> The correct answer should be 0 rather than NaN. I will very appreciate > >> if anyone can share some technique to get a correct answer. > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on finding specific columns in matrix
If I understand your question correctly, you want to identify the one column that has the lowest mean of all columns, and the one column that has the highest mean of all columns. Using your provided sample data, this gives you the indices: > colMeans(a) [1] 12.48160 17.46868 22.51761 27.59880 > which.min( colMeans( a ) ) [1] 1 > which.max( colMeans( a ) ) [1] 4 Does that help? Rgds, Rainer On Sunday 02 September 2012 02:49:11 Andras Farkas wrote: > Dear All, > > I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific columns in the set: the one that holds the highest values and the one that holds the lowest values. In this case the column's mean would be apropriate to use to try to find those specific columns because each columns mean is different and they all change together based on the same "change of rate constants" as a function of time (ie: tme is the most important determinant of the magnitude of that mean). The columns are not to be named, if possible Any thoughts on that? Would apreciate the help > > example: > > a <- matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4) > > I would need to find and plot with a box percentile plot column 1, the column with the lowest mean, and column 4, the column with the highest mean > > thanks, > > Andras > [[alternative HTML version deleted]] > - - - - - Boycott Apple! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aov(rt~(F1*F2*F3)+Error(s/(F1*F2*F3)), three_way) question + data
Dear list members, I am picking up some experimental data that was collected last year and analysed using a within subject repeated measure ANOVA using SPSS (glm). There are 3 factors (F1, F2, F3), each with 2 levels (foc - per; on - off; l - r) recorded within a balanced 2x2x2 design. I want to rerun the analysis, using R as I no longer have access to SPSS. Current attempts have produced different results to previous SPSS runs so I am requiring a bit of help to ensure I am using the right R code/re-ordering the data correctly: Attached is the original matrix data used for the SPSS analysis (mean RTs 0811 rnabble) http://r.789695.n4.nabble.com/file/n4641999/mean_RTs_0811_rnabble.csv mean_RTs_0811_rnabble.csv . I understand that the data needs to be formatted length wise, which is also attached (three_way rnabble) http://r.789695.n4.nabble.com/file/n4641999/three_way_rnabble.csv three_way_rnabble.csv . The data represents mean reaction times for each subject across conditions. I want (I think) to run a repeated measures within subject ANOVA in order to test main effects of 3 factors as well as any between factor interactions. The R code that I have used so far (but I am not sure it is the correct way to get what I want) is: three_way <- read.csv(file.choose(),header=TRUE) three_way aov.3way=aov(Reaction_time~(F1*F2*F3)+Error(Subject/(F1*F2*F3)),three_way) summary (aov.3way) This give the reading of significant main effect of F1 (p=0.001) and F2 (p=0.026), with no main effect of F3. There are no interactions. These results are different to that of SPSS which were: F1 = (p=0.001), F2 = (0.031), no main effect for F3. There was an interaction: F1*F3 (p=0.042). Question: Am I using the most appropriate R code for this? Many thanks in advance. Best wishes, David -- View this message in context: http://r.789695.n4.nabble.com/aov-rt-F1-F2-F3-Error-s-F1-F2-F3-three-way-question-data-tp4641999.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] for help post-hoc comparisons
Dear R community In my experiment, there are three treatments (F, R, and S), and the storge temperatures of these seeds were 4, 10 and 15. The seeds number of the treatment were the same. The question is whether the treatments and storage temperatures changed the seed germinating. The data attached as a ".csv" file names "g.csv". I performed a generalized linear mixed model using glm. Time and Temp are significant (p < 0.0001) and I would like to perform post-hoc comparisons to check where the difference among Time and temp categories but I did not find a solution in R help list or others. Thank you very much in advance for your help. -- Xishuangbanna Tropical Botanical Garden Chinese Academy of Sciences Menglun Town, Mengla County Yunnan 666303 Tel.: +86-691-8716759 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on finding specific columns in matrix
Dear All, I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific columns in the set: the one that holds the highest values and the one that holds the lowest values. In this case the column's mean would be apropriate to use to try to find those specific columns because each columns mean is different and they all change together based on the same "change of rate constants" as a function of time (ie: tme is the most important determinant of the magnitude of that mean). The columns are not to be named, if possible Any thoughts on that? Would apreciate the help example: a <-matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4) I would need to find and plot with a box percentile plot column 1, the column with the lowest mean, and column 4, the column with the highest mean thanks, Andras [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmulti runs indefinitely when using genetic algorithm with lme4
Dear List, I'm using glmulti for model averaging in R. There are ~10 variables in my model, making exhaustive screening impractical - I therefore need to use the genetic algorithm (GA) (call: method = "g"). I need to include random effects so I'm using glmulti as a wrapper for lme4. Methods for doing this are available here http://www.inside-r.org/packages/cran/glmulti/docs/glmulti and there is also a pdf included with the glmulti package that goes into more detail. The problem is that when telling glmulti to use GA in this setting it runs indefinitely, even after the best model has been found. This is the example taken from the pdf included in the glmulti package: library(lme4) library(glmulti) # create a function for glmulti to act as a wrapper for lmer: lmer.glmulti <- function (formula, data, random = "", ...) { lmer(paste(deparse(formula), random), data = data, REML=F, ...) } # set some random variables: y = runif(30,0,10) # mock dependent variable a = runif(30) # dummy covariate b = runif(30) # another dummy covariate c = runif(30) # an another one x = as.factor(round(runif(30),1))# dummy grouping factor # run exhaustive screening with lmer: bab <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random = "+(1|x)") This works fine. The problem is when I tell it to use the genetic algorithm: babs <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random = "+(1|x)", method = "g") It just keeps running indefinitely and the AIC does not change: After 19550 generations: Best model: y~1 Crit= 161.038899734164 Mean crit= 164.13629335762 Change in best IC: 0 / Change in mean IC: 0 After 19560 generations: Best model: y~1 Crit= 161.038899734164 Mean crit= 164.13629335762 Change in best IC: 0 / Change in mean IC: 0 After 19570 generations: Best model: y~1 Crit= 161.038899734164 Mean crit= 164.13629335762 etc. I have tried using calls that tell glmulti when to stop (deltaB = 0, deltaM = 0.01, conseq = 6) but nothing seems to work. I think the problem must lie with setting the function (?). It may be something really obvious however I'm new to R and I can't work it out. I am using R v.2.15.0, glmulti v.1.0.6 and lme4 v.0.99-0 on Windows. Any help with this would be much appreciated. Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (Newbie) help cannot find chrome
I should have added ?help.start tells you what I did already. So you should make it your first order of business to learn to use R's Help system. -- Bert On Sat, Sep 1, 2012 at 2:39 PM, Carilda Thomas wrote: > Following the beginning tutorial, I typed help.start() and was given a > choice of browsers. I picked chrome but got back that chrome is not found. > I cannot seem to change or get rid of it now. > I looked at /etc/R/Renviron but don't see anywhere to set browser > (R_BROWSER = ${R_BROWSER ...etc) - nothing about chrome. > I tried starting as "R_BROWSER= R" - same thing. > How do I change the default browser, for instance, to firefox? (Using Linux > Mint xfce) > Thanks. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (Newbie) help cannot find chrome
?options options("browser") <- ## whatever If you don't know how to use the Help system, read the Intro to R tutorial -- ot type ?help at the command prompt in interactive mode fpr the help for help. -- Bert On Sat, Sep 1, 2012 at 2:39 PM, Carilda Thomas wrote: > Following the beginning tutorial, I typed help.start() and was given a > choice of browsers. I picked chrome but got back that chrome is not found. > I cannot seem to change or get rid of it now. > I looked at /etc/R/Renviron but don't see anywhere to set browser > (R_BROWSER = ${R_BROWSER ...etc) - nothing about chrome. > I tried starting as "R_BROWSER= R" - same thing. > How do I change the default browser, for instance, to firefox? (Using Linux > Mint xfce) > Thanks. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.