[R] multiple imputation of longitudinal, time-unstructured data
Hello! I have a longitudinal dataset of radiation exposures of an occupational cohort. A percentage of the exposure values are missing and I would like to multiply impute the missing values (it is one option of several we are comparing). The data are recorded in long format (one row for each exposure entry) and there are multiple exposure measurements per worker. However, the data are time-unstructured (different data collection schedules for each worker) and unbalanced. I want to account for the correlation between repeated measurements on the same worker. However, because of the time-unstructured nature of the dataset, I am unable to convert my dataset into wide format and impute that way. I have begun reading about about using multilevel imputation for such a scenario, but I rather unfamiliar with this approach, including within R. Is this an appropriate method to investigate? Any advice on how to get started would be greatly appreciated! Thank you! Pam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] is parallel computing possible for 'rollapplyr' job?
Hi, The code below does exactly what I want in sequential mode. But, it is slow and I want to run it in parallel mode. I examined some windows version packages (parallel, snow, snowfall,..) but could not solve my specific problem. As far as I understood, either I have to write a new function like sfRollapplyr or I have to change my code in a way that it utilizes lapply, or sapply instead of 'rollapplyr' first then use sfInit, sfExport, and sfLapply,.. for parallel computing. I could not perform either so please help me :) ## nc<-313 rs<-50 ema<-10 h<-4 gomin1sd<-function (x,rho) { getOutliers(as.vector(x),rho=c(1,1))$limit[1] } dim(dt_l1_inp) [1] 50 312 dt_l1_min1<-matrix(nrow=rs, ncol=nc-1-(ema*h)) for (i in 1:rs) { dt_l1_min1[i,]<-rollapplyr(dt_l1_inp[i,], FUN=gomin1sd, width=ema*h+1) } ## [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using influence plots and obtaining id numbers
I am a novice R user, and I am having difficulty understanding R's influence plots. I am trying to remove outliers from a particular variable, "sib." I am able to generate influence plots and further outlier information such as below (which is a shortened example). For my analyses, I end up excluding the points R refers to, 7, 18, 26, and 105. However, my question is, how can I understand which ID numbers these points (7,18,26, and 105) are referring to? These numbers, 7,18, 26. and 105, are definitely not my study ID numbers. > Myoutput<-aov(sib~newgroup1, data=Study1) > influencePlot(Myoutput) [1] 7 18 26 105 > influence.measures(Myoutput) Influence measures of aov(formula = sib ~ newgroup1, data = Study1) : dfb.1_ dfb.nw12 dfb.nw13 dfb.nw14 dfb.nw15 dffit cov.r cook.dhat inf 33 1.70e-01 -1.33e-01 -1.53e-01 -1.56e-01 -1.52e-01 0.170405 1.124 5.83e-03 0.0909 * 34 7.79e-02 -6.07e-02 -7.00e-02 -7.14e-02 -6.94e-02 0.077934 1.131 1.22e-03 0.0909 * 35 1.47e-01 -1.15e-01 -1.32e-01 -1.35e-01 -1.31e-01 0.147268 1.126 4.36e-03 0.0909 * 36 6.64e-02 -5.17e-02 -5.96e-02 -6.08e-02 -5.91e-02 0.066386 1.132 8.86e-04 0.0909 * 37 -3.15e-01 2.46e-01 2.83e-01 2.89e-01 2.81e-01 -0.315448 1.100 1.99e-02 0.0909 * 38 1.47e-01 -1.15e-01 -1.32e-01 -1.35e-01 -1.31e-01 0.147268 1.126 4.36e-03 0.0909 * 39 -9.26e-01 7.22e-01 8.32e-01 8.48e-01 8.24e-01 -0.926059 0.882 1.64e-01 0.0909 * -- View this message in context: http://r.789695.n4.nabble.com/Using-influence-plots-and-obtaining-id-numbers-tp4339144p4339144.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice wireframe or cloud plot with different colours by a group
I have a question about wireframe 3-D plots and how to apply colors. I have a large dataset of river flow (m^3/s) over time, and I have coded these flows based on their height. I would like to produce a wireframe plot that colors the graph based on the flow code, i.e. I would like high flows to be red, medium to be green and low flows to be blue. Here is some sample data with the basic wireframe plot: flow.dat=cbind.data.frame(flow=sin(2*pi/53*c(1:3000))+1,day=as.numeric(format(as.Date(c(1:3000)), format="%j")), year=as.numeric(format(as.Date(c(1:3000)), format="%Y")),grp=c(rep(c("1.high","2.med","3.low"),1000))) wireframe(flow~day*year, data=flow.dat, shade=T) Is there any way to specify what colours are passed to the plot? I.e. wireframe(flow~day*year, data=flow.dat, shade=T, groups=grp, col.group=c("#FF3030","#551A8B","#43CD80")) I would also be happy if I could do this with a cloud plot, but I can't get the colors to plot correctly. cloud(flow~day*year, data=flow.dat, shade=T, groups=grp, col.group=c("#FF3030","#43CD80","#1E90FF"), pch=20) Any help is much appreciated! Thank you. -Pam Allen allen_...@hotmail.com -- View this message in context: http://r.789695.n4.nabble.com/Lattice-wireframe-or-cloud-plot-with-different-colours-by-a-group-tp3421296p3421296.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting a line that is multicoloured based on levels of a factor
Thank you Jim and David for your help. The 'levels' call is not a misdirection, in my actual dataset it is necessary because the flows aren't symmetrical. So while your solution is quite elegant David, it doesn't apply to my actual data, just the example. Too bad, it's quite nice! I do think that "color.scale.lines" can work, now I just need to figure out how! Unfortunately when I tried your example Jim: date=c(1:300) flow=sin(2*pi/53*c(1:300)) levels=c(rep(c("high","med","low"),100)) data=cbind.data.frame(date, flow, levels) plot(data$date,data$flow,type="n") library(plotrix) color.scale.lines(data$date,data$flow, col=color.scale(data$flow,extremes=c("blue","red"))) I got this error: "Error in length(redrange) : 'redrange' is missing" But I do think the function is the way to go for my dataset. Thank you! -Pam -- View this message in context: http://r.789695.n4.nabble.com/Help-with-plotting-a-line-that-is-multicoloured-based-on-levels-of-a-factor-tp3385857p3412310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting a line that is multicoloured based on levels of a factor
Hello again, I wrote an example that better represents my data, since the coloured points are actually consecutive, but with variable lengths: date=as.Date(c(1:300)) flow=sin(2*pi/53*c(1:300)) levels=c(rep(c("high","med","low"),100)) data=cbind.data.frame(date, flow, levels) library(zoo) z <- zoo(data$flow, data$date) zz=cbind.data.frame(date=as.Date(rownames(cbind.data.frame(rollapply(z, 2, align = "right", FUN="+",flow.change=(rollapply(z, 2, align = "right",FUN="+" ))) names(zz)=c("date","todays.flow","next.day.flow") zzz=cbind.data.frame(zz[,1], (zz[,3]-zz[,2])) names(zzz)=c("date","change.flow") data2=merge(data, zzz) rate=zoo(data2$change.flow,data2$date) x=cbind.data.frame(date=as.Date(rownames(cbind.data.frame(rollapply(rate, 2, align="left", FUN="+", sign=rollapply(rate, 2, align="left",FUN="+")) names(x)=c("date","todays.change","next.day.change") xx=cbind.data.frame(x[,1],(x[,3]*x[,2])) names(xx)=c("date","sign") data2=merge(data2, xx) data3=cbind(data2,pass1= ifelse(data2$flow<0, "extreme.low", ifelse(data2$flow>=0.9, "extreme.high","NA"))) data4=cbind(data3, pass2= ifelse(data3$flow<0.8&data3$flow>0&data3$change.flow>=0&data3$change>=0&data3$pass1=="NA","medium" , ifelse(data3$flow<0.7&data3$flow>0&data3$change.flow<0&data3$change<0&data3$pass1=="NA","medium","NA"))) data4$pass1=paste(data4$pass1, data4$pass2) data4$pass1=replace(data4$pass1, data4$pass1=="NA NA", "low") dat=cbind(data4[,1:5], class= ifelse(data4$pass1=="extreme.high NA","1.Extreme.High", ifelse(data4$pass1=="NA medium","2.Medium", ifelse(data4$pass1=="low","3.Low", ifelse(data4$pass1=="extreme.low NA","4.Extreme.Low",NA) colour=ifelse(dat$class=="1.Extreme.High","red", ifelse(dat$class=="2.Medium","green", ifelse(dat$class=="3.Low","blue", ifelse(dat$class=="4.Extreme.Low","purple","" plot(dat$date, dat$flow, col=colour) What I would like to do is to plot this using a line with the correct colours instead of points, i.e.: plot(dat$date, dat$flow, col=colour, type="l") ##Doesn't work, because the line is continuous Any help would be much appreciated. Thank you! -Pam -- View this message in context: http://r.789695.n4.nabble.com/Help-with-plotting-a-line-that-is-multicoloured-based-on-levels-of-a-factor-tp3385857p3406309.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting a line that is multicoloured based on levels of a factor
Hello Baptiste and others, I tried your example with my dataset, and for a few days I thought it worked for me. But I realized yesterday that the result wasn't quite what I hoped for. In my actual data the flows aren't perfectly sinusoidal, and I used a series of ifelse queries to code the flows into their different categories (i.e., extremely high, high, low, extremely low). Your solution almost worked, except that some flows are coloured incorrectly. I think the issue lies in the use of the "transform" or "approx" functions. I tried to understand what they do, but I wasn't able to figure it out. Is there a way to use the exact data set, i.e.: date=c(1:300) flow=sin(2*pi/53*c(1:300)) levels=c(rep(c("high","med","low"),100)) data=cbind.data.frame(date, flow, levels) With the following colours: colour=ifelse(data$levels=="high","red", ifelse(data$levels=="med","green", ifelse(data$levels=="low","blue",""))) And plot a line without having to create new data, i.e. "d"? Thank you. -Pam allen_...@hotmail.com -- View this message in context: http://r.789695.n4.nabble.com/Help-with-plotting-a-line-that-is-multicoloured-based-on-levels-of-a-factor-tp3385857p3406199.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interactive session
Thanks Niels but it won't do.. please copy and paste the 2 lines below together to your console in order to see what I mean: cat("?"); a<-readLines(n=1) b<-paste("t",a,sep="") anyone / any idea to overcome this problem? Best, Fatih Niels wrote: Hi Fatih I believe that readLines(n=1) will do the job. It works fine from the Windows RGui, but I noticed that it hangs my Aquamacs/ESS when R runs from there, and a C-g was needed (which may be completely irrelevant to you). Best, Niels On 30/09/10 08.55, Pam wrote: >Hi guys, > >My concern is to create an automated process from the beginning to the end. I >want to copy all my code together in one piece and paste it to R console and sit >back and relax :) except one moment in which the program should ask me to enter >a number.. and only then (only after getting a valid number from me) it should >continue to read and process the rest of the code. I�tried�lots of things >(readline, readLines, scan, interactive, ask, switch,...) and read manuals and >searched help forums.. I found several similar questions but never a satisfying >answer.. so now it became a challenge.. any idea how�to�overcome this >problem (R >2.11.1 for Windows)? (an example�code is below)� > >Best, >Fatih [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interactive session
Hi guys, My concern is to create an automated process from the beginning to the end. I want to copy all my code together in one piece and paste it to R console and sit back and relax :) except one moment in which the program should ask me to enter a number.. and only then (only after getting a valid number from me) it should continue to read and process the rest of the code. I tried lots of things (readline, readLines, scan, interactive, ask, switch,...) and read manuals and searched help forums.. I found several similar questions but never a satisfying answer.. so now it became a challenge.. any idea how to overcome this problem (R 2.11.1 for Windows)? (an example code is below) Best, Fatih library(gtools) library(YaleToolkit) library(xts) ### start of my wrong function! f<-function(w){ w<-readline("which data? ") w<-as.numeric(w) ifelse(is.numeric(w)=="TRUE", w, f()) } f() # end of my wrong function v<- ## and output of my function should be a "v" for example which I can use it in the next line (v<-w or something like that??) ##the rest works fine p<-paste("t", v, ".txt", sep = "") t<-read.table(p, header=FALSE, sep="\t", dec=",", blank.lines.skip=FALSE) rownames(t)<-as.Date(t[,1],"%d.%m.%Y") colnames(t)<-c("date","start","high","low","end","w.average","lot", "volume") x<-as.xts(t) whatis(x) . . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] forestplot and x axis scale
Hello R users, I would like to create several forestplots with the same X axis, so, if you were to look at the plots lined up all the X axes would be identical (and the different plots could be compared). Here is one version of code I've used: mytk10<-c(0.1, 0.5, 1, 2, 5, 10) pdf(file = "myfile.pdf", pointsize = 7, paper="letter", width=6, height=9) forestplot(newcite,or,lcl,ucl,zero=0, graphwidth = unit(1.2,"inches"), clip=c(log(0.1),log(10)), xlog=TRUE, xticks=mytk10, xlab="Odds Ratio", col=meta.colors(box="darkblue",line="darkblue",zero="grey50")) title(main = list("My title", col="darkblue", font=2)) dev.off() --> I have changed the width of the pdf output and/or the graphwidth specified in the forestplot function -- and depending on the length of the text/table descriptions (in the matrix "newcite"), the X axis will vary (when the text is long, the axis is shorter). I tried fixing the axis at a relatively small size (using graphwidth), but it would still be smaller when I was using data with long "newcite" text. Do I need to fix the amount of text for display within "newcite"? --> A second and less essential question: I've used mytk10 for the axis tickmarks/labels - I'd prefer no decimal points for 1,2,5,and 10 - any way to adjust this? Thanks in advance for any assistance! Pam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.