HI GG, I thought you were referring about the solution that I sent today. Sorry, I didn't check it. TRy this:
dat1<- read.table(text=" patient_id number responsed_at delais scores1 scores2 scores3 1 1 2010-05-26 NA 2.6 0.5 0.7 1 2 2010-07-07 42 2.5 NA NA 1 3 2010-08-14 38 2.3 NA NA 1 4 2010-10-01 48 2.5 0.7 0.6 1 5 2010-12-01 61 2.5 NA NA 2 1 2011-07-19 NA 2.5 0.8 0.5 2 2 2011-09-22 65 2.6 NA NA 2 3 2011-10-26 34 2.7 NA NA 3 1 2011-07-17 NA 2.8 0.5 0.6 3 2 2011-10-30 103 2.6 NA NA 3 3 2011-12-23 54 2.5 NA NA ",sep="",header=TRUE,stringsAsFactors=FALSE) dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 & delais>20, 1,ifelse(delais<60 & delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA)))) #deleted lines dat1[as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),-8] # patient_id number responsed_at delais scores1 scores2 scores3 #10 3 2 2011-10-30 103 2.6 NA NA #11 3 3 2011-12-23 54 2.5 NA NA nrow(dat1[as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),-8]) #[1] 2 #or table(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x)))))[2] #1 #2 #added lines datSub<-dat1[!as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),] #count sum(with(datSub,idx[idx>1 & !is.na(idx)])-1) #[1] 5 A.K. ________________________________ From: GUANGUAN LUO <guanguan...@gmail.com> To: arun <smartpink...@yahoo.com> Sent: Wednesday, May 22, 2013 11:30 AM Subject: Re: how to calculate the mean in a period of time? Hi Arun, I meant to say "number of observations you have deleted". In the table, you have deleted some lines which have a delais > 90 days, and added some lines copied from the precedent lines. Can I calculate the number of lines that you have deleted and the number of lines that you have added? GG 2013/5/22 arun <smartpink...@yahoo.com> Hi GG, >It is not clear what you meant by "number of observations you have detected". >Is it the number of observations in the initial dataset for each patient_id? >Number of observations added also means what? In the final result, we are >taking the means for each 3 observations keeping the t="0" along with the >first cluster for each Patient_id. > > > > > >________________________________ >From: GUANGUAN LUO <guanguan...@gmail.com> >To: arun <smartpink...@yahoo.com> >Sent: Wednesday, May 22, 2013 11:13 AM > >Subject: Re: how to calculate the mean in a period of time? > > > >Hello A.K, >for this reconstructed table, can i calculate the number of the observations >you have delected and the number of the observations that you have added? >Thanks > >GG > > >2013/5/17 arun <smartpink...@yahoo.com> > > >> >>Hi, >>Try this: >> >> dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 & >>delais>20, 1,ifelse(delais<60 & >>delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA)))) >>dat1$idx1<-c(dat1$idx[-head(dat1$idx,1)],1) >> >>library(zoo) >>res1<-do.call(rbind,lapply(split(dat1,dat1$patient_id),function(x) >>{x$idx[as.logical(cumsum(is.na(x$idx)))]<-NA; x1<-x[!is.na(x$idx),]; >>x1[,6:8]<-na.locf(x1[,6:8]);x1$idx1[is.na(x1$idx1)]<-1; >>x2<-x1[rep(seq_len(nrow(x1)),x1$idx1),]; >>x2$delais[duplicated(x2$delais,fromLast=FALSE)]<-0; >>x2$t<-seq(0,nrow(x2)-1,1);x2[,-c(8:9)]})) >> row.names(res1)<- 1:nrow(res1) >> res2<- res1[,c(1:3,8,4:7)] >> res2 >> >># patient_id number responsed_at t delais scores1 scores2 scores3 >>#1 1 1 2010-05-26 0 NA 2.6 0.5 0.7 >>#2 1 2 2010-07-07 1 42 2.5 0.5 0.7 >>#3 1 3 2010-08-14 2 38 2.3 0.5 0.7 >>#4 1 3 2010-08-14 3 0 2.3 0.5 0.7 >> >>#5 1 4 2010-10-01 4 48 2.5 0.7 0.6 >>#6 1 4 2010-10-01 5 0 2.5 0.7 0.6 >>#7 1 4 2010-10-01 6 0 2.5 0.7 0.6 >> >>#8 1 5 2010-12-01 7 61 2.5 0.7 0.6 >>#9 2 1 2011-07-19 0 NA 2.5 0.8 0.5 >>#10 2 1 2011-07-19 1 0 2.5 0.8 0.5 >>#11 2 1 2011-07-19 2 0 2.5 0.8 0.5 >> >>#12 2 2 2011-09-22 3 65 2.6 0.8 0.5 >>#13 2 3 2011-10-26 4 34 2.7 0.8 0.5 >>#14 3 1 2011-07-17 0 NA 2.8 0.5 0.6 >>A.K. >> >>________________________________ >>From: GUANGUAN LUO <guanguan...@gmail.com> >>To: arun <smartpink...@yahoo.com> >>Sent: Friday, May 17, 2013 9:33 AM >> >>Subject: Re: how to calculate the mean in a period of time? >> >> >> >>Hello, >>Thank you for your help >>the lines added to the tables are the precedent lines but not the followed >>lines, if i just change x2<-x1[rep(seq_len(nrow(x1)-1), is that ok? and so >>the delais should be changed too, isn't it? >> >> >>GG >> >> >>2013/5/17 arun <smartpink...@yahoo.com> >> >>Hi, >>>No problem. >>> >>>Arun >>> >>> >>> >>> >>> >>> >>>________________________________ >>>From: GUANGUAN LUO <guanguan...@gmail.com> >>>To: arun <smartpink...@yahoo.com> >>>Sent: Friday, May 17, 2013 4:19 AM >>> >>>Subject: Re: how to calculate the mean in a period of time? >>> >>> >>> >>>Ah, yes, that is the wrong thing i have written. Thank you so much. the >>>output which you have got is right. >>>Thanks a lot. >>>GG >>> >>> >>>2013/5/16 arun <smartpink...@yahoo.com> >>> >>>Hi, >>>>The output you showed is not clear especially the for the scores3, >>>> >>>> 2 2 2011-09-22 3 65 >>>>2.6 0.8 0.8 >>>>2 3 2011-10-26 4 34 >>>>2.7 0.8 0.8 >>>>3 1 2011-07-17 0 NA 2.8 >>>> 0.5 0.6 >>>>In the input data, the scores3 column didn't had 0.8. >>>> >>>> >>>>This is what I got: >>>>dat1<- read.table(text=" >>>> >>>>patient_id number responsed_at delais scores1 scores2 >>>> scores3 >>>> 1 1 2010-05-26 NA 2.6 >>>> 0.5 0.7 >>>> 1 2 2010-07-07 42 2.5 >>>> NA NA >>>> 1 3 2010-08-14 38 2.3 >>>> NA NA >>>> 1 4 2010-10-01 48 2.5 >>>> 0.7 0.6 >>>> 1 5 2010-12-01 61 2.5 >>>> NA NA >>>>2 1 2011-07-19 NA 2.5 >>>> 0.8 0.5 >>>>2 2 2011-09-22 65 2.6 >>>> NA NA >>>>2 3 2011-10-26 34 >>>>2.7 NA NA >>>>3 1 2011-07-17 NA 2.8 >>>> 0.5 0.6 >>>>3 2 2011-10-30 103 2.6 >>>> NA NA >>>>3 3 2011-12-23 54 >>>>2.5 NA NA >>>>",sep="",header=TRUE,stringsAsFactors=FALSE) >>>> >>>> dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 & delais>20, >>>>1,ifelse(delais<60 & delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA)))) >>>>library(zoo) >>>>res<-do.call(rbind,lapply(split(dat1,dat1$patient_id),function(x) >>>>{x$idx[as.logical(cumsum(is.na(x$idx)))]<-NA; x1<-x[!is.na(x$idx),]; >>>>x1[,6:8]<-na.locf(x1[,6:8]);x2<-x1[rep(seq_len(nrow(x1)),x1$idx),]; >>>>x2$delais[duplicated(x2$delais,fromLast=TRUE)]<-0; >>>>x2$t<-seq(0,nrow(x2)-1,1);x2})) >>>> >>>> >>>>row.names(res)<- 1:nrow(res) >>>> res1<- res[,c(1:3,9,4:7)] >>>>res1 >>>># patient_id number responsed_at t delais scores1 scores2 scores3 >>>>#1 1 1 2010-05-26 0 NA 2.6 0.5 0.7 >>>>#2 1 2 2010-07-07 1 42 2.5 0.5 0.7 >>>>#3 1 3 2010-08-14 2 38 2.3 0.5 0.7 >>>>#4 1 4 2010-10-01 3 0 2.5 0.7 0.6 >>>>#5 1 4 2010-10-01 4 48 2.5 0.7 0.6 >>>>#6 1 5 2010-12-01 5 0 2.5 0.7 0.6 >>>>#7 1 5 2010-12-01 6 0 2.5 0.7 0.6 >>>>#8 1 5 2010-12-01 7 61 2.5 0.7 0.6 >>>>#9 2 1 2011-07-19 0 NA 2.5 0.8 0.5 >>>>#10 2 2 2011-09-22 1 0 2.6 0.8 0.5 >>>>#11 2 2 2011-09-22 2 0 2.6 0.8 0.5 >>>>#12 2 2 2011-09-22 3 65 2.6 0.8 0.5 >>>>#13 2 3 2011-10-26 4 34 2.7 0.8 0.5 >>>>#14 3 1 2011-07-17 0 NA 2.8 0.5 0.6 >>>> >>>>A.K. >>>> >>>>________________________________ >>>>From: GUANGUAN LUO <guanguan...@gmail.com> >>>>To: arun <smartpink...@yahoo.com> >>>>Sent: Thursday, May 16, 2013 12:05 PM >>>> >>>>Subject: Re: how to calculate the mean in a period of time? >>>> >>>> >>>> >>>> >>>>Hello, AK, >>>>now i have a problem really complicated for me, >>>> >>>>Now my table is like this: >>>> >>>>patient_id number responsed_at delais scores1 scores2 >>>> scores3 >>>> 1 1 2010-05-26 NA 2.6 >>>> 0.5 0.7 >>>> 1 2 2010-07-07 42 2.5 >>>> NA NA >>>> 1 3 2010-08-14 38 2.3 >>>> NA NA >>>> 1 4 2010-10-01 48 2.5 >>>> 0.7 0.6 >>>> 1 5 2010-12-01 61 2.5 >>>> NA NA >>>>2 1 2011-07-19 NA 2.5 >>>> 0.8 0.5 >>>>2 2 2011-09-22 65 2.6 >>>> NA NA >>>>2 3 2011-10-26 34 >>>>2.7 NA NA >>>>3 1 2011-07-17 NA 2.8 >>>> 0.5 0.6 >>>>3 2 2011-10-30 103 2.6 >>>> NA NA >>>>3 3 2011-12-23 54 >>>>2.5 NA NA >>>> >>>>explications: delais = the date of "responsed_at" - the date of precedent >>>>"responsed_at" >>>>scores1 is measured every month >>>>scores2 and 3 are measured every three months >>>> >>>>first thing is : if the 20<delais <45, this count one month of delais >>>>if the 45<=delais <60, this count two month of delais,so add one line with >>>>the copy of the precedent line,and change the delais to 0 >>>> >>>>if the 60 <= delais <=90, this count three month of delais, so add two >>>>lines with the copy of the precedent line,and change these two delais to 0 >>>> >>>>if the delais >90, delete all the following lines >>>> >>>>and add a column "t", "t" means the month, "t" is in order >>>> >>>>second thing is : >>>>I want to replace NA of scores2 and scores3 with the precedent scores >>>> >>>>and finally get a table like this: >>>> >>>>patient_id number responsed_at t delais scores1 >>>>scores2 scores3 >>>> 1 1 2010-05-26 0 NA 2.6 >>>> 0.5 0.7 # scores2 and 3 are mesured every 3 >>>> 1 2 2010-07-07 1 42 2.5 >>>> 0.5 0.7 ## months,replace the following with >>>>precedent numbers >>>> 1 3 2010-08-14 2 38 2.3 >>>> 0.5 0.7 # copy this line >>>>1 3 2010-08-14 3 0 2.3 >>>> 0.5 0.7 # add one line here and change delais to 0 >>>> 1 4 2010-10-01 4 48 2.5 >>>> 0.7 0.6 >>>>1 4 2010-10-01 5 0 >>>>2.5 0.7 0.6 >>>>1 4 2010-10-01 6 0 >>>>2.5 0.7 0.6 >>>> 1 5 2010-12-01 7 61 >>>>2.5 0.7 0.6 >>>>2 1 2011-07-19 0 NA 2.5 >>>> 0.8 0.5 >>>>2 1 2011-07-19 1 0 2.5 >>>> 0.8 0.5 >>>>2 1 2011-07-19 2 0 2.5 >>>> 0.8 0.5 >>>>2 2 2011-09-22 3 65 2.6 >>>> 0.8 0.8 >>>>2 3 2011-10-26 4 34 >>>>2.7 0.8 0.8 >>>>3 1 2011-07-17 0 NA 2.8 >>>> 0.5 0.6 >>>>(3 2 2011-10-30 103 >>>>2.6 0.5 0.6)# delete these 2 line >>>> >>>> >>>>(3 3 2011-12-23 54 >>>>2.5 0.5 0.6) ## and delete the following lines of this >>>>patient because the delais is 103 which superior to 90 >>>> >>>> >>>> >>>>Do you know how can i get this? >>>>Thank you so much >>>> >>>>GG >>>> >>>> >>>>2013/5/9 GUANGUAN LUO <guanguan...@gmail.com> >>>> >>>>Hello, >>>>>Because according to the database, the first time of collecting the data >>>>>is just for an entry, an inclusion of the patients. What i should focus on >>>>>is the difference between two month. If i don't include the t0, i think i >>>>>will lose the information. I don't know if this is pertinent. >>>>>Thank you very much for your help, AK >>>>> >>>>>GG >>>>> >>>>> >>>>> >>>>>2013/5/9 arun <smartpink...@yahoo.com> >>>>> >>>>>HI GG, >>>>>>I did the code according the calculation shown by you. For example, in >>>>>>the first case, you had scores of (0+1+2+3)/4, which is not 3 months. >>>>>>It is 3+ initial month. After that it is only 3 months that is >>>>>>repeating. So, my code is doing this: >>>>>> 1st group (the scores) >>>>>>(0+1+2+3)/4 >>>>>>2nd group( scores) >>>>>>(5+6+7)/3 >>>>>>3rd group >>>>>>(8+9+10)/3 >>>>>> >>>>>>Here, the months are used only for illustration. I am taking the mean of >>>>>>the scores. >>>>>> >>>>>>So, if you are only doing every 3 months, why do you need to combine the >>>>>>initial scores with the first 3 months? Anyway, I did what you asked for. >>>>>> >>>>>> >>>>>>Regards, >>>>>>Arun >>>>>> >>>>>> >>>>>>________________________________ >>>>>>From: GUANGUAN LUO <guanguan...@gmail.com> >>>>>>To: arun <smartpink...@yahoo.com> >>>>>>Sent: Thursday, May 9, 2013 4:19 AM >>>>>> >>>>>>Subject: Re: how to calculate the mean in a period of time? >>>>>> >>>>>> >>>>>> >>>>>>Hi,Thank you so much for your help. I have seen your code. It's very >>>>>>complicated. The time 0 is the beginning of collecting all the >>>>>>information. >>>>>> >>>>>>The time 1 is the first month when follow the patients, time 2 is the >>>>>>second month ect. If i want to calculate the means of every 3 month do >>>>>>you think i should do like this? >>>>>> >>>>>>GG >>>>>> >>>>>>2013/5/7 arun <smartpink...@yahoo.com> >>>>>> >>>>>>Hi, >>>>>>>Your question is still not clear. >>>>>>>May be this helps: >>>>>>> >>>>>>>dat2<- read.table(text=" >>>>>>> >>>>>>>patient_id t scores >>>>>>>1 0 1.6 >>>>>>>1 1 2.6 >>>>>>>1 2 2.2 >>>>>>>1 3 1.8 >>>>>>>2 0 2.3 >>>>>>>2 2 2.5 >>>>>>>2 4 2.6 >>>>>>>2 5 1.5 >>>>>>>",sep="",header=TRUE) >>>>>>> >>>>>>>library(plyr) >>>>>>> dat2New<-ddply(dat2,.(patient_id),summarize,t=seq(min(t),max(t))) >>>>>>> res<-join(dat2New,dat2,type="full") >>>>>>>res1<-do.call(rbind,lapply(split(res,res$patient_id),function(x) >>>>>>>{x1<-x[x$t!=0,];do.call(rbind,lapply(split(x1,((x1$t-1)%/%3)+1),function(y) >>>>>>> {y1<-if(any(y$t==1)) rbind(x[x$t==0,],y) else y; >>>>>>>data.frame(patient_id=unique(y1$patient_id),scores=mean(y1$scores,na.rm=TRUE))}) >>>>>>> ) })) >>>>>>> row.names(res1)<-1:nrow(res1) >>>>>>>res1$period<-with(res1,ave(patient_id,patient_id,FUN=seq)) >>>>>>> res1 >>>>>>># patient_id scores period >>>>>>>#1 1 2.05 1 >>>>>>>#2 2 2.40 1 >>>>>>>#3 2 2.05 2 >>>>>>> >>>>>>> >>>>>>>A.K. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>________________________________ >>>>>>>From: GUANGUAN LUO <guanguan...@gmail.com> >>>>>>>To: arun <smartpink...@yahoo.com> >>>>>>>Sent: Tuesday, May 7, 2013 11:29 AM >>>>>>>Subject: Re: how to calculate the mean in a period of time? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>Yes , as you have said, probably , it's not continuous. >>>>>>> >>>>>>> >>>>>>>2013/5/7 arun <smartpink...@yahoo.com> >>>>>>> >>>>>>>Hi, >>>>>>>>Your question is not clear. You mentioned to calculate the mean of 3 >>>>>>>>months, but infact you added the scores for t=0,1,2,3 as first 3 >>>>>>>>months, then possibly 4,5,6 as the next. So, it is not exactly three >>>>>>>>months. Isn't it? >>>>>>>> >>>>>>>> >>>>>>>>Dear R experts, >>>>>>>>sorry to trouble you again. >>>>>>>>My data is like this now : >>>>>>>>patient_id t scores >>>>>>>>1 0 1.6 >>>>>>>>1 1 2.6 >>>>>>>>1 2 2.2 >>>>>>>>1 3 1.8 >>>>>>>>2 0 2.3 >>>>>>>>2 2 2.5 >>>>>>>>2 4 2.6 >>>>>>>>2 5 1.5 >>>>>>>> >>>>>>>>I want to calculate the mean of period of 3 months, just get a table >>>>>>>>like this >>>>>>>> >>>>>>>>patient_id period scores >>>>>>>>1 1 2.05 >>>>>>>>(1.6+2.6+2.2+1.8)/4 >>>>>>>>2 1 2.4 >>>>>>>>(2.3+2.5)/2 >>>>>>>>2 2 2.05 >>>>>>>>(2.6+1.5)/2 >>>>>>>> >>>>>>>>thank you in avance >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.