HI, May be this is what you wanted. #using tt1 indx<-which(tt1$response=="buy") tt1$newcolumn<-0 tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx)) seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else seq(indx[i]+1,indx[i+1]-1);x2<- tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<- subset(x2,response=="buy"); x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0) row.names(x3)[x3$product%in% x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1
tt1 subj response product newcolumn 1 1 sample 1 0 2 1 sample 2 0 3 1 buy 3 0 4 2 sample 2 0 5 2 buy 2 0 6 3 sample 3 1 7 3 sample 2 1 8 3 buy 1 0 9 4 sample 1 1 10 4 buy 4 0 11 5 buy 4 1 12 5 sample 2 1 13 5 buy 2 1 14 6 buy 4 1 15 6 sample 5 0 16 6 sample 5 0 17 7 sample 4 1 18 7 buy 3 1 19 7 buy 4 1 20 8 buy 5 0 21 8 sample 4 1 22 8 buy 2 1 A.K. ________________________________ From: vanessa van der vaart <vanessa.va...@gmail.com> To: arun <smartpink...@yahoo.com> Cc: David Winsemius <dwinsem...@comcast.net>; R help <r-help@r-project.org> Sent: Saturday, July 27, 2013 6:55 PM Subject: Re: [R] Duplicated function with conditional statement Dear all,, thank you all for your help..Its been such a help but its not really exactly what I am looking for. Apparently I havent explained the condition very clearly. I hope this can works. If the data on column product is duplicated from the previous row, (its applied for response==buy and ==sample) , and it is duplicated from the row which has the value on column 'response'== buy, than the value = 1, otherwise is =0. so in that case, if the value is duplicated but it is duplicated from the previous row where the value of resonse==sample, than it is not considered duplicated, and in the new column is 0 thank you very much in advance, I really appreciated On Sat, Jul 27, 2013 at 3:45 AM, arun <smartpink...@yahoo.com> wrote: > >On some slightly different datasets: >tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, >6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, >2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, >1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"), > product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5, > 5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product" >), class = "data.frame", row.names = c(NA, 22L)) > >tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, >6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, >2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, >1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"), > product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4, > 2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product" >), class = "data.frame", row.names = c(NA, 22L)) > >tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, >6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, >2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, >1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"), > product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2, > 2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product" >), class = "data.frame", row.names = c(NA, 22L)) > > >#Tried David's solution: >tt1$rown <- rownames(tt1) >as.numeric ( apply(tt1, 1, function(x) { > x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", >"product"] } ) ) > #gave inconsistent results especially since the first 10 rows were from `tt` ># [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1 > >#similarly for `tt2` and `tt3`. > > >##Created this function. It seems to work in the tested cases, though it is >not tested extensively. >fun1<- function(dat,colName,newColumn){ > indx<- which(dat[,colName]=="buy") > dat[,newColumn]<-0 > dat[unlist(lapply(seq_along(indx),function(i){ > x1<- if(i==length(indx)){ > seq(indx[i],nrow(dat)) > } > else if((indx[i+1]-indx[i])==1){ > indx[i] > } > else { > seq(indx[i]+1,indx[i+1]-1) > } > x2<- dat[unique(c(indx[i:1],x1)),] > x3<- subset(x2,response=="sample") > x4<- subset(x2,response=="buy") > if(nrow(x3)!=0) { > row.names(x3)[x3$product%in% x4$product] > } > > })),newColumn]<-1 > dat > > } >fun1(tt,"response","newCol") ># subj response product rown newCol >#1 1 sample 1 1 0 >#2 1 sample 2 2 0 >#3 1 buy 3 3 0 >#4 2 sample 2 4 0 >#5 2 buy 2 5 0 >#6 3 sample 3 6 1 >#7 3 sample 2 7 1 >#8 3 buy 1 8 0 >#9 4 sample 1 9 1 >#10 4 buy 4 10 0 > >fun1(tt1,"response","newCol") ># subj response product newCol >#1 1 sample 1 0 >#2 1 sample 2 0 >#3 1 buy 3 0 >#4 2 sample 2 0 >#5 2 buy 2 0 >#6 3 sample 3 1 >#7 3 sample 2 1 >#8 3 buy 1 0 >#9 4 sample 1 1 >#10 4 buy 4 0 >#11 5 buy 4 0 >#12 5 sample 2 1 >#13 5 buy 2 0 >#14 6 buy 4 0 >#15 6 sample 5 0 >#16 6 sample 5 0 >#17 7 sample 4 1 >#18 7 buy 3 0 >#19 7 buy 4 0 >#20 8 buy 5 0 >#21 8 sample 4 1 >#22 8 buy 2 0 >#Also > fun1(tt2,"response","newCol") > fun1(tt3,"response","newCol") >A.K. > >P.S. Below is OP's clarification regarding the conditional statement in a >private message: > >I am sorry i didnt question it very clearly, let me change the >conditional statement, I hope you can understand. i will explain by >example > >as you can see, almost every number is duplicated, but only in row 6th,7th,and >9th the value on column is 1. > >on row4th, the value is duplicated( 2 already occurred on 2nd row),but >since the value is considered as duplicated only if the value is >duplicated where the response is 'buy' than the value on column, on >row4th still zero. > >On row 6th, where the value product column is 3. 3 is already occurred >in 3rd row where the value on response is 'buy', so the value on column >should be 1 > >I hope it can understand the conditional statement. > > > > > > > > > >----- Original Message ----- >From: David Winsemius <dwinsem...@comcast.net> >To: David Winsemius <dwinsem...@comcast.net> >Cc: R-help@r-project.org; Uwe Ligges <lig...@statistik.tu-dortmund.de> >Sent: Friday, July 26, 2013 5:16 PM >Subject: Re: [R] Duplicated function with conditional statement > > >On Jul 26, 2013, at 2:06 PM, David Winsemius wrote: > >> >> On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote: >> >>> >>> >>> On 25.07.2013 21:05, vanessa van der vaart wrote: >>>> Hi everybody,, >>>> I have a question about R function duplicated(). I have spent days try to >>>> figure this out,but I cant find any solution yet. I hope somebody can help >>>> me.. >>>> this is my data: >>>> >>>> subj=c(1,1,1,2,2,3,3,3,4,4) >>>> response=c('sample','sample','buy','sample','buy','sample',' >>>> sample','buy','sample','buy') >>>> product=c(1,2,3,2,2,3,2,1,1,4) >>>> tt=data.frame(subj, response, product) >>>> >>>> the data look like this: >>>> >>>> subj response product >>>> 1 1 sample 1 >>>> 2 1 sample 2 >>>> 3 1 buy 3 >>>> 4 2 sample 2 >>>> 5 2 buy 2 >>>> 6 3 sample 3 >>>> 7 3 sample 2 >>>> 8 3 buy 1 >>>> 9 4 sample 1 >>>> 10 4 buy 4 >>>> >>>> I want to create new column based on the value on response and product >>>> column. if the value on product is duplicated, then the value on new >>>> column >>>> is 1, otherwise is 0. >>> >>> >>> According to your description: >>> >> >> Agree that the description did not match the output. I tried to match the >> output using a rule that could be expressed as: >> >> if( a "buy"- associated "product" value precedes the current "product" >> value){1}else{0} >> > >So this delivers the specified output: > >tt$rown <- rownames(tt) >as.numeric ( apply(tt, 1, function(x) { > x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", >"product"] } ) ) > ># [1] 0 0 0 0 0 1 1 0 1 0 > >> -- >> David. >> >>> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy") >>> >>> which is different from what you show us below, where I cannot derive any >>> systematic rule from. >>> >>> Uwe Ligges >>> >>>> but I want to add conditional statement that the value on product column >>>> will only be considered as duplicated if the value on response column is >>>> 'buy'. >>>> for illustration, the table should look like this: >>>> >>>> subj response product newcolumn >>>> 1 1 sample 1 0 >>>> 2 1 sample 2 0 >>>> 3 1 buy 3 0 >>>> 4 2 sample 2 0 >>>> 5 2 buy 2 0 >>>> 6 3 sample 3 1 >>>> 7 3 sample 2 1 >>>> 8 3 buy 1 0 >>>> 9 4 sample 1 1 >>>> 10 4 buy 4 0 >>>> >>>> >>>> can somebody help me? >>>> any help will be appreciated. >>>> I am new in this mailing list, so forgive me in advance, If I did not ask >>>> the question appropriately. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >David Winsemius >Alameda, CA, USA > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.