Re: [R] Duplicated function with conditional statement

2013-07-28 Thread vanessa van der vaart
Dear Arun,,

Thank you. its perfect! wow! thank you very much..and David, thank you for
you too.. its such a help. I am so sorry it must've been confusing at the
beginning..
really, I dont know how to thank you..

well do you mind if I ask you how can you be so expert? what kind a book or
training did you have? and how long have you been working on R?
I am really interested in R


On Sun, Jul 28, 2013 at 2:40 AM, arun  wrote:

> If you wanted to wrap it in a function:
>
>
> fun1<- function(dat,colName,newColumn){
>   indx<- which(dat[,colName]=="buy")
>   dat[,newColumn]<-0
>   dat[unique(unlist(lapply(seq_along(indx),function(i){
> x1<- if(i==length(indx)){
> seq(indx[i],nrow(dat))
>  }
> else if((indx[i+1]-indx[i])==1){
> indx[i]
> }
> else {
> seq(indx[i]+1,indx[i+1]-1)
>  }
> x2<- dat[unique(c(indx[i:1],x1)),]
> x3<- subset(x2,response=="sample")
> x4<- subset(x2,response=="buy")
> x4New<-x4[order(as.numeric(row.names(x4))),]
> x5<- row.names(x4New)[duplicated(x4New$product)]
> x6<- if(nrow(x3)!=0) {
> row.names(x3)[x3$product%in% x4$product]
>}
>
> sort(as.numeric(c(x5,x6)))
> }))),newColumn] <- 1
> dat
>
> }
>
>
>  fun1(tt1,"response","newCol")
> #   subj response product newCol
> #1 1   sample   1  0
> #2 1   sample   2  0
> #3 1  buy   3  0
> #4 2   sample   2  0
> #5 2  buy   2  0
> #6 3   sample   3  1
> #7 3   sample   2  1
> #8 3  buy   1  0
> #9 4   sample   1  1
> #104  buy   4  0
> #115  buy   4  1
> #125   sample   2  1
> #135  buy   2  1
> #146  buy   4  1
> #156   sample   5  0
> #166   sample   5  0
> #177   sample   4  1
> #187  buy   3  1
> #197  buy   4  1
> #208  buy   5  0
> #21    8   sample   4  1
> #228  buy   2  1
>
> A.K.
>
>
> - Original Message -
> From: arun 
> To: vanessa van der vaart 
> Cc: David Winsemius ; R help  >
> Sent: Saturday, July 27, 2013 9:11 PM
> Subject: Re: [R] Duplicated function with conditional statement
>
> HI,
> May be this is what you wanted.
> #using tt1
> indx<-which(tt1$response=="buy")
> tt1$newcolumn<-0
> tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx))
> seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else
> seq(indx[i]+1,indx[i+1]-1);x2<-
> tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<-
> subset(x2,response=="buy");
> x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0)
> row.names(x3)[x3$product%in% x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1
>
>
>  tt1
>subj response product newcolumn
> 1 1   sample   1 0
> 2 1   sample   2 0
> 3 1  buy   3 0
> 4 2   sample   2 0
> 5 2  buy   2 0
> 6 3   sample   3 1
> 7 3   sample   2 1
> 8 3  buy   1 0
> 9 4   sample   1 1
> 104  buy   4 0
> 115  buy   4 1
> 125   sample   2 1
> 135  buy   2 1
> 146  buy   4 1
> 156   sample   5         0
> 166   sample   5     0
> 177   sample   4 1
> 187  buy   3 1
> 197  buy   4 1
> 208  buy   5 0
> 218   sample   4 1
> 228  buy   2 1
> A.K.
>
>
>
>
>
> 
> From: vanessa van der vaart 
> To: arun 
> Cc: David Winsemius ; R help  >
> Sent: Saturday, July 27, 2013 6:55 PM
> Subject: Re: [R] Duplicated function with conditional statement
>
>
>
> Dear all,,
> thank you all for your help..Its been such a help but its not really
> exactly what I am looking for. Apparently I havent explained the condition
> very clearly. I hope this can works.
>
> If the data on column product is duplicated from the previous row, (its
> applied for response==buy and ==sample) , and it is d

Re: [R] Duplicated function with conditional statement

2013-07-27 Thread arun
Dear Vanessa,
Glad to know that it works.
Sorry, I misunderstood ur question initially because there were no duplicates 
for "product" from response=="buy" in your initial dataset (tt).
Regarding the code: what i did in brief is:
1. Find the rows with response=="buy
 indx<- which(dat[,colName]=="buy")  #in fun1()
dat[,newColumn]<-0 #created a newcolumn with 0's
2.  Loop over these `indx` using lapply()
3. Checked some conditions:
  a. if(i==length(indx)) #means if it is the last element in indx or the last 
row with response=="buy"
    seq(indx[i], nrow(dat)) # here I wanted to get the sequence from the last 
indx to the last  row of dataframe
   #for example.

  indx<-which(tt1$response=="buy")
 indx
# [1]  3  5  8 10 11 13 14 18 19 20 22
 nrow(tt1)
#[1] 22
seq(indx[length(indx)],nrow(tt1))
#[1] 22
#this could change depending upon the two values.
seq(20,22) #if the last indx with response=="buy" was in 20th row
#[1] 20 21 22

b. the second condition occurs when you have consecutive "buy" rows
 else if((indx[i+1]-indx[i])==1){
indx
# [1]  3  5  8 10 11 13 14 18 19 20 22
indx[5]-indx[4] # or
indx[7]-indx[6] #or
indx[9]-indx[8] etc..
then I would want that indx[i] value in the loop

c. if it is other cases:
indx[1], indx[2]
seq(indx[1]+1, indx[1+1]-1)
#[1] 4
4. x2<- dat[unique(c(indx[i:1],x1)),] ### this was a bug in the function which 
troubled me.
it should be
x2<- dat[unique(c(indx[1:i],x1)),] #this is what I was looking for.  It created 
a problem which I fixed using
x4New<- #  
x2 ## gives me all the rows starting from the 1st row of response=="buy" to 
that row of response=="buy" according to the indx + the rows that are between 
two indx values
For indx[1], it should be row 4 because indx[2] is 5.
likewise for indx[2], it is
seq(indx[2]+1, indx[2+1]-1)
#[1] 6 7

5. Subset the data `x2` into x3 and x4 which have response=="sample" and 
response=="buy" respectively
6. x4New <- # because of a previous mistake by me.  It is still needed as an 
additional check
7. x5<- # it checks the duplicated rows for product in x4New
8. x6<- #here, a condition was used because some list elements have 0 rows for 
x3.  I guess it occurs when you have consecutive "buy" rows.
9. sort(as.numeric(c(x5,x6))) #concatentate and sorted these
10. unique(unlist( #unlist the list and choose only the unique elements
11. dat[unique(unlist(,newColumn]<-1 # assign those rows that fits the 
condition in newColumn as 1.

Hope it helps.
Regards,
A.K.

 














____
From: vanessa van der vaart 
To: arun  
Sent: Saturday, July 27, 2013 11:07 PM
Subject: Re: [R] Duplicated function with conditional statement



Dear Arun,,

Thank you very much. the code really works.

I was wondering if you could explain how the code works.
I am really interested in R, and I really want to master it 

I will really appreciate it, but please, if you think this is too much to ask, 
please just ignore it.

Thank you very much in advance,
Best Regards,Vanessa



On Sun, Jul 28, 2013 at 4:02 AM, vanessa van der vaart 
 wrote:

Dear Arun,,
>
>
>Thank you. its perfect! wow! thank you very much..and David, thank you for you 
>too.. its such a help. I am so sorry it must've been confusing at the 
>beginning..
>really, I dont know how to thank you..  
>
>
>well do you mind if I ask you how can you be so expert? what kind a book or 
>training did you have? and how long have you been working on R?
>I am really interested in R
>
>
>
>On Sun, Jul 28, 2013 at 2:40 AM, arun  wrote:
>
>If you wanted to wrap it in a function:
>>
>>
>>
>>fun1<- function(dat,colName,newColumn){
>>  indx<- which(dat[,colName]=="buy")
>>  dat[,newColumn]<-0
>>  dat[unique(unlist(lapply(seq_along(indx),function(i){
>>
>>            x1<- if(i==length(indx)){
>>                seq(indx[i],nrow(dat))
>>             }
>>            else if((indx[i+1]-indx[i])==1){
>>            indx[i]
>>            }
>>            else {
>>            seq(indx[i]+1,indx[i+1]-1)
>>             }
>>            x2<- dat[unique(c(indx[i:1],x1)),]
>>            x3<- subset(x2,response=="sample")
>>            x4<- subset(x2,response=="buy")
>>            x4New<-x4[order(as.numeric(row.names(x4))),]
>>            x5<- row.names(x4New)[duplicated(x4New$product)]
>>            x6<- if(nrow(x3)!=0) {
>>                row.names(x3)[x3$product%in% x4$product]
>>                       }
>>           
>>            sort(as.numeric(c(x5,x6)))
>>            }))),newColumn] <- 1
>>    dat   
>>
>>
>>

Re: [R] Duplicated function with conditional statement

2013-07-27 Thread arun
If you wanted to wrap it in a function:


fun1<- function(dat,colName,newColumn){
  indx<- which(dat[,colName]=="buy")
  dat[,newColumn]<-0
  dat[unique(unlist(lapply(seq_along(indx),function(i){
            x1<- if(i==length(indx)){
                seq(indx[i],nrow(dat))
             }
            else if((indx[i+1]-indx[i])==1){
            indx[i]
            }
            else {
            seq(indx[i]+1,indx[i+1]-1)
             }
            x2<- dat[unique(c(indx[i:1],x1)),]
            x3<- subset(x2,response=="sample")
            x4<- subset(x2,response=="buy")
            x4New<-x4[order(as.numeric(row.names(x4))),]
            x5<- row.names(x4New)[duplicated(x4New$product)]
            x6<- if(nrow(x3)!=0) {
                row.names(x3)[x3$product%in% x4$product]
                       }
            
            sort(as.numeric(c(x5,x6)))
            }))),newColumn] <- 1
    dat    

    }


 fun1(tt1,"response","newCol")
#   subj response product newCol
#1 1   sample   1  0
#2 1   sample   2  0
#3 1  buy   3  0
#4 2   sample   2  0
#5 2  buy   2  0
#6 3   sample   3  1
#7 3   sample   2  1
#8 3  buy   1  0
#9 4   sample   1  1
#10    4  buy   4  0
#11    5  buy   4  1
#12    5   sample   2  1
#13    5  buy   2  1
#14    6  buy   4  1
#15    6   sample   5  0
#16    6   sample   5  0
#17    7   sample   4  1
#18    7  buy   3  1
#19    7  buy   4  1
#20    8  buy   5  0
#21    8   sample   4  1
#22    8  buy   2  1

A.K.


- Original Message -
From: arun 
To: vanessa van der vaart 
Cc: David Winsemius ; R help 
Sent: Saturday, July 27, 2013 9:11 PM
Subject: Re: [R] Duplicated function with conditional statement

HI,
May be this is what you wanted.
#using tt1
indx<-which(tt1$response=="buy")
tt1$newcolumn<-0
tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx)) 
seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else 
seq(indx[i]+1,indx[i+1]-1);x2<- 
tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<- 
subset(x2,response=="buy"); 
x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0) 
row.names(x3)[x3$product%in% x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1


 tt1
   subj response product newcolumn
1 1   sample   1 0
2 1   sample   2 0
3 1  buy   3 0
4 2   sample   2 0
5 2  buy   2 0
6 3   sample   3 1
7 3   sample   2 1
8 3  buy   1 0
9 4   sample   1 1
10    4  buy   4 0
11    5  buy   4 1
12    5   sample   2 1
13    5  buy   2 1
14    6  buy   4 1
15    6   sample   5 0
16    6   sample   5 0
17    7   sample   4 1
18    7  buy   3 1
19    7  buy   4 1
20    8  buy   5 0
21    8   sample   4 1
22    8  buy   2 1
A.K.





____________
From: vanessa van der vaart 
To: arun  
Cc: David Winsemius ; R help  
Sent: Saturday, July 27, 2013 6:55 PM
Subject: Re: [R] Duplicated function with conditional statement



Dear all,,
thank you all for your help..Its been such a help but its not really exactly 
what I am looking for. Apparently I havent explained the condition very 
clearly. I hope this can works.

If the data on column product is duplicated from the previous row, (its applied 
for response==buy and ==sample) , and it is duplicated from the row which has 
the value on column 'response'== buy, than  the value = 1, otherwise is =0.
so in that case,
if the value is duplicated but it is duplicated from the previous row where the 
value of resonse==sample, than it is not considered duplicated, and in the new 
column is 0

thank you very much in advance,
I really appreciated

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated function with conditional statement

2013-07-27 Thread arun
HI,
May be this is what you wanted.
#using tt1
indx<-which(tt1$response=="buy")
tt1$newcolumn<-0
tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx)) 
seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else 
seq(indx[i]+1,indx[i+1]-1);x2<- 
tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<- 
subset(x2,response=="buy"); 
x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0) 
row.names(x3)[x3$product%in% x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1


 tt1
   subj response product newcolumn
1 1   sample   1 0
2 1   sample   2 0
3 1  buy   3 0
4 2   sample   2 0
5 2  buy   2 0
6 3   sample   3 1
7 3   sample   2 1
8 3  buy   1 0
9 4   sample   1 1
10    4  buy   4 0
11    5  buy   4 1
12    5   sample   2 1
13    5  buy   2 1
14    6  buy   4 1
15    6   sample   5 0
16    6   sample   5 0
17    7   sample   4 1
18    7  buy   3 1
19    7  buy   4 1
20    8  buy   5 0
21    8   sample   4 1
22    8  buy   2 1
A.K.






From: vanessa van der vaart 
To: arun  
Cc: David Winsemius ; R help  
Sent: Saturday, July 27, 2013 6:55 PM
Subject: Re: [R] Duplicated function with conditional statement



Dear all,,
thank you all for your help..Its been such a help but its not really exactly 
what I am looking for. Apparently I havent explained the condition very 
clearly. I hope this can works.

If the data on column product is duplicated from the previous row, (its applied 
for response==buy and ==sample) , and it is duplicated from the row which has 
the value on column 'response'== buy, than  the value = 1, otherwise is =0.
so in that case,
if the value is duplicated but it is duplicated from the previous row where the 
value of resonse==sample, than it is not considered duplicated, and in the new 
column is 0

thank you very much in advance,
I really appreciated



On Sat, Jul 27, 2013 at 3:45 AM, arun  wrote:


>
>On some slightly different datasets:
>tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
>1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"),
>    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5,
>    5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
>1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"),
>    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4,
>    2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
>1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"),
>    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2,
>    2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>
>#Tried David's solution:
>tt1$rown <- rownames(tt1)
>as.numeric ( apply(tt1, 1, function(x) {
>    x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", 
>"product"]  } ) )
>  #gave inconsistent results especially since the first 10 rows were from `tt`
># [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1
>
>#similarly for `tt2` and `tt3`.
>
>
>##Created this function.  It seems to work in the tested cases, though it is 
>not tested extensively.
>fun1<- function(dat,colName,newColumn){
>  indx<- which(dat[,colName]=="buy")
>  dat[,newColumn]<-0
>  dat[unlist(lapply(seq_along(indx),function(i){
>            x1<- if(i==length(indx)){
>                seq(indx[i],nrow(dat))
>             }
>            else if((indx[i+1]-indx[i])==1){
>            indx[i]
>            }
>            el

Re: [R] Duplicated function with conditional statement

2013-07-27 Thread vanessa van der vaart
2  0
> #146  buy   4  0
> #156   sample   5  0
> #166   sample   5  0
> #177   sample   4  1
> #187  buy   3  0
> #197  buy   4  0
> #208  buy   5  0
> #218   sample   4  1
> #228  buy   2  0
> #Also
>  fun1(tt2,"response","newCol")
>  fun1(tt3,"response","newCol")
> A.K.
>
> P.S.  Below is OP's clarification regarding the conditional statement in a
> private message:
>
> I am sorry i didnt question it very clearly, let me change the
> conditional statement, I hope you can understand. i will explain by
> example
>
> as you can see, almost every number is duplicated, but only in row
> 6th,7th,and 9th the value on column is 1.
>
> on row4th, the value is duplicated( 2 already occurred on 2nd row),but
> since the value is considered as duplicated only if the value is
> duplicated where the response is 'buy' than the value on column, on
> row4th still zero.
>
> On row 6th, where the value product column is 3. 3 is already occurred
> in 3rd row where the value on response is 'buy', so the value on column
> should be 1
>
> I hope it can understand the conditional statement.
>
>
>
>
>
>
>
>
> - Original Message -
> From: David Winsemius 
> To: David Winsemius 
> Cc: R-help@r-project.org; Uwe Ligges 
> Sent: Friday, July 26, 2013 5:16 PM
> Subject: Re: [R] Duplicated function with conditional statement
>
>
> On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:
>
> >
> > On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
> >
> >>
> >>
> >> On 25.07.2013 21:05, vanessa van der vaart wrote:
> >>> Hi everybody,,
> >>> I have a question about R function duplicated(). I have spent days try
> to
> >>> figure this out,but I cant find any solution yet. I hope somebody can
> help
> >>> me..
> >>> this is my data:
> >>>
> >>> subj=c(1,1,1,2,2,3,3,3,4,4)
> >>> response=c('sample','sample','buy','sample','buy','sample','
> >>> sample','buy','sample','buy')
> >>> product=c(1,2,3,2,2,3,2,1,1,4)
> >>> tt=data.frame(subj, response, product)
> >>>
> >>> the data look like this:
> >>>
> >>> subj response product
> >>> 1 1   sample   1
> >>> 2 1   sample   2
> >>> 3 1  buy  3
> >>> 4 2   sample   2
> >>> 5 2 buy   2
> >>> 6 3   sample   3
> >>> 7 3   sample   2
> >>> 8 3 buy   1
> >>> 9 4  sample   1
> >>> 10   4   buy4
> >>>
> >>> I want to create new  column based on the value on response and product
> >>> column. if the value on product is duplicated, then  the value on new
> column
> >>> is 1, otherwise is 0.
> >>
> >>
> >> According to your description:
> >>
> >
> > Agree that the description did not match the output. I tried to match
> the output using a rule that could be expressed as:
> >
> > if( a "buy"- associated "product" value precedes the current "product"
> value){1}else{0}
> >
>
> So this delivers the specified output:
>
> tt$rown <- rownames(tt)
> as.numeric ( apply(tt, 1, function(x) {
>  x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response ==
> "buy", "product"]  } ) )
>
> # [1] 0 0 0 0 0 1 1 0 1 0
>
> > --
> > David.
> >
> >> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
> >>
> >> which is different from what you show us below, where I cannot derive
> any systematic rule from.
> >>
> >> Uwe Ligges
> >>
> >>> but I want to add conditional statement that the value on product
> column
> >>> will only be considered as duplicated if the value on response column
> is
> >>> 'buy'.
> >>> for illustration, the table should look like this:
> >>>
> >>> subj response product newcolumn
> >>> 1 1   sample   1  0
> >>> 2 1   sample   2  0
> >>> 3 1  buy  3  0
> >>> 4 2   s

Re: [R] Duplicated function with conditional statement

2013-07-26 Thread arun


On some slightly different datasets:
tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5, 
    5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product"
), class = "data.frame", row.names = c(NA, 22L))

tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4, 
    2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product"
), class = "data.frame", row.names = c(NA, 22L))

tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2, 
    2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product"
), class = "data.frame", row.names = c(NA, 22L))


#Tried David's solution:
tt1$rown <- rownames(tt1)
as.numeric ( apply(tt1, 1, function(x) {
    x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", 
"product"]  } ) )
  #gave inconsistent results especially since the first 10 rows were from `tt`
# [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1

#similarly for `tt2` and `tt3`.


##Created this function.  It seems to work in the tested cases, though it is 
not tested extensively.
fun1<- function(dat,colName,newColumn){
  indx<- which(dat[,colName]=="buy")
  dat[,newColumn]<-0
  dat[unlist(lapply(seq_along(indx),function(i){
            x1<- if(i==length(indx)){
                seq(indx[i],nrow(dat))
             }
            else if((indx[i+1]-indx[i])==1){
            indx[i]
            }
            else {
            seq(indx[i]+1,indx[i+1]-1)
             }
            x2<- dat[unique(c(indx[i:1],x1)),]
            x3<- subset(x2,response=="sample")
            x4<- subset(x2,response=="buy")
            if(nrow(x3)!=0) {
                row.names(x3)[x3$product%in% x4$product]
                       }
                                    
            })),newColumn]<-1
    dat

    }
fun1(tt,"response","newCol")
#   subj response product rown newCol
#1 1   sample   1    1  0
#2 1   sample   2    2  0
#3 1  buy   3    3  0
#4 2   sample   2    4  0
#5 2  buy   2    5  0
#6 3   sample   3    6  1
#7 3   sample   2    7  1
#8 3  buy   1    8  0
#9 4   sample   1    9  1
#10    4  buy   4   10  0

fun1(tt1,"response","newCol")
#   subj response product newCol
#1 1   sample   1  0
#2 1   sample   2  0
#3 1  buy   3  0
#4 2   sample   2  0
#5 2  buy   2  0
#6 3   sample   3  1
#7 3   sample   2  1
#8 3  buy   1  0
#9 4   sample   1  1
#10    4  buy   4  0
#11    5  buy   4  0
#12    5   sample   2  1
#13    5  buy   2  0
#14    6  buy   4  0
#15    6   sample   5  0
#16    6   sample   5  0
#17    7   sample   4  1
#18    7  buy   3  0
#19    7  buy   4  0
#20    8  buy   5  0
#21    8   sample   4  1
#22    8  buy   2  0
#Also
 fun1(tt2,"response","newCol")
 fun1(tt3,"response","newCol")
A.K.

P.S.  Below is OP's clarification regarding the conditional statement in a 
private message:

I am sorry i didnt question it very clearly, let me change the 
conditional statement, I hope you can understand. i will explain by 
example

as you can see, almost every number is duplicated, but only in row 6th,7th,and 
9th the value on column is 1.

on row4th, the value is duplicated( 2 already occurred on 2nd row),but 
since the value is considered as duplicated only if the value is 
duplicated where the response is 'buy' than the value on column, on 
row4th still zero. 

On row 6th, where the value product column is 3. 3 is already occurred 
in 3rd row where the value on response is

Re: [R] Duplicated function with conditional statement

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:

> 
> On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
> 
>> 
>> 
>> On 25.07.2013 21:05, vanessa van der vaart wrote:
>>> Hi everybody,,
>>> I have a question about R function duplicated(). I have spent days try to
>>> figure this out,but I cant find any solution yet. I hope somebody can help
>>> me..
>>> this is my data:
>>> 
>>> subj=c(1,1,1,2,2,3,3,3,4,4)
>>> response=c('sample','sample','buy','sample','buy','sample','
>>> sample','buy','sample','buy')
>>> product=c(1,2,3,2,2,3,2,1,1,4)
>>> tt=data.frame(subj, response, product)
>>> 
>>> the data look like this:
>>> 
>>> subj response product
>>> 1 1   sample   1
>>> 2 1   sample   2
>>> 3 1  buy  3
>>> 4 2   sample   2
>>> 5 2 buy   2
>>> 6 3   sample   3
>>> 7 3   sample   2
>>> 8 3 buy   1
>>> 9 4  sample   1
>>> 10   4   buy4
>>> 
>>> I want to create new  column based on the value on response and product
>>> column. if the value on product is duplicated, then  the value on new column
>>> is 1, otherwise is 0.
>> 
>> 
>> According to your description:
>> 
> 
> Agree that the description did not match the output. I tried to match the 
> output using a rule that could be expressed as: 
> 
> if( a "buy"- associated "product" value precedes the current "product" 
> value){1}else{0}
> 

So this delivers the specified output:

tt$rown <- rownames(tt)
as.numeric ( apply(tt, 1, function(x) { 
 x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", 
"product"]  } ) )

# [1] 0 0 0 0 0 1 1 0 1 0

> -- 
> David.
> 
>> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
>> 
>> which is different from what you show us below, where I cannot derive any 
>> systematic rule from.
>> 
>> Uwe Ligges
>> 
>>> but I want to add conditional statement that the value on product column
>>> will only be considered as duplicated if the value on response column is
>>> 'buy'.
>>> for illustration, the table should look like this:
>>> 
>>> subj response product newcolumn
>>> 1 1   sample   1  0
>>> 2 1   sample   2  0
>>> 3 1  buy  3  0
>>> 4 2   sample   2  0
>>> 5 2 buy   2  0
>>> 6 3   sample   3  1
>>> 7 3   sample   2   1
>>> 8 3 buy   1   0
>>> 9 4  sample   11
>>> 10   4   buy   4 0
>>> 
>>> 
>>> can somebody help me?
>>> any help will be appreciated.
>>> I am new in this mailing list, so forgive me in advance, If I did not  ask
>>> the question appropriately.
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated function with conditional statement

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:

> 
> 
> On 25.07.2013 21:05, vanessa van der vaart wrote:
>> Hi everybody,,
>> I have a question about R function duplicated(). I have spent days try to
>> figure this out,but I cant find any solution yet. I hope somebody can help
>> me..
>> this is my data:
>> 
>> subj=c(1,1,1,2,2,3,3,3,4,4)
>> response=c('sample','sample','buy','sample','buy','sample','
>> sample','buy','sample','buy')
>> product=c(1,2,3,2,2,3,2,1,1,4)
>> tt=data.frame(subj, response, product)
>> 
>> the data look like this:
>> 
>>  subj response product
>> 1 1   sample   1
>> 2 1   sample   2
>> 3 1  buy  3
>> 4 2   sample   2
>> 5 2 buy   2
>> 6 3   sample   3
>> 7 3   sample   2
>> 8 3 buy   1
>> 9 4  sample   1
>> 10   4   buy4
>> 
>> I want to create new  column based on the value on response and product
>> column. if the value on product is duplicated, then  the value on new column
>> is 1, otherwise is 0.
> 
> 
> According to your description:
> 

Agree that the description did not match the output. I tried to match the 
output using a rule that could be expressed as: 

 if( a "buy"- associated "product" value precedes the current "product" 
value){1}else{0}

-- 
David.

> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
> 
> which is different from what you show us below, where I cannot derive any 
> systematic rule from.
> 
> Uwe Ligges
> 
>> but I want to add conditional statement that the value on product column
>> will only be considered as duplicated if the value on response column is
>> 'buy'.
>> for illustration, the table should look like this:
>> 
>> subj response product newcolumn
>> 1 1   sample   1  0
>> 2 1   sample   2  0
>> 3 1  buy  3  0
>> 4 2   sample   2  0
>> 5 2 buy   2  0
>> 6 3   sample   3  1
>> 7 3   sample   2   1
>> 8 3 buy   1   0
>> 9 4  sample   11
>> 10   4   buy   4 0
>> 
>> 
>> can somebody help me?
>> any help will be appreciated.
>> I am new in this mailing list, so forgive me in advance, If I did not  ask
>> the question appropriately.
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated function with conditional statement

2013-07-26 Thread Uwe Ligges



On 25.07.2013 21:05, vanessa van der vaart wrote:

Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:

subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','
sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)

the data look like this:

  subj response product
1 1   sample   1
2 1   sample   2
3 1  buy  3
4 2   sample   2
5 2 buy   2
6 3   sample   3
7 3   sample   2
8 3 buy   1
9 4  sample   1
10   4   buy4

I want to create new  column based on the value on response and product
column. if the value on product is duplicated, then  the value on new column
is 1, otherwise is 0.



According to your description:

tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")

which is different from what you show us below, where I cannot derive 
any systematic rule from.


Uwe Ligges


but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:

subj response product newcolumn
1 1   sample   1  0
2 1   sample   2  0
3 1  buy  3  0
4 2   sample   2  0
5 2 buy   2  0
6 3   sample   3  1
7 3   sample   2   1
8 3 buy   1   0
9 4  sample   11
10   4   buy   4 0


can somebody help me?
any help will be appreciated.
I am new in this mailing list, so forgive me in advance, If I did not  ask
the question appropriately.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Duplicated function with conditional statement

2013-07-26 Thread vanessa van der vaart
Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:

subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','
sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)

the data look like this:

 subj response product
1 1   sample   1
2 1   sample   2
3 1  buy  3
4 2   sample   2
5 2 buy   2
6 3   sample   3
7 3   sample   2
8 3 buy   1
9 4  sample   1
10   4   buy4

I want to create new  column based on the value on response and product
column. if the value on product is duplicated, then  the value on new column
is 1, otherwise is 0.
but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:

subj response product newcolumn
1 1   sample   1  0
2 1   sample   2  0
3 1  buy  3  0
4 2   sample   2  0
5 2 buy   2  0
6 3   sample   3  1
7 3   sample   2   1
8 3 buy   1   0
9 4  sample   11
10   4   buy   4 0


can somebody help me?
any help will be appreciated.
I am new in this mailing list, so forgive me in advance, If I did not  ask
the question appropriately.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.