Hi Matthew, thanks for your help. There are some things going wrong still. Consider this (slightly extended) example:
library(data.table) DT = data.table(read.table(textConnection(" A B C 1 1 a 1999 2 1 b 1999 3 1 c 1999 4 1 d 1999 5 2 c 2001 6 2 d 2001 7 3 a 2004 8 3 b 2004 9 3 d 2004 10 4 c 2001 11 4 d 2001"),head=TRUE,stringsAsFactors=FALSE)) firststep = DT[,cbind(A,expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] firststep C A Var1 Var2 v 1 1999 1 b a 0.2500000 2 1999 1 c a 0.2500000 3 1999 1 d a 0.2500000 4 1999 1 a b 0.2500000 5 1999 1 c b 0.2500000 6 1999 1 d b 0.2500000 7 1999 1 a c 0.2500000 8 1999 1 b c 0.2500000 9 1999 1 d c 0.2500000 10 1999 1 a d 0.2500000 11 1999 1 b d 0.2500000 12 1999 1 c d 0.2500000 13 2001 2 b a 0.2500000 14 2001 4 b a 0.2500000 15 2001 2 a b 0.2500000 16 2001 4 a b 0.2500000 17 2001 2 b a 0.2500000 18 2001 4 b a 0.2500000 19 2001 2 a b 0.2500000 20 2001 4 a b 0.2500000 21 2004 3 b a 0.3333333 22 2004 3 c a 0.3333333 23 2004 3 a b 0.3333333 24 2004 3 c b 0.3333333 25 2004 3 a c 0.3333333 26 2004 3 b c 0.3333333 Following "firststep", project 2 and 4 involved individuals a and b, while actually c and d were involved. It seems that there is something going wrong in transforming the data. Then going to the final result, a list is generated of years and sums of v, rather than a list of projects and sums of v. Probably I haven't been clear enough: I want to produce a list of all projects and the familiarity of all project members involved right before the start of the project. Example project_id familiarity 4 0.25 Members c and d were jointly involved in 3 projects: 1,2,4. Project 4 took place in 2001, so only project 1 took place before that (1999 (project 2 took place in the same year and is therefore not included). The average familiarity between the members in project 1 was 1/4, so: project_id familiarity 4 0.25 Thanks! Matthew Dowle wrote: > > > Thanks for the attempt and required output. How about this? > > firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] > setkey(firststep,Var1,Var2,C) > firststep = firststep[,transform(.SD,cv=cumsum(v)),by=list(Var1,Var2)] > setkey(firststep,Var1,Var2,C) > DT[, {x=data.table(expand.grid(B,B),C[1]-1L) > firststep[x,roll=TRUE,nomatch=0][,sum(cv)] # prior familiarity > },by=C] > C V1 > [1,] 1999 0.0 > [2,] 2001 0.5 > [3,] 2004 2.5 > > I think you may have said you have large data. If so, this > method should be fast. Please let us know how you get on. > > HTH > Matthew > > > > On Thu, 17 Feb 2011 23:07:19 -0800, mathijsdevaan wrote: > >> OK, for the last step I have tried this (among other things): >> library(data.table) >> DT = data.table(read.table(textConnection(" A B C 1 1 a 1999 >> 2 1 b 1999 >> 3 1 c 1999 >> 4 1 d 1999 >> 5 2 c 2001 >> 6 2 d 2001 >> 7 3 a 2004 >> 8 3 b 2004 >> 9 3 d 2004"),head=TRUE,stringsAsFactors=FALSE)) >> >> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] >> setkey(firststep,Var1,Var2) >> list1<-firststep[J(expand.grid(DT$B,DT$B),v=1/length(DT$B)),nomatch=0] > [,sum(v)] >> list1 >> #27 >> >> What I would like to get: >> list >> 1 0 >> 2 0.5 >> 3 2.5 >> >> Thanks! > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3318939.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.