Thanks for helping me out so generously. After reading the vignettes and the other info I still have a question (sorry I am a R novice):
I am not so much trying to construct time series (although it comes very close). Rather for each pair (Bi,Bj) in project (An) I am trying to sum up the values of v for (Bi,Bj) where C<focal C. One remark here: some pairs (Bi,Bj) are involved in more than one project per year. Because I cannot see which of these projects was initiated first I only want to sum the values of v for (Bi,Bj) where C<focal C (versus C=focal C). So far, I've executed the first step and set the key. I don't think I have to permutate the project-people data again, because that's already in firststep. Ideally, I would like to add a column to the firststep data.table containing the sum of v for (Bi,Bj) where C<focal C. Any suggestions? Thanks in advance! Best, Mathijs >Hello. One (of many) solution might be: >require(data.table) >DT = data.table(read.table(textConnection(" A B C >1 1 a 1999 >2 1 b 1999 >3 1 c 1999 >4 1 d 1999 >5 2 c 2001 >6 2 d 2001"),head=TRUE,stringsAsFactors=FALSE)) >firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] >setkey(firststep,Var1,Var2) >grp3 = c("a","b","d") >firststep[J(expand.grid(grp3,grp3)),nomatch=0][,sum(v)] ># 2.5 >If I guess the bigger picture correctly, this can be extended >to make a time series of prior familiarity by including >the year in the key. >If you decide to try this, please make sure to grab the latest >(recent) version of data.table from CRAN (v1.5.3). Suggest that >you run it first to confirm it does return 2.5, then break it >down and run it step by step to see how each part works. You >will need some time to read the vignettes and ?data.table >(which has recently been improved) but I hope you think it is >worth it. Support is available at maintainer("data.table"). >HTH >Matthew >>On Mon, 14 Feb 2011 09:22:12 -0800, mathijsdevaan wrote: >> Hi, >> >> I have a large dataset with info on individuals (B) that have been >> involved in projects (A) during multiple years (C). The dataset contains >> three columns: A, B, C. Example: >> >> A B C >> 1 1 a 1999 >> 2 1 b 1999 >>3 1 c 1999 >> 4 1 d 1999 >>5 2 c 2001 >>6 2 d 2001 >> 7 3 a 2004 >> 8 3 c 2004 >> 9 3 d 2004 >> >> I am interested in how well all the individuals in a project know each >> other. To calculate this team familiarity measure I want to sum the >> familiarity between all individual pairs in a team. The familiarity >> between each individual pair in a team is calculated as the summation of >> each pair's prior co-appearance in a project divided by the total number >> of team members. So the team familiarity in project 3 = (1/4+1/4) + >> (1/4+1/4+1/2) + (1/4+1/4+1/2) = 2,5 or a has been in project 1 (of size >> 4) with c and d > 1/4+1/4 and c has been in project 1 (of size 4) with 1 >> and d > 1/4+1/4 and c has been in project 2 (of size 2) with d > 1/2. >> >> I think that the best way to do it is to transform the data into an >> edgelist (each pair in one row/two columns) and then creating two >> additional columns for the strength of the familiarity and the year of >> the project in which the pair was active. The problem is that I am stuck >> already in the first step. So the question is: how do I go from the >> current data structure to a list of projects and the familiarity of its >> team members? >> >> Your help is very much appreciated. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3311101.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.