Yep - you're right - missing parents are indicated as zero in the M/PID field.
The above code worked with a few errors: 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] : number of items to replace is not a multiple of replacement length looking at the output I get numbers where the father/mother ID should be in the M/PID field. For example: 2702 349 mother 0 0 2702 3456 sibling 0 842 2702 9980 sibling 0 842 3064 3 father 0 0 3064 4 mother 0 0 3064 5 sibling 879 880 3064 86 sibling 879 880 3064 87 sibling 879 880 On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanve...@gmail.com> wrote: > Dear Kate, > > Try this: > > res <- do.call(rbind, lapply(xs, function(l){ > l$PID <- l$MID <- 0 > father <- with(l, Relationship == 'father') > mother <- with(l, Relationship == 'mother') > if(sum(father) == 0) > l$PID[l$Relationship == 'sibling'] <- 0 > else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father] > if(sum(mother) == 0) > l$MID[l$Relationship == 'sibling'] <- 0 > else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother] > l > })) > > It is assumed that when either parent is not available the M/PID is 0. > > Best, > Jorge.- > > > On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignat...@gmail.com> > wrote: >> >> Actually - I didn't check this before, but these are not all nuclear >> families (as I assumed they were). That is, some don't have a father >> or don't have a mother.... Usually if this is the case PID or MID will >> become 0, respectively, for the child. How can the code be edit to >> account for this? >> >> On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignat...@gmail.com> >> wrote: >> > Thanks! >> > >> > I think I know what is being done here but not sure how to fix the >> > following error: >> > >> > Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : >> > replacement has length zero >> > >> > >> > >> > On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez >> > <jorgeivanve...@gmail.com> wrote: >> >> Dear Kate, >> >> >> >> Assuming you have nuclear families, one option would be: >> >> >> >> x <- read.table(textConnection("Family.ID Sample.ID Relationship >> >> 14 62 sibling >> >> 14 94 father >> >> 14 63 sibling >> >> 14 59 mother >> >> 17 6004 father >> >> 17 6003 mother >> >> 17 6005 sibling >> >> 17 368 sibling >> >> 130 202 mother >> >> 130 203 father >> >> 130 204 sibling >> >> 130 205 sibling >> >> 130 206 sibling >> >> 222 9 mother >> >> 222 45 sibling >> >> 222 34 sibling >> >> 222 10 sibling >> >> 222 11 sibling >> >> 222 18 father"), header = TRUE) >> >> closeAllConnections() >> >> >> >> xs <- with(x, split(x, Family.ID)) >> >> res <- do.call(rbind, lapply(xs, function(l){ >> >> l$PID <- l$MID <- 0 >> >> father <- with(l, Relationship == 'father') >> >> mother <- with(l, Relationship == 'mother') >> >> l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father] >> >> l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother] >> >> l >> >> })) >> >> res >> >> >> >> HTH, >> >> Jorge.- >> >> >> >> >> >> Best regards, >> >> Jorge.- >> >> >> >> >> >> >> >> On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius >> >> <kate.ignat...@gmail.com> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> I have a data.table question (as well as if else statement query). >> >>> >> >>> I have a large list of families (file has 935 individuals that are >> >>> sorted by famiy of varying sizes). At the moment the file has the >> >>> columns: >> >>> >> >>> SampleID FamilyID Relationship >> >>> >> >>> To prevent from having to make a pedigree file by hand - ie adding a >> >>> PaternalID and a MaternalID one by one I want to try write a script >> >>> that will quickly do this for me (I eventually want to run this >> >>> through a program such as plink) Is there a way to use data.table >> >>> (maybe in conjucntion with ifelse to do this effectively)? >> >>> >> >>> An example of the file is something like: >> >>> >> >>> Family.ID Sample.ID Relationship >> >>> 14 62 sibling >> >>> 14 94 father >> >>> 14 63 sibling >> >>> 14 59 mother >> >>> 17 6004 father >> >>> 17 6003 mother >> >>> 17 6005 sibling >> >>> 17 368 sibling >> >>> 130 202 mother >> >>> 130 203 father >> >>> 130 204 sibling >> >>> 130 205 sibling >> >>> 130 206 sibling >> >>> 222 9 mother >> >>> 222 45 sibling >> >>> 222 34 sibling >> >>> 222 10 sibling >> >>> 222 11 sibling >> >>> 222 18 father >> >>> >> >>> But the goal is to have a file like this: >> >>> >> >>> Family.ID Sample.ID Relationship PID MID >> >>> 14 62 sibling 94 59 >> >>> 14 94 father 0 0 >> >>> 14 63 sibling 94 59 >> >>> 14 59 mother 0 0 >> >>> 17 6004 father 0 0 >> >>> 17 6003 mother 0 0 >> >>> 17 6005 sibling 6004 6003 >> >>> 17 368 sibling 6004 6003 >> >>> 130 202 mother 0 0 >> >>> 130 203 father 0 0 >> >>> 130 204 sibling 203 202 >> >>> 130 205 sibling 203 202 >> >>> 130 206 sibling 203 202 >> >>> 222 9 mother 0 0 >> >>> 222 45 sibling 18 9 >> >>> 222 34 sibling 18 9 >> >>> 222 10 sibling 18 9 >> >>> 222 11 sibling 18 9 >> >>> 222 18 father 0 0 >> >>> >> >>> I've tried searches for this but with no luck. Greatly appreciate any >> >>> help - even if its just a link to a great example/solution! >> >>> >> >>> Thanks! >> >>> >> >>> ______________________________________________ >> >>> R-help@r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >>> http://www.R-project.org/posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.