On Thu, 18 Mar 2010, Ben Bimber wrote:
I have a data frame containing the Id, Mother, Father and Sex from about
10,000 animals in our colony. I am interested in graphing simple family
trees for a given subject or small number of subjects. The basic idea is:
start with data frame from entire colony and list of index animals. I need
to identify all immediate relatives of these index animals and plot the
pedigree for them. We're not trying to do any sort of real analysis, just
present a visualization of the family structure. I have used the kinship
and pedigree packages to plot the pedigree. My question relates to
efficiently identifying the animals to include in the pedigree:
Starting with the data frame of ~10,000 records, I want to use a set of
index animals to extract the immediate relatives and plot only a small
number in the pedigree. 'Immediate relatives' is somewhat of an ambiguous
term - I am currently defining it as 3 generations forward and 3 backward.
Currently, I have a somewhat ugly approach where I recursively calculate
each generation forward or backward and build a new dataframe. Is there a
better approach or package that does this? I realize my code should be
written better to get rid of the loops, so if anyone has suggestions there I
would appreciate this as well. Thanks in advance.
Using an indicator matrix for parent/child relations, you can identify
future/past generations using matrix multiplication(s).
Since you have 10000 animals, the matrix indicating parents/children will
be 10000 x 10000, but will have <20000 non-zero elements.
To me, this sounds like a good candidate for a sparse matrix
representation. Packages 'Matrix' and 'SparseM' provide these.
HTH,
Chuck
Code to calculate generations forward and backward:
#queryIds holds the unique Ids for parents of the index animals
queryIds = unique(c(ped$Sire, ped$Dam));
for(i in 1:gens){
if (length(queryIds) == 0){break};
#allPed is the dataframe with Id,Dam,Sire and Sex for animals in our
colony
newRows <- subset(allPed, Id %in% queryIds);
queryIds = c(newRows$Sire, newRows$Dam);
ped <- unique(rbind(newRows,ped));
}
#build forwards
#when calculating children, queryIds holds the Ids of the previous
generation
queryIds = unique(ped$Id);
for(i in 1:gens){
if (length(queryIds)==0){break};
#allPed is the dataframe with Id,Dam,Sire and Sex for animals in our
colony
newRows <- subset(allPed, Sire %in% queryIds | Dam %in% queryIds);
queryIds = newRows$Id;
ped <- unique(rbind(newRows,ped));
}
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.