I am trying to find the most efficient (fastest) way of manipulating a data.table object that contains genetic data. The format is in the following toy example, where rows represent alleles for individuals and columns are separate loci. Each individual will be represented in one or more rows depending on what the ploidy of the loci are. In this example, there are three tetraploid loci (4 alleles per locus) genotyped for three individuals (1:3). In my real data, all loci will always have the same ploidy.
> library(data.table) > dt <- data.table( + id = as.character(rep(1:3, each = 4)), + loc1 = factor(sample(c("C", "T"), 12, rep = T)), + loc2 = factor(sample(c("C", "T"), 12, rep = T)), + loc3 = factor(sample(c("C", "T"), 12, rep = T)), + key = "id" + ) > dt id loc1 loc2 loc3 1: 1 T T T 2: 1 C T C 3: 1 T C T 4: 1 C C T 5: 2 T C C 6: 2 T T T 7: 2 C T T 8: 2 C T C 9: 3 C T T 10: 3 T T T 11: 3 T C T 12: 3 T T C What I'm looking for is the fastest way to convert this data.table to a matrix where each row has the entire genotypes for one individual with the alleles for a locus in sequential columns. The code I currently have for this follows. > ids <- dt[, unique(id)] > .cbindColFunc <- function(x) { + do.call(cbind, as.list(as.character(x))) + } > mat <- do.call(rbind, lapply(ids, function(i) { + dt[i, do.call(cbind, lapply(.SD, .cbindColFunc)), .SDcols = !"id"] + })) > num.alleles <- ncol(mat) / (ncol(dt) - 1) > colnames(mat) <- paste(rep(colnames(dt)[-1], each = num.alleles), > 1:num.alleles, sep = ".") > mat <- cbind(id = ids, mat) > mat id loc1.1 loc1.2 loc1.3 loc1.4 loc2.1 loc2.2 loc2.3 loc2.4 loc3.1 loc3.2 loc3.3 loc3.4 [1,] "1" "T" "C" "T" "C" "T" "T" "C" "C" "T" "C" "T" "T" [2,] "2" "T" "T" "C" "C" "C" "T" "T" "T" "C" "T" "T" "C" [3,] "3" "C" "T" "T" "T" "T" "T" "C" "T" "T" "T" "T" "C" Is there a faster, more data.table friendly way to do it? Thanks in advance! Eric -- View this message in context: http://r.789695.n4.nabble.com/Create-matrix-of-columns-from-grouped-rows-tp4726777.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list datatable-help@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help