I have problems converting my dataset from long to wide format. Previous
attempts using reshape package and aggregate function were unsuccessful as they
took too long. Apparently, my simplified solution also lasted as long.
My complete codes is given below. When sample.size = 10000, the execution takes
about 20 seconds. But sample.size = 100000 seems to take eternity. My actual
sample.size is 15000000 i.e. 15 million.
sample.size <- 10000
m <- data.frame(Name=sample(1:100000, sample.size, T), Type=sample(1:1000,
sample.size, T), Predictor=sample(LETTERS[1:10], sample.size, T))
res <- function(m) {
m.12.unique <- unique(m[,1:2])
m.12.unique <- m.12.unique[order(m.12.unique[,1], m.12.unique[,2]),]
v1 <- paste(m.12.unique[,1], m.12.unique[,2], sep=".")
v2 <- c(sort(unique(m[,3])))
res <- matrix(0, nr=length(v1), nc=length(v2), dimnames=list(v1, v2))
m.ids <- paste(m[,1], m[,2], sep=".")
for(i in 1:nrow(m)) {
x <- m.ids[i]
y <- m[i,3]
res[x, y] <- res[x, y] + 1
}
res <- data.frame(m.12.unique[,1], m.12.unique[,2], res, row.names=NULL)
colnames(res) <- c("Name", "Type", v2)
return(res)
}
res(m)
> sessionInfo()
R version 2.8.0 (2008-10-20)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.