>>>>> "UweL" == Uwe Ligges <[EMAIL PROTECTED]> >>>>> on Sun, 07 Jan 2007 09:42:08 +0100 writes:
UweL> Zoltan Kmetty wrote: >> Hi! >> >> I had some memory problem with R - hope somebody could >> tell me a solution. >> >> I work with very large datasets, but R cannot allocate >> enough memoty to handle these datasets. >> >> I want work a matrix with row= 100 000 000 and column=10 >> >> A know this is 1 milliard cases, but i thought R could >> handle it (other commercial software like spss could do), >> but R wrote out everytime: not enough memory.. >> >> any good idea? UweL> Buy a machine that has at least 8Gb (better 16Gb) of UweL> RAM and proceed ... Well, I doubt that Zoltan wants to *fill* his matrix with all non-zeros. If he does, Uwe and Roger are right. Otherwise, working with a *sparse* matrix, using the 'Matrix' (my recommendation, but I am biased) or 'SparseM' package, might well be feasible: install.packages("Matrix") # if needed; only once for your R library(Matrix) # each time you need it TsparseMatrix <- function(nrow, ncol, i,j,x) { ## Purpose: User friendly construction of sparse "Matrix" from triple ## ---------------------------------------------------------------------- ## Arguments: (i,j,x): 2 integer and 1 numeric vector of the same length: ## ## The matrix M will have ## M[i[k], j[k]] == x[k] , for k = 1,2,..., length(i) ## and M[ i', j' ] == 0 for `` all other pairs (i',j') ## ---------------------------------------------------------------------- ## Author: Martin Maechler, Date: 8 Jan 2007, 18:46 nnz <- length(i) stopifnot(length(j) == nnz, length(x) == nnz, is.numeric(x), is.numeric(i), is.numeric(j)) dim <- c(as.integer(nrow), as.integer(ncol)) ## The conformability of (i,j) with 'dim' will be checked automatically ## by an internal "validObject()" that is part of new(.): new("dgTMatrix", x = x, Dim = dim, ## our "Tsparse" Matrices use 0-based indices : i = as.integer(i - 1:1), j = as.integer(j - 1:1)) } For example : > TsparseMatrix(10,20, c(1,3:8), c(2,9,6:10), 7 * (1:7)) 10 x 20 sparse Matrix of class "dgTMatrix" [1,] . 7 . . . . . . . . . . . . . . . . . . [2,] . . . . . . . . . . . . . . . . . . . . [3,] . . . . . . . . 14 . . . . . . . . . . . [4,] . . . . . 21 . . . . . . . . . . . . . . [5,] . . . . . . 28 . . . . . . . . . . . . . [6,] . . . . . . . 35 . . . . . . . . . . . . [7,] . . . . . . . . 42 . . . . . . . . . . . [8,] . . . . . . . . . 49 . . . . . . . . . . [9,] . . . . . . . . . . . . . . . . . . . . [10,] . . . . . . . . . . . . . . . . . . . . But nr <- 1e8 nc <- 10 set.seed(1) i <- sample(nr, 10000) j <- sample(nc, 10000) x <- round(rnorm(10000), 2) M <- TsparseMatrix(nr, nc, i=i, j=j, x=x) works, e.g. you can x <- 1:10 system.time(y <- M %*% x) # needs around 4 sec on one of our better machines y <- as.vector(y) ## but you can become even more efficient, translating from the ## so-called "triplet" to the (recommended) "Csparse" ## representation: M. <- as(M, "CsparseMatrix") object.size(M) / object.size(M.) ## 1.328921; i.e. we saved 33% ## and system.time(y. <- M. %*% x) # much faster (1 sec) identical(as.vector(y.), y) --- --- --- I hope this is useful to you. Martin Maechler, ETH Zurich ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.