I have a large file backed big. matrix, with millions of rows and 20
columns.

The columns contain data that I simply need to tabulate. There are a few
dozen unique
values. and I just want a frequency count

Test code with a small "big" matrix.

library(bigmemory)
library(bigtabulate)

  test <- big.matrix(nrow = 100, ncol = 10)
  test[,1:3]<- sample(150)
  test[,4:6]<- sample(100)
  test[,7:10]<- sample(100)

##    so we have  sample big memory matrix. Its not file backed but will do
for testing.
##    the result we want is one that you would get if you could run table()
 on the bigmatrix
##    thats emulated in this example by coercing the bigmatrix to a matrix.
##    in the real application that is not possible, because of RAM limits
  P <- table(as.matrix(test))

##  the package big tabulate has a version of table called  bigtable.
##  you can run table on an individual column.
## I want to run it on all the columns. basically combine the results of
running it on individual columns
## if you try to specify multiple columns, you get a contingency table, and
if you use too many
## columns you will hang your system hard .. so dont try the line below .
Well at least I hung my system

#  Ouch <- bigtable(test, ccols = seq(1,10))

So, is there a simple way to  get the answer as emulated by
 P<-table(as.matrix(test))
without coercing to a matrix.

TIA

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to