I'm consistently seeing R crash with a particular large data set. What's strange is that although the crash seems related to running out of memory, I'm unable to construct a pseudo-random data set of the same size that also causes the crash. Further adding to the strangeness is that the crash only happens if the dataset goes through a save()/load() cycle -- without that, the command in question just gives an out-of-memory error, but does not crash.
To make this clear, three different versions of the same data consistently produce very different behavior: (1) original data read with read.table: memory error; fail to allocate 164062 Kb (2) original data through save()/load() cycle: memory error; fail to allocate 82031 Kb, followed by crash (3) psuedo-random data of same size and similar characteristics: works without problem This is with R-1.9.0 under Windows 2000. I'm not loading any optional packages. I get the same crash behavior with R-1.9.0 patched, and R-2.0.0 alpha, but I didn't test success with the psuedo-random data under those programs. (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am MDT on Jun 7, 2004.) Unfortunately, I don't have sufficient knowledge of how to debug memory problems in R to make further progress than I've made here, but maybe the following will provide some clues for someone else. All the following transcripts are from Rgui.exe, with new runs at each comment beginning with "###" ### Read in the data and get a out-of-memory error (but no crash) > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip > X <- read.table("ClassifyTrain.txt", skip=2) > X1 <- as.matrix(X) > hist(log(X1[,-(1:2)]+1)) Error: cannot allocate vector of size 164062 Kb In addition: Warning message: Reached total allocation of 1024Mb: see help(memory.size) > ### Read in the data and save it as a .RData file for faster runs (I initially did this for speed, ### but this seems to be essential to causing the crash) > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip > X <- read.table("ClassifyTrain.txt", skip=2) > X1 <- as.matrix(X) > c(class(X1), storage.mode(X1), dim(X1)) [1] "matrix" "double" "30000" "702" > save(list="X1", file="X1.RData") ### Produce the crash > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 9.0 year 2004 month 04 day 12 language R > > load("X1.RData") > c(class(X1), storage.mode(X1), dim(X1)) [1] "matrix" "double" "30000" "702" > # all of the following 3 command consistently cause a crash > hist(log(X1[,-(1:2)]+1)) > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5)) > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F) Error: cannot allocate vector of size 82031 Kb In addition: Warning message: Reached total allocation of 1024Mb: see help(memory.size) [message that comes in a Windows dialog box after a wait of many seconds:] R Console: Rgui.exe - Application Error The exception unknown software exception (0xc00000fd) occured in the application at location 0x6b5b0a53 #### The following is a failed attempt to reproduce the crash with psuedo-random #### data, i.e., R functions correctly (even when X1 is in memory) > > # Look at some characteristics of the original data in > # order to produce a matrix of similar psuedo-random numbers. > load("X1.RData") > dim(X1) [1] 30000 702 > class(X1) [1] "matrix" > storage.mode(X1) [1] "double" > table(is.na(X1)) FALSE 21060000 > table(X1==0) FALSE TRUE 2284455 18775545 > exp(diff(log(table(X1==0)))) TRUE 8.218829 > table(X1>=0) TRUE 21060000 > range(X1) [1] 0 326022 > memory.limit() [1] 1073741824 > memory.limit()/2^20 [1] 1024 > object.size(X1)/2^20 [1] 161.0267 > > set.seed(1) > X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702) > range(X) [1] 3.615044e-04 3.249415e+05 > > # Both of thse commands seem to work without problems > hist(log(X[,-(1:2)]+1)) > hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5)) ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
