Here's the result on R 3.0.0 64 bit under Windows 8: > A<-matrix(1:365000*144,nrow=365000,ncol=144) > dim(A) [1] 365000 144 > d <- dist(mydata_nor, method = "euclidean") Error in as.matrix(x) : object 'mydata_nor' not found > d <- dist(A, method = "euclidean") Error: cannot allocate vector of size 496.3 Gb In addition: Warning messages: 1: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) 2: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) 3: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) 4: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size)
Your message suggests that your system could not accurately compute the requirements. Unless you have access to a computer with 500 gigabytes, you need to consider alternate approaches such as aggregating the data into longer time blocks or using kmeans. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of HJ YAN Sent: Thursday, May 2, 2013 6:02 PM To: r-help@r-project.org Subject: [R] Calculating distance matrix for large dataset Dear R users I wondered if any of you ever tried to calculate distance matrix with very large data set, and if anyone out there can confirm this error message I got actually mean that my data is too large for this task. negative length vectors are not allowed My data size and code used dim(mydata_nor)[1] 365000 144> d <- dist(mydata_nor, method = "euclidean") Here my data has 1000 samples each has a year data observed by 10 minutes interval daily, so the size is (365* 1000) * 144. I checked the manual of function 'dist' but can not see the upper limit size allowed, and I bet there should be one, so any hints is appreciated. I would also be grateful if any other method for calculating distance matrix for large dataset could be advised. I appreciate reproducible code should be provided for your advice, so try below if needed: A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A)[1] 365000 144> d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") : negative length vectors are not allowed Many thanks in advance! HJ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.