On 12/12/2013 7:08 AM, Eik Vettorazzi wrote:
thanks Duncan for this clarification. A double precision matrix with 2e11 elements (as the op wanted) would need about 1.5 TB memory, that's more than a standard (windows 64bit) computer can handle.
According to Microsoft's "Memory Limits" web page (currently at http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.85%29.aspx#memory_limits, but these things tend to move around), the limit is 8 TB for virtual memory. (The same page lists a variety of smaller physical memory limits, depending on the Windows version, but R doesn't need physical memory, virtual is good enough. )
R would be very slow if it was working with objects bigger than physical memory, but it could conceivably work.
Duncan Murdoch
Cheers. Am 12.12.2013 13:00, schrieb Duncan Murdoch: > On 13-12-12 6:51 AM, Eik Vettorazzi wrote: >> I thought so (with all the limitations due to collinearity and so on), >> but actually there is a limit for the maximum size of an array which is >> independent of your memory size and is due to the way arrays are >> indexed. You can't create an object with more than 2^31-1 = 2147483647 >> elements. >> >> https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html > > That post is from 2007. The limits were raised considerably when R > 3.0.0 was released, and it is now 2^48 for disk-based operations, 2^52 > for working in memory. > > Duncan Murdoch > > >> >> cheers >> >> Am 12.12.2013 12:34, schrieb Romeo Kienzler: >>> ok, so 200K predictors an 10M observations would work? >>> >>> >>> On 12/12/2013 12:12 PM, Eik Vettorazzi wrote: >>>> it is simply because you can't do a regression with more predictors >>>> than >>>> observations. >>>> >>>> Cheers. >>>> >>>> Am 12.12.2013 09:00, schrieb Romeo Kienzler: >>>>> Dear List, >>>>> >>>>> I'm quite new to R and want to do logistic regression with a 200K >>>>> feature data set (around 150 training examples). >>>>> >>>>> I'm aware that I should use Naive Bayes but I have a more general >>>>> question about the capability of R handling very high dimensional >>>>> data. >>>>> >>>>> Please consider the following R code where "mygenestrain.tab" is a 150 >>>>> by 200000 matrix: >>>>> >>>>> traindata <- read.table('mygenestrain.tab'); >>>>> mylogit <- glm(V1 ~ ., data = traindata, family = "binomial"); >>>>> >>>>> When executing this code I get the following error: >>>>> >>>>> Error in terms.formula(formula, data = data) : >>>>> allocMatrix: too many elements specified >>>>> Calls: glm ... model.frame -> model.frame.default -> terms -> >>>>> terms.formula >>>>> Execution halted >>>>> >>>>> Is this because R can't handle 200K features or am I doing something >>>>> completely wrong here? >>>>> >>>>> Thanks a lot for your help! >>>>> >>>>> best Regards, >>>>> >>>>> Romeo >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> >> >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.