[R] Memory Problems with a Simple Bootstrap

2008-08-01 Thread Tom La Bone


I have a data file called inputdata.csv that looks something like this"

  ID YearResult Month   Date
1   71741954   103  540301
2   7174195443  540322
3   20924  1967 4   2  670223
4   20924  1967   -75  670518
5   20924  1967   -37  670706
...
67209 ...

i.e., it goes on for 67209 rows (~2 Mb file). When I run the following
bootstrap session I get the indicated error:

> 
> library(boot)
> setwd("C:/Documents and Settings/Tom/Desktop")   
> 
> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
> 
> per95 <- function( annual.data, b.index) {
+   sample.data <- annual.data[b.index,]
+   return(quantile(sample.data$Result,probs=c(0.95))) }
> 
> m <- 1
> for (i in 1:39) {
+   annual.data <- data.in[data.in$Year == (i+1949),]
+   B <- boot(data=annual.data,statistic=per95,R=m)
+   print(i)
+   print(memory.size())
+ }
[1] 1
[1] 20.26163
[1] 2
[1] 61.6352
[1] 3
[1] 134.4187
[1] 4
[1] 149.4704
[1] 5
[1] 290.3090
[1] 6
[1] 376.7017
[1] 7
[1] 435.7683
[1] 8
[1] 463.7404
[1] 9
[1] 497.7946
Error: cannot allocate vector of size 568.8 Mb
> 

I am running this on a Windows XP Pro machine with 4 Gb of memory. The same
problem occurs when the code is executed on the same box running Ubuntu
8.04. Does anyone see any obvious reason why this should run out of memory?
I would be happy to email the data file to anyone who cares to try it on
their computer.

Tom


 


-- 
View this message in context: 
http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap-tp18777897p18777897.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Problems with a Simple Bootstrap

2008-08-01 Thread jim holtman
Use gc() in the loop to possibly free up any fragmented memory.  You
might also print out the size of B (object.size(B)) since that appears
to be the only variable in your loop that might be growing.

On Fri, Aug 1, 2008 at 12:09 PM, Tom La Bone <[EMAIL PROTECTED]> wrote:
>
>
> I have a data file called inputdata.csv that looks something like this"
>
>  ID YearResult Month   Date
> 1   71741954   103  540301
> 2   7174195443  540322
> 3   20924  1967 4   2  670223
> 4   20924  1967   -75  670518
> 5   20924  1967   -37  670706
> ...
> 67209 ...
>
> i.e., it goes on for 67209 rows (~2 Mb file). When I run the following
> bootstrap session I get the indicated error:
>
>>
>> library(boot)
>> setwd("C:/Documents and Settings/Tom/Desktop")
>>
>> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
>>
>> per95 <- function( annual.data, b.index) {
> +   sample.data <- annual.data[b.index,]
> +   return(quantile(sample.data$Result,probs=c(0.95))) }
>>
>> m <- 1
>> for (i in 1:39) {
> +   annual.data <- data.in[data.in$Year == (i+1949),]
> +   B <- boot(data=annual.data,statistic=per95,R=m)
> +   print(i)
> +   print(memory.size())
> + }
> [1] 1
> [1] 20.26163
> [1] 2
> [1] 61.6352
> [1] 3
> [1] 134.4187
> [1] 4
> [1] 149.4704
> [1] 5
> [1] 290.3090
> [1] 6
> [1] 376.7017
> [1] 7
> [1] 435.7683
> [1] 8
> [1] 463.7404
> [1] 9
> [1] 497.7946
> Error: cannot allocate vector of size 568.8 Mb
>>
>
> I am running this on a Windows XP Pro machine with 4 Gb of memory. The same
> problem occurs when the code is executed on the same box running Ubuntu
> 8.04. Does anyone see any obvious reason why this should run out of memory?
> I would be happy to email the data file to anyone who cares to try it on
> their computer.
>
> Tom
>
>
>
>
>
> --
> View this message in context: 
> http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap-tp18777897p18777897.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Problems with a Simple Bootstrap

2008-08-01 Thread Tom La Bone

Same problem. The Windows Task Manager indicated that Rgui.exe was using
1,249,722 K of memory when the error occurred. This is R 2.7.1 by the way.

> library(boot)
> setwd("C:/Documents and Settings/Tom/Desktop")   
> 
> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
> 
> per95 <- function( annual.data, b.index) {
+   sample.data <- annual.data[b.index,]
+   return(quantile(sample.data$Result,probs=c(0.95))) }
> 
> m <- 1
> for (i in 1:39) {
+   annual.data <- data.in[data.in$Year == (i+1949),]
+   B <- boot(data=annual.data,statistic=per95,R=m)
+   gc()
+   print(i)  
+   print(object.size(B))
+   print(memory.size())
+ }
[1] 1
[1] 90352
[1] 12.35335
[1] 2
[1] 111032
[1] 12.39024
[1] 3
[1] 155544
[1] 12.48451
[1] 4
[1] 159064
[1] 11.10526
[1] 5
[1] 243456
[1] 11.23505
[1] 6
[1] 280592
[1] 12.74642
[1] 7
[1] 302416
[1] 11.33087
[1] 8
[1] 319752
[1] 12.84377
[1] 9
[1] 351448
[1] 11.42264
Error: cannot allocate vector of size 284.4 Mb
> 
> 



jholtman wrote:
> 
> Use gc() in the loop to possibly free up any fragmented memory.  You
> might also print out the size of B (object.size(B)) since that appears
> to be the only variable in your loop that might be growing.
> 
> On Fri, Aug 1, 2008 at 12:09 PM, Tom La Bone <[EMAIL PROTECTED]>
> wrote:
>>
>>
>> I have a data file called inputdata.csv that looks something like this"
>>
>>  ID YearResult Month   Date
>> 1   71741954   103  540301
>> 2   7174195443  540322
>> 3   20924  1967 4   2  670223
>> 4   20924  1967   -75  670518
>> 5   20924  1967   -37  670706
>> ...
>> 67209 ...
>>
>> i.e., it goes on for 67209 rows (~2 Mb file). When I run the following
>> bootstrap session I get the indicated error:
>>
>>>
>>> library(boot)
>>> setwd("C:/Documents and Settings/Tom/Desktop")
>>>
>>> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
>>>
>>> per95 <- function( annual.data, b.index) {
>> +   sample.data <- annual.data[b.index,]
>> +   return(quantile(sample.data$Result,probs=c(0.95))) }
>>>
>>> m <- 1
>>> for (i in 1:39) {
>> +   annual.data <- data.in[data.in$Year == (i+1949),]
>> +   B <- boot(data=annual.data,statistic=per95,R=m)
>> +   print(i)
>> +   print(memory.size())
>> + }
>> [1] 1
>> [1] 20.26163
>> [1] 2
>> [1] 61.6352
>> [1] 3
>> [1] 134.4187
>> [1] 4
>> [1] 149.4704
>> [1] 5
>> [1] 290.3090
>> [1] 6
>> [1] 376.7017
>> [1] 7
>> [1] 435.7683
>> [1] 8
>> [1] 463.7404
>> [1] 9
>> [1] 497.7946
>> Error: cannot allocate vector of size 568.8 Mb
>>>
>>
>> I am running this on a Windows XP Pro machine with 4 Gb of memory. The
>> same
>> problem occurs when the code is executed on the same box running Ubuntu
>> 8.04. Does anyone see any obvious reason why this should run out of
>> memory?
>> I would be happy to email the data file to anyone who cares to try it on
>> their computer.
>>
>> Tom
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap-tp18777897p18777897.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap-tp18777897p18779433.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Problems with a Simple Bootstrap

2008-08-01 Thread jim holtman
It seems like the objects are reasonable size and the memory size also
seems reasonable.  That is what I usually go by to see if there are
large objects in my memory.  If it was showing that R had 1.2GB of
memory allocated to it, I wonder if there might be a memory leak
somewhere.

On Fri, Aug 1, 2008 at 1:36 PM, Tom La Bone <[EMAIL PROTECTED]> wrote:
>
> Same problem. The Windows Task Manager indicated that Rgui.exe was using
> 1,249,722 K of memory when the error occurred. This is R 2.7.1 by the way.
>
>> library(boot)
>> setwd("C:/Documents and Settings/Tom/Desktop")
>>
>> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
>>
>> per95 <- function( annual.data, b.index) {
> +   sample.data <- annual.data[b.index,]
> +   return(quantile(sample.data$Result,probs=c(0.95))) }
>>
>> m <- 1
>> for (i in 1:39) {
> +   annual.data <- data.in[data.in$Year == (i+1949),]
> +   B <- boot(data=annual.data,statistic=per95,R=m)
> +   gc()
> +   print(i)
> +   print(object.size(B))
> +   print(memory.size())
> + }
> [1] 1
> [1] 90352
> [1] 12.35335
> [1] 2
> [1] 111032
> [1] 12.39024
> [1] 3
> [1] 155544
> [1] 12.48451
> [1] 4
> [1] 159064
> [1] 11.10526
> [1] 5
> [1] 243456
> [1] 11.23505
> [1] 6
> [1] 280592
> [1] 12.74642
> [1] 7
> [1] 302416
> [1] 11.33087
> [1] 8
> [1] 319752
> [1] 12.84377
> [1] 9
> [1] 351448
> [1] 11.42264
> Error: cannot allocate vector of size 284.4 Mb
>>
>>
>
>
>
> jholtman wrote:
>>
>> Use gc() in the loop to possibly free up any fragmented memory.  You
>> might also print out the size of B (object.size(B)) since that appears
>> to be the only variable in your loop that might be growing.
>>
>> On Fri, Aug 1, 2008 at 12:09 PM, Tom La Bone <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>
>>> I have a data file called inputdata.csv that looks something like this"
>>>
>>>  ID YearResult Month   Date
>>> 1   71741954   103  540301
>>> 2   7174195443  540322
>>> 3   20924  1967 4   2  670223
>>> 4   20924  1967   -75  670518
>>> 5   20924  1967   -37  670706
>>> ...
>>> 67209 ...
>>>
>>> i.e., it goes on for 67209 rows (~2 Mb file). When I run the following
>>> bootstrap session I get the indicated error:
>>>

 library(boot)
 setwd("C:/Documents and Settings/Tom/Desktop")

 data.in <- read.csv("inputdata.csv",header=T,as.is=T)

 per95 <- function( annual.data, b.index) {
>>> +   sample.data <- annual.data[b.index,]
>>> +   return(quantile(sample.data$Result,probs=c(0.95))) }

 m <- 1
 for (i in 1:39) {
>>> +   annual.data <- data.in[data.in$Year == (i+1949),]
>>> +   B <- boot(data=annual.data,statistic=per95,R=m)
>>> +   print(i)
>>> +   print(memory.size())
>>> + }
>>> [1] 1
>>> [1] 20.26163
>>> [1] 2
>>> [1] 61.6352
>>> [1] 3
>>> [1] 134.4187
>>> [1] 4
>>> [1] 149.4704
>>> [1] 5
>>> [1] 290.3090
>>> [1] 6
>>> [1] 376.7017
>>> [1] 7
>>> [1] 435.7683
>>> [1] 8
>>> [1] 463.7404
>>> [1] 9
>>> [1] 497.7946
>>> Error: cannot allocate vector of size 568.8 Mb

>>>
>>> I am running this on a Windows XP Pro machine with 4 Gb of memory. The
>>> same
>>> problem occurs when the code is executed on the same box running Ubuntu
>>> 8.04. Does anyone see any obvious reason why this should run out of
>>> memory?
>>> I would be happy to email the data file to anyone who cares to try it on
>>> their computer.
>>>
>>> Tom
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap-tp18777897p18777897.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap-tp18777897p18779433.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


Re: [R] Memory Problems with a Simple Bootstrap

2008-08-01 Thread Tom La Bone

Here it is with the gc() in a print statement.

Tom

> 
> library(boot)
> setwd("C:/Documents and Settings/Tom/Desktop")   
> 
> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
> 
> per95 <- function( annual.data, b.index) {
+   sample.data <- annual.data[b.index,]
+   return(quantile(sample.data$Result,probs=c(0.95))) }
> 
> m <- 1
> for (i in 1:39) {
+   annual.data <- data.in[data.in$Year == (i+1949),]
+   B <- boot(data=annual.data,statistic=per95,R=m)
+   print(i)  
+   print(gc())  
+   print(object.size(B))
+   print(memory.size())
+ }
[1] 1
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 145517  3.9 35  9.4   35  9.4
Vcells 304805  2.42602013 19.9  2841664 21.7
[1] 90352
[1] 12.35812
[1] 2
 used (Mb) gc trigger (Mb) max used  (Mb)
Ncells 145540  3.9 35  9.4   35   9.4
Vcells 309041  2.4   12977679 99.1 15259760 116.5
[1] 111032
[1] 12.39814
[1] 3
 used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 145540  3.9 35   9.4   35   9.4
Vcells 318147  2.5   35277418 269.2 41833896 319.2
[1] 155544
[1] 12.49432
[1] 4
 used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 145540  3.9 35   9.4   35   9.4
Vcells 318867  2.5   37046714 282.7 43935337 335.3
[1] 159064
[1] 11.10362
[1] 5
 used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 145540  3.9 35   9.4   35   9.4
Vcells 336129  2.6   79348305 605.4 94192718 718.7
[1] 243456
[1] 11.2296
[1] 6
 used (Mb) gc trigger  (Mb)  max used  (Mb)
Ncells 145540  3.9 35   9.435   9.4
Vcells 343725  2.7   97971904 747.5 116530118 889.1
[1] 280592
[1] 12.75431
[1] 7
 used (Mb) gc trigger  (Mb)  max used  (Mb)
Ncells 145540  3.9 35   9.435   9.4
Vcells 348189  2.7  108915204 831.0 129493067 988.0
[1] 302416
[1] 11.32924
[1] 8
 used (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells 145540  3.9 35   9.4359.4
Vcells 351735  2.7  117607222 897.3 139706454 1065.9
[1] 319752
[1] 12.85929
[1] 9
 used (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells 145540  3.9 359.4359.4
Vcells 358217  2.8  133510676 1018.7 158462984 1209.0
[1] 351448
[1] 11.40765
Error: cannot allocate vector of size 284.4 Mb
> 
> 
> 




jholtman wrote:
> 
> It seems like the objects are reasonable size and the memory size also
> seems reasonable.  That is what I usually go by to see if there are
> large objects in my memory.  If it was showing that R had 1.2GB of
> memory allocated to it, I wonder if there might be a memory leak
> somewhere.
> 
> On Fri, Aug 1, 2008 at 1:36 PM, Tom La Bone <[EMAIL PROTECTED]>
> wrote:
>>
>> Same problem. The Windows Task Manager indicated that Rgui.exe was using
>> 1,249,722 K of memory when the error occurred. This is R 2.7.1 by the
>> way.
>>
>>> library(boot)
>>> setwd("C:/Documents and Settings/Tom/Desktop")
>>>
>>> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
>>>
>>> per95 <- function( annual.data, b.index) {
>> +   sample.data <- annual.data[b.index,]
>> +   return(quantile(sample.data$Result,probs=c(0.95))) }
>>>
>>> m <- 1
>>> for (i in 1:39) {
>> +   annual.data <- data.in[data.in$Year == (i+1949),]
>> +   B <- boot(data=annual.data,statistic=per95,R=m)
>> +   gc()
>> +   print(i)
>> +   print(object.size(B))
>> +   print(memory.size())
>> + }
>> [1] 1
>> [1] 90352
>> [1] 12.35335
>> [1] 2
>> [1] 111032
>> [1] 12.39024
>> [1] 3
>> [1] 155544
>> [1] 12.48451
>> [1] 4
>> [1] 159064
>> [1] 11.10526
>> [1] 5
>> [1] 243456
>> [1] 11.23505
>> [1] 6
>> [1] 280592
>> [1] 12.74642
>> [1] 7
>> [1] 302416
>> [1] 11.33087
>> [1] 8
>> [1] 319752
>> [1] 12.84377
>> [1] 9
>> [1] 351448
>> [1] 11.42264
>> Error: cannot allocate vector of size 284.4 Mb
>>>
>>>
>>
>>
>>
>> jholtman wrote:
>>>
>>> Use gc() in the loop to possibly free up any fragmented memory.  You
>>> might also print out the size of B (object.size(B)) since that appears
>>> to be the only variable in your loop that might be growing.
>>>
>>> On Fri, Aug 1, 2008 at 12:09 PM, Tom La Bone <[EMAIL PROTECTED]>
>>> wrote:


 I have a data file called inputdata.csv that looks something like this"

  ID YearResult Month   Date
 1   71741954   103  540301
 2   7174195443  540322
 3   20924  1967 4   2  670223
 4   20924  1967   -75  670518
 5   20924  1967   -37  670706
 ...
 67209 ...

 i.e., it goes on for 67209 rows (~2 Mb file). When I run the following
 bootstrap session I get the indicated error:

>
> library(boot)
> setwd("C:/Documents and Settings/Tom/Desktop")
>
> data.in <- read.csv("inputdata.csv",header=T,as.is=T)
>
> per95 <- function( annual.data, b.index) {
 +   sample.data <- annual.data[b.index,]
 +   return(quantile(sample.data$Result,probs=c(0.95)))

[R] Memory Problems with a Simple Bootstrap - Part II

2008-08-02 Thread Tom La Bone

I have distilled my bootstrap problem down to this bit of code, which
calculates an estimate of the 95th percentile of 7500 random numbers drawn
from a standard normal distribution: 

library(boot)
per95 <- function( annual.data, b.index) {
  sample.data <- annual.data[b.index]
  return(quantile(sample.data,probs=c(0.95))) }
m <- 1
x <- rnorm(7500,0,1)
B <- boot(data=x,statistic=per95,R=m)

Error: cannot allocate vector of size 286.1 Mb

This was result was observed with R 2.7.1 and 2.7.1patched when run on a
Windows XP computer with 4Gb of memory.

This does not seem to be an excessively large and complicated calculation,
so is this an intentional limitation of the boot function, a result of bad
choices on my part, or a bug? 

Tom





-- 
View this message in context: 
http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap---Part-II-tp18788083p18788083.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Problems with a Simple Bootstrap - Part II

2008-08-02 Thread Prof Brian Ripley

On Sat, 2 Aug 2008, Tom La Bone wrote:


I have distilled my bootstrap problem down to this bit of code, which
calculates an estimate of the 95th percentile of 7500 random numbers drawn
from a standard normal distribution:

library(boot)
per95 <- function( annual.data, b.index) {
 sample.data <- annual.data[b.index]
 return(quantile(sample.data,probs=c(0.95))) }
m <- 1
x <- rnorm(7500,0,1)
B <- boot(data=x,statistic=per95,R=m)

Error: cannot allocate vector of size 286.1 Mb

This was result was observed with R 2.7.1 and 2.7.1patched when run on a
Windows XP computer with 4Gb of memory.

This does not seem to be an excessively large and complicated calculation,
so is this an intentional limitation of the boot function, a result of bad
choices on my part, or a bug?


Use of a 32-bit OS was a bad choice on your part.  On 64-bit Linux it runs 
fine in

gc()

  used (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  146670  7.9 35   18.735   18.7
Vcells 3189171 24.4  168442002 1285.2 193746905 1478.2

That's too much usage for a 2GB address space.

boot() sets up an index array, in your case of size 7500x1 or 
600Mb.  That dominates a 2Gb address space.


What you could do is

B <- replicate(10, boot(data=x,statistic=per95,R=1000), FALSE)
Ball <- B[[1]]
Ball$t <- do.call("rbind", lapply(B, "[[", "t"))

that is, combine 10 independent runs (and that runs in ca 200Mb).

BTW to Jim Holtman: adding a gc() call is not very helpful.  R will run gc 
to get memory if it is running out, and whereas the pattern of gc calls 
can affect the fragmentation, it is pretty much random whether adding gc 
calls helps or hinders.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Problems with a Simple Bootstrap - Part II

2008-08-02 Thread jim holtman
I was suggesting adding the gc() call to help provide some additional
information on the utilization of memory.  As you indicated, it
probably do not help in reducing the fragmentation of memory, but it
was worth a try to see if there was any additional information that
might be gleaned from the execution of the code.  A traceback() at the
point of the error does indicate that it was a problem with allocating
a matrix:

> per95 <- function( annual.data, b.index) {
+  sample.data <- annual.data[b.index]
+  return(quantile(sample.data,probs=c(0.95))) }
> m <- 1
> x <- rnorm(7500,0,1)
> B <- boot(data=x,statistic=per95,R=m)
Error: cannot allocate vector of size 572.2 Mb
> traceback()
4: matrix(0, R, n)
3: ordinary.array(n, R, strata)
2: index.array(n, R, sim, strata, m, L, weights)
1: boot(data = x, statistic = per95, R = m)

You could then trace back through the 'boot' code (if you wanted) to
determine what 'n' was.

On Sat, Aug 2, 2008 at 8:04 AM, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> On Sat, 2 Aug 2008, Tom La Bone wrote:
>
>> I have distilled my bootstrap problem down to this bit of code, which
>> calculates an estimate of the 95th percentile of 7500 random numbers drawn
>> from a standard normal distribution:
>>
>> library(boot)
>> per95 <- function( annual.data, b.index) {
>>  sample.data <- annual.data[b.index]
>>  return(quantile(sample.data,probs=c(0.95))) }
>> m <- 1
>> x <- rnorm(7500,0,1)
>> B <- boot(data=x,statistic=per95,R=m)
>>
>> Error: cannot allocate vector of size 286.1 Mb
>>
>> This was result was observed with R 2.7.1 and 2.7.1patched when run on a
>> Windows XP computer with 4Gb of memory.
>>
>> This does not seem to be an excessively large and complicated calculation,
>> so is this an intentional limitation of the boot function, a result of bad
>> choices on my part, or a bug?
>
> Use of a 32-bit OS was a bad choice on your part.  On 64-bit Linux it runs
> fine in
>>
>> gc()
>
>  used (Mb) gc trigger   (Mb)  max used   (Mb)
> Ncells  146670  7.9 35   18.735   18.7
> Vcells 3189171 24.4  168442002 1285.2 193746905 1478.2
>
> That's too much usage for a 2GB address space.
>
> boot() sets up an index array, in your case of size 7500x1 or 600Mb.
>  That dominates a 2Gb address space.
>
> What you could do is
>
> B <- replicate(10, boot(data=x,statistic=per95,R=1000), FALSE)
> Ball <- B[[1]]
> Ball$t <- do.call("rbind", lapply(B, "[[", "t"))
>
> that is, combine 10 independent runs (and that runs in ca 200Mb).
>
> BTW to Jim Holtman: adding a gc() call is not very helpful.  R will run gc
> to get memory if it is running out, and whereas the pattern of gc calls can
> affect the fragmentation, it is pretty much random whether adding gc calls
> helps or hinders.
>
>
> --
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Problems with a Simple Bootstrap - Part II

2008-08-02 Thread Prof Brian Ripley
The following version of boot:::ordinary.array will enable this to run in 
300Mb:


ordinary.array <- function(n, R, strata)
{
inds <- as.integer(names(table(strata)))
if (length(inds) == 1) {
output <- sample(n, n*R, replace=TRUE)
dim(output) <- c(R, n)
} else {
output <- matrix(as.integer(0), R, n)
for(is in inds) {
gp <- (1:n)[strata == is]
output[, gp] <- sample(gp, R*length(gp), replace=TRUE)
}
}
output
}

Note that you will have to replace the function in the 'boot' name space 
(either re-install boot afer editing the sources or use fixInNamespace)



On Sat, 2 Aug 2008, Prof Brian Ripley wrote:


On Sat, 2 Aug 2008, Tom La Bone wrote:


I have distilled my bootstrap problem down to this bit of code, which
calculates an estimate of the 95th percentile of 7500 random numbers drawn
from a standard normal distribution:

library(boot)
per95 <- function( annual.data, b.index) {
 sample.data <- annual.data[b.index]
 return(quantile(sample.data,probs=c(0.95))) }
m <- 1
x <- rnorm(7500,0,1)
B <- boot(data=x,statistic=per95,R=m)

Error: cannot allocate vector of size 286.1 Mb

This was result was observed with R 2.7.1 and 2.7.1patched when run on a
Windows XP computer with 4Gb of memory.

This does not seem to be an excessively large and complicated calculation,
so is this an intentional limitation of the boot function, a result of bad
choices on my part, or a bug?


Use of a 32-bit OS was a bad choice on your part.  On 64-bit Linux it runs 
fine in

gc()

 used (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  146670  7.9 35   18.735   18.7
Vcells 3189171 24.4  168442002 1285.2 193746905 1478.2

That's too much usage for a 2GB address space.

boot() sets up an index array, in your case of size 7500x1 or 600Mb. 
That dominates a 2Gb address space.


What you could do is

B <- replicate(10, boot(data=x,statistic=per95,R=1000), FALSE)
Ball <- B[[1]]
Ball$t <- do.call("rbind", lapply(B, "[[", "t"))

that is, combine 10 independent runs (and that runs in ca 200Mb).

BTW to Jim Holtman: adding a gc() call is not very helpful.  R will run gc to 
get memory if it is running out, and whereas the pattern of gc calls can 
affect the fragmentation, it is pretty much random whether adding gc calls 
helps or hinders.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.