Re: [R] Memory hungry routines

2014-12-30 Thread ALBERTO VIEIRA FERREIRA MONTEIRO
Thanks to Duncan, Hadley and Henrik.

Duncan, I used Rprof and could pinpoint the critical routine that was
doing the memory crash.

Henrik, you got it right: the culprit was a big matrix of integers,
but where some of its fields are filled with -Inf and Inf. This matrix
is global, it's used only once, it does not consume too much memory,
and it should be harmless, but...

Hadley, your link to memory allocation and management helped to
identify the problem. I did a very stupid think, I added some debug in
the critical routine that duplicated it at each iteration of a loop...
So that big matrix with integers and Infs and -Infs was being copied
several times, killing memory needlessly.

Thanks for all the help.

I got 99 problems but you won't be one

Alberto Monteiro

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Memory hungry routines

2014-12-29 Thread ALBERTO VIEIRA FERREIRA MONTEIRO
Is there any way to detect which calls are consuming memory?

I run a program whose global variables take up about 50 Megabytes of
memory, but when I monitor the progress of the program it seems to
allocating 150 Megabytes of memory, with peaks of up to 2 Gigabytes.

I know that the global variables aren't copied many times by the
routines, but I suspect something weird must be happening.

Alberto Monteiro

PS: the lines, below, count the memory allocated to all global
variables, probably it could be adapted to track the local variables:

y - ls(pat=)   # get all names of the variables
z - rep(0, length(y))  # create array of sizes
for (i in 1:length(y)) z[i] - object.size(get(y[i]))  # loop: get all
sizes (in bytes) of the variables
# BTW, is there any way to vectorialize the above loop?
xix - sort.int(z, index.return = TRUE)  # sort the sizes
y - y[xix$ix]  # apply the sort to the variables
z - z[xix$ix]  # apply the sort to the sizes
y - c(y, total)  # add a totalizator
z - c(z, sum(z))  # sum them all
cbind(y, z)  # ugly way to list them

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory hungry routines

2014-12-29 Thread Duncan Murdoch
On 29/12/2014 1:52 PM, ALBERTO VIEIRA FERREIRA MONTEIRO wrote:
 Is there any way to detect which calls are consuming memory?

The Rprofmem() function can do this, but you need to build R to enable
it.Rprof() does a more limited version of the same thing if run with
memory.profiling = TRUE.

Duncan Murdoch

 
 I run a program whose global variables take up about 50 Megabytes of
 memory, but when I monitor the progress of the program it seems to
 allocating 150 Megabytes of memory, with peaks of up to 2 Gigabytes.
 
 I know that the global variables aren't copied many times by the
 routines, but I suspect something weird must be happening.
 
 Alberto Monteiro
 
 PS: the lines, below, count the memory allocated to all global
 variables, probably it could be adapted to track the local variables:
 
 y - ls(pat=)   # get all names of the variables
 z - rep(0, length(y))  # create array of sizes
 for (i in 1:length(y)) z[i] - object.size(get(y[i]))  # loop: get all
 sizes (in bytes) of the variables
 # BTW, is there any way to vectorialize the above loop?
 xix - sort.int(z, index.return = TRUE)  # sort the sizes
 y - y[xix$ix]  # apply the sort to the variables
 z - z[xix$ix]  # apply the sort to the sizes
 y - c(y, total)  # add a totalizator
 z - c(z, sum(z))  # sum them all
 cbind(y, z)  # ugly way to list them
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory hungry routines

2014-12-29 Thread Hadley Wickham
You might find the advice at http://adv-r.had.co.nz/memory.html helpful.
Hadley

On Tue, Dec 30, 2014 at 7:52 AM, ALBERTO VIEIRA FERREIRA MONTEIRO
albm...@centroin.com.br wrote:
 Is there any way to detect which calls are consuming memory?

 I run a program whose global variables take up about 50 Megabytes of
 memory, but when I monitor the progress of the program it seems to
 allocating 150 Megabytes of memory, with peaks of up to 2 Gigabytes.

 I know that the global variables aren't copied many times by the
 routines, but I suspect something weird must be happening.

 Alberto Monteiro

 PS: the lines, below, count the memory allocated to all global
 variables, probably it could be adapted to track the local variables:

 y - ls(pat=)   # get all names of the variables
 z - rep(0, length(y))  # create array of sizes
 for (i in 1:length(y)) z[i] - object.size(get(y[i]))  # loop: get all
 sizes (in bytes) of the variables
 # BTW, is there any way to vectorialize the above loop?
 xix - sort.int(z, index.return = TRUE)  # sort the sizes
 y - y[xix$ix]  # apply the sort to the variables
 z - z[xix$ix]  # apply the sort to the sizes
 y - c(y, total)  # add a totalizator
 z - c(z, sum(z))  # sum them all
 cbind(y, z)  # ugly way to list them

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory hungry routines

2014-12-29 Thread Henrik Bengtsson
On Mon, Dec 29, 2014 at 10:52 AM, ALBERTO VIEIRA FERREIRA MONTEIRO
albm...@centroin.com.br wrote:
 Is there any way to detect which calls are consuming memory?

 I run a program whose global variables take up about 50 Megabytes of
 memory, but when I monitor the progress of the program it seems to
 allocating 150 Megabytes of memory, with peaks of up to 2 Gigabytes.

 I know that the global variables aren't copied many times by the
 routines, but I suspect something weird must be happening.

 Alberto Monteiro

 PS: the lines, below, count the memory allocated to all global
 variables, probably it could be adapted to track the local variables:

 y - ls(pat=)   # get all names of the variables
 z - rep(0, length(y))  # create array of sizes
 for (i in 1:length(y)) z[i] - object.size(get(y[i]))  # loop: get all
 sizes (in bytes) of the variables
 # BTW, is there any way to vectorialize the above loop?
 xix - sort.int(z, index.return = TRUE)  # sort the sizes
 y - y[xix$ix]  # apply the sort to the variables
 z - z[xix$ix]  # apply the sort to the sizes
 y - c(y, total)  # add a totalizator
 z - c(z, sum(z))  # sum them all
 cbind(y, z)  # ugly way to list them

Duncan already suggested Rprofmem().  For a neat interface to that,
see also lineprof package.

Common memory hogs are cbind(), rbind() and other ways of
incrementally building up objects.  These can often be avoided by
pre-allocating the final object up front and populating it as you go.
Another source of unnecessary memory duplication is coercion of data
types, e.g. allocating an integer matrix but populating it with
doubles.  A related mistake is to use matrix(nrow, ncol) for allocate
matrices that will hold numeric values.  That is actually doing
matrix(NA, nrow, ncol), which becomes a *logical* matrix, which will
be coerced (involving copying and large memory allocation) the first
thing as soon as it get's populated with a numeric value. One should
have used matrix(NA_real_, nrow, ncol) here.

For listing objects, their sizes and more, you can use ll() in the
R.oo package which returns a data.frame, e.g.

 example(iris)
 a - 1:1e6
 R.oo::ll()
  member data.class dimension objectSize
1  anumeric   100440
2   dni3   list 3600
3 ii data.frame  c(150,5)   7088
4   iris data.frame  c(150,5)   7088

 R.oo::ll(sortBy=objectSize)
  member data.class dimension objectSize
2   dni3   list 3600
3 ii data.frame  c(150,5)   7088
4   iris data.frame  c(150,5)   7088
1  anumeric   100440

 tbl - R.oo::ll()
 tbl - tbl[order(tbl$objectSize, decreasing=TRUE),]
 tbl
  member data.class dimension objectSize
1  anumeric   100440
3 ii data.frame  c(150,5)   7088
4   iris data.frame  c(150,5)   7088
5   objs data.framec(4,4)   2760
2   dni3   list 3600
 sum(tbl$objectSize)
[1] 4017576


/Henrik


 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.