When I change the data set size, the "extra allocations" do not change in size. This supports Luke and Martin's diagnosis.
The extra allocations are either 2 or 4 allocations each of size 80040 240048 320040 Details (you may skip): (Fresh session of R 3.0.0) > y <- 1:10^4 + 0.0 > Rprofmem("temp.out", threshold = 10^4) > d <- as.data.frame(y) > Rprofmem(NULL); system("cat temp.out") 320040 :80040 :240048 :320040 :80040 :240048 :80040 :"as.data.frame.numeric" "as.data.frame" 320040 :80040 :240048 :320040 :80040 :240048 :> > # Try increasing size by a factor of 10 > y <- 1:10^5 + 0.0 > Rprofmem("temp.out", threshold = 10^4) > d <- as.data.frame(y) > Rprofmem(NULL); system("cat temp.out") 320040 :80040 :240048 :320040 :80040 :240048 :800040 :"as.data.frame.numeric" "as.data.frame" 320040 :80040 :240048 :320040 :80040 :240048 :> The number of allocations shown, of different sizes: 3.0.0 3.0.0 2.15.3 2.15.3 first second first second 240048 4 4 0 0 320040 4 4 0 0 80040 5 4 1 0 800040 0 1 0 1 So it looks like both R 2.15.3 and R 3.0.0 are making one copy of the data, plus extra allocations. (Fresh session of R 2.15.3) > y <- 1:10^4 + 0.0 > Rprofmem("temp.out", threshold = 10^4) > d <- as.data.frame(y) > Rprofmem(NULL); system("cat temp.out") 80040 :"as.data.frame.numeric" "as.data.frame" > # Increase size by factor of 10 > y <- 1:10^5 + 0.0 > Rprofmem("temp.out", threshold = 10^4) > d <- as.data.frame(y) > Rprofmem(NULL); system("cat temp.out") 800040 :"as.data.frame.numeric" "as.data.frame" On Sun, 14 Apr 2013 19:15:45 -0700 Martin Morgan <mtmor...@fhcrc.org> wrote: >On 04/14/2013 07:11 PM, luke-tier...@uiowa.edu wrote: >> There were a couple of bug fixes to somewhat obscure compound >> assignment related bugs that required bumping up internal reference >> counts. It's possible that one or more of these are responsible. If so >> it is unavoidable for now, but it's worth finding out for sure. With >> some stripped down test examples it should be possible to identify >> when things changed. I won't have time to look for some time, but if >> someone else wanted to nail this down that would be useful. > >I can't quite tell from Tim's script what he's documenting. In R-2.15.3 I have > > > Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE) >character(0) > >(or sometimes [1] "new page:new page:\"Rprofmem\" ") > >whereas in R-3.0.0 > > > Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE) >[1] "320040 :80040 :240048 :320040 :80040 :240048 :" > >I think these are the allocations Tim is seeing. They're from the parser (see >below) rather than as.data.frame. For Tim's example > > y <- 1:10^4 + 0.0 > Rprofmem(); d <- as.data.frame(y); Rprofmem(NULL); readLines("Rprofmem.out") > >[1] "320040 :80040 :240048 :320040 :80040 :240048 :80040 >:\"as.data.frame.numeric\" \"as.data.frame\" " >[2] "320040 :80040 :240048 :320040 :80040 :240048 :" > >only the allocation 80040 is from as.data.frame (from the call stack output). > >Under R -d gdb > > (gdb) b R_OutputStackTrace > (gdb) r > > Rprofmem(); Rprofmem(NULL) > > Breakpoint 1, R_OutputStackTrace (file=0xbd43f0) at >/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434 > 3434 { > (gdb) bt > #0 R_OutputStackTrace (file=0xbd43f0) at >/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434 > #1 0x00007ffff792ff83 in R_ReportAllocation (size=320040) at >/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3456 > #2 Rf_allocVector (type=13, length=80000) at >/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:2478 > #3 0x00007ffff790bedf in growData () at gram.y:3391 > >and the memory allocations are from these lines in the parser gram.y > > PROTECT( bigger = allocVector( INTSXP, data_size * DATA_ROWS ) ) ; > PROTECT( biggertext = allocVector( STRSXP, data_size ) ); > >I'm not sure why these show up under R 3.0.0, though. > >$ R-2-15-branch/bin/R --version >R version 2.15.3 Patched (2013-03-13 r62579) -- "Security Blanket" >Copyright (C) 2013 The R Foundation for Statistical Computing >ISBN 3-900051-07-0 >Platform: x86_64-unknown-linux-gnu (64-bit) > >R-3-0-branch$ bin/R --version >R version 3.0.0 Patched (2013-04-14 r62579) -- "Masked Marvel" >Copyright (C) 2013 The R Foundation for Statistical Computing >Platform: x86_64-unknown-linux-gnu (64-bit) > >Martin > > > >> >> Best, >> >> luke >> >> On Sun, 14 Apr 2013, Tim Hesterberg wrote: >> >>> I did some benchmarking of data frame code, and >>> it appears that R 3.0.0 is far worse than earlier versions of R >>> in terms of how many large objects it allocates space for, >>> for data frame operations - creation, subscripting, subscript replacement. >>> For a data frame with n rows, it makes either 2 or 4 extra copies of >>> all of: >>> 8n bytes (e.g. double precision) >>> 24n bytes >>> 32n bytes >>> E.g., for as.data.frame(numeric vector), instead of allocations >>> totalling ~8n bytes, it allocates 33 times that much. >>> >>> Here, compare columns 3 and 5 >>> (columns 2 and 4 are with the dataframe package). >>> >>> # Summary >>> # R-2.14.2 R-2.15.3 R-3.0.0 >>> # w/o with w/o with w/o >>> # as.data.frame(y) 3 1 1 1 5;4;4 >>> # data.frame(y) 7 3 4 2 6;2;2 >>> # data.frame(y, z) 7 each 3 each 4 2 8;4;4 >>> # as.data.frame(l) 8 3 5 2 9;4;4 >>> # data.frame(l) 13 5 8 3 12;4;4 >>> # d$z <- z 3,2 1,1 3,1 2,1 7;4;4,1 >>> # d[["z"]] <- z 4,3 1,1 3,1 2,1 7;4;4,1 >>> # d[, "z"] <- z 6,4,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2 >>> # d["z"] <- z 6,5,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2 >>> # d["z"] <- list(z=z) 6,3,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2 >>> # d["z"] <- Z #list(z=z) 6,2,2 2,1,1 4,1,2 3,1,1 8;4;4,1,2 >>> # a <- d["y"] 2 1 2 1 6;4;4 >>> # a <- d[, "y", drop=F] 2 1 2 1 6;4;4 >>> >>> # Where two numbers are given, they refer to: >>> # (copies of the old data frame), >>> # (copies of the new column) >>> # A third number refers to numbers of >>> # (copies made of an integer vector of row names) >>> >>> # For R 3.0.0, I'm getting astounding results - many more copies, >>> # and also some copies of larger objects; in addition to the data >>> # vectors of size 80K and 160K, also 240K and 320K. >>> # Where three numbers are given in form a;c;d, they refer to >>> # (copies of 80K; 240K; 320K) >>> >>> The benchmarks are at >>> http://www.timhesterberg.net/r-packages/memory.R >>> >>> I'm using versions of R I installed from source on a Linux box, using e.g. >>> ./configure --prefix=(my path) --enable-memory-profiling --with-readline=no >>> make >>> make install >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> > > >-- >Computational Biology / Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N. >PO Box 19024 Seattle, WA 98109 > >Location: Arnold Building M1 B861 >Phone: (206) 667-2793 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel