Dear Jeff,

On 12/17/2014 01:46 AM, Jeff Newmiller wrote:
You are chasing ghosts of performance past, Denes.

In terms of memory efficiency, yes. In terms of CPU time, there can be significant difference, see below.


The data.frame
function causes no problems, and if it is used then the OP would not
need to presume they know the internal structure of the data frame.
See below. (I am using R3.1.2.)

a1 <- list(x = rnorm(1e6), y = rnorm(1e6))
a2 <- list(x = rnorm(1e6), y = rnorm(1e6))
a3 <- list(x = rnorm(1e6), y = rnorm(1e6))

# get names of the objects
out_names <- ls(pattern="a[[:digit:]]$")

# amount of memory allocated
gc(reset=TRUE)

# Explicitly call data frame
out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] )

# No copying.
gc()

# Your suggested retreival method
out3a <- lapply( lapply( out_names, get ), "[[", "x" )
names( out3a ) <- out_names
# The "obvious" way to finish the job works fine.
out3 <- do.call( data.frame, out3a )

BTW, the even more "obvious" as.data.frame() produces the same with an even more intuitive interface.

However, for lists with a larger number of elements the transformation to a data.frame can be pretty slow. In the toy example, we created only a three-element list. Let's increase it a little bit.

---

# this is not even that large
datlen <- 1e2
listlen <- 1e5

# create a toy list
mylist <- matrix(seq_len(datlen * listlen),
                 nrow = datlen, ncol = listlen)
mylist <- lapply(1:ncol(mylist), function(i) mylist[, i])
names(mylist) <- paste0("V", seq_len(listlen))


# define the more efficient function ---
# note that I put class(x) first so that setattr does not
# modify the attributes of the original input (see ?setattr,
# you have to be careful)
setAttrib <- function(x) {
    class(x) <- "data.frame"
    data.table::setattr(x, "row.names", seq_along(x[[1]]))
    x
}

# benchmarking
# (we do not need microbenchmark here, the differences are
# extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec
gc(reset=TRUE)
system.time(df1 <- do.call(data.frame, mylist))
gc()
system.time(df2 <- as.data.frame(mylist))
gc()
system.time(df3 <- setAttrib(mylist))
gc()

# check results
identical(df1, df2)
identical(df1, df3)

----

Of course for small datasets, one should use the built-in and safe functions (either do.call or as.data.frame). BTW, for the original three-element list, these are even faster than the workaround.

All the best,
  Denes





# No copying... well, you do end up with a new list in out3, but the
data itself doesn't get copied.
gc()


On Tue, 16 Dec 2014, D?nes T?th wrote:

On 12/16/2014 06:06 PM, SH wrote:
Dear List,

I hope this posting is not redundant.  I have several list outputs
with the
same components.  I ran a function with three different scenarios below
(e.g., scen1, scen2, and scen3,...,scenN).  I would like to extract the
same components and group them as a data frame.  For example,
pop.inf.r1 <- scen1[['pop.inf.r']]
pop.inf.r2 <- scen2[['pop.inf.r']]
pop.inf.r3 <- scen3[['pop.inf.r']]
...
pop.inf.rN<-scenN[['pop.inf.r']]
new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN)

My final output would be 'new.df'.  Could you help me how I can do that
efficiently?

If efficiency is of concern, do not use data.frame() but create a list
and add the required attributes with data.table::setattr (the setattr
function of the data.table package). (You can also consider creating a
data.table instead of a data.frame.)

# some largish lists
a1 <- list(x = rnorm(1e6), y = rnorm(1e6))
a2 <- list(x = rnorm(1e6), y = rnorm(1e6))
a3 <- list(x = rnorm(1e6), y = rnorm(1e6))

# amount of memory allocated
gc(reset=TRUE)

# get names of the objects
out_names <- ls(pattern="a[[:digit:]]$")

# create a list
out <- lapply(lapply(out_names, get), "[[", "x")

# note that no copying occured
gc()

# decorate the list
data.table::setattr(out, "names", out_names)
data.table::setattr(out, "row.names", seq_along(out[[1]]))
class(out) <- "data.frame"

# still no copy
gc()

# output
head(out)


HTH,
 Denes



Thanks in advance,

Steve

P.S.:  Below are some examples of summary outputs.


summary(scen1)
                 Length Class  Mode
aql                1   -none- numeric
rql                1   -none- numeric
alpha              1   -none- numeric
beta               1   -none- numeric
n.sim              1   -none- numeric
N                  1   -none- numeric
n.sample           1   -none- numeric
n.acc              1   -none- numeric
lot.inf.r          1   -none- numeric
pop.inf.n       2000   -none- list
pop.inf.r       2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n        2000   -none- list
sp.inf.r        2000   -none- list
sp.decision     2000   -none- list
summary(scen2)
                 Length Class  Mode
aql                1   -none- numeric
rql                1   -none- numeric
alpha              1   -none- numeric
beta               1   -none- numeric
n.sim              1   -none- numeric
N                  1   -none- numeric
n.sample           1   -none- numeric
n.acc              1   -none- numeric
lot.inf.r          1   -none- numeric
pop.inf.n       2000   -none- list
pop.inf.r       2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n        2000   -none- list
sp.inf.r        2000   -none- list
sp.decision     2000   -none- list
summary(scen3)
                 Length Class  Mode
aql                1   -none- numeric
rql                1   -none- numeric
alpha              1   -none- numeric
beta               1   -none- numeric
n.sim              1   -none- numeric
N                  1   -none- numeric
n.sample           1   -none- numeric
n.acc              1   -none- numeric
lot.inf.r          1   -none- numeric
pop.inf.n       2000   -none- list
pop.inf.r       2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n        2000   -none- list
sp.inf.r        2000   -none- list
sp.decision     2000   -none- list

    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to