Re: [R] How to quickly convert a data.frame into a structure of lists
Here is code to transform the matrix that by() or array(split()) produces, along with an example of the speed of the various approaches. Using split(), either directly or via by() or tapply(), saves a lot of time. f0 <- function(df) { # original code with typos fixed. list_structure <- lapply(levels(df$A), function(levelA) { lapply(levels(df$B), function(levelB) {df$C[df$A==levelA & df$B==levelB]}) }) # Apply the names: names(list_structure)<-levels(df$A) for (i in 1:length(list_structure)) { names(list_structure[[i]])<-levels(df$B) } list_structure } f0a <- function(df) { # slightly faster version of f0, taking repeated # calculations out of loops. A <- df$A B <- df$B C <- df$C levelsA <- structure(levels(A), names=levels(A)) levelsB <- structure(levels(B), names=levels(B)) lapply(levelsA, function(levelA) { tmpA <- A == levelA # this is responsible for most of the savings lapply(levelsB, function(levelB) {C[tmpA & B==levelB]}) }) } f1 <- function(df) { # DM's code by(df$C, df[,1:2], identity) } f2 <- function(df) { # WD's code AB<- df[c("A","B")] array(split(df$C, AB), dim=sapply(AB, nlevels), dimnames=sapply(AB, levels)) } matrix2ListOfRows <- function(mat) { # convert a matrix to a list of its rows, converting dimnames to names. retval <- structure(as.vector(mat), names=rep(colnames(mat), each=nrow(mat))) retval <- split(retval, row(mat)) names(retval) <- rownames(mat) retval } The test involves 10^5 rows of data with 26 levels for A and 200 for B. > r200 <- as.character(as.roman(1:200)) > set.seed(1) > df <- data.frame(A=factor(sample(letters, size=1e5, replace=TRUE), > levels=letters), + B=factor(sample(r200, size=1e5, replace=TRUE), levels=r200), + C=1:1e5) > system.time(ls0 <- f0(df)) user system elapsed 74.082.34 76.60 > system.time(ls0a <- f0a(df)) user system elapsed 43.090.44 43.73 > all.equal(ls0, ls0a) [1] TRUE > system.time(ls2 <- matrix2ListOfRows(f2(df))) user system elapsed 0.090.020.11 > all.equal(ls0, ls2) [1] TRUE > system.time(ls1 <- matrix2ListOfRows(f1(df))) user system elapsed 0.690.000.69 > all.equal(ls0, ls1) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of William Dunlap > Sent: Wednesday, August 10, 2011 10:05 AM > To: Duncan Murdoch; Frederic F > Cc: r-help@r-project.org > Subject: Re: [R] How to quickly convert a data.frame into a structure of lists > > I was going to suggest > > AB <- df[c("A","B")] > > ls2 <- array(split(df$C, AB), dim=sapply(AB, nlevels), > dimnames=sapply(AB, levels)) > which produces a matrix very similar to what Duncan's by() call produces > > ls1 <- by(df$C, df[,1:2], identity) > E.g., > > ls2[["a","X"]] > [1] 1 2 > > ls1[["a","X"]] > [1] 1 2 > > ls1[["a","Y"]] # by assigns NULL to unoccupied slots > NULL > > ls2[["a","Y"]] # split gives the same type to all slots, copied from input > numeric(0) > > They both are quick because they use split() to avoid the repeated > evaluations of > bigVector[ anotherBigVector == scalar ] > that your nested (not imbricated) loops do. If you really need to convert > the matrix to a list of lists that will probably be a quick transformation. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > > Behalf Of Duncan Murdoch > > Sent: Wednesday, August 10, 2011 9:43 AM > > To: Frederic F > > Cc: r-help@r-project.org > > Subject: Re: [R] How to quickly convert a data.frame into a structure of > > lists > > > > On 10/08/2011 10:30 AM, Frederic F wrote: > > > Hello Duncan, > > > > > > Here is a small example to illustrate what I am trying to do. > > > > > > # Example data.frame > > > df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) > > > # A B C > > > # 1 a X 1 > > > # 2 a X 2 > > > # 3 b Y 3 > > > # 4 b Z 4 > > > > > > ### First way of getting the list structure (ls1) using imbricated lapply > > > loops: > > > # Get the
Re: [R] How to quickly convert a data.frame into a structure of lists
I was going to suggest > AB <- df[c("A","B")] > ls2 <- array(split(df$C, AB), dim=sapply(AB, nlevels), dimnames=sapply(AB, levels)) which produces a matrix very similar to what Duncan's by() call produces > ls1 <- by(df$C, df[,1:2], identity) E.g., > ls2[["a","X"]] [1] 1 2 > ls1[["a","X"]] [1] 1 2 > ls1[["a","Y"]] # by assigns NULL to unoccupied slots NULL > ls2[["a","Y"]] # split gives the same type to all slots, copied from input numeric(0) They both are quick because they use split() to avoid the repeated evaluations of bigVector[ anotherBigVector == scalar ] that your nested (not imbricated) loops do. If you really need to convert the matrix to a list of lists that will probably be a quick transformation. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Duncan Murdoch > Sent: Wednesday, August 10, 2011 9:43 AM > To: Frederic F > Cc: r-help@r-project.org > Subject: Re: [R] How to quickly convert a data.frame into a structure of lists > > On 10/08/2011 10:30 AM, Frederic F wrote: > > Hello Duncan, > > > > Here is a small example to illustrate what I am trying to do. > > > > # Example data.frame > > df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) > > # A B C > > # 1 a X 1 > > # 2 a X 2 > > # 3 b Y 3 > > # 4 b Z 4 > > > > ### First way of getting the list structure (ls1) using imbricated lapply > > loops: > > # Get the structure and populate it: > > ls1<-lapply(levels(df$A), function(levelA) { > >lapply(levels(df$B), function(levelB) {df$C[df$A==levelA& > > df$B==levelB]}) > > }) > > # Apply the names: > > names(list_structure)<-levels(df$A) > > for (i in 1:length(list_structure)) > > {names(list_structure[[i]])<-levels(df$B)} > > > > # Result: > > ls1$a$X > > # [1] 1 2 > > ls1$b$Z > > # [1] 4 > > > > The data.frame will always be 'complete', i.e., there will be a value in > > every row for every column. > > I want to produce a structure like this one quickly (I aim at something > > below 10 seconds) for a dataset containing between 1 and 2 millions of rows. > > > > I don't know what the timing would be like for your real data, but this > does look like by() would work: > > ls1 <- by(df$C, df[,1:2], identity) > > When I repeat the rows of df a million times each, this finishes in a > few seconds. It would definitely be slower if there were more levels of > A or B. > > Now ls1 will be a matrix whose entries are the subsets of C that you > want, so you can see your two results with slightly different syntax: > > > ls1[["a", "X"]] > [1] 1 2 > > ls1[["b","Z"]] > [1] 4 > > Duncan Murdoch > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
Hi Frederic, shouldn't there be an result for the 3rd row as well, eg ls1$b$Y? Maybe this will do what you want? dtf<-within(dtf,index<-factor(A:B)) tapply(dtf$C,dtf$index,list) Hth. Am 10.08.2011 16:30, schrieb Frederic F: > Hello Duncan, > > Here is a small example to illustrate what I am trying to do. > > # Example data.frame > df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) > # A B C > # 1 a X 1 > # 2 a X 2 > # 3 b Y 3 > # 4 b Z 4 > > ### First way of getting the list structure (ls1) using imbricated lapply > loops: > # Get the structure and populate it: > ls1<-lapply(levels(df$A), function(levelA) { > lapply(levels(df$B), function(levelB) {df$C[df$A==levelA & > df$B==levelB]}) > }) > # Apply the names: > names(list_structure)<-levels(df$A) > for (i in 1:length(list_structure)) > {names(list_structure[[i]])<-levels(df$B)} > > # Result: > ls1$a$X > # [1] 1 2 > ls1$b$Z > # [1] 4 > > The data.frame will always be 'complete', i.e., there will be a value in > every row for every column. > I want to produce a structure like this one quickly (I aim at something > below 10 seconds) for a dataset containing between 1 and 2 millions of rows. > > I hope that this helps clarify things. > > Thanks for your help, > > Frederic > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3733073.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
On 10/08/2011 10:30 AM, Frederic F wrote: Hello Duncan, Here is a small example to illustrate what I am trying to do. # Example data.frame df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) # A B C # 1 a X 1 # 2 a X 2 # 3 b Y 3 # 4 b Z 4 ### First way of getting the list structure (ls1) using imbricated lapply loops: # Get the structure and populate it: ls1<-lapply(levels(df$A), function(levelA) { lapply(levels(df$B), function(levelB) {df$C[df$A==levelA& df$B==levelB]}) }) # Apply the names: names(list_structure)<-levels(df$A) for (i in 1:length(list_structure)) {names(list_structure[[i]])<-levels(df$B)} # Result: ls1$a$X # [1] 1 2 ls1$b$Z # [1] 4 The data.frame will always be 'complete', i.e., there will be a value in every row for every column. I want to produce a structure like this one quickly (I aim at something below 10 seconds) for a dataset containing between 1 and 2 millions of rows. I don't know what the timing would be like for your real data, but this does look like by() would work: ls1 <- by(df$C, df[,1:2], identity) When I repeat the rows of df a million times each, this finishes in a few seconds. It would definitely be slower if there were more levels of A or B. Now ls1 will be a matrix whose entries are the subsets of C that you want, so you can see your two results with slightly different syntax: > ls1[["a", "X"]] [1] 1 2 > ls1[["b","Z"]] [1] 4 Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
Hello Denis, > To borrow shamelessly from one of the prominent helpers on this list: > "What is the problem you're trying to solve?" Â Â (attribution: Jim Holtman) I'm trying to connect two sets of legacy R tools: the output of the first one can be transformed in a data.frame without loss of information, the input of the second one takes the form of a structure of list. > it's entirely possible > that there may be a nice 'R way' to do it. Read the posting guide and > if at all possible, provide a small, reproducible example that > demonstrates what you want to accomplish. Here is the first way attacked the problems illustrated on a tiny dataset (this way does not work quickly enough on a real dataset unfortunately): df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) # Get the structure and populate it: ls1<-lapply(levels(df$A), function(levelA) { lapply(levels(df$B), function(levelB) {df$C[df$A==levelA & df$B==levelB]}) }) # Get the names: names(list_structure)<-levels(df$A) for (i in 1:length(list_structure)) {names(list_structure[[i]])<-levels(df$B)} # Results: ls1$a$X # [1] 1 2 ls1$b$Z # [1] 4 Thanks for your help, Frederic -- View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3733114.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
Hello Duncan, Here is a small example to illustrate what I am trying to do. # Example data.frame df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) # A B C # 1 a X 1 # 2 a X 2 # 3 b Y 3 # 4 b Z 4 ### First way of getting the list structure (ls1) using imbricated lapply loops: # Get the structure and populate it: ls1<-lapply(levels(df$A), function(levelA) { lapply(levels(df$B), function(levelB) {df$C[df$A==levelA & df$B==levelB]}) }) # Apply the names: names(list_structure)<-levels(df$A) for (i in 1:length(list_structure)) {names(list_structure[[i]])<-levels(df$B)} # Result: ls1$a$X # [1] 1 2 ls1$b$Z # [1] 4 The data.frame will always be 'complete', i.e., there will be a value in every row for every column. I want to produce a structure like this one quickly (I aim at something below 10 seconds) for a dataset containing between 1 and 2 millions of rows. I hope that this helps clarify things. Thanks for your help, Frederic -- View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3733073.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
To borrow shamelessly from one of the prominent helpers on this list: "What is the problem you're trying to solve?"(attribution: Jim Holtman) I have the sense you want to do something over many subsets of your data frame. If so, breaking things up into lists of lists of lists is not necessarily productive, nor may it be necessary to use loops explicitly, depending on the nature of what you want to do. If you're more explicit about the nature of your task, it's entirely possible that there may be a nice 'R way' to do it. Read the posting guide and if at all possible, provide a small, reproducible example that demonstrates what you want to accomplish. (See ?dput to learn how to transmit data by e-mail.) HTH, Dennis On Tue, Aug 9, 2011 at 5:58 PM, Frederic F wrote: > Hello, > > This is my first project in R, so I'm trying to work 'the R way', but it > still feels awkward sometimes. > > The problem that I'm facing right now is that I need to convert a data.frame > into a structure of lists. The data.frame has columns in the order of tens > (I need to focus on only three of them) and rows in the order of millions. > So it's quite a big dataset. > Let say that the columns of interest are A, B and C. I need to take the > data.frame and construct a structure of list where I have a list for every > level of A, those list all contain lists for every levels of B, and the > 'b-lists' contains all the values of C that match the corresponding levels > of A and B. > So, I should be able to write something like this: >> MyData@list_structure$x_level_of_A$y_level_of_B > and get a vector of the values of C that were on rows where A=x_level_of_A > and B=y_level_of_B. > > My first attempt was to use two imbricated "lapply" functions running > something like this: > > list_structure<-lapply(levels(A) function(x) { > as.character(x) = lapply( levels(B), function(y) { > as.character(y) = C[A==x & B==y] > }) > }) > > The real code was not quite as simple, but I managed to have it work, and it > worked well on my first dataset (where A and B had only few levels). I was > quite happy... but the imbricated loops killed me on a second dataset where > A had several thousand levels. So I tried something else. > > My second attempt was to go through every row of the data.frame and append > the value to the appropriate vector. > > I first initialized a structure of lists ending with NULL vector, then I did > something like this: > > for (i in 1:nrow(DataFrame)) { > eval( > substitute( > append(MyData@list_structure$a_value$b_value, c_value), > list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), > c_value=as.character(DF$C[i])) > ) > ) > } > > This works... but way too slowly for my purpose. > > I would like to know if there is a better road to take to do this > transformation. Or, if there is a way of speeding one of the two solutions > that I have tried. > > Thank you very much for your help! > > (And in your replies, please remember that this is my first project in R, so > don't hesitate to state the obvious if it seems like I am missing it!) > > Frederic > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
I would use the tapply function (which is designed for the case in which data exists for most pairs of the levels of A and B) or the reshape::sparseby function, or something else in the reshape package. These won't give you exactly the structure you were asking for, but they will separate the data properly. By the way, it's a good idea when posting a question to post a simple example; then other solutions can be illustrated on the same example. It doesn't need to contain millions of rows. Duncan Murdoch On 11-08-09 8:58 PM, Frederic F wrote: > Hello, > > This is my first project in R, so I'm trying to work 'the R way', but it > still feels awkward sometimes. > > The problem that I'm facing right now is that I need to convert a data.frame > into a structure of lists. The data.frame has columns in the order of tens > (I need to focus on only three of them) and rows in the order of millions. > So it's quite a big dataset. > Let say that the columns of interest are A, B and C. I need to take the > data.frame and construct a structure of list where I have a list for every > level of A, those list all contain lists for every levels of B, and the > 'b-lists' contains all the values of C that match the corresponding levels > of A and B. > So, I should be able to write something like this: >> MyData@list_structure$x_level_of_A$y_level_of_B > and get a vector of the values of C that were on rows where A=x_level_of_A > and B=y_level_of_B. > > My first attempt was to use two imbricated "lapply" functions running > something like this: > > list_structure<-lapply(levels(A) function(x) { >as.character(x) = lapply( levels(B), function(y) { > as.character(y) = C[A==x& B==y] >}) > }) > > The real code was not quite as simple, but I managed to have it work, and it > worked well on my first dataset (where A and B had only few levels). I was > quite happy... but the imbricated loops killed me on a second dataset where > A had several thousand levels. So I tried something else. > > My second attempt was to go through every row of the data.frame and append > the value to the appropriate vector. > > I first initialized a structure of lists ending with NULL vector, then I did > something like this: > > for (i in 1:nrow(DataFrame)) { >eval( > substitute( >append(MyData@list_structure$a_value$b_value, c_value), >list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), > c_value=as.character(DF$C[i])) > ) >) > } > > This works... but way too slowly for my purpose. > > I would like to know if there is a better road to take to do this > transformation. Or, if there is a way of speeding one of the two solutions > that I have tried. > > Thank you very much for your help! > > (And in your replies, please remember that this is my first project in R, so > don't hesitate to state the obvious if it seems like I am missing it!) > > Frederic > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to quickly convert a data.frame into a structure of lists
Hi Something to get you started ? as.list a data.frame can be regarded as a 2 dimensional array of list vectors df = data.frame(a=1:2,b=2:1,c=4:5,d=9:10) as.list(df[,1:3]) $a [1] 1 2 $b [1] 2 1 $c [1] 4 5 see also http://cran.ms.unimelb.edu.au/doc/contrib/Burns-unwilling_S.pdf Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mac...@northnet.com.au At 10:58 10/08/2011, you wrote: Hello, This is my first project in R, so I'm trying to work 'the R way', but it still feels awkward sometimes. The problem that I'm facing right now is that I need to convert a data.frame into a structure of lists. The data.frame has columns in the order of tens (I need to focus on only three of them) and rows in the order of millions. So it's quite a big dataset. Let say that the columns of interest are A, B and C. I need to take the data.frame and construct a structure of list where I have a list for every level of A, those list all contain lists for every levels of B, and the 'b-lists' contains all the values of C that match the corresponding levels of A and B. So, I should be able to write something like this: > MyData@list_structure$x_level_of_A$y_level_of_B and get a vector of the values of C that were on rows where A=x_level_of_A and B=y_level_of_B. My first attempt was to use two imbricated "lapply" functions running something like this: list_structure<-lapply(levels(A) function(x) { as.character(x) = lapply( levels(B), function(y) { as.character(y) = C[A==x & B==y] }) }) The real code was not quite as simple, but I managed to have it work, and it worked well on my first dataset (where A and B had only few levels). I was quite happy... but the imbricated loops killed me on a second dataset where A had several thousand levels. So I tried something else. My second attempt was to go through every row of the data.frame and append the value to the appropriate vector. I first initialized a structure of lists ending with NULL vector, then I did something like this: for (i in 1:nrow(DataFrame)) { eval( substitute( append(MyData@list_structure$a_value$b_value, c_value), list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), c_value=as.character(DF$C[i])) ) ) } This works... but way too slowly for my purpose. I would like to know if there is a better road to take to do this transformation. Or, if there is a way of speeding one of the two solutions that I have tried. Thank you very much for your help! (And in your replies, please remember that this is my first project in R, so don't hesitate to state the obvious if it seems like I am missing it!) Frederic -- View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to quickly convert a data.frame into a structure of lists
Hello, This is my first project in R, so I'm trying to work 'the R way', but it still feels awkward sometimes. The problem that I'm facing right now is that I need to convert a data.frame into a structure of lists. The data.frame has columns in the order of tens (I need to focus on only three of them) and rows in the order of millions. So it's quite a big dataset. Let say that the columns of interest are A, B and C. I need to take the data.frame and construct a structure of list where I have a list for every level of A, those list all contain lists for every levels of B, and the 'b-lists' contains all the values of C that match the corresponding levels of A and B. So, I should be able to write something like this: > MyData@list_structure$x_level_of_A$y_level_of_B and get a vector of the values of C that were on rows where A=x_level_of_A and B=y_level_of_B. My first attempt was to use two imbricated "lapply" functions running something like this: list_structure<-lapply(levels(A) function(x) { as.character(x) = lapply( levels(B), function(y) { as.character(y) = C[A==x & B==y] }) }) The real code was not quite as simple, but I managed to have it work, and it worked well on my first dataset (where A and B had only few levels). I was quite happy... but the imbricated loops killed me on a second dataset where A had several thousand levels. So I tried something else. My second attempt was to go through every row of the data.frame and append the value to the appropriate vector. I first initialized a structure of lists ending with NULL vector, then I did something like this: for (i in 1:nrow(DataFrame)) { eval( substitute( append(MyData@list_structure$a_value$b_value, c_value), list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), c_value=as.character(DF$C[i])) ) ) } This works... but way too slowly for my purpose. I would like to know if there is a better road to take to do this transformation. Or, if there is a way of speeding one of the two solutions that I have tried. Thank you very much for your help! (And in your replies, please remember that this is my first project in R, so don't hesitate to state the obvious if it seems like I am missing it!) Frederic -- View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.