Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread William Dunlap
Here is code to transform the matrix that by() or array(split())
produces, along with an example of the speed of the various
approaches.  Using split(), either directly or via by() or tapply(),
saves a lot of time.

f0 <- function(df) {
# original code with typos fixed.
list_structure <- lapply(levels(df$A), function(levelA) {
lapply(levels(df$B), function(levelB) {df$C[df$A==levelA & 
df$B==levelB]})
})
# Apply the names:
names(list_structure)<-levels(df$A)
for (i in 1:length(list_structure)) {
names(list_structure[[i]])<-levels(df$B)
}
list_structure
}

f0a <- function(df) {
# slightly faster version of f0, taking repeated
# calculations out of loops.
A <- df$A
B <- df$B
C <- df$C
levelsA <- structure(levels(A), names=levels(A))
levelsB <- structure(levels(B), names=levels(B))
lapply(levelsA, function(levelA) {
tmpA <- A == levelA # this is responsible for most of the savings
lapply(levelsB, function(levelB) {C[tmpA & B==levelB]})
})
}

f1 <- function(df) {
# DM's code
by(df$C, df[,1:2], identity)
}

f2 <- function(df) {
# WD's code
AB<- df[c("A","B")]
array(split(df$C, AB), dim=sapply(AB, nlevels), dimnames=sapply(AB, levels))
}

matrix2ListOfRows <- function(mat) {
# convert a matrix to a list of its rows, converting dimnames to names.
retval <- structure(as.vector(mat), names=rep(colnames(mat), 
each=nrow(mat)))
retval <- split(retval, row(mat))
names(retval) <- rownames(mat)
retval
}

The test involves 10^5 rows of data with 26 levels for A and 200 for B.

> r200 <- as.character(as.roman(1:200))
> set.seed(1)
> df <- data.frame(A=factor(sample(letters, size=1e5, replace=TRUE), 
> levels=letters),
+  B=factor(sample(r200, size=1e5, replace=TRUE), levels=r200),
+  C=1:1e5)
> system.time(ls0 <- f0(df))
   user  system elapsed 
  74.082.34   76.60 
> system.time(ls0a <- f0a(df))
   user  system elapsed 
  43.090.44   43.73 
> all.equal(ls0, ls0a)
[1] TRUE
> system.time(ls2 <- matrix2ListOfRows(f2(df)))
   user  system elapsed 
   0.090.020.11 
> all.equal(ls0, ls2)
[1] TRUE
> system.time(ls1 <- matrix2ListOfRows(f1(df)))
   user  system elapsed 
   0.690.000.69 
> all.equal(ls0, ls1)
[1] TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of William Dunlap
> Sent: Wednesday, August 10, 2011 10:05 AM
> To: Duncan Murdoch; Frederic F
> Cc: r-help@r-project.org
> Subject: Re: [R] How to quickly convert a data.frame into a structure of lists
> 
> I was going to suggest
>   > AB <- df[c("A","B")]
>   > ls2 <- array(split(df$C, AB), dim=sapply(AB, nlevels), 
> dimnames=sapply(AB, levels))
> which produces a matrix very similar to what Duncan's by() call produces
>   > ls1 <- by(df$C, df[,1:2], identity)
> E.g.,
>   > ls2[["a","X"]]
>   [1] 1 2
>   > ls1[["a","X"]]
>   [1] 1 2
>   > ls1[["a","Y"]] # by assigns NULL to unoccupied slots
>   NULL
>   > ls2[["a","Y"]] # split gives the same type to all slots, copied from input
>   numeric(0)
> 
> They both are quick because they use split() to avoid the repeated
> evaluations of
>   bigVector[ anotherBigVector == scalar ]
> that your nested (not imbricated) loops do.  If you really need to convert
> the matrix to a list of lists that will probably be a quick transformation.
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> > Behalf Of Duncan Murdoch
> > Sent: Wednesday, August 10, 2011 9:43 AM
> > To: Frederic F
> > Cc: r-help@r-project.org
> > Subject: Re: [R] How to quickly convert a data.frame into a structure of 
> > lists
> >
> > On 10/08/2011 10:30 AM, Frederic F wrote:
> > > Hello Duncan,
> > >
> > > Here is a small example to illustrate what I am trying to do.
> > >
> > > # Example data.frame
> > > df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))
> > > #   A B C
> > > # 1 a X 1
> > > # 2 a X 2
> > > # 3 b Y 3
> > > # 4 b Z 4
> > >
> > > ### First way of getting the list structure (ls1) using imbricated lapply
> > > loops:
> > > # Get the 

Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread William Dunlap
I was going to suggest
  > AB <- df[c("A","B")]
  > ls2 <- array(split(df$C, AB), dim=sapply(AB, nlevels), dimnames=sapply(AB, 
levels))
which produces a matrix very similar to what Duncan's by() call produces
  > ls1 <- by(df$C, df[,1:2], identity)
E.g.,
  > ls2[["a","X"]]
  [1] 1 2
  > ls1[["a","X"]]
  [1] 1 2
  > ls1[["a","Y"]] # by assigns NULL to unoccupied slots
  NULL
  > ls2[["a","Y"]] # split gives the same type to all slots, copied from input
  numeric(0)

They both are quick because they use split() to avoid the repeated
evaluations of
  bigVector[ anotherBigVector == scalar ]
that your nested (not imbricated) loops do.  If you really need to convert
the matrix to a list of lists that will probably be a quick transformation.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Duncan Murdoch
> Sent: Wednesday, August 10, 2011 9:43 AM
> To: Frederic F
> Cc: r-help@r-project.org
> Subject: Re: [R] How to quickly convert a data.frame into a structure of lists
> 
> On 10/08/2011 10:30 AM, Frederic F wrote:
> > Hello Duncan,
> >
> > Here is a small example to illustrate what I am trying to do.
> >
> > # Example data.frame
> > df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))
> > #   A B C
> > # 1 a X 1
> > # 2 a X 2
> > # 3 b Y 3
> > # 4 b Z 4
> >
> > ### First way of getting the list structure (ls1) using imbricated lapply
> > loops:
> > # Get the structure and populate it:
> > ls1<-lapply(levels(df$A), function(levelA) {
> >lapply(levels(df$B), function(levelB) {df$C[df$A==levelA&
> > df$B==levelB]})
> > })
> > # Apply the names:
> > names(list_structure)<-levels(df$A)
> > for (i in 1:length(list_structure))
> > {names(list_structure[[i]])<-levels(df$B)}
> >
> > # Result:
> > ls1$a$X
> > # [1] 1 2
> > ls1$b$Z
> > # [1] 4
> >
> > The data.frame will always be 'complete', i.e., there will be a value in
> > every row for every column.
> > I want to produce a structure like this one quickly (I aim at something
> > below 10 seconds) for a dataset containing between 1 and 2 millions of rows.
> >
> 
> I don't know what the timing would be like for your real data, but this
> does look like by() would work:
> 
> ls1 <- by(df$C, df[,1:2], identity)
> 
> When I repeat the rows of df a million times each, this finishes in a
> few seconds.  It would definitely be slower if there were more levels of
> A or B.
> 
> Now ls1 will be a matrix whose entries are the subsets of C that you
> want, so you can see your two results with slightly different syntax:
> 
>  > ls1[["a", "X"]]
> [1] 1 2
>  > ls1[["b","Z"]]
> [1] 4
> 
> Duncan Murdoch
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread Eik Vettorazzi
Hi Frederic,
shouldn't there be an result for the 3rd row as well, eg ls1$b$Y?

Maybe this will do what you want?

dtf<-within(dtf,index<-factor(A:B))
tapply(dtf$C,dtf$index,list)

Hth.

Am 10.08.2011 16:30, schrieb Frederic F:
> Hello Duncan,  
> 
> Here is a small example to illustrate what I am trying to do.
> 
> # Example data.frame
> df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) 
> #   A B C
> # 1 a X 1
> # 2 a X 2
> # 3 b Y 3
> # 4 b Z 4
> 
> ### First way of getting the list structure (ls1) using imbricated lapply
> loops:
> # Get the structure and populate it:
> ls1<-lapply(levels(df$A), function(levelA) { 
>   lapply(levels(df$B), function(levelB) {df$C[df$A==levelA &
> df$B==levelB]})
> })
> # Apply the names:
> names(list_structure)<-levels(df$A)
> for (i in 1:length(list_structure))
> {names(list_structure[[i]])<-levels(df$B)}
> 
> # Result:
> ls1$a$X
> # [1] 1 2
> ls1$b$Z
> # [1] 4
> 
> The data.frame will always be 'complete', i.e., there will be a value in
> every row for every column. 
> I want to produce a structure like this one quickly (I aim at something
> below 10 seconds) for a dataset containing between 1 and 2 millions of rows. 
> 
> I hope that this helps clarify things.
> 
> Thanks for your help,
> 
> Frederic 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3733073.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread Duncan Murdoch

On 10/08/2011 10:30 AM, Frederic F wrote:

Hello Duncan,

Here is a small example to illustrate what I am trying to do.

# Example data.frame
df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))
#   A B C
# 1 a X 1
# 2 a X 2
# 3 b Y 3
# 4 b Z 4

### First way of getting the list structure (ls1) using imbricated lapply
loops:
# Get the structure and populate it:
ls1<-lapply(levels(df$A), function(levelA) {
   lapply(levels(df$B), function(levelB) {df$C[df$A==levelA&
df$B==levelB]})
})
# Apply the names:
names(list_structure)<-levels(df$A)
for (i in 1:length(list_structure))
{names(list_structure[[i]])<-levels(df$B)}

# Result:
ls1$a$X
# [1] 1 2
ls1$b$Z
# [1] 4

The data.frame will always be 'complete', i.e., there will be a value in
every row for every column.
I want to produce a structure like this one quickly (I aim at something
below 10 seconds) for a dataset containing between 1 and 2 millions of rows.



I don't know what the timing would be like for your real data, but this 
does look like by() would work:


ls1 <- by(df$C, df[,1:2], identity)

When I repeat the rows of df a million times each, this finishes in a 
few seconds.  It would definitely be slower if there were more levels of 
A or B.


Now ls1 will be a matrix whose entries are the subsets of C that you 
want, so you can see your two results with slightly different syntax:


> ls1[["a", "X"]]
[1] 1 2
> ls1[["b","Z"]]
[1] 4

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread Frederic F
Hello Denis,

> To borrow shamelessly from one of the prominent helpers on this list:
> "What is the problem you're trying to solve?"    (attribution: Jim Holtman)

I'm trying to connect two sets of legacy R tools: the output of the
first one can be transformed in a data.frame without loss of
information, the input of the second one takes the form of a structure
of list.

>  it's entirely possible
> that there may be a nice 'R way' to do it. Read the posting guide and
> if at all possible, provide a small, reproducible example that
> demonstrates what you want to accomplish.

Here is the first way attacked the problems illustrated on a tiny
dataset (this way does not work quickly enough on a real dataset
unfortunately):

df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))

# Get the structure and populate it:
ls1<-lapply(levels(df$A), function(levelA) {
  lapply(levels(df$B), function(levelB) {df$C[df$A==levelA & df$B==levelB]})
})
# Get the names:
names(list_structure)<-levels(df$A)
for (i in 1:length(list_structure)) {names(list_structure[[i]])<-levels(df$B)}

# Results:
ls1$a$X
# [1] 1 2
ls1$b$Z
# [1] 4

Thanks for your help,

Frederic


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3733114.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread Frederic F
Hello Duncan,  

Here is a small example to illustrate what I am trying to do.

# Example data.frame
df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) 
#   A B C
# 1 a X 1
# 2 a X 2
# 3 b Y 3
# 4 b Z 4

### First way of getting the list structure (ls1) using imbricated lapply
loops:
# Get the structure and populate it:
ls1<-lapply(levels(df$A), function(levelA) { 
  lapply(levels(df$B), function(levelB) {df$C[df$A==levelA &
df$B==levelB]})
})
# Apply the names:
names(list_structure)<-levels(df$A)
for (i in 1:length(list_structure))
{names(list_structure[[i]])<-levels(df$B)}

# Result:
ls1$a$X
# [1] 1 2
ls1$b$Z
# [1] 4

The data.frame will always be 'complete', i.e., there will be a value in
every row for every column. 
I want to produce a structure like this one quickly (I aim at something
below 10 seconds) for a dataset containing between 1 and 2 millions of rows. 

I hope that this helps clarify things.

Thanks for your help,

Frederic 

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3733073.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread Dennis Murphy
To borrow shamelessly from one of the prominent helpers on this list:

"What is the problem you're trying to solve?"(attribution: Jim Holtman)

I have the sense you want to do something over many subsets of your
data frame. If so, breaking things up into lists of lists of lists is
not necessarily productive, nor may it be necessary to use loops
explicitly, depending on the nature of what you want to do. If you're
more explicit about the nature of your task, it's entirely possible
that there may be a nice 'R way' to do it. Read the posting guide and
if at all possible, provide a small, reproducible example that
demonstrates what you want to accomplish.
(See ?dput to learn how to transmit data by e-mail.)

HTH,
Dennis

On Tue, Aug 9, 2011 at 5:58 PM, Frederic F  wrote:
> Hello,
>
> This is my first project in R, so I'm trying to work 'the R way', but it
> still feels awkward sometimes.
>
> The problem that I'm facing right now is that I need to convert a data.frame
> into a structure of lists. The data.frame has columns in the order of tens
> (I need to focus on only three of them) and rows in the order of millions.
> So it's quite a big dataset.
> Let say that the columns of interest are A, B and C. I need to take the
> data.frame and construct a structure of list where I have a list for every
> level of A, those list all contain lists for every levels of B, and the
> 'b-lists' contains all the values of C that match the corresponding levels
> of A and B.
> So, I should be able to write something like this:
>> MyData@list_structure$x_level_of_A$y_level_of_B
> and get a vector of the values of C that were on rows where A=x_level_of_A
> and B=y_level_of_B.
>
> My first attempt was to use two imbricated "lapply" functions running
> something like this:
>
> list_structure<-lapply(levels(A) function(x) {
>  as.character(x) = lapply( levels(B), function(y) {
>    as.character(y) = C[A==x & B==y]
>  })
> })
>
> The real code was not quite as simple, but I managed to have it work, and it
> worked well on my first dataset (where A and B had only few levels). I was
> quite happy... but the imbricated loops killed me on a second dataset where
> A had several thousand levels. So I tried something else.
>
> My second attempt was to go through every row of the data.frame and append
> the value to the appropriate vector.
>
> I first initialized a structure of lists ending with NULL vector, then I did
> something like this:
>
> for (i in 1:nrow(DataFrame)) {
>  eval(
>    substitute(
>      append(MyData@list_structure$a_value$b_value, c_value),
>      list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]),
> c_value=as.character(DF$C[i]))
>    )
>  )
> }
>
> This works... but way too slowly for my purpose.
>
> I would like to know if there is a better road to take to do this
> transformation. Or, if there is a way of speeding one of the two solutions
> that I have tried.
>
> Thank you very much for your help!
>
> (And in your replies, please remember that this is my first project in R, so
> don't hesitate to state the obvious if it seems like I am missing it!)
>
> Frederic
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-10 Thread Duncan Murdoch
I would use the tapply function (which is designed for the case in which 
data exists for most pairs of the levels of A and B) or the 
reshape::sparseby function, or something else in the reshape package. 
These won't give you exactly the structure you were asking for, but they 
will separate the data properly.


By the way, it's a good idea when posting a question to post a simple 
example; then other solutions can be illustrated on the same example. 
It doesn't need to contain millions of rows.


Duncan Murdoch

On 11-08-09 8:58 PM, Frederic F wrote:
> Hello,
>
> This is my first project in R, so I'm trying to work 'the R way', but it
> still feels awkward sometimes.
>
> The problem that I'm facing right now is that I need to convert a 
data.frame
> into a structure of lists. The data.frame has columns in the order of 
tens
> (I need to focus on only three of them) and rows in the order of 
millions.

> So it's quite a big dataset.
> Let say that the columns of interest are A, B and C. I need to take the
> data.frame and construct a structure of list where I have a list for 
every

> level of A, those list all contain lists for every levels of B, and the
> 'b-lists' contains all the values of C that match the corresponding 
levels

> of A and B.
> So, I should be able to write something like this:
>> MyData@list_structure$x_level_of_A$y_level_of_B
> and get a vector of the values of C that were on rows where 
A=x_level_of_A

> and B=y_level_of_B.
>
> My first attempt was to use two imbricated "lapply" functions running
> something like this:
>
> list_structure<-lapply(levels(A) function(x) {
>as.character(x) = lapply( levels(B), function(y) {
>  as.character(y) = C[A==x&  B==y]
>})
> })
>
> The real code was not quite as simple, but I managed to have it work, 
and it
> worked well on my first dataset (where A and B had only few levels). 
I was
> quite happy... but the imbricated loops killed me on a second dataset 
where

> A had several thousand levels. So I tried something else.
>
> My second attempt was to go through every row of the data.frame and 
append

> the value to the appropriate vector.
>
> I first initialized a structure of lists ending with NULL vector, 
then I did

> something like this:
>
> for (i in 1:nrow(DataFrame)) {
>eval(
>  substitute(
>append(MyData@list_structure$a_value$b_value, c_value),
>list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]),
> c_value=as.character(DF$C[i]))
>  )
>)
> }
>
> This works... but way too slowly for my purpose.
>
> I would like to know if there is a better road to take to do this
> transformation. Or, if there is a way of speeding one of the two 
solutions

> that I have tried.
>
> Thank you very much for your help!
>
> (And in your replies, please remember that this is my first project 
in R, so

> don't hesitate to state the obvious if it seems like I am missing it!)
>
> Frederic
>
> --
> View this message in context: 
http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html

> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to quickly convert a data.frame into a structure of lists

2011-08-09 Thread Duncan Mackay

Hi

Something to get you started
? as.list
a data.frame can be regarded as a 2 dimensional array of list vectors

df = data.frame(a=1:2,b=2:1,c=4:5,d=9:10)
as.list(df[,1:3])
$a
[1] 1 2

$b
[1] 2 1

$c
[1] 4 5

see also
http://cran.ms.unimelb.edu.au/doc/contrib/Burns-unwilling_S.pdf

Regards

Duncan


Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email: home mac...@northnet.com.au

At 10:58 10/08/2011, you wrote:

Hello,

This is my first project in R, so I'm trying to work 'the R way', but it
still feels awkward sometimes.

The problem that I'm facing right now is that I need to convert a data.frame
into a structure of lists. The data.frame has columns in the order of tens
(I need to focus on only three of them) and rows in the order of millions.
So it's quite a big dataset.
Let say that the columns of interest are A, B and C. I need to take the
data.frame and construct a structure of list where I have a list for every
level of A, those list all contain lists for every levels of B, and the
'b-lists' contains all the values of C that match the corresponding levels
of A and B.
So, I should be able to write something like this:
> MyData@list_structure$x_level_of_A$y_level_of_B
and get a vector of the values of C that were on rows where A=x_level_of_A
and B=y_level_of_B.

My first attempt was to use two imbricated "lapply" functions running
something like this:

list_structure<-lapply(levels(A) function(x) {
  as.character(x) = lapply( levels(B), function(y) {
as.character(y) = C[A==x & B==y]
  })
})

The real code was not quite as simple, but I managed to have it work, and it
worked well on my first dataset (where A and B had only few levels). I was
quite happy... but the imbricated loops killed me on a second dataset where
A had several thousand levels. So I tried something else.

My second attempt was to go through every row of the data.frame and append
the value to the appropriate vector.

I first initialized a structure of lists ending with NULL vector, then I did
something like this:

for (i in 1:nrow(DataFrame)) {
  eval(
substitute(
  append(MyData@list_structure$a_value$b_value, c_value),
  list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]),
c_value=as.character(DF$C[i]))
)
  )
}

This works... but way too slowly for my purpose.

I would like to know if there is a better road to take to do this
transformation. Or, if there is a way of speeding one of the two solutions
that I have tried.

Thank you very much for your help!

(And in your replies, please remember that this is my first project in R, so
don't hesitate to state the obvious if it seems like I am missing it!)

Frederic

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to quickly convert a data.frame into a structure of lists

2011-08-09 Thread Frederic F
Hello,

This is my first project in R, so I'm trying to work 'the R way', but it
still feels awkward sometimes.

The problem that I'm facing right now is that I need to convert a data.frame
into a structure of lists. The data.frame has columns in the order of tens
(I need to focus on only three of them) and rows in the order of millions.
So it's quite a big dataset. 
Let say that the columns of interest are A, B and C. I need to take the
data.frame and construct a structure of list where I have a list for every
level of A, those list all contain lists for every levels of B, and the
'b-lists' contains all the values of C that match the corresponding levels
of A and B. 
So, I should be able to write something like this:
> MyData@list_structure$x_level_of_A$y_level_of_B
and get a vector of the values of C that were on rows where A=x_level_of_A
and B=y_level_of_B.

My first attempt was to use two imbricated "lapply" functions running
something like this:

list_structure<-lapply(levels(A) function(x) {
  as.character(x) = lapply( levels(B), function(y) {
as.character(y) = C[A==x & B==y]
  })
})

The real code was not quite as simple, but I managed to have it work, and it
worked well on my first dataset (where A and B had only few levels). I was
quite happy... but the imbricated loops killed me on a second dataset where
A had several thousand levels. So I tried something else.

My second attempt was to go through every row of the data.frame and append
the value to the appropriate vector. 

I first initialized a structure of lists ending with NULL vector, then I did
something like this:

for (i in 1:nrow(DataFrame)) {
  eval(
substitute(
  append(MyData@list_structure$a_value$b_value, c_value),
  list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]),
c_value=as.character(DF$C[i]))
)
  )
}

This works... but way too slowly for my purpose. 

I would like to know if there is a better road to take to do this
transformation. Or, if there is a way of speeding one of the two solutions
that I have tried.

Thank you very much for your help!

(And in your replies, please remember that this is my first project in R, so
don't hesitate to state the obvious if it seems like I am missing it!)

Frederic

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.