[R] Efficient cbind of elements from two lists

2009-11-19 Thread Stephan Dlugosz

Hi!

I have a data.frame data and splitted it.

data - split(data, data[,1])

This is a quite slow procedure; and I do not want to do it again. So, 
any unsplit and resplit is no option for me.
But: I have to cbind variables to the splitted data from another list, 
that contains of vectors with matching sizes, so


for (i in 1:length(data)) {
  data[[i]] - cbind(data[[i]], l[[i]]))
}

works well; but very, very slowly.
The lapply solution:

data - lapply(1:k, function(i) cbind(data[[i]], l[[i]]))

does not improve the situation, but allows for mclapply from the 
multicore package...

Is there a more efficient way to combine elements from two lists?

Thank you very much!

Greetings,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient cbind of elements from two lists

2009-11-19 Thread Jorge Ivan Velez
Dear Stephan,

Here is a suggestion using do.call():

res - do.call(cbind, yourlist)
res

HTH,
Jorge


On Thu, Nov 19, 2009 at 10:03 AM, Stephan Dlugosz  wrote:

 Hi!

 I have a data.frame data and splitted it.

 data - split(data, data[,1])

 This is a quite slow procedure; and I do not want to do it again. So, any
 unsplit and resplit is no option for me.
 But: I have to cbind variables to the splitted data from another list,
 that contains of vectors with matching sizes, so

 for (i in 1:length(data)) {
  data[[i]] - cbind(data[[i]], l[[i]]))
 }

 works well; but very, very slowly.
 The lapply solution:

 data - lapply(1:k, function(i) cbind(data[[i]], l[[i]]))

 does not improve the situation, but allows for mclapply from the multicore
 package...
 Is there a more efficient way to combine elements from two lists?

 Thank you very much!

 Greetings,
 Stephan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient cbind of elements from two lists

2009-11-19 Thread William Dunlap
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Stephan Dlugosz
 Sent: Thursday, November 19, 2009 7:03 AM
 To: r-help@r-project.org
 Subject: [R] Efficient cbind of elements from two lists
 
 Hi!
 
 I have a data.frame data and splitted it.
 
 data - split(data, data[,1])
 
 This is a quite slow procedure; and I do not want to do it again. So, 
 any unsplit and resplit is no option for me.
 But: I have to cbind variables to the splitted data from 
 another list, 
 that contains of vectors with matching sizes, so
 
 for (i in 1:length(data)) {
data[[i]] - cbind(data[[i]], l[[i]]))
 }
 
 works well; but very, very slowly.
 The lapply solution:
 
 data - lapply(1:k, function(i) cbind(data[[i]], l[[i]]))
 
 does not improve the situation, but allows for mclapply from the 
 multicore package...
 Is there a more efficient way to combine elements from two lists?

Can you restructure your analysis so you don't need
to split the data.frame itself?  I'm assuming the split
was slow because there are a lot of groups.  Splitting
a data.frame into lots of pieces is considerably slower
than splitting a few numeric or character columns in it.

   df - data.frame(group=rep(1:1e5, each=2), score=1:2e5)
   system.time(split(df, df$group)) # split entire data.frame into 1e5
parts
 user  system elapsed 
   117.32   38.42  154.34 
   system.time(split(df$score, df$group)) # split 2nd column into 1e5
parts
 user  system elapsed 
 0.430.030.46 

If R does things the way S+ does this is because splitting
simple vectors is done in C code but splitting data.frames
invokes the S-language [.data.frame function, which is
relatively slow when selecting rows from a data.frame.

I'd suggest using ave() (or a function from the plyr package),
working on columns from your data.frame and adding ave's
output as a column in your big data.frame.  E.g., to compute
the average score in each group
   system.time(df$meanScore - ave(df$score, df$group, FUN=mean))
 user  system elapsed 
 3.370.003.50 
   df[1:6,]
group score meanScore
  1 1 1   1.5
  2 1 2   1.5
  3 2 3   3.5
  4 2 4   3.5
  5 3 5   5.5
  6 3 6   5.5

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
 
 Thank you very much!
 
 Greetings,
 Stephan
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.