-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Stephan Dlugosz
Sent: Thursday, November 19, 2009 7:03 AM
To: r-help@r-project.org
Subject: [R] Efficient cbind of elements from two lists
Hi!
I have a data.frame data and splitted it.
data - split(data, data[,1])
This is a quite slow procedure; and I do not want to do it again. So,
any unsplit and resplit is no option for me.
But: I have to cbind variables to the splitted data from
another list,
that contains of vectors with matching sizes, so
for (i in 1:length(data)) {
data[[i]] - cbind(data[[i]], l[[i]]))
}
works well; but very, very slowly.
The lapply solution:
data - lapply(1:k, function(i) cbind(data[[i]], l[[i]]))
does not improve the situation, but allows for mclapply from the
multicore package...
Is there a more efficient way to combine elements from two lists?
Can you restructure your analysis so you don't need
to split the data.frame itself? I'm assuming the split
was slow because there are a lot of groups. Splitting
a data.frame into lots of pieces is considerably slower
than splitting a few numeric or character columns in it.
df - data.frame(group=rep(1:1e5, each=2), score=1:2e5)
system.time(split(df, df$group)) # split entire data.frame into 1e5
parts
user system elapsed
117.32 38.42 154.34
system.time(split(df$score, df$group)) # split 2nd column into 1e5
parts
user system elapsed
0.430.030.46
If R does things the way S+ does this is because splitting
simple vectors is done in C code but splitting data.frames
invokes the S-language [.data.frame function, which is
relatively slow when selecting rows from a data.frame.
I'd suggest using ave() (or a function from the plyr package),
working on columns from your data.frame and adding ave's
output as a column in your big data.frame. E.g., to compute
the average score in each group
system.time(df$meanScore - ave(df$score, df$group, FUN=mean))
user system elapsed
3.370.003.50
df[1:6,]
group score meanScore
1 1 1 1.5
2 1 2 1.5
3 2 3 3.5
4 2 4 3.5
5 3 5 5.5
6 3 6 5.5
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
Thank you very much!
Greetings,
Stephan
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.