I would like to clarify my previous email about using data.table.
imagine the following data.frame called "data":
a b c d e
1 12 15 65 6
1 65 85 36 5
2 69 84 35 8
2 45 78 65 8
I want to aggregate the rows of columns b:d by the rows of column a.
the aggregation is sum(col[b:d]/sum(col[e]).
For this I am using a data.table with a loop of the form:
##########################################
ColNames<-colnames(data) #gets the names of the columns
x=ncol(data)-1 #number of columns to process minus the last column.
data<-data.table(data) #converts to data.table
for (z in 2:x) #I start the loop in the second column and finish in column d
{
outputdata<-data[, sum(get(ColNames[z]))/sum(e), by="a"]
}
############################################
this works fine but the function "get" slowdown the aggregation of the
rows by about 20 times. I wonder if there is an alternative fucntion
to "get" or an alternative way to aggregate all columns at once. I am
reading into the function .SD but have not yet figure out how to put
more than one operation in the function.
right now I have:
###############
outputdata=data[, lapply(.SD, sum), by="a", .SDcols=2:x]
##############
this later code aggregates all columns at once but only by summing.
eventually I need to divide the sum of each column by the sum of
column e as well.
ANy help will be greatly appreciate.
Thanks,
Camilo
Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone: Country code: 57
Provider code: 313
Phone 776 2282
From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.