Hey Everyone! I´m a quite new R user .. I found a problem that I'd like to share with you and help me find a solution. I have a large txt. file which I opened with read.table command, and what I understood from many R manuals is that I have a kind of matrix readed with read.table, I've used order() to sort my data and now my problem is: I have a variable that has many repeated values and I would like to operate with the row indexes of "these repeated values": for example, suppose I have:
var1 var2 varN 122 nnn1 1 213 nnn2 2 422 nnn4 2 432 3 441 4 500 4 550 4 So I want to obtain a new column where all elements of var1 are added at the places where varN are repetead ... so for varN=2 the new column correspond to this element will be 213+422, for varN=4 will be 441+500+550, where there is no such repeated values obviously there´s nothing to do and varN is the unique value. I made a function to do this but is not so good, (I hava a database with around 1 million rows and 5 columns) actually, this function works for not so large data: suma.rep=function(X,Y){ resp=numeric(0) Z=unique(Y) for (i in (1:length(Z))) resp=c(resp,sum(X[which(Y==Z[i])])) return(resp)} When I run this function with my large data, R appears calculating and I think it would take so long to make my new required column.(maybe 4 days) Question1: I "feel" that maybe there's a command that could help me to do this "simple" operation more elegant, I googled it but I couldnt find... Is there any such a command? Question2: Is a good idea to handle large data bases files with R, as in my example? Thank you so much for your help. Christian Paúl [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.