Hey Everyone!
I´m a quite  new R user .. I found a problem that I'd like to share with you
and help me find a solution.
I have a large txt. file which I opened with read.table command, and what I
understood from many R manuals is that  I have a kind of matrix readed with
read.table,
I've used order() to sort my data and now my problem is: I have a variable
that has many repeated values and  I would like to operate with the row
indexes of "these repeated values": for example, suppose I have:

  var1    var2     …    varN
 122     nnn1    …     1
 213     nnn2    …    2
 422     nnn4    …    2
 432     …        …    3
 441     …        …    4
 500     …        …    4
 550     …        …    4

So I want to obtain a new column where all elements of var1 are added at the
places where varN are repetead ... so for varN=2  the new column correspond
to this element will be 213+422, for varN=4 will be 441+500+550, where there
is no such repeated values obviously there´s nothing to do and varN is the
unique value.
I made a function to do this but is not so good, (I hava a database with
around 1 million rows and 5 columns) actually, this function works for not
so large data:

suma.rep=function(X,Y){
resp=numeric(0)
Z=unique(Y)
for (i in (1:length(Z)))
resp=c(resp,sum(X[which(Y==Z[i])]))
return(resp)}

When I  run this function with my large data, R appears calculating and I
think it would take so long to make my new required column.(maybe 4 days)
Question1: I "feel" that maybe there's a command that could help me to do
this "simple" operation more elegant, I googled it but I couldnt find... Is
there any such a command?
Question2: Is a good idea to handle large data bases files with  R, as in my
example?

Thank you so much for your help.
Christian Paúl

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to