Consider a data frame which I name as rwrdatafile. It includes several variables stored in columns. For each variable there are 1000 observations and hence 1000 rows. The interest lies in the values of the second column of this data frame, that is in rwrdatafile[,2]. What I am trying to accomplish is to delete the rows of the data frame if it is the first instance of a unique value in rwrdatafile[,2]. That is, the values stored in rwrdatafile[,2] look like
1 4 4 4 4 4 4 6 6 and the routine should delete 1 (and the other values in that row), the first 4 (and the other values in that row), and the first 6 (and the other values in that row). I did an online search, and indeed there are similar examples, but they did not help for what I am trying to achieve. What is specific to what I am trying to achieve is that the routine should use a for loop. I have written a routine that is not using a for loop and it works fine and I paste it below (Vector-oriented coding in R). I need to write a for loop that accomplishes the same task. In fact, I have written this for loop but it has a problem (Scalar-oriened coding in R pasted below). Note that the data stored in rwrdatafile[,2] has three unique values (there are more but for making the example that does not matter) which are 1, 4, 6. The for loop I have written first determines the number of unique values in rwrdatafile[,2], with length(unique(rwrdatafile[,2])), and uses that number in the sequence of the for loop. The length is 3 so the sequence is 1:3. But there is a catch! When 1 is deleted (and other values row wise), the length decreases to 2 but the for loop attempts 3 and therefore it returns NULL at the end of the loop. Therefore I subtract 1 from the length. But this is not good coding. I wondered about the NULL result and it took me a while to figure out the problem, and worse is that I could have never found the problem. So the for loop here is not reliable because it requires that the user knows that there are multiple instances of the unique values (so multiple instances of 1). How can I fix the problem? The restriction I have is that I need to keep the for loop and it should resemble the for loop I have written for MATLAB (pasted below). The aim is to translate the MATLAB routine as close as possible in R. So I do not want to deviate (much) from the MATLAB version of the code because otherwise I cannot compare the routines while I am teaching this. That is, I need to use a function in the for loop in R that is as close as possible to the find function (with the first option) of MATLAB. # Scalar-oriented coding in R length(unique(rwrdatafile[,2])) for (i in 1:(.Last.value-1)){ rwrdatafile = rwrdatafile[-(which(rwrdatafile[,2] == unique(rwrdatafile[,2])[i])[1]),] } # Vector-oriented coding in R unique(rwrdatafile[,2]) tag = match(.Last.value,rwrdatafile[,2]) rwrdatafile = rwrdatafile[!row.names(rwrdatafile) %in% tag,] # Scalar-oriented coding in MATLAB unique(mwmatfile.data(:,2)); for i = ans' mwmatfile.data(find(mwmatfile.data(:,2) == i,1,'first'),:) = []; end ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.