Suppose I have something like the following dataframe:
samp1 <- c(60,50,20,90)
samp2 <- c(60,60,90,58)
samp3 <- c(25,65,65,90)
test <- data.frame(samp1,samp2,samp3)
I want to calculate column means. Easy enough, if I want to use all the
data within each column:
print(colMeans(test),na.rm = TRUE)
However, I'm danged if I can figure out how to do the same thing after
dropping the minimum value for each column. For example, column 1 in the
dataframe test consists of 60, 50,20,90. I want to calculate the mean
over (60,50,90), dropping the minimum value (20). Figuring out what the
minimum value is in a single column is easy, but I can't figure out how
to arm-twist colMeans into 'applying itself' to the elements of a column
greater than the minimum, for each column in turn. I've tried
permutations of select, subset etc., to no avail. Only thing I can think
of is to (i) find the minimum in a column, (ii) change it to NA, and
then (iii) tell colMeans to na.rm = TRUE):
test2 <- test
for (i in 1:ncol(test)) { test2[which.min(test[,i]),i]==NA}
print(test2)
print(colMeans(test2),na.rm = TRUE)
While this works, seems awfully 'clunky' -- is there a better way?
Thanks in advance...
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.