This is exactly what I needed -- thanks for your help Greg and Gabor. I'm looking forward to replacing a dozen stored procedures, temp tables, and database calls with a one page R script.
Josh On Wed, 12 Jul 2006, Greg Snow wrote: > Gabor, your solution does not take into account the groups. How about > something like: > > iris2 <- iris > iris2$m <- ave(iris2$Sepal.Length, iris2$Species) > iris2$s <- ave(iris2$Sepal.Length, iris2$Species, FUN=sd) > > iris2 <- transform(iris2, z= (Sepal.Length-m)/s) > > iris2.2 <- subset(iris2, abs(z) < 2) > > aggregate(iris2.2, list(iris2.2$Species), FUN=mean) > > > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > [EMAIL PROTECTED] > (801) 408-8111 > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Gabor > Grothendieck > Sent: Tuesday, July 11, 2006 1:06 PM > To: Joshua Tokle > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] R newbie: logical subsets > > Try this, using the built in anscombe data set: > > anscombe[!rowSums(abs(scale(anscombe)) > 2),] > > > > On 7/11/06, Joshua Tokle <[EMAIL PROTECTED]> wrote: >> Hello! I'm a newcomer to R hoping to replace some convoluted database > >> code with an R script. Unfortunately, I haven't been able to figure >> out how to implement the following logic. >> >> Essentially, we have a database of transactions that are coded with a >> geographic locale and a type. These are being loaded into a >> data.frame with named variables city, type, and price. E.g., >> trans$city and all that. >> >> We want to calculate mean prices by city and type, AFTER excluding >> outliers. That is, we want to calculate the mean price in 3 steps: >> >> 1. calculate a mean and standard deviation by city and type over all >> transactions 2. create a subset of the original data frame, excluding >> transactions that differ from the relevant mean by more than 2 >> standard deviations 3. calculate a final mean by city and type based >> on this subset. >> >> I'm stuck on step 2. I would like to do something like the following: >> >> fs <- list(factor(trans$city), factor(trans$type)) means <- >> tapply(trans$price, fs, mean) stdevs <- tapply(trans$price, fs, sd) >> >> filter <- abs(trans$price - means[trans$city, trans$type]) < >> 2*stdevs[trans$city, trans$type] >> >> sub <- subset(trans, filter) >> >> The above code doesn't work. What's the correct way to do this? >> >> Thanks, >> Josh >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html