Hi Noah, I am unclear if the 0s should be standardized or not---I am assuming since you want them excluded from the calculation of the mean and SD, you do not want (0 - M) / sigma. If that is the case, here is an example:
## read in your data ## FYI: providing via dput() would be easier next time d <- read.table(textConnection(" code v1 v2 G1 1.2 2.3 G1 0 2.4 G1 1.4 3.4 G2 2.9 2.3 G2 4.3 4.4"), header = TRUE) closeAllConnections() ## temporary data as a matrix tmp <- as.matrix(d[-1]) ## index 0s and set to missing tmp[index.0 <- which(tmp == 0, arr.ind = TRUE)] <- NA ## scale by column and d$code and pull back to matrix tmp <- do.call("rbind", by(tmp, d$code, scale)) ## NAs back to 0s tmp[index.0] <- 0 d[, 2:3] <- tmp If you want the zeros standardized, it will take a bit of a different approach. The other issue that could come up here is speed, but that can get to be very dataset dependent (e.g., what is most efficient for a few levels of code may not be the same as what is efficient for many columns, etc. That said, it would not take much work to create a parallelized version of what by() is doing, and scale is already vectorized so it works pretty darn fast assuming you pass it a matrix. Cheers, Josh On Sat, Dec 10, 2011 at 1:44 PM, Noah Silverman <noahsilver...@ucla.edu> wrote: > Hi, > > I'm having difficulty coming up with a good way to subest some data to > generate statistics. > > My data frame has multiple observations by group. > > Here is an overly-simplified toy example of the data > ========================== > code v1 v2 > G1 1.2 2.3 > G1 0 2.4 > G1 1.4 3.4 > G2 2.9 2.3 > G2 4.3 4.4 > etc.. > ========================= > > I want to normalize the data *by group* for certain variable. But, I want > to ignore 0 values when calculating the mean and standard deviation. > > What I *want* to do is something like this: > ======================= > for (code in unique (d$code) ){ > mu <- mean( d[which(d[d$code==code,v1] !=0 ), v1] ) > sig <- sd( d[which(d[d$code==code,v1] !=0 ), v1] ) > d[which(d[d$code==code,v1] !=0 ), cname] <- > (d[which(d[d$code==code,v1] !=0 ), v1] - mu) / sig > } > ======================= > > My goal, if it isn't apparent, is to replace values with their normalized > value. (But, the statistics used for normalization are calculated skipping > zero values.) > > This doesn't work as the indexing from the which command is relative (1,2,3, > etc.) > > > Suggestions? > > > > -- > Noah Silverman > UCLA Department of Statistics > 8208 Math Sciences Building > Los Angeles, CA 90095 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.