Thanks, guys.
On Sat, Jun 8, 2013 at 2:17 PM, Neal Fultz <nfu...@gmail.com> wrote: > rowSums and Reduce will have the same problems with bad data you alluded > to earlier, eg > cg = 1, hs = 0 > > But that's something to check for with crosstabs anyway. > > This "wrong data" thing is a distraction here. I guess I'd have to craft 2 solutions, depending on what the researcher says. (We can't assume es = 0 or es = NA and cg = 1 is bad data. There are some people who finish college without doing elementary school (wasn't Albert Einstein one of those?) or high school. I once went to an eye doctor who didn't finish high school, but nonetheless was admitted to optometrist school.) I did not know about the Reduce function before this. If we enforce the ordering and clean up the data in the way you imagine, it would work. I think the pmax is the most teachable and dependably not-getting-wrongable approach if the data is not wrong. > Side note: you should check out the microbenchmark pkg, it's quite handy. > > Perhaps the working example of microbenchmark is the best thing in this thread! I understand the idea behind it, but it seems like I can never get it to work right. It helps to see how you do it. > > R>require(microbenchmark) > R>microbenchmark( > + f1(cg,hs,es), > + f2(cg,hs,es), > + f3(cg,hs,es), > + f4(cg,hs,es) > + ) > Unit: microseconds > expr min lq median uq max neval > f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112 100 > f2(cg, hs, es) 730.665 755.5750 811.7445 934.3320 6179.798 100 > f3(cg, hs, es) 85.029 101.6785 129.8605 196.2835 2820.187 100 > f4(cg, hs, es) 762.232 804.4850 843.7170 1079.0800 24869.548 100 > > On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote: > > I still argue for na.rm=FALSE, but that is cute, also substantially > faster > > > > f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) > > f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) > > f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) > > f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) > > > > es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) > > hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) > > cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) > > > > system.time(replicate(1000, f1(cg, hs, es))) > > system.time(replicate(1000, f2(cg, hs, es))) > > system.time(replicate(1000, f3(cg, hs, es))) > > system.time(replicate(1000, f4(cg, hs, es))) > > > > > system.time(replicate(1000, f1(cg, hs, es))) > > user system elapsed > > 22.73 0.03 22.76 > > > system.time(replicate(1000, f2(cg, hs, es))) > > user system elapsed > > 0.92 0.04 0.95 > > > system.time(replicate(1000, f3(cg, hs, es))) > > user system elapsed > > 0.19 0.02 0.20 > > > system.time(replicate(1000, f4(cg, hs, es))) > > user system elapsed > > 0.95 0.03 0.98 > > > > > > R version 3.0.0 (2013-04-03) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > > > > > > > On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfu...@gmail.com> wrote: > > > I would do this to get the highest non-missing level: > > > > > > x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) > > > > > > rock chalk... > > > > > > -nfultz > > > > > > On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: > > >> Hi Paul, > > >> > > >> Unless you have truly offended the data generating oracle*, the > > >> pattern: NA, 1, NA, should be a data entry error --- graduating HS > > >> implies graduating ES, no? I would argue fringe cases like that > > >> should be corrected in the data, not through coding work arounds. > > >> Then you can just do: > > >> > > >> x <- do.call(paste0, list(es, hs, cg)) > > >> > > >> > table(factor(x, levels = c("000", "100", "110", "111"), labels = > c("none", "es","hs", "cg"))) > > >> none es hs cg > > >> 4 1 1 2 > > >> > > >> Cheers, > > >> > > >> Josh > > >> > > >> *Drawn from comments by Judea Pearl one lively session. > > >> > > >> > > >> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljoh...@gmail.com> > wrote: > > >> > In our Summer Stats Institute, I was asked a question that amounts > to > > >> > reversing the effect of the contrasts function (reconstruct an > ordinal > > >> > predictor from a set of binary columns). The best I could think of > was to > > >> > link together several ifelse functions, and I don't think I want to > do this > > >> > if the example became any more complicated. > > >> > > > >> > I'm unable to remember a less error prone method :). But I expect > you might. > > >> > > > >> > Here's my working example code > > >> > > > >> > ## Paul Johnson <pauljohn at ku.edu> > > >> > ## 2013-06-07 > > >> > > > >> > ## We need to create an ordinal factor from these indicators > > >> > ## completed elementary school > > >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1) > > >> > ## completed high school > > >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0) > > >> > ## completed college graduate > > >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0) > > >> > > > >> > ed <- ifelse(cg == 1, 3, > > >> > ifelse(hs == 1, 2, > > >> > ifelse(es == 1, 1, 0))) > > >> > > > >> > edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", > "cg")) > > >> > data.frame(es, hs, cg, ed, edf) > > >> > > > >> > ## Looks OK, but what if there are missings? > > >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) > > >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) > > >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) > > >> > ed <- ifelse(cg == 1, 3, > > >> > ifelse(hs == 1, 2, > > >> > ifelse(es == 1, 1, 0))) > > >> > cbind(es, hs, cg, ed) > > >> > > > >> > ## That's bad, ifelse returns NA too frequently. > > >> > ## Revise (becoming tedious!) > > >> > > > >> > ed <- ifelse(!is.na(cg) & cg == 1, 3, > > >> > ifelse(!is.na(hs) & hs == 1, 2, > > >> > ifelse(!is.na(es) & es == 1, 1, > > >> > ifelse(is.na(es), NA, 0)))) > > >> > cbind(es, hs, cg, ed) > > >> > > > >> > > > >> > ## Does the project director want us to worry about > > >> > ## logical inconsistencies, such as es = 0 but cg = 1? > > >> > ## I hope not. > > >> > > > >> > Thanks in advance, I hope you are having a nice summer. > > >> > > > >> > pj > > >> > > > >> > -- > > >> > Paul E. Johnson > > >> > Professor, Political Science Assoc. Director > > >> > 1541 Lilac Lane, Room 504 Center for Research Methods > > >> > University of Kansas University of Kansas > > >> > http://pj.freefaculty.org http://quant.ku.edu > > >> > > > >> > [[alternative HTML version deleted]] > > >> > > > >> > ______________________________________________ > > >> > R-help@r-project.org mailing list > > >> > https://stat.ethz.ch/mailman/listinfo/r-help > > >> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > >> > and provide commented, minimal, self-contained, reproducible code. > > >> > > >> > > >> > > >> -- > > >> Joshua Wiley > > >> Ph.D. Student, Health Psychology > > >> University of California, Los Angeles > > >> http://joshuawiley.com/ > > >> Senior Analyst - Elkhart Group Ltd. > > >> http://elkhartgroup.com > > >> > > >> ______________________________________________ > > >> R-help@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > Joshua Wiley > > Ph.D. Student, Health Psychology > > University of California, Los Angeles > > http://joshuawiley.com/ > > Senior Analyst - Elkhart Group Ltd. > > http://elkhartgroup.com > -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.