Re: [R] recode: how to avoid nested ifelse

Paul Johnson Mon, 10 Jun 2013 08:12:29 -0700

Thanks, guys.


On Sat, Jun 8, 2013 at 2:17 PM, Neal Fultz <nfu...@gmail.com> wrote:

> rowSums and Reduce will have the same problems with bad data you alluded
> to earlier, eg
> cg = 1, hs = 0
>
> But that's something to check for with crosstabs anyway.
>
>
This "wrong data" thing is a distraction here.  I guess I'd have to craft 2
solutions, depending on what the researcher says. (We can't assume es = 0
or es = NA and cg = 1 is bad data. There are some people who finish college
without doing elementary school (wasn't Albert Einstein one of those?) or
high school. I once went to an eye doctor who didn't finish high school,
but nonetheless was admitted to optometrist school.)

I did not know about the Reduce function before this. If we enforce the
ordering and clean up the data in the way you imagine, it would work.

I think the pmax is the most teachable and dependably not-getting-wrongable
approach if the data is not wrong.


> Side note: you should check out the microbenchmark pkg, it's quite handy.
>
>
Perhaps the working example of microbenchmark is the best thing in this
thread! I understand the idea behind it, but it seems like I can never get
it to work right. It helps to see how you do it.

>
> R>require(microbenchmark)
> R>microbenchmark(
> +   f1(cg,hs,es),
> +   f2(cg,hs,es),
> +   f3(cg,hs,es),
> +   f4(cg,hs,es)
> + )
> Unit: microseconds
>            expr       min         lq     median         uq       max neval
>  f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112   100
>  f2(cg, hs, es)   730.665   755.5750   811.7445   934.3320  6179.798   100
>  f3(cg, hs, es)    85.029   101.6785   129.8605   196.2835  2820.187   100
>  f4(cg, hs, es)   762.232   804.4850   843.7170  1079.0800 24869.548   100
>
> On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote:
> > I still argue for na.rm=FALSE, but that is cute, also substantially
> faster
> >
> > f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
> > f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
> > f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
> > f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3))
> >
> > es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
> > hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
> > cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)
> >
> > system.time(replicate(1000, f1(cg, hs, es)))
> > system.time(replicate(1000, f2(cg, hs, es)))
> > system.time(replicate(1000, f3(cg, hs, es)))
> > system.time(replicate(1000, f4(cg, hs, es)))
> >
> > > system.time(replicate(1000, f1(cg, hs, es)))
> >    user  system elapsed
> >   22.73    0.03   22.76
> > > system.time(replicate(1000, f2(cg, hs, es)))
> >    user  system elapsed
> >    0.92    0.04    0.95
> > > system.time(replicate(1000, f3(cg, hs, es)))
> >    user  system elapsed
> >    0.19    0.02    0.20
> >  > system.time(replicate(1000, f4(cg, hs, es)))
> >    user  system elapsed
> >    0.95    0.03    0.98
> >
> >
> > R version 3.0.0 (2013-04-03)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> >
> >
> >
> > On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfu...@gmail.com> wrote:
> > > I would do this to get the highest non-missing level:
> > >
> > > x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)
> > >
> > > rock chalk...
> > >
> > > -nfultz
> > >
> > > On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
> > >> Hi Paul,
> > >>
> > >> Unless you have truly offended the data generating oracle*, the
> > >> pattern: NA, 1, NA, should be a data entry error --- graduating HS
> > >> implies graduating ES, no?  I would argue fringe cases like that
> > >> should be corrected in the data, not through coding work arounds.
> > >> Then you can just do:
> > >>
> > >> x <- do.call(paste0, list(es, hs, cg))
> > >>
> > >> > table(factor(x, levels = c("000", "100", "110", "111"), labels =
> c("none", "es","hs", "cg")))
> > >> none   es   hs   cg
> > >>    4    1    1    2
> > >>
> > >> Cheers,
> > >>
> > >> Josh
> > >>
> > >> *Drawn from comments by Judea Pearl one lively session.
> > >>
> > >>
> > >> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljoh...@gmail.com>
> wrote:
> > >> > In our Summer Stats Institute, I was asked a question that amounts
> to
> > >> > reversing the effect of the contrasts function (reconstruct an
> ordinal
> > >> > predictor from a set of binary columns). The best I could think of
> was to
> > >> > link together several ifelse functions, and I don't think I want to
> do this
> > >> > if the example became any more complicated.
> > >> >
> > >> > I'm unable to remember a less error prone method :). But I expect
> you might.
> > >> >
> > >> > Here's my working example code
> > >> >
> > >> > ## Paul Johnson <pauljohn at ku.edu>
> > >> > ## 2013-06-07
> > >> >
> > >> > ## We need to create an ordinal factor from these indicators
> > >> > ## completed elementary school
> > >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1)
> > >> > ## completed high school
> > >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
> > >> > ## completed college graduate
> > >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
> > >> >
> > >> > ed <- ifelse(cg == 1, 3,
> > >> >              ifelse(hs == 1, 2,
> > >> >                     ifelse(es == 1, 1, 0)))
> > >> >
> > >> > edf <- factor(ed, levels = 0:3,  labels = c("none", "es", "hs",
> "cg"))
> > >> > data.frame(es, hs, cg, ed, edf)
> > >> >
> > >> > ## Looks OK, but what if there are missings?
> > >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
> > >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
> > >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
> > >> > ed <- ifelse(cg == 1, 3,
> > >> >              ifelse(hs == 1, 2,
> > >> >                     ifelse(es == 1, 1, 0)))
> > >> > cbind(es, hs, cg, ed)
> > >> >
> > >> > ## That's bad, ifelse returns NA too frequently.
> > >> > ## Revise (becoming tedious!)
> > >> >
> > >> > ed <- ifelse(!is.na(cg) & cg == 1, 3,
> > >> >              ifelse(!is.na(hs) & hs == 1, 2,
> > >> >                     ifelse(!is.na(es) & es == 1, 1,
> > >> >                            ifelse(is.na(es), NA, 0))))
> > >> > cbind(es, hs, cg, ed)
> > >> >
> > >> >
> > >> > ## Does the project director want us to worry about
> > >> > ## logical inconsistencies, such as es = 0 but cg = 1?
> > >> > ## I hope not.
> > >> >
> > >> > Thanks in advance, I hope you are having a nice summer.
> > >> >
> > >> > pj
> > >> >
> > >> > --
> > >> > Paul E. Johnson
> > >> > Professor, Political Science      Assoc. Director
> > >> > 1541 Lilac Lane, Room 504      Center for Research Methods
> > >> > University of Kansas                 University of Kansas
> > >> > http://pj.freefaculty.org               http://quant.ku.edu
> > >> >
> > >> >         [[alternative HTML version deleted]]
> > >> >
> > >> > ______________________________________________
> > >> > R-help@r-project.org mailing list
> > >> > https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > >> > and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >>
> > >>
> > >> --
> > >> Joshua Wiley
> > >> Ph.D. Student, Health Psychology
> > >> University of California, Los Angeles
> > >> http://joshuawiley.com/
> > >> Senior Analyst - Elkhart Group Ltd.
> > >> http://elkhartgroup.com
> > >>
> > >> ______________________________________________
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Joshua Wiley
> > Ph.D. Student, Health Psychology
> > University of California, Los Angeles
> > http://joshuawiley.com/
> > Senior Analyst - Elkhart Group Ltd.
> > http://elkhartgroup.com
>



-- 
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recode: how to avoid nested ifelse

Reply via email to