Re: [R] recode: how to avoid nested ifelse
Thanks, guys. On Sat, Jun 8, 2013 at 2:17 PM, Neal Fultz nfu...@gmail.com wrote: rowSums and Reduce will have the same problems with bad data you alluded to earlier, eg cg = 1, hs = 0 But that's something to check for with crosstabs anyway. This wrong data thing is a distraction here. I guess I'd have to craft 2 solutions, depending on what the researcher says. (We can't assume es = 0 or es = NA and cg = 1 is bad data. There are some people who finish college without doing elementary school (wasn't Albert Einstein one of those?) or high school. I once went to an eye doctor who didn't finish high school, but nonetheless was admitted to optometrist school.) I did not know about the Reduce function before this. If we enforce the ordering and clean up the data in the way you imagine, it would work. I think the pmax is the most teachable and dependably not-getting-wrongable approach if the data is not wrong. Side note: you should check out the microbenchmark pkg, it's quite handy. Perhaps the working example of microbenchmark is the best thing in this thread! I understand the idea behind it, but it seems like I can never get it to work right. It helps to see how you do it. Rrequire(microbenchmark) Rmicrobenchmark( + f1(cg,hs,es), + f2(cg,hs,es), + f3(cg,hs,es), + f4(cg,hs,es) + ) Unit: microseconds expr min lq median uq max neval f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112 100 f2(cg, hs, es) 730.665 755.5750 811.7445 934.3320 6179.798 100 f3(cg, hs, es)85.029 101.6785 129.8605 196.2835 2820.187 100 f4(cg, hs, es) 762.232 804.4850 843.7170 1079.0800 24869.548 100 On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote: I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 - function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 - function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 - function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 - function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es - rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs - rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg - rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es))) system.time(replicate(1000, f1(cg, hs, es))) user system elapsed 22.730.03 22.76 system.time(replicate(1000, f2(cg, hs, es))) user system elapsed 0.920.040.95 system.time(replicate(1000, f3(cg, hs, es))) user system elapsed 0.190.020.20 system.time(replicate(1000, f4(cg, hs, es))) user system elapsed 0.950.030.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz nfu...@gmail.com wrote: I would do this to get the highest non-missing level: x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1,
Re: [R] recode: how to avoid nested ifelse
rowSums and Reduce will have the same problems with bad data you alluded to earlier, eg cg = 1, hs = 0 But that's something to check for with crosstabs anyway. Side note: you should check out the microbenchmark pkg, it's quite handy. Rrequire(microbenchmark) Rmicrobenchmark( + f1(cg,hs,es), + f2(cg,hs,es), + f3(cg,hs,es), + f4(cg,hs,es) + ) Unit: microseconds expr min lq median uq max neval f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112 100 f2(cg, hs, es) 730.665 755.5750 811.7445 934.3320 6179.798 100 f3(cg, hs, es)85.029 101.6785 129.8605 196.2835 2820.187 100 f4(cg, hs, es) 762.232 804.4850 843.7170 1079.0800 24869.548 100 On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote: I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 - function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 - function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 - function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 - function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es - rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs - rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg - rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es))) system.time(replicate(1000, f1(cg, hs, es))) user system elapsed 22.730.03 22.76 system.time(replicate(1000, f2(cg, hs, es))) user system elapsed 0.920.040.95 system.time(replicate(1000, f3(cg, hs, es))) user system elapsed 0.190.020.20 system.time(replicate(1000, f4(cg, hs, es))) user system elapsed 0.950.030.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz nfu...@gmail.com wrote: I would do this to get the highest non-missing level: x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML
Re: [R] recode: how to avoid nested ifelse
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode: how to avoid nested ifelse
I would do this to get the highest non-missing level: x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode: how to avoid nested ifelse
I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 - function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 - function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 - function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 - function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es - rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs - rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg - rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es))) system.time(replicate(1000, f1(cg, hs, es))) user system elapsed 22.730.03 22.76 system.time(replicate(1000, f2(cg, hs, es))) user system elapsed 0.920.040.95 system.time(replicate(1000, f3(cg, hs, es))) user system elapsed 0.190.020.20 system.time(replicate(1000, f4(cg, hs, es))) user system elapsed 0.950.030.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz nfu...@gmail.com wrote: I would do this to get the highest non-missing level: x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com