Re: [R] Condition to factor (easy to remember)
Douglas Bates-2 wrote: On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote: And besides, Frank Harrell will soon be weighing in to tell you why you shouldn't dichotomize in the first place. Subjects in this study received a 20 ml infusion of Kirsch (40%, Swiss Brand) at t=10 minutes, therefore the second interval should read Prost instead of Post. Even Frank would admit this is a valid dichotomization. Dieter -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25696647.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Condition to factor (easy to remember)
Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
On Sep 30, 2009, at 3:43 AM, Dieter Menne wrote: Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) I agree with your observation that many people express a preference for the ifelse version. I had the same sort of comment on some of my Excel code (not in a statistical application) a couple of days ago. In your code the as.integer function is superfluous and you could argue that it might even be easier to understand for the Boolean- challenged masses if you substituted as.logical(). It would be also superfluous, but it might convey a message that the programmer _knew+ that the + operation is capable of doing the necessary coercion. # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) -- -- Boole Rules David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
David Winsemius wrote: # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) In your code the as.integer function is superfluous Oops... done too much c# lately, getting invalid cast challenged. Dieter -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25680111.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
1. A common way of doing this is cut: cut(data, c(-Inf, 10, Inf), lab = levs, right = TRUE) [1] Pre Pre Pre Post Post Levels: Pre Post We don't actually need right=TRUE as its the default but if you omit it then it can be hard to remember whether the right end of intervals are included or excluded in the subdivision so I would recommend including it as a matter of course. Slightly less safe but if you knew the values were integral then another approach that would allow dropping the right= argument would be to use 10.5 as the breakpoint in which case the setting of right= does not matter anyways. 2. Similar to cut is findInterval so the subscripting of your first solution could be done via findInterval: levs[ findInterval(data, c(-Inf, 10), right = TRUE) ] [1] Pre Pre Pre Post Post The same comment regarding 10.5 applies. I've omitted the factor(...) part to focus on the difference and in the remaining examples have done that too. 3. Either of these could replace the ifelse. Both work by vectorizing an ordinary if but sapply is a more common way to do it so is likely preferable from the viewpoint of clarity. # 3a sapply(data, function(x) if (x = 10) levs[1] else levs[2]) [1] Pre Pre Pre Post Post # 3b Vectorize(function(x) if (x = 10) levs[1] else levs[2])(data) [1] Pre Pre Pre Post Post 4. The subscripting in your first solution could be done like this which is a bit longer but is arguably easier to understand: levs[ 1 * (data =10) + 2 * (data 10) ] [1] Pre Pre Pre Post Post On Wed, Sep 30, 2009 at 3:43 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
An extremely verbose, but (in my view) easy to understand approach is: data.f - data; data.f[which(data = 10)] - levs[1]; data.f[which(data 10)] - levs[2]; data.f - factor(data.f) -Ista On Wed, Sep 30, 2009 at 8:31 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: David Winsemius wrote: # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) In your code the as.integer function is superfluous Oops... done too much c# lately, getting invalid cast challenged. Dieter -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25680111.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote: On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) Why not factor(data 10, labels = c(Pre, Post)) [1] Pre Pre Pre Post Post Levels: Pre Post All you have to remember is that FALSE comes before TRUE. And besides, Frank Harrell will soon be weighing in to tell you why you shouldn't dichotomize in the first place. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
On Wed, Sep 30, 2009 at 2:32 PM, Ista Zahn istaz...@gmail.com wrote: An extremely verbose, but (in my view) easy to understand approach is: data.f - data; data.f[which(data = 10)] - levs[1]; data.f[which(data 10)] - levs[2]; data.f - factor(data.f) All those which()s are unnecessary. And if you're going to use this approach I'd recommend initialising data.f with NA's so you can tell if you missed any cases. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
Douglas Bates wrote: On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote: On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) Why not factor(data 10, labels = c(Pre, Post)) [1] Pre Pre Pre Post Post Levels: Pre Post All you have to remember is that FALSE comes before TRUE. And besides, Frank Harrell will soon be weighing in to tell you why you shouldn't dichotomize in the first place. And someone might also remind you that it is safest to include levels=c(FALSE,TRUE), just in case the condition is always TRUE. (Terry Thernau has the scars from the implementation of Surv()...) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Douglas Bates Sent: Wednesday, September 30, 2009 12:42 PM To: Dieter Menne Cc: r-help@r-project.org Subject: Re: [R] Condition to factor (easy to remember) On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) Why not factor(data 10, labels = c(Pre, Post)) [1] Pre Pre Pre Post Post Levels: Pre Post All you have to remember is that FALSE comes before TRUE. And if you don't want to remember that order or if you want TRUE to come before FALSE use the levels argument to factor. E.g., factor(data10, levels=c(TRUE,FALSE), labels=c(Post,Pre)) [1] Pre Pre Pre Post Post Levels: Post Pre Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.