Re: [Rd] Regression stars
On Feb 12, 2013, at 20:19 , Duncan Murdoch wrote: I think you are misreading what Peter wrote. He wasn't defending that point of view, he was describing it. Yes. However, that being said, there is the point that the whole thing has been designed to work within the paradigm that I described, and, for better or worse, things are reasonably coherent and consistent within that framework. The thing that always worries me, when people get bothered by some aspect of software design, is that, if you change only that aspect, you may find yourself with something that is incoherent and inconsistent. I have quite a few times found myself realizing that Uncle John was right after all. For instance, if you change the paradigm to say that character variables are character, unless explicitly turned into factors, and then ameliorate the inconvenience by changing code that relies on factors to convert character variables on the fly, then you will lose the otherwise automatic consistency of level sets between subsets of data. (So, the math department not only has zero female professors, the entire female gender ceases to exist for that subgroup.) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 13-02-13 7:25 AM, peter dalgaard wrote: On Feb 12, 2013, at 20:19 , Duncan Murdoch wrote: I think you are misreading what Peter wrote. He wasn't defending that point of view, he was describing it. Yes. However, that being said, there is the point that the whole thing has been designed to work within the paradigm that I described, and, for better or worse, things are reasonably coherent and consistent within that framework. The thing that always worries me, when people get bothered by some aspect of software design, is that, if you change only that aspect, you may find yourself with something that is incoherent and inconsistent. I have quite a few times found myself realizing that Uncle John was right after all. For instance, if you change the paradigm to say that character variables are character, unless explicitly turned into factors, and then ameliorate the inconvenience by changing code that relies on factors to convert character variables on the fly, then you will lose the otherwise automatic consistency of level sets between subsets of data. (So, the math department not only has zero female professors, the entire female gender ceases to exist for that subgroup.) Sure, if I have a file that contains a column named Sex and it is all M, I can't expect R to automatically know that there is another possibility. That's always been a problem. If we automatically convert the data to factors when we read, then maybe we'll be lucky and some other part of that file that we're planning to throw away will contain an F, and we'll automatically construct the right factor. (Except we don't: lm and glm will throw away the F level if there are none in the subset we pass to them, factor or not, because they use drop.unused.levels=TRUE in their call to model.frame().) There's also the possibility that there will be m and f in there, and we'll get it wrong. In R 2.15.2, we do the automatic conversion with a warning, but we do it wrong, which leads to the inconsistency that Bill Dunlap reported. R-devel drops the warning and comes closer to getting it right, but it's really an impossible problem: if we never see an F, we'll never set the levels of the factor properly. If we see a typo like m or f and don't realize it's a typo, we'll have more than two Sex values. The current R-devel implementation delays the conversion as much as it can, and maybe it delays it too far. It allows model.frame() to continue to return character columns, as it does in 2.15.2. This was to support xtabs(), which treats character columns differently from factors, and other unforeseen uses. Another possibility would be to add an argument (stringsAsFactors?) to model.frame() to let modelling functions choose whether they want factors or not. xtabs() would say no, lm() and glm() would say yes. I think the current implementation is preferable because it won't require changes to well written existing functions. With the current R-devel implementation, it is easier than in 2.15.2 to get errors thrown when the auto-conversion goes wrong. I don't know of any examples where you get incorrect results. I think this is an improvement. I'd appreciate hearing of any bugs in it. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Please do not change the defaults for the show.signif.stars option or for the default.stringsAsFactors option. Backward compatibility is more important than your convenience. The same sort of argument could be made for changing the default of the [ function from drop = TRUE to drop = FALSE. It would lead to less gotchas when coding and make R a saner programming language (less infernoish), but would annoy and confuse ordinary users and is not the R way. In any case your philosophical arguments about signif stars are bogus. Non-simultaneous have exactly the same problem as these regression stars. As I once said in a paper, they are something users think they can interpret with the unstated implication that they really cannot. Charlie's law of users says ordinary users of statistics actually ignore confidence levels and treat all confidence intervals as if they cover (i. e., take the true confidence level to be 100%). You cannot fix lack of user understanding of statistics by any such simplistic idea. Yes R is a prime example of worse is better, but it is the way it is. Don't try to turn it into C++. Thank you. -- Charles Geyer Professor, School of Statistics University of Minnesota char...@stat.umn.edu [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 13/02/2013 8:40 AM, Charles Geyer wrote: Please do not change the defaults for the show.signif.stars option or for the default.stringsAsFactors option. Backward compatibility is more important than your convenience. The same sort of argument could be made for changing the default of the [ function from drop = TRUE to drop = FALSE. It would lead to less gotchas when coding and make R a saner programming language (less infernoish), but would annoy and confuse ordinary users and is not the R way. That is something that might improve the language, but it would be far more disruptive than either of the other two changes. It's a matter of balance. In my judgment its cost would greatly exceed its benefit. In the case of stringsAsFactors, I think the benefits would exceed the costs.In the case of the stars, I think both costs and benefits are negligible. I think the R way is this kind of balance, with a fairly strong conservative tilt. Due to the conservatism, I'm not planning to make the stringsAsFactors change for everybody, but I have made an effort to make it easier to make the change individually via the option() setting. Duncan Murdoch In any case your philosophical arguments about signif stars are bogus. Non-simultaneous have exactly the same problem as these regression stars. As I once said in a paper, they are something users think they can interpret with the unstated implication that they really cannot. Charlie's law of users says ordinary users of statistics actually ignore confidence levels and treat all confidence intervals as if they cover (i. e., take the true confidence level to be 100%). You cannot fix lack of user understanding of statistics by any such simplistic idea. Yes R is a prime example of worse is better, but it is the way it is. Don't try to turn it into C++. Thank you. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Uwe I've been consulting for decades and have never once been asked for such stars. And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12/02/2013 9:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Could you post an example of a non-trivial one? (By trivial, I mean one that says data.frame() converts character vectors to factors. Obviously that would need to change. I mean one that just assumes current behaviour, and would be broken by the change.) Duncan Murdoch Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I think that we should use P .03 (which approximates the probability of 5 consecutive heads) for assigning significance! Ravi -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Frank Harrell Sent: Tuesday, February 12, 2013 9:43 AM To: r-devel@r-project.org Subject: Re: [Rd] Regression stars Uwe I've been consulting for decades and have never once been asked for such stars. And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12.02.2013 15:42, Frank Harrell wrote: Uwe I've been consulting for decades and have never once been asked for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) OK, I can add you another 5 stars for p values smaller than 0.5 they did not find it too funny. Best, Uwe And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 13-02-12 09:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12.02.2013 16:40, Ben Bolker wrote: On 13-02-12 09:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? No, I cannot, Uwe I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12/02/2013 10:40 AM, Ben Bolker wrote: On 13-02-12 09:20 AM, Uwe Ligges wrote: On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I can, under two assumptions: 1. We keep stringsAsFactors=TRUE on read.table(). 2. We keep the stringsAsFactors argument in data.frame(). Under those assumptions, it would just be confusing to have opposite defaults. (Just in case someone hasn't read all of this thread: I'd be happier to have the default be FALSE in both cases, but not until 3.1.x. For 3.0.x I think I'd just change the default value of default.stringsAsFactors() to FALSE, so people could easily get the old behaviour.) Duncan Murdoch I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale') n - 100 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=TRUE) a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=FALSE) fn - function(i,x) x[x$f %in% c('kale','spinach'),] system.time(z - sapply(1:100, fn, a1)) user system elapsed 19.614 4.037 24.649 system.time(z - sapply(1:100, fn, a2)) user system elapsed 19.726 7.715 36.761 On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote: Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
They are reaching for the stars. Pardon my jest, but I couldn't resist. Ravi -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Uwe Ligges Sent: Tuesday, February 12, 2013 10:01 AM To: Frank Harrell Cc: r-devel@r-project.org Subject: Re: [Rd] Regression stars On 12.02.2013 15:42, Frank Harrell wrote: Uwe I've been consulting for decades and have never once been asked for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) OK, I can add you another 5 stars for p values smaller than 0.5 they did not find it too funny. Best, Uwe And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I think it may have been John D. Cook who first observed that p-values are linearly correlated with the amount of time remaining on a grant. Perhaps a suitable transform would reveal an ordinal relationship with stars. On Tue, Feb 12, 2013 at 7:03 AM, Ravi Varadhan ravi.varad...@jhu.eduwrote: They are reaching for the stars. Pardon my jest, but I couldn't resist. Ravi -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Uwe Ligges Sent: Tuesday, February 12, 2013 10:01 AM To: Frank Harrell Cc: r-devel@r-project.org Subject: Re: [Rd] Regression stars On 12.02.2013 15:42, Frank Harrell wrote: Uwe I've been consulting for decades and have never once been asked for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) OK, I can add you another 5 stars for p values smaller than 0.5 they did not find it too funny. Best, Uwe And when a clinical researcher puts a sentence in a study protocol that P0.05 will be considered significant I get them to take it out. Frank Uwe Ligges-3 wrote On 12.02.2013 14:54, Ben Bolker wrote: Duncan Murdoch murdoch.duncan at gmail.com writes: [snip] Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On Feb 12, 2013, at 11:05 AM, Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale') n - 100 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=TRUE) a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=FALSE) fn - function(i,x) x[x$f %in% c('kale','spinach'),] system.time(z - sapply(1:100, fn, a1)) user system elapsed 19.614 4.037 24.649 system.time(z - sapply(1:100, fn, a2)) user system elapsed 19.726 7.715 36.761 Not really: system.time(z - sapply(1:100, fn, a1)) user system elapsed 13.780 0.444 14.229 rm(z) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 182113 9.8 407500 21.8337655 18.1 Vcells 5789638 44.2 133982285 1022.3 163019778 1243.8 system.time(z - sapply(1:100, fn, a2)) user system elapsed 13.201 0.668 13.873 But your test is bogus, because %in% uses match() which converts factors to character vectors anyway, so in your case you're just measuring noise in your system, character vectors are always faster in your example. The reason is that in R strings are hashed so character vectors are technically very similar to factors just with faster access (because they don't need to go through the integer indirection). On 32-bit strings are in theory always faster than factors, on 64-bit they use double the size so they may or may not be faster depending on how you hit the cache etc. Anyway, in modern R versions you're much better off using character vectors than factors for any processing, so stringsAsFactors=FALSE is what I use exclusively. Cheers, Simon On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote: Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch [apologies for snipping context: gmane made me do it] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 02/12/2013 08:20 AM, peter dalgaard wrote: On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. sarcasm Since character vectors are so bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. /sarcasm No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 12/02/2013 1:47 PM, Hervé Pagès wrote: On 02/12/2013 08:20 AM, peter dalgaard wrote: On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. sarcasm Since character vectors are so bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. /sarcasm I think you are misreading what Peter wrote. He wasn't defending that point of view, he was describing it. No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. That's a really bad suggestion -- it would break code for people who set stringsAsFactors=FALSE as well as those who rely on the current default behaviour. We certainly won't do that. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Hi Duncan, On 02/12/2013 11:19 AM, Duncan Murdoch wrote: On 12/02/2013 1:47 PM, Hervé Pagès wrote: On 02/12/2013 08:20 AM, peter dalgaard wrote: On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote: I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings. I think not. Historically, it's more like In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case. sarcasm Since character vectors are so bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. /sarcasm I think you are misreading what Peter wrote. He wasn't defending that point of view, he was describing it. I was answering to the thread, not to Peter in particular. Sorry if it sounded otherwise. No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. That's a really bad suggestion -- it would break code for people who set stringsAsFactors=FALSE as well as those who rely on the current default behaviour. We certainly won't do that. But since there seems to be a discussion about doing some changes to the stringsAsFactors feature, I was hoping you would consider that one too. Doing the right thing sometimes requires breaking people's code, sadly! Cheers, H. Duncan Murdoch -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Great discussion. Tim's Sinclair quote is priceless and relates to the non-reproducible research done in some quarters. Norm's wish to remove stars altogether is entirely consistent with good statistical practice and would make a statement that R base adheres to good practice. I don't think it will work to add confidence intervals because models can have nonlinear or interaction terms, and the reference cell for a factor variable may not be what the analyst chooses for a comparison group. I would like for us to find a way to, over time, implement Norm's wish to de-emphasize P-values in general. The harm done by P-values is immeasureable. Frank Norm Matloff wrote I appreciate Tim's comments. I myself have a social science paper coming out soon in which I felt forced to use p-values, given their ubiquity. However, I also told readers of the paper that confidence intervals are much more informative and I do provide them. As I said earlier, there is no avoiding that, and R needs to report p-values for that reason. Instead, the question is what to do about the stars; I proposed eliminating them altogether. Star-crazed users know how to determine them themselves from the p-values, but deleting them from R would send a message. I did say my proposal was bold, which really meant I was suggesting that R do SOMETHING to send that message, not necessarily star elimination. One such something would be the proposal I made, which would be to add confidence intervals to the output. This too could be just an option, but again offering that option would send a message. Indeed, I would suggest that the help page explain that confidence intervals are more informative. (The help page could make a similar statement regarding the stars.) When I pitch R to people, I say that in addition to the large function and library base and the nice graphics capabilities, R is above all Statistically Correct--it's written by statisticians who know what they are doing, rather than some programmer simply implementing a formula from a textbook. I know that a lot of people feel this is one of R's biggest strengths. Given that, one might argue that R should do what it can to help users engage in good statistical practice. I think this was Frank's point. Norm __ R-devel@ mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658084.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
On 13-02-09 3:49 PM, Tim Triche, Jr. wrote: To clarify, I favor changing the defaults for stringsAsFactors and show.signif.stars to FALSE in R-3.0.0, and view any attempt to remove either functionality as a seemingly simple but fundamentally misguided idea. Both of these were discussed by R Core. I think it's unlikely the default for stringsAsFactors will be changed (some R Core members like the current behaviour), but it's fairly likely the show.signif.stars default will change. (That's if someone gets around to it: I personally don't care about that one. P-values are commonly used statistics, and the stars are just a simple graphical display of them. I find some p-values to be useful, and the display to be harmless.) I think it's really unlikely the more extreme changes (i.e. dropping show.signif.stars completely, or dropping p-values) will happen. Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch This is just my opinion, of course. The change could easily be accompanied by a startup notice or release notes indicating that the changes have been made, and can be reverted to past behavior if the user so desires. Perhaps more users will investigate the various settings, as a happy side effect. My thanks to everyone who spends time supporting and working on R-core. On Sat, Feb 9, 2013 at 12:44 PM, Tim Triche, Jr. tim.tri...@gmail.comwrote: Changing the default for show.signif.stars should be sufficient to ensure that, if people are going to get themselves into trouble, they will have to do it on purpose. It's just a visual cue; removing it will not remove the underlying issue, namely blind acceptance of unlikely null models and distributions. For any complex problem, there is a solution that is simple, elegant, and wrong. As grants and careers can depend on these magic numbers, Upton Sinclair might save everyone some trouble... It is difficult to get a man to understand something, when his salary depends upon his not understanding. stringsAsFactors, however, is responsible for an endless stream of mildly irritating misunderstandings, and defaulting that to FALSE would be very nice. Just my $0.02. Defaults are one of the most powerful forces in the universe. Also, I liked your book. On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff matl...@cs.ucdavis.eduwrote: Thanks for bringing this up, Frank. Since many of us are educators, I'd like to suggest a bolder approach. Discontinue even offering the stars as an option. Sadly, we can't stop reporting p-values, as the world expects them, but does R need to cater to that attitude by offering star display? For that matter, why not have R report confidence intervals as a default? Many years ago, I wrote a short textbook on stat, and included a substantial section on the dangers of significance testing. All three internal reviewers liked it, but the funny part is that all three said, I agree with this, but no one else will. :-) Norm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Thanks for bringing this up, Frank. Since many of us are educators, I'd like to suggest a bolder approach. Discontinue even offering the stars as an option. Sadly, we can't stop reporting p-values, as the world expects them, but does R need to cater to that attitude by offering star display? For that matter, why not have R report confidence intervals as a default? Many years ago, I wrote a short textbook on stat, and included a substantial section on the dangers of significance testing. All three internal reviewers liked it, but the funny part is that all three said, I agree with this, but no one else will. :-) Norm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Changing the default for show.signif.stars should be sufficient to ensure that, if people are going to get themselves into trouble, they will have to do it on purpose. It's just a visual cue; removing it will not remove the underlying issue, namely blind acceptance of unlikely null models and distributions. For any complex problem, there is a solution that is simple, elegant, and wrong. As grants and careers can depend on these magic numbers, Upton Sinclair might save everyone some trouble... It is difficult to get a man to understand something, when his salary depends upon his not understanding. stringsAsFactors, however, is responsible for an endless stream of mildly irritating misunderstandings, and defaulting that to FALSE would be very nice. Just my $0.02. Defaults are one of the most powerful forces in the universe. Also, I liked your book. On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff matl...@cs.ucdavis.eduwrote: Thanks for bringing this up, Frank. Since many of us are educators, I'd like to suggest a bolder approach. Discontinue even offering the stars as an option. Sadly, we can't stop reporting p-values, as the world expects them, but does R need to cater to that attitude by offering star display? For that matter, why not have R report confidence intervals as a default? Many years ago, I wrote a short textbook on stat, and included a substantial section on the dangers of significance testing. All three internal reviewers liked it, but the funny part is that all three said, I agree with this, but no one else will. :-) Norm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
To clarify, I favor changing the defaults for stringsAsFactors and show.signif.stars to FALSE in R-3.0.0, and view any attempt to remove either functionality as a seemingly simple but fundamentally misguided idea. This is just my opinion, of course. The change could easily be accompanied by a startup notice or release notes indicating that the changes have been made, and can be reverted to past behavior if the user so desires. Perhaps more users will investigate the various settings, as a happy side effect. My thanks to everyone who spends time supporting and working on R-core. On Sat, Feb 9, 2013 at 12:44 PM, Tim Triche, Jr. tim.tri...@gmail.comwrote: Changing the default for show.signif.stars should be sufficient to ensure that, if people are going to get themselves into trouble, they will have to do it on purpose. It's just a visual cue; removing it will not remove the underlying issue, namely blind acceptance of unlikely null models and distributions. For any complex problem, there is a solution that is simple, elegant, and wrong. As grants and careers can depend on these magic numbers, Upton Sinclair might save everyone some trouble... It is difficult to get a man to understand something, when his salary depends upon his not understanding. stringsAsFactors, however, is responsible for an endless stream of mildly irritating misunderstandings, and defaulting that to FALSE would be very nice. Just my $0.02. Defaults are one of the most powerful forces in the universe. Also, I liked your book. On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff matl...@cs.ucdavis.eduwrote: Thanks for bringing this up, Frank. Since many of us are educators, I'd like to suggest a bolder approach. Discontinue even offering the stars as an option. Sadly, we can't stop reporting p-values, as the world expects them, but does R need to cater to that attitude by offering star display? For that matter, why not have R report confidence intervals as a default? Many years ago, I wrote a short textbook on stat, and included a substantial section on the dangers of significance testing. All three internal reviewers liked it, but the funny part is that all three said, I agree with this, but no one else will. :-) Norm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
I appreciate Tim's comments. I myself have a social science paper coming out soon in which I felt forced to use p-values, given their ubiquity. However, I also told readers of the paper that confidence intervals are much more informative and I do provide them. As I said earlier, there is no avoiding that, and R needs to report p-values for that reason. Instead, the question is what to do about the stars; I proposed eliminating them altogether. Star-crazed users know how to determine them themselves from the p-values, but deleting them from R would send a message. I did say my proposal was bold, which really meant I was suggesting that R do SOMETHING to send that message, not necessarily star elimination. One such something would be the proposal I made, which would be to add confidence intervals to the output. This too could be just an option, but again offering that option would send a message. Indeed, I would suggest that the help page explain that confidence intervals are more informative. (The help page could make a similar statement regarding the stars.) When I pitch R to people, I say that in addition to the large function and library base and the nice graphics capabilities, R is above all Statistically Correct--it's written by statisticians who know what they are doing, rather than some programmer simply implementing a formula from a textbook. I know that a lot of people feel this is one of R's biggest strengths. Given that, one might argue that R should do what it can to help users engage in good statistical practice. I think this was Frank's point. Norm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] regression stars
There are only a few things in R where we override the global defaults on a departmental level -- we really don't like to do so. But show.signif.stars is one of the 3. The other 2 if you are curious: set stringsAsFactors=FALSE and make NA included by default in the output of table. We've been overriding both of these for 10+ years. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Regression stars
Today's GNU R tutorial in http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics points out how bad statistical practice is being further perpetuated, by virtue of significance stars still being the default in printed output from lm models. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
Dear Frank, I'd like to second your implicit motion to make options(show.signif.stars=FALSE) the default. Thanks for raising this point. John On Thu, 7 Feb 2013 05:32:04 -0800 (PST) Frank Harrell f.harr...@vanderbilt.edu wrote: Today's GNU R tutorial in http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics points out how bad statistical practice is being further perpetuated, by virtue of significance stars still being the default in printed output from lm models. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Regression stars
FWIW, that has been my default setting for years in my .Rprofile. If there is some agreement on this from R Core, it would seem that version 3.0.0 would be a reasonable breakpoint for this change in default behavior. Regards, Marc Schwartz On Feb 7, 2013, at 8:27 AM, John Fox j...@mcmaster.ca wrote: Dear Frank, I'd like to second your implicit motion to make options(show.signif.stars=FALSE) the default. Thanks for raising this point. John On Thu, 7 Feb 2013 05:32:04 -0800 (PST) Frank Harrell f.harr...@vanderbilt.edu wrote: Today's GNU R tutorial in http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics points out how bad statistical practice is being further perpetuated, by virtue of significance stars still being the default in printed output from lm models. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel