Re: [Rd] Regression stars

2013-02-13 Thread peter dalgaard

On Feb 12, 2013, at 20:19 , Duncan Murdoch wrote:

 I think you are misreading what Peter wrote.  He wasn't defending that point 
 of view, he was describing it.
 

Yes. However, that being said, there is the point that the whole thing has been 
designed to work within the paradigm that I described, and, for better or 
worse, things are reasonably coherent and consistent within that framework.

The thing that always worries me, when people get bothered by some aspect of 
software design, is that, if you change only that aspect, you may find yourself 
with something that is incoherent and inconsistent. I have quite a few times 
found myself realizing that Uncle John was right after all.  

For instance, if you change the paradigm to say that character variables are 
character, unless explicitly turned into factors, and then ameliorate the 
inconvenience by changing code that relies on factors to convert character 
variables on the fly, then you will lose the otherwise automatic consistency of 
level sets between subsets of data. (So, the math department not only has zero 
female professors, the entire female gender ceases to exist for that subgroup.)

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-13 Thread Duncan Murdoch

On 13-02-13 7:25 AM, peter dalgaard wrote:


On Feb 12, 2013, at 20:19 , Duncan Murdoch wrote:


I think you are misreading what Peter wrote.  He wasn't defending
that point of view, he was describing it.



Yes. However, that being said, there is the point that the whole
thing has been designed to work within the paradigm that I described,
and, for better or worse, things are reasonably coherent and
consistent within that framework.

The thing that always worries me, when people get bothered by some
aspect of software design, is that, if you change only that aspect,
you may find yourself with something that is incoherent and
inconsistent. I have quite a few times found myself realizing that
Uncle John was right after all.

For instance, if you change the paradigm to say that character
variables are character, unless explicitly turned into factors, and
then ameliorate the inconvenience by changing code that relies on
factors to convert character variables on the fly, then you will lose
the otherwise automatic consistency of level sets between subsets of
data. (So, the math department not only has zero female professors,
the entire female gender ceases to exist for that subgroup.)



Sure, if I have a file that contains a column named Sex and it is all M,
I can't expect R to automatically know that there is another
possibility.  That's always been a problem.  If we automatically convert
the data to factors when we read, then maybe we'll be lucky and some
other part of that file that we're planning to throw away will contain
an F, and we'll automatically construct the right factor.
(Except we don't:  lm and glm will throw away the F level if there are
none in the subset we pass to them, factor or not, because they use
drop.unused.levels=TRUE in their call to model.frame().)

There's also the possibility that there will be m and f in there, and
we'll get it wrong.

In R 2.15.2, we do the automatic conversion with a warning, but we do it
wrong, which leads to the inconsistency that Bill Dunlap reported.
R-devel drops the warning and comes closer to getting it right, but it's
really an impossible problem:  if we never see an F, we'll never set the
levels of the factor properly.  If we see a typo like m or f and don't
realize it's a typo, we'll have more than two Sex values.

The current R-devel implementation delays the conversion as much as it
can, and maybe it delays it too far.  It allows model.frame() to
continue to return character columns, as it does in 2.15.2.  This was to
support xtabs(), which treats character columns differently from
factors, and other unforeseen uses.  Another possibility would be to add
an argument (stringsAsFactors?) to model.frame() to let modelling
functions choose whether they want factors or not.  xtabs() would say
no, lm() and glm() would say yes.  I think the current implementation is
preferable because it won't require changes to well written existing
functions.

With the current R-devel implementation, it is easier than in 2.15.2 to
get errors thrown when the auto-conversion goes wrong.  I don't know of
any examples where you get incorrect results.  I think this is an
improvement.

I'd appreciate hearing of any bugs in it.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-13 Thread Charles Geyer
Please do not change the defaults for the show.signif.stars option or for
the default.stringsAsFactors option.  Backward compatibility is more
important than your convenience.  The same sort of argument could be made
for changing the default of the [ function from drop = TRUE to drop =
FALSE.  It would lead to less gotchas when coding and make R a saner
programming language (less infernoish), but would annoy and confuse
ordinary users and is not the R way.  In any case your philosophical
arguments about signif stars are bogus.  Non-simultaneous have exactly the
same problem as these regression stars.  As I once said in a paper, they
are something users think they can interpret with the unstated
implication that they really cannot.  Charlie's law of users says ordinary
users of statistics actually ignore confidence levels and treat all
confidence intervals as if they cover (i. e., take the true confidence
level to be 100%).  You cannot fix lack of user understanding of statistics
by any such simplistic idea.  Yes R is a prime example of worse is
better, but it is the way it is.  Don't try to turn it into C++.  Thank
you.
-- 
Charles Geyer
Professor, School of Statistics
University of Minnesota
char...@stat.umn.edu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-13 Thread Duncan Murdoch

On 13/02/2013 8:40 AM, Charles Geyer wrote:

Please do not change the defaults for the show.signif.stars option or for
the default.stringsAsFactors option.  Backward compatibility is more
important than your convenience.  The same sort of argument could be made
for changing the default of the [ function from drop = TRUE to drop =
FALSE.  It would lead to less gotchas when coding and make R a saner
programming language (less infernoish), but would annoy and confuse
ordinary users and is not the R way.


That is something that might improve the language, but it would be far 
more disruptive than either of the other two changes.  It's a matter of 
balance.  In my judgment its cost would greatly exceed its benefit.  In 
the case of stringsAsFactors, I think the benefits would exceed the 
costs.In the case of the stars, I think both costs and benefits are 
negligible.  I think the R way is this kind of balance, with a fairly 
strong conservative tilt.  Due to the conservatism, I'm not planning to 
make the stringsAsFactors change for everybody, but I have made an 
effort to make it easier to make the change individually via the 
option() setting.


Duncan Murdoch


  In any case your philosophical
arguments about signif stars are bogus.  Non-simultaneous have exactly the
same problem as these regression stars.  As I once said in a paper, they
are something users think they can interpret with the unstated
implication that they really cannot.  Charlie's law of users says ordinary
users of statistics actually ignore confidence levels and treat all
confidence intervals as if they cover (i. e., take the true confidence
level to be 100%).  You cannot fix lack of user understanding of statistics
by any such simplistic idea.  Yes R is a prime example of worse is
better, but it is the way it is.  Don't try to turn it into C++.  Thank
you.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ben Bolker
Duncan Murdoch murdoch.duncan at gmail.com writes:

  [snip]
 
 Regarding stringsAsFactors:  I'm not going to defend keeping it as is, 
 I'll let the people who like it defend it.  

  Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?

 What I will likely do is 
 make a few changes so that character vectors are automatically changed 
 to factors in modelling functions, so that operating with 
 stringsAsFactors=FALSE doesn't trigger silly warnings.
 
 Duncan Murdoch
 

 [apologies for snipping context: gmane made me do it]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Uwe Ligges



On 12.02.2013 14:54, Ben Bolker wrote:

Duncan Murdoch murdoch.duncan at gmail.com writes:

   [snip]


Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
I'll let the people who like it defend it.


   Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?


Sure:
I will have to change all my scripts, my teaching examples, my book, and 
lots of code examples for research and particularly consulting jobs.


Personally, I think having stringsAsFactors=TRUE is not too bad for 
read.table() but less useful for data.frame().


And since you ask for the devil's advocate already, related to the 
subject line: Removing stars is horrible for consulting: With all those 
people from biology, medicine and other fields who even ask us questions 
in term of significance stars that are obviously very common for them. 
Many of them will certainly ask us for the stars, and ask us to switch 
to another software product once they do not get it from R. They may not 
be interested in being taught about the advantages or disadvantages of 
p-values or stars.


There are different use cases of R, and I want to keep stars for 
consulting tasks where things have to be delivered within minutes. I am 
happy with or without for teaching, where I have the time and can easily 
talk about the sense and nonsense of p-values.



Best,
Uwe
















What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

Duncan Murdoch



  [apologies for snipping context: gmane made me do it]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Frank Harrell
Uwe I've been consulting for decades and have never once been asked for such
stars.  And when a clinical researcher puts a sentence in a study protocol
that P0.05 will be considered significant I get them to take it out.
Frank

Uwe Ligges-3 wrote
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch 
 murdoch.duncan at
  gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?
 
 Sure:
 I will have to change all my scripts, my teaching examples, my book, and 
 lots of code examples for research and particularly consulting jobs.
 
 Personally, I think having stringsAsFactors=TRUE is not too bad for 
 read.table() but less useful for data.frame().
 
 And since you ask for the devil's advocate already, related to the 
 subject line: Removing stars is horrible for consulting: With all those 
 people from biology, medicine and other fields who even ask us questions 
 in term of significance stars that are obviously very common for them. 
 Many of them will certainly ask us for the stars, and ask us to switch 
 to another software product once they do not get it from R. They may not 
 be interested in being taught about the advantages or disadvantages of 
 p-values or stars.
 
 There are different use cases of R, and I want to keep stars for 
 consulting tasks where things have to be delivered within minutes. I am 
 happy with or without for teaching, where I have the time and can easily 
 talk about the sense and nonsense of p-values.
 
 
 Best,
 Uwe
 
 
 
 
 
 
 
 
 
 
 
 
 

 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 
 __

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Duncan Murdoch

On 12/02/2013 9:20 AM, Uwe Ligges wrote:


On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?

Sure:
I will have to change all my scripts, my teaching examples, my book, and
lots of code examples for research and particularly consulting jobs.


Could you post an example of a non-trivial one?  (By trivial, I mean one 
that says data.frame() converts character vectors to factors. 
Obviously that would need to change.  I mean one that just assumes 
current behaviour, and would be broken by the change.)


Duncan Murdoch


Personally, I think having stringsAsFactors=TRUE is not too bad for
read.table() but less useful for data.frame().

And since you ask for the devil's advocate already, related to the
subject line: Removing stars is horrible for consulting: With all those
people from biology, medicine and other fields who even ask us questions
in term of significance stars that are obviously very common for them.
Many of them will certainly ask us for the stars, and ask us to switch
to another software product once they do not get it from R. They may not
be interested in being taught about the advantages or disadvantages of
p-values or stars.

There are different use cases of R, and I want to keep stars for
consulting tasks where things have to be delivered within minutes. I am
happy with or without for teaching, where I have the time and can easily
talk about the sense and nonsense of p-values.


Best,
Uwe














 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ravi Varadhan
I think that we should use P  .03 (which approximates the probability of 5 
consecutive heads) for assigning significance!

Ravi

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Frank Harrell
Sent: Tuesday, February 12, 2013 9:43 AM
To: r-devel@r-project.org
Subject: Re: [Rd] Regression stars

Uwe I've been consulting for decades and have never once been asked for such 
stars.  And when a clinical researcher puts a sentence in a study protocol that 
P0.05 will be considered significant I get them to take it out.
Frank

Uwe Ligges-3 wrote
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch
 murdoch.duncan at
  gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as 
 is, I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense 
 of stringsAsFactors=TRUE -- even someone who doesn't personally like 
 it but would like to play devil's advocate?
 
 Sure:
 I will have to change all my scripts, my teaching examples, my book, 
 and lots of code examples for research and particularly consulting jobs.
 
 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().
 
 And since you ask for the devil's advocate already, related to the 
 subject line: Removing stars is horrible for consulting: With all 
 those people from biology, medicine and other fields who even ask us 
 questions in term of significance stars that are obviously very common for 
 them.
 Many of them will certainly ask us for the stars, and ask us to switch 
 to another software product once they do not get it from R. They may 
 not be interested in being taught about the advantages or 
 disadvantages of p-values or stars.
 
 There are different use cases of R, and I want to keep stars for 
 consulting tasks where things have to be delivered within minutes. I 
 am happy with or without for teaching, where I have the time and can 
 easily talk about the sense and nonsense of p-values.
 
 
 Best,
 Uwe
 
 
 
 
 
 
 
 
 
 
 
 
 

 What I will likely do is
 make a few changes so that character vectors are automatically 
 changed to factors in modelling functions, so that operating with 
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 
 __

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Uwe Ligges



On 12.02.2013 15:42, Frank Harrell wrote:

Uwe I've been consulting for decades and have never once been asked for such
stars.


Honestly: last time I have been asked last week.

And when I answered (in another case few months ago) OK, I can add you 
another 5 stars for p values smaller than 0.5 they did not find it too 
funny.


Best,
Uwe


And when a clinical researcher puts a sentence in a study protocol
that P0.05 will be considered significant I get them to take it out.

Frank

Uwe Ligges-3 wrote

On 12.02.2013 14:54, Ben Bolker wrote:

Duncan Murdoch

murdoch.duncan at
  gmail.com writes:


[snip]


Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
I'll let the people who like it defend it.


Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?


Sure:
I will have to change all my scripts, my teaching examples, my book, and
lots of code examples for research and particularly consulting jobs.

Personally, I think having stringsAsFactors=TRUE is not too bad for
read.table() but less useful for data.frame().

And since you ask for the devil's advocate already, related to the
subject line: Removing stars is horrible for consulting: With all those
people from biology, medicine and other fields who even ask us questions
in term of significance stars that are obviously very common for them.
Many of them will certainly ask us for the stars, and ask us to switch
to another software product once they do not get it from R. They may not
be interested in being taught about the advantages or disadvantages of
p-values or stars.

There are different use cases of R, and I want to keep stars for
consulting tasks where things have to be delivered within minutes. I am
happy with or without for teaching, where I have the time and can easily
talk about the sense and nonsense of p-values.


Best,
Uwe
















What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

Duncan Murdoch



   [apologies for snipping context: gmane made me do it]

__




R-devel@



  mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel



__



R-devel@



  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ben Bolker
On 13-02-12 09:20 AM, Uwe Ligges wrote:
 
 
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?
 
 Sure:
 I will have to change all my scripts, my teaching examples, my book, and
 lots of code examples for research and particularly consulting jobs.
 
 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().
 
 And since you ask for the devil's advocate already, related to the
 subject line: Removing stars is horrible for consulting: With all those
 people from biology, medicine and other fields who even ask us questions
 in term of significance stars that are obviously very common for them.
 Many of them will certainly ask us for the stars, and ask us to switch
 to another software product once they do not get it from R. They may not
 be interested in being taught about the advantages or disadvantages of
 p-values or stars.
 
 There are different use cases of R, and I want to keep stars for
 consulting tasks where things have to be delivered within minutes. I am
 happy with or without for teaching, where I have the time and can easily
 talk about the sense and nonsense of p-values.
 
 
 Best,
 Uwe

  Thanks, Uwe.
  Now let me go one step farther.

  Can you (or anyone) give a good argument **other than backward
compatibility** for keeping the stringAsFactors=TRUE argument on
data.frame()?

  I appreciate your distinction between data.frame() and read.table()'s
use of stringAsFactors, and I can see that there is some point for
quick-and-dirty interactive use in setting all non-numeric variables to
factors (arguing that wanting non-numerics as factors is somewhat more
common than wanting them as strings).

  It might be nice to add an optional stringsAsFactors (and check.names)
argument to transform(): I've had to write my own Transform() function
to allow the defaults to be overridden, since transform() calls
data.frame() with the defaults.  (Setting the stringsAsFactors option
globally would work, although not for check.names.)

  Ben BOlker

 

 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Uwe Ligges



On 12.02.2013 16:40, Ben Bolker wrote:

On 13-02-12 09:20 AM, Uwe Ligges wrote:



On 12.02.2013 14:54, Ben Bolker wrote:

Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]


Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
I'll let the people who like it defend it.


Would someone (anyone) like to come forward and give us a defense
of stringsAsFactors=TRUE -- even someone who doesn't personally like
it but would like to play devil's advocate?


Sure:
I will have to change all my scripts, my teaching examples, my book, and
lots of code examples for research and particularly consulting jobs.

Personally, I think having stringsAsFactors=TRUE is not too bad for
read.table() but less useful for data.frame().

And since you ask for the devil's advocate already, related to the
subject line: Removing stars is horrible for consulting: With all those
people from biology, medicine and other fields who even ask us questions
in term of significance stars that are obviously very common for them.
Many of them will certainly ask us for the stars, and ask us to switch
to another software product once they do not get it from R. They may not
be interested in being taught about the advantages or disadvantages of
p-values or stars.

There are different use cases of R, and I want to keep stars for
consulting tasks where things have to be delivered within minutes. I am
happy with or without for teaching, where I have the time and can easily
talk about the sense and nonsense of p-values.


Best,
Uwe


   Thanks, Uwe.
   Now let me go one step farther.

   Can you (or anyone) give a good argument **other than backward
compatibility** for keeping the stringAsFactors=TRUE argument on
data.frame()?


No, I cannot,
Uwe




   I appreciate your distinction between data.frame() and read.table()'s
use of stringAsFactors, and I can see that there is some point for
quick-and-dirty interactive use in setting all non-numeric variables to
factors (arguing that wanting non-numerics as factors is somewhat more
common than wanting them as strings).

   It might be nice to add an optional stringsAsFactors (and check.names)
argument to transform(): I've had to write my own Transform() function
to allow the defaults to be overridden, since transform() calls
data.frame() with the defaults.  (Setting the stringsAsFactors option
globally would work, although not for check.names.)

   Ben BOlker






What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

Duncan Murdoch



   [apologies for snipping context: gmane made me do it]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Duncan Murdoch

On 12/02/2013 10:40 AM, Ben Bolker wrote:

On 13-02-12 09:20 AM, Uwe Ligges wrote:


 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch murdoch.duncan at gmail.com writes:

[snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
 I'll let the people who like it defend it.

Would someone (anyone) like to come forward and give us a defense
 of stringsAsFactors=TRUE -- even someone who doesn't personally like
 it but would like to play devil's advocate?

 Sure:
 I will have to change all my scripts, my teaching examples, my book, and
 lots of code examples for research and particularly consulting jobs.

 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().

 And since you ask for the devil's advocate already, related to the
 subject line: Removing stars is horrible for consulting: With all those
 people from biology, medicine and other fields who even ask us questions
 in term of significance stars that are obviously very common for them.
 Many of them will certainly ask us for the stars, and ask us to switch
 to another software product once they do not get it from R. They may not
 be interested in being taught about the advantages or disadvantages of
 p-values or stars.

 There are different use cases of R, and I want to keep stars for
 consulting tasks where things have to be delivered within minutes. I am
 happy with or without for teaching, where I have the time and can easily
 talk about the sense and nonsense of p-values.


 Best,
 Uwe

   Thanks, Uwe.
   Now let me go one step farther.

   Can you (or anyone) give a good argument **other than backward
compatibility** for keeping the stringAsFactors=TRUE argument on
data.frame()?


I can, under two assumptions:

  1.  We keep stringsAsFactors=TRUE on read.table().
  2.  We keep the stringsAsFactors argument in data.frame().

Under those assumptions, it would just be confusing to have opposite 
defaults.  (Just in case someone hasn't read all of this thread: I'd be 
happier to have the default be FALSE in both cases, but not until 
3.1.x.  For 3.0.x I think I'd just change the default value of 
default.stringsAsFactors() to FALSE, so people could easily get the old 
behaviour.)


Duncan Murdoch



   I appreciate your distinction between data.frame() and read.table()'s
use of stringAsFactors, and I can see that there is some point for
quick-and-dirty interactive use in setting all non-numeric variables to
factors (arguing that wanting non-numerics as factors is somewhat more
common than wanting them as strings).

   It might be nice to add an optional stringsAsFactors (and check.names)
argument to transform(): I've had to write my own Transform() function
to allow the defaults to be overridden, since transform() calls
data.frame() with the defaults.  (Setting the stringsAsFactors option
globally would work, although not for check.names.)

   Ben BOlker



 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


   [apologies for snipping context: gmane made me do it]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Brian Lee Yung Rowe

I thought that the default was the way it was for performance reasons. For 
large data.frames or repeated applications, using factors should be faster for 
non-trivial strings.

 fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale')
 n - 100

 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=TRUE)
 a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=FALSE)

 fn - function(i,x) x[x$f %in% c('kale','spinach'),]
 system.time(z - sapply(1:100, fn, a1))
   user  system elapsed 
 19.614   4.037  24.649 
 system.time(z - sapply(1:100, fn, a2))
   user  system elapsed 
 19.726   7.715  36.761 


On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote:
 
  Thanks, Uwe.
  Now let me go one step farther.
 
  Can you (or anyone) give a good argument **other than backward
 compatibility** for keeping the stringAsFactors=TRUE argument on
 data.frame()?
 
  I appreciate your distinction between data.frame() and read.table()'s
 use of stringAsFactors, and I can see that there is some point for
 quick-and-dirty interactive use in setting all non-numeric variables to
 factors (arguing that wanting non-numerics as factors is somewhat more
 common than wanting them as strings).
 
  It might be nice to add an optional stringsAsFactors (and check.names)
 argument to transform(): I've had to write my own Transform() function
 to allow the defaults to be overridden, since transform() calls
 data.frame() with the defaults.  (Setting the stringsAsFactors option
 globally would work, although not for check.names.)
 
  Ben BOlker
 
 
 
 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.
 
 Duncan Murdoch
 
 
  [apologies for snipping context: gmane made me do it]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread peter dalgaard

On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:

 
 I thought that the default was the way it was for performance reasons. For 
 large data.frames or repeated applications, using factors should be faster 
 for non-trivial strings.

I think not. Historically, it's more like In statistics we have two kinds of 
variables, numerical and categorical. OK, so we have the occasional truly 
character-type variables like name and address, let's handle those as a special 
case. 


-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Ravi Varadhan
They are reaching for the stars.  Pardon my jest, but I couldn't resist. 

Ravi

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Uwe Ligges
Sent: Tuesday, February 12, 2013 10:01 AM
To: Frank Harrell
Cc: r-devel@r-project.org
Subject: Re: [Rd] Regression stars



On 12.02.2013 15:42, Frank Harrell wrote:
 Uwe I've been consulting for decades and have never once been asked 
 for such stars.

Honestly: last time I have been asked last week.

And when I answered (in another case few months ago) OK, I can add you another 
5 stars for p values smaller than 0.5 they did not find it too funny.

Best,
Uwe

 And when a clinical researcher puts a sentence in a study protocol 
 that P0.05 will be considered significant I get them to take it out.

 Frank

 Uwe Ligges-3 wrote
 On 12.02.2013 14:54, Ben Bolker wrote:
 Duncan Murdoch
 murdoch.duncan at
   gmail.com writes:

 [snip]

 Regarding stringsAsFactors:  I'm not going to defend keeping it as 
 is, I'll let the people who like it defend it.

 Would someone (anyone) like to come forward and give us a 
 defense of stringsAsFactors=TRUE -- even someone who doesn't 
 personally like it but would like to play devil's advocate?

 Sure:
 I will have to change all my scripts, my teaching examples, my book, 
 and lots of code examples for research and particularly consulting jobs.

 Personally, I think having stringsAsFactors=TRUE is not too bad for
 read.table() but less useful for data.frame().

 And since you ask for the devil's advocate already, related to the 
 subject line: Removing stars is horrible for consulting: With all 
 those people from biology, medicine and other fields who even ask us 
 questions in term of significance stars that are obviously very common for 
 them.
 Many of them will certainly ask us for the stars, and ask us to 
 switch to another software product once they do not get it from R. 
 They may not be interested in being taught about the advantages or 
 disadvantages of p-values or stars.

 There are different use cases of R, and I want to keep stars for 
 consulting tasks where things have to be delivered within minutes. I 
 am happy with or without for teaching, where I have the time and can 
 easily talk about the sense and nonsense of p-values.


 Best,
 Uwe














 What I will likely do is
 make a few changes so that character vectors are automatically 
 changed to factors in modelling functions, so that operating with 
 stringsAsFactors=FALSE doesn't trigger silly warnings.

 Duncan Murdoch


[apologies for snipping context: gmane made me do it]

 __


 R-devel@

   mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 __

 R-devel@

   mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
 Sent from the R devel mailing list archive at Nabble.com.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Tim Triche, Jr.
I think it may have been John D. Cook who first observed that p-values are
linearly correlated with the amount of time remaining on a grant.

Perhaps a suitable transform would reveal an ordinal relationship with
stars.



On Tue, Feb 12, 2013 at 7:03 AM, Ravi Varadhan ravi.varad...@jhu.eduwrote:

 They are reaching for the stars.  Pardon my jest, but I couldn't resist.

 Ravi

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
 On Behalf Of Uwe Ligges
 Sent: Tuesday, February 12, 2013 10:01 AM
 To: Frank Harrell
 Cc: r-devel@r-project.org
 Subject: Re: [Rd] Regression stars



 On 12.02.2013 15:42, Frank Harrell wrote:
  Uwe I've been consulting for decades and have never once been asked
  for such stars.

 Honestly: last time I have been asked last week.

 And when I answered (in another case few months ago) OK, I can add you
 another 5 stars for p values smaller than 0.5 they did not find it too
 funny.

 Best,
 Uwe

  And when a clinical researcher puts a sentence in a study protocol
  that P0.05 will be considered significant I get them to take it out.
 
  Frank
 
  Uwe Ligges-3 wrote
  On 12.02.2013 14:54, Ben Bolker wrote:
  Duncan Murdoch
  murdoch.duncan at
gmail.com writes:
 
  [snip]
 
  Regarding stringsAsFactors:  I'm not going to defend keeping it as
  is, I'll let the people who like it defend it.
 
  Would someone (anyone) like to come forward and give us a
  defense of stringsAsFactors=TRUE -- even someone who doesn't
  personally like it but would like to play devil's advocate?
 
  Sure:
  I will have to change all my scripts, my teaching examples, my book,
  and lots of code examples for research and particularly consulting jobs.
 
  Personally, I think having stringsAsFactors=TRUE is not too bad for
  read.table() but less useful for data.frame().
 
  And since you ask for the devil's advocate already, related to the
  subject line: Removing stars is horrible for consulting: With all
  those people from biology, medicine and other fields who even ask us
  questions in term of significance stars that are obviously very common
 for them.
  Many of them will certainly ask us for the stars, and ask us to
  switch to another software product once they do not get it from R.
  They may not be interested in being taught about the advantages or
  disadvantages of p-values or stars.
 
  There are different use cases of R, and I want to keep stars for
  consulting tasks where things have to be delivered within minutes. I
  am happy with or without for teaching, where I have the time and can
  easily talk about the sense and nonsense of p-values.
 
 
  Best,
  Uwe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  What I will likely do is
  make a few changes so that character vectors are automatically
  changed to factors in modelling functions, so that operating with
  stringsAsFactors=FALSE doesn't trigger silly warnings.
 
  Duncan Murdoch
 
 
 [apologies for snipping context: gmane made me do it]
 
  __
 
 
  R-devel@
 
mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
  __
 
  R-devel@
 
mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 
 
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
  http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html
  Sent from the R devel mailing list archive at Nabble.com.
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Simon Urbanek

On Feb 12, 2013, at 11:05 AM, Brian Lee Yung Rowe wrote:

 
 I thought that the default was the way it was for performance reasons. For 
 large data.frames or repeated applications, using factors should be faster 
 for non-trivial strings.
 
 fs - c('apple','peach','watermelon','spinach','persimmon','potato','kale')
 n - 100
 
 a1 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=TRUE)
 a2 - data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), 
 stringsAsFactors=FALSE)
 
 fn - function(i,x) x[x$f %in% c('kale','spinach'),]
 system.time(z - sapply(1:100, fn, a1))
   user  system elapsed 
 19.614   4.037  24.649 
 system.time(z - sapply(1:100, fn, a2))
   user  system elapsed 
 19.726   7.715  36.761 
 

Not really:

 system.time(z - sapply(1:100, fn, a1))
   user  system elapsed 
 13.780   0.444  14.229 
 rm(z)
 gc()
  used (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  182113  9.8 407500   21.8337655   18.1
Vcells 5789638 44.2  133982285 1022.3 163019778 1243.8
 system.time(z - sapply(1:100, fn, a2))
   user  system elapsed 
 13.201   0.668  13.873 


But your test is bogus, because %in% uses match() which converts factors to 
character vectors anyway, so in your case you're just measuring noise in your 
system, character vectors are always faster in your example.

The reason is that in R strings are hashed so character vectors are technically 
very similar to factors just with faster access (because they don't need to go 
through the integer indirection). On 32-bit strings are in theory always faster 
than factors, on 64-bit they use double the size so they may or may not be 
faster depending on how you hit the cache etc. Anyway, in modern R versions 
you're much better off using character vectors than factors for any processing, 
so stringsAsFactors=FALSE is what I use exclusively.

Cheers,
Simon

 
 On Feb 12, 2013, at 10:40 AM, Ben Bolker bbol...@gmail.com wrote:
 
 Thanks, Uwe.
 Now let me go one step farther.
 
 Can you (or anyone) give a good argument **other than backward
 compatibility** for keeping the stringAsFactors=TRUE argument on
 data.frame()?
 
 I appreciate your distinction between data.frame() and read.table()'s
 use of stringAsFactors, and I can see that there is some point for
 quick-and-dirty interactive use in setting all non-numeric variables to
 factors (arguing that wanting non-numerics as factors is somewhat more
 common than wanting them as strings).
 
 It might be nice to add an optional stringsAsFactors (and check.names)
 argument to transform(): I've had to write my own Transform() function
 to allow the defaults to be overridden, since transform() calls
 data.frame() with the defaults.  (Setting the stringsAsFactors option
 globally would work, although not for check.names.)
 
 Ben BOlker
 
 
 
 What I will likely do is
 make a few changes so that character vectors are automatically changed
 to factors in modelling functions, so that operating with
 stringsAsFactors=FALSE doesn't trigger silly warnings.
 
 Duncan Murdoch
 
 
 [apologies for snipping context: gmane made me do it]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Hervé Pagès

On 02/12/2013 08:20 AM, peter dalgaard wrote:


On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:



I thought that the default was the way it was for performance reasons. For 
large data.frames or repeated applications, using factors should be faster for 
non-trivial strings.


I think not. Historically, it's more like In statistics we have two kinds of 
variables, numerical and categorical. OK, so we have the occasional truly character-type 
variables like name and address, let's handle those as a special case.


sarcasm

Since character vectors are so bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.

/sarcasm

No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?

Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.

Thanks,
H.






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Duncan Murdoch

On 12/02/2013 1:47 PM, Hervé Pagès wrote:

On 02/12/2013 08:20 AM, peter dalgaard wrote:

 On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:


 I thought that the default was the way it was for performance reasons. For 
large data.frames or repeated applications, using factors should be faster for 
non-trivial strings.

 I think not. Historically, it's more like In statistics we have two kinds of 
variables, numerical and categorical. OK, so we have the occasional truly character-type 
variables like name and address, let's handle those as a special case.

sarcasm

Since character vectors are so bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.

/sarcasm


I think you are misreading what Peter wrote.  He wasn't defending that 
point of view, he was describing it.


No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?

Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.


That's a really bad suggestion -- it would break code for people who set 
stringsAsFactors=FALSE as well as those who rely on the current default 
behaviour.   We certainly won't do that.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-12 Thread Hervé Pagès

Hi Duncan,

On 02/12/2013 11:19 AM, Duncan Murdoch wrote:

On 12/02/2013 1:47 PM, Hervé Pagès wrote:

On 02/12/2013 08:20 AM, peter dalgaard wrote:

 On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:


 I thought that the default was the way it was for performance
reasons. For large data.frames or repeated applications, using factors
should be faster for non-trivial strings.

 I think not. Historically, it's more like In statistics we have two
kinds of variables, numerical and categorical. OK, so we have the
occasional truly character-type variables like name and address, let's
handle those as a special case.

sarcasm

Since character vectors are so bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.

/sarcasm


I think you are misreading what Peter wrote.  He wasn't defending that
point of view, he was describing it.


I was answering to the thread, not to Peter in particular. Sorry if it
sounded otherwise.



No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?

Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.


That's a really bad suggestion -- it would break code for people who set
stringsAsFactors=FALSE as well as those who rely on the current default
behaviour.   We certainly won't do that.


But since there seems to be a discussion about doing some changes to
the stringsAsFactors feature, I was hoping you would consider that
one too.  Doing the right thing sometimes requires breaking people's
code, sadly!

Cheers,
H.



Duncan Murdoch



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-10 Thread Frank Harrell
Great discussion.   Tim's Sinclair quote is priceless and relates to the
non-reproducible research done in some quarters.   Norm's wish to remove
stars altogether is entirely consistent with good statistical practice and
would make a statement that R base adheres to good practice.  I don't think
it will work to add confidence intervals because models can have nonlinear
or interaction terms, and the reference cell for a factor variable may not
be what the analyst chooses for a comparison group.

I would like for us to find a way to, over time, implement Norm's wish to
de-emphasize P-values in general.  The harm done  by P-values is
immeasureable.

Frank

Norm Matloff wrote
 I appreciate Tim's comments.
 
 I myself have a social science paper coming out soon in which I felt
 forced to use p-values, given their ubiquity.  However, I also told
 readers of the paper that confidence intervals are much more informative
 and I do provide them.  As I said earlier, there is no avoiding that,
 and R needs to report p-values for that reason.  
 
 Instead, the question is what to do about the stars; I proposed
 eliminating them altogether.  Star-crazed users know how to determine
 them themselves from the p-values, but deleting them from R would send a
 message.
 
 I did say my proposal was bold, which really meant I was suggesting
 that R do SOMETHING to send that message, not necessarily star
 elimination.
 
 One such something would be the proposal I made, which would be to add
 confidence intervals to the output.  This too could be just an option,
 but again offering that option would send a message.  Indeed, I would
 suggest that the help page explain that confidence intervals are more
 informative.  (The help page could make a similar statement regarding
 the stars.)
 
 When I pitch R to people, I say that in addition to the large function
 and library base and the nice graphics capabilities, R is above all
 Statistically Correct--it's written by statisticians who know what they
 are doing, rather than some programmer simply implementing a formula
 from a textbook.  I know that a lot of people feel this is one of R's
 biggest strengths.  Given that, one might argue that R should do what it
 can to help users engage in good statistical practice.  I think this was
 Frank's point.
 
 Norm
 
 __

 R-devel@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658084.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-10 Thread Duncan Murdoch

On 13-02-09 3:49 PM, Tim Triche, Jr. wrote:

To clarify, I favor changing the defaults for stringsAsFactors and
show.signif.stars to FALSE in R-3.0.0, and view any attempt to remove
either functionality as a seemingly simple but fundamentally misguided idea.


Both of these were discussed by R Core.  I think it's unlikely the 
default for stringsAsFactors will be changed (some R Core members like 
the current behaviour), but it's fairly likely the show.signif.stars 
default will change.  (That's if someone gets around to it:  I 
personally don't care about that one.  P-values are commonly used 
statistics, and the stars are just a simple graphical display of them. 
I find some p-values to be useful, and the display to be harmless.)


I think it's really unlikely the more extreme changes (i.e. dropping 
show.signif.stars completely, or dropping p-values) will happen.


Regarding stringsAsFactors:  I'm not going to defend keeping it as is, 
I'll let the people who like it defend it.  What I will likely do is 
make a few changes so that character vectors are automatically changed 
to factors in modelling functions, so that operating with 
stringsAsFactors=FALSE doesn't trigger silly warnings.


Duncan Murdoch



This is just my opinion, of course.  The change could easily be accompanied
by a startup notice or release notes indicating that the changes have been
made, and can be reverted to past behavior if the user so desires.  Perhaps
more users will investigate the various settings, as a happy side effect.

My thanks to everyone who spends time supporting and working on R-core.



On Sat, Feb 9, 2013 at 12:44 PM, Tim Triche, Jr. tim.tri...@gmail.comwrote:


Changing the default for show.signif.stars should be sufficient to ensure
that, if people are going to get themselves into trouble, they will have to
do it on purpose.  It's just a visual cue; removing it will not remove the
underlying issue, namely blind acceptance of unlikely null models and
distributions.

For any complex problem, there is a solution that is simple, elegant, and
wrong.  As grants and careers can depend on these magic numbers, Upton
Sinclair might save everyone some trouble... It is difficult to get a man
to understand something, when his salary depends upon his not
understanding.

stringsAsFactors, however, is responsible for an endless stream of mildly
irritating misunderstandings, and defaulting that to FALSE would be very
nice.

Just my $0.02.  Defaults are one of the most powerful forces in the
universe.

Also, I liked your book.



On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff matl...@cs.ucdavis.eduwrote:


Thanks for bringing this up, Frank.

Since many of us are educators, I'd like to suggest a bolder approach.
Discontinue even offering the stars as an option.  Sadly, we can't stop
reporting p-values, as the world expects them, but does R need to cater
to that attitude by offering star display?  For that matter, why not
have R report confidence intervals as a default?

Many years ago, I wrote a short textbook on stat, and included a
substantial section on the dangers of significance testing.  All three
internal reviewers liked it, but the funny part is that all three said,
I agree with this, but no one else will. :-)

Norm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
*A model is a lie that helps you see the truth.*
*
*
Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf







__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-09 Thread Norm Matloff
Thanks for bringing this up, Frank.

Since many of us are educators, I'd like to suggest a bolder approach.
Discontinue even offering the stars as an option.  Sadly, we can't stop
reporting p-values, as the world expects them, but does R need to cater
to that attitude by offering star display?  For that matter, why not
have R report confidence intervals as a default?

Many years ago, I wrote a short textbook on stat, and included a
substantial section on the dangers of significance testing.  All three
internal reviewers liked it, but the funny part is that all three said,
I agree with this, but no one else will. :-)

Norm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-09 Thread Tim Triche, Jr.
Changing the default for show.signif.stars should be sufficient to ensure
that, if people are going to get themselves into trouble, they will have to
do it on purpose.  It's just a visual cue; removing it will not remove the
underlying issue, namely blind acceptance of unlikely null models and
distributions.

For any complex problem, there is a solution that is simple, elegant, and
wrong.  As grants and careers can depend on these magic numbers, Upton
Sinclair might save everyone some trouble... It is difficult to get a man
to understand something, when his salary depends upon his not
understanding.

stringsAsFactors, however, is responsible for an endless stream of mildly
irritating misunderstandings, and defaulting that to FALSE would be very
nice.

Just my $0.02.  Defaults are one of the most powerful forces in the
universe.

Also, I liked your book.



On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff matl...@cs.ucdavis.eduwrote:

 Thanks for bringing this up, Frank.

 Since many of us are educators, I'd like to suggest a bolder approach.
 Discontinue even offering the stars as an option.  Sadly, we can't stop
 reporting p-values, as the world expects them, but does R need to cater
 to that attitude by offering star display?  For that matter, why not
 have R report confidence intervals as a default?

 Many years ago, I wrote a short textbook on stat, and included a
 substantial section on the dangers of significance testing.  All three
 internal reviewers liked it, but the funny part is that all three said,
 I agree with this, but no one else will. :-)

 Norm

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-09 Thread Tim Triche, Jr.
To clarify, I favor changing the defaults for stringsAsFactors and
show.signif.stars to FALSE in R-3.0.0, and view any attempt to remove
either functionality as a seemingly simple but fundamentally misguided idea.

This is just my opinion, of course.  The change could easily be accompanied
by a startup notice or release notes indicating that the changes have been
made, and can be reverted to past behavior if the user so desires.  Perhaps
more users will investigate the various settings, as a happy side effect.

My thanks to everyone who spends time supporting and working on R-core.



On Sat, Feb 9, 2013 at 12:44 PM, Tim Triche, Jr. tim.tri...@gmail.comwrote:

 Changing the default for show.signif.stars should be sufficient to ensure
 that, if people are going to get themselves into trouble, they will have to
 do it on purpose.  It's just a visual cue; removing it will not remove the
 underlying issue, namely blind acceptance of unlikely null models and
 distributions.

 For any complex problem, there is a solution that is simple, elegant, and
 wrong.  As grants and careers can depend on these magic numbers, Upton
 Sinclair might save everyone some trouble... It is difficult to get a man
 to understand something, when his salary depends upon his not
 understanding.

 stringsAsFactors, however, is responsible for an endless stream of mildly
 irritating misunderstandings, and defaulting that to FALSE would be very
 nice.

 Just my $0.02.  Defaults are one of the most powerful forces in the
 universe.

 Also, I liked your book.



 On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff matl...@cs.ucdavis.eduwrote:

 Thanks for bringing this up, Frank.

 Since many of us are educators, I'd like to suggest a bolder approach.
 Discontinue even offering the stars as an option.  Sadly, we can't stop
 reporting p-values, as the world expects them, but does R need to cater
 to that attitude by offering star display?  For that matter, why not
 have R report confidence intervals as a default?

 Many years ago, I wrote a short textbook on stat, and included a
 substantial section on the dangers of significance testing.  All three
 internal reviewers liked it, but the funny part is that all three said,
 I agree with this, but no one else will. :-)

 Norm

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 --
 *A model is a lie that helps you see the truth.*
 *
 *
 Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf




-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-09 Thread Norm Matloff
I appreciate Tim's comments.

I myself have a social science paper coming out soon in which I felt
forced to use p-values, given their ubiquity.  However, I also told
readers of the paper that confidence intervals are much more informative
and I do provide them.  As I said earlier, there is no avoiding that,
and R needs to report p-values for that reason.  

Instead, the question is what to do about the stars; I proposed
eliminating them altogether.  Star-crazed users know how to determine
them themselves from the p-values, but deleting them from R would send a
message.

I did say my proposal was bold, which really meant I was suggesting
that R do SOMETHING to send that message, not necessarily star
elimination.

One such something would be the proposal I made, which would be to add
confidence intervals to the output.  This too could be just an option,
but again offering that option would send a message.  Indeed, I would
suggest that the help page explain that confidence intervals are more
informative.  (The help page could make a similar statement regarding
the stars.)

When I pitch R to people, I say that in addition to the large function
and library base and the nice graphics capabilities, R is above all
Statistically Correct--it's written by statisticians who know what they
are doing, rather than some programmer simply implementing a formula
from a textbook.  I know that a lot of people feel this is one of R's
biggest strengths.  Given that, one might argue that R should do what it
can to help users engage in good statistical practice.  I think this was
Frank's point.

Norm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] regression stars

2013-02-08 Thread Terry Therneau
 There are only a few things in R where we override the global defaults on a departmental 
level -- we really don't like to do so.  But show.signif.stars is one of the 3.


  The other 2 if you are curious: set stringsAsFactors=FALSE and make NA included by 
default in the output of table. We've been overriding both of these for 10+ years.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Regression stars

2013-02-07 Thread Frank Harrell
Today's GNU R tutorial in
http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics
points out how bad statistical practice is being further perpetuated, by
virtue of significance stars still being the default in printed output
from lm models.




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-07 Thread John Fox
Dear Frank,

I'd like to second your implicit motion to make 
options(show.signif.stars=FALSE) the default.

Thanks for raising this point.

John

On Thu, 7 Feb 2013 05:32:04 -0800 (PST)
 Frank Harrell f.harr...@vanderbilt.edu wrote:
 Today's GNU R tutorial in
 http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics
 points out how bad statistical practice is being further perpetuated, by
 virtue of significance stars still being the default in printed output
 from lm models.
 
 
 
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html
 Sent from the R devel mailing list archive at Nabble.com.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression stars

2013-02-07 Thread Marc Schwartz
FWIW, that has been my default setting for years in my .Rprofile.

If there is some agreement on this from R Core, it would seem that version 
3.0.0 would be a reasonable breakpoint for this change in default behavior.

Regards,

Marc Schwartz

On Feb 7, 2013, at 8:27 AM, John Fox j...@mcmaster.ca wrote:

 Dear Frank,
 
 I'd like to second your implicit motion to make 
 options(show.signif.stars=FALSE) the default.
 
 Thanks for raising this point.
 
 John
 
 On Thu, 7 Feb 2013 05:32:04 -0800 (PST)
 Frank Harrell f.harr...@vanderbilt.edu wrote:
 Today's GNU R tutorial in
 http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics
 points out how bad statistical practice is being further perpetuated, by
 virtue of significance stars still being the default in printed output
 from lm models.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel