Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-18 Thread John Maindonald
There are answers that could and should be applied in specific 
situations.  At least in academia and in substantial research teams, 
statisticians ought to have a prominent part in many of the research 
teams.  Senior statisticians should have a prominent role in deciding 
the teams to which this applies.  why should it be ok to do combine 
high levels of chemical expertise with truly appalling statistical 
misunderstandings, to the extent that the suppose chemical insights are 
not what they appear to be?

There should be a major focus on training application area students on 
training them to understand important ideas, to recognize when they are 
out of their depth, and to work with statisticians.

There should be much more use of statisticians in the refereeing of 
published papers.  Editors need to seek advice from experienced 
statisticians (some do) on what sorts of papers are candidates for 
statistical refereeing.

Publication in an archive of the data that have been used for a paper 
could be a huge help, so that others can check whether the data really 
do support the conclusion.  Even better, as Robert Gentleman has 
argued, would/will be papers that can be processed through Sweave or 
its equivalent.

Really enlightened people (in the statistical sense) in the applied 
communities will latch onto R, as some are doing, because the 
limitations inherent in much other software so often lead to crippled 
and/or misleading analyses.  Increasingly, we can hope that it will 
become difficult for statistics to in various applied area communities 
to proceed on its merry way, ignorant of or ignoring most of what has 
happened in the mainstream statistical community in the past 20 years.

The statistical community needs to be a lot more aggressive in 
demanding adequate standards of data analysis in applied areas, at the 
same time suggesting ways in which it can work with application area 
people to improve standards.

It is also fair to comment that the situation is very uneven.  There 
are some areas where the standards are pretty reasonable, at least for 
the types of problems that typically come up in those areas.
John Maindonald.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 18 Aug 2004, Bert Gunter wrote:
So we see fairly frequently indications
of misunderstanding and confusion in using R. But the problem isn't R 
-- it's that users don't know enough statistics.

. . . .
I wish I could say I had an answer for this, but I don't have a clue. I 
do not thing it's fair to expect a mechnical engineer or psychologist 
or biologist to have the numerous math and statistical courses and 
experience in their training that would provide the base they need. For 
one thing, they don't have the time in their studies for this; for 
another, they may not have the background or interest -- they are, 
after all, mechanical engineers or biologists, not statisticians. 
Unfortunately, they could do their jobs as engineers and scientists a 
lot better if they did know more
statistics.  To me, it's a fundamental conundrum, and no one is to 
blame. It's just the reality, but it is the source for all kinds of 
frustrations on both sides of the statistical divide, which both you 
and Roger expressed in your own ways.
. . . .

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread david_foreman
First, many thanks to Frank Harrell for once again helping me out.  This actually 
relates to the next point, which is my contribution to the 'why don't social 
scientists use R' discussion.  I am a hybrid social scientist(child psychiatrist) who 
trained on SPSS.  Many of my difficulties in coming to terms with R have been to do 
with trying to apply the logic underlying SPSS, with dire results.  You do not want to 
know how long I spent looking for a 'recode' command in R, to change factor names and 
classes.

I think the solution is to combine a graphical interface that encourages command line 
use (such as Rcommander) with the analyse(this) paradigm suggested, but also 
explaining how one can a) display the code on a separate window ('page' is only an 
obvious command once you know it), and b) how one can then save one's modification, 
make it generally available, and not overwrite the unmodified version (again, thanks, 
Frank).  Finally, one would need to change the emphasis in basic statistical teaching 
from 'the right test' to 'the right model'.  That should get people used to R's logic.

If a rabbit starts to use R, s/he is likely to head for the help files associated with 
each function, which can assume that the reader can make sense of gnomic utterances 
like Omit 'var' to impute all variables, creating new variables in 'search' position 
'where'.  I still don't know what that one means (as I don't understand search 
positions, or why they're important).  This can be very offputting, and could lead the 
rabbit to return to familiar SPSS territory.

Finally, friendlier error messages would also help. It took me 3 days, and opening 
every function I could, to work out that '...cannot find function xxx.data.frame...' 
meant that MICE was unable to make a polychotomous logistic imputation model converge 
for the variable immediately preceding it.


I am now off to the help files and FAQs to find out how to change graph parameters, as 
the plot.mids function in MICE a) doesn't allow one to select a subset of variables, 
and b) tells me that the graph it wants to produce on the whole of my 26 variable 
dataset is too big to fit on the (windows) plotting device.  Unless anyone wants to 
tell me how/where? (which of course is why, in the end, R is EASIER to use than SPSS)


-- Original Message --
From: [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
Date:  Sun, 15 Aug 2004 12:10:22 +0200

Send R-help mailing list submissions to
   [EMAIL PROTECTED]

To subscribe or unsubscribe via the World Wide Web, visit
   https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
   [EMAIL PROTECTED]

You can reach the person managing the list at
   [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than Re: Contents of R-help digest...


Today's Topics:

   1. Re: numerical accuracy, dumb question (Brian Gough)
   2. RE: numerical accuracy, dumb question (Tony Plate)
   3. RE: numerical accuracy, dumb question (Dan Bolser)
   4. Re: extracting datasets from aregImpute objects
  (Frank E Harrell Jr)
   5. RE: numerical accuracy, dumb question (Marc Schwartz)
   6. RE: numerical accuracy, dumb question (Marc Schwartz)
   7. RE: numerical accuracy, dumb question (Prof Brian Ripley)
   8. ROracle connection problem (xianghe yan)
   9. association rules in R (Christoph Lehmann)
  10. R Cookbook ([EMAIL PROTECTED])
  11. RE: numerical accuracy, dumb question (Marc Schwartz)
  12. How to display the equation of ECDF (Yair Benita)
  13. Re: association rules in R (Spencer Graves)
  14. Re: How to display the equation of ECDF (Rolf Turner)
  15. Re: How to display the equation of ECDF (Spencer Graves)
  16. how to draw two graphs in one graph window (Chuanjun Zhang)
  17. Rserve needs (but cannot find) libR.a (or maybe it's .so)
  (Paul Shannon)
  18. Re: Rserve needs (but cannot find) libR.a (or maybe it's .so)
  (A.J. Rossini)
  19. calibration/validation sets (Peyuco Porras Porras .)
  20. RE: calibration/validation sets (Austin, Matt)
  21. Re: calibration/validation sets (Kevin Wang)
  22. RE: calibration/validation sets (Liaw, Andy)
  23. Dirichlet-Multinomial (Z P)
  24. Re: how to draw two graphs in one graph window
  (Adaikalavan Ramasamy)
  25. index and by groups statement (Robert Waters)
  26. Re: index and by groups statement (Adaikalavan Ramasamy)


--

Message: 1
Date: 14 Aug 2004 10:46:31 +0100
From: Brian Gough [EMAIL PROTECTED]
Subject: Re: [R] numerical accuracy, dumb question
To: Dan Bolser [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Message-ID: [EMAIL PROTECTED]

Dan Bolser [EMAIL PROTECTED] writes:

 I store an id as a big number, could this be a problem?

If there are ids with significant leading zeros, or too big to be
represented accurately (2^53)--you won't get any warning about 

Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread Roger D. Peng
I'm just curious, but how do social scientists, or anyone else for 
that matter, learn SPSS, besides taking a class?

-roger
[EMAIL PROTECTED] wrote:
First, many thanks to Frank Harrell for once again helping me out.
This actually relates to the next point, which is my contribution
to the 'why don't social scientists use R' discussion.  I am a
hybrid social scientist(child psychiatrist) who trained on SPSS.
Many of my difficulties in coming to terms with R have been to do
with trying to apply the logic underlying SPSS, with dire results.
You do not want to know how long I spent looking for a 'recode'
command in R, to change factor names and classes.
I think the solution is to combine a graphical interface that
encourages command line use (such as Rcommander) with the
analyse(this) paradigm suggested, but also explaining how one can
a) display the code on a separate window ('page' is only an obvious
command once you know it), and b) how one can then save one's
modification, make it generally available, and not overwrite the
unmodified version (again, thanks, Frank).  Finally, one would need
to change the emphasis in basic statistical teaching from 'the
right test' to 'the right model'.  That should get people used to
R's logic.
If a rabbit starts to use R, s/he is likely to head for the help
files associated with each function, which can assume that the
reader can make sense of gnomic utterances like Omit 'var' to
impute all variables, creating new variables in 'search' position
'where'.  I still don't know what that one means (as I don't
understand search positions, or why they're important).  This can
be very offputting, and could lead the rabbit to return to familiar
SPSS territory.
Finally, friendlier error messages would also help. It took me 3
days, and opening every function I could, to work out that
'...cannot find function xxx.data.frame...' meant that MICE was
unable to make a polychotomous logistic imputation model converge
for the variable immediately preceding it.
I am now off to the help files and FAQs to find out how to change
graph parameters, as the plot.mids function in MICE a) doesn't
allow one to select a subset of variables, and b) tells me that the
graph it wants to produce on the whole of my 26 variable dataset is
too big to fit on the (windows) plotting device.  Unless anyone
wants to tell me how/where? (which of course is why, in the end, R
is EASIER to use than SPSS)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re: Thanks Frank, setting graph parameters,and why social scientists don't use R

2004-08-17 Thread Berton Gunter
A few comments:

First, your remarks are interesting and, I would say, mainly well founded. However, I 
think they are in many respects irrelevant, although they do point to the much bigger 
underlying issue, which Roger Peng also hinted at in his reply.

I think they are sensible because R IS difficult; the documentation is often 
challenging, which is not surprising given (a) the inherent complexity of R; (b) the 
difficulty in writing good documentation, especially when many of the functions being 
documented are inherently technical, so subject matter knowledge (CS, statistics, 
numerical analysis ,...) must be assumed; (c) the documentation has been written by a 
variety of mostly statistical types as a sidelight of their main professional 
activities -- none of these writers are ** professional documenters ** (whatever that 
may mean)
and some of them even speak ENglish as a second or third language. My own take is that 
the documentation for Core R and many of the packages is remarkably well done given 
these realities, and my hat is off to those who have produced it. Nevertheless, I 
agree, it is challenging -- it MUST be.

But they are irrelevant because the fundamental issue **is** that there is an inherent 
tension between ease of use and power/flexibility. Writing good GUI's for anything is 
hard, very hard. For a project such as R, it doesn't make sense, although it may to 
write GUI's for small subsets of R targeted at specific audiences (as in BioConductor, 
RCommander, etc.). But even this is hard to do well and takes a lot of time and 
effort. So, IMHO, there never will be nor ever should/could be an overall GUI for R: 
it is too complex and needs to be too extensible and flexible to constrain it in
that way.

However, I believe the larger question that both you and Roger Peng hint at is more 
important: not How does a social scientist learn to use R, but how does any 
scientist/technologist for whom experimental design and data analysis forms a large 
component of their work gain the necessary technical background in statistics and 
related disciplines (linear algebra, numerical analysis, ...) to ** know how to use 
the statistical tools they need that R provides.**  Software like SPSS must assume a 
limited collection of methods to present to their customers in an effective GUI. Their 
strategy
**must** be (this is NOT a criticism) to dumb it down so that they can provide 
coherent albeit limited data analysis strategies. As you have explicitly stated, users 
who wish to venture outside those narrow paradigms are simply out of luck. R was 
designed from the outset not to be so constrained, but the cost is that you must know 
a good deal to use it effectively. It is obvious from the questions posted to this 
list that even something as simple as lm() often demands from users technical 
statistical understanding far beyond what they have. So we see fairly frequently 
indications
of misunderstanding and confusion in using R. But the problem isn't R -- it's that 
users don't know enough statistics.

I wish I could say I had an answer for this, but I don't have a clue. I do not thing 
it's fair to expect a mechnical engineer or psychologist or biologist to have the 
numerous math and statistical courses and experience in their training that would 
provide the base they need. For one thing, they don't have the time in their studies 
for this; for another, they may not have the background or interest -- they are, after 
all, mechanical engineers or biologists, not statisticians. Unfortunately, they could 
do their jobs as engineers and scientists a lot better if they did know more
statistics.  To me, it's a fundamental conundrum, and no one is to blame. It's just 
the reality, but it is the source for all kinds of frustrations on both sides of the 
statistical divide, which both you and Roger expressed in your own ways.

Obviously, all of this is just personal ranting, so I would love to hear alternative 
views. An thanks again for your clear and interesting comments.

Cheers,
Bert

[EMAIL PROTECTED] wrote:

 First, many thanks to Frank Harrell for once again helping me out.  This actually 
 relates to the next point, which is my contribution to the 'why don't social 
 scientists use R' discussion.  I am a hybrid social scientist(child psychiatrist) 
 who trained on SPSS.  Many of my difficulties in coming to terms with R have been to 
 do with trying to apply the logic underlying SPSS, with dire results.  You do not 
 want to know how long I spent looking for a 'recode' command in R, to change factor 
 names and classes.

 I think the solution is to combine a graphical interface that encourages command 
 line use (such as Rcommander) with the analyse(this) paradigm suggested, but also 
 explaining how one can a) display the code on a separate window ('page' is only an 
 obvious command once you know it), and b) how one can then save one's modification, 
 make it generally available, and 

Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread John
On Tuesday 17 August 2004 06:14, Roger D. Peng wrote:
 I'm just curious, but how do social scientists, or anyone else for
 that matter, learn SPSS, besides taking a class?

They sit down with a book, a computer, and data they desperately need to 
analyze and start working.  SPSS documentation and some of the third party 
works are fairly thorough, and pretty gentle, and the writings fits the 
expectations of someone who has had only the initiatory stats courses.  Your 
teacher emphasizes checking the normality of the data, so you look for the 
means of measuring it and the tests that tell you whether it is significant 
or not, after very carefully considering the nature of your data in the light 
of the assumptions made in the SPSS tests make.  You are far less concerned 
with the real mathematical mechanics than you are about meeting the 
expectations of the professor.  SPSS, SYSTAT, NCSS and similar programs all 
support this kind work.  Many social science professors don't really know 
enough to judge your work beyond similar expectations THEY learned from their 
own professors.  It's sad, but the way it works in many schools.

J

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread John
On Tuesday 17 August 2004 09:20, Berton Gunter wrote:
 A few comments:

It has been decades since I used SPSS.  At that time, to really work with it 
you edited a text file program that identified the data file and variable 
columns you wanted to work with.  You assembled the flow of work commands 
after carefully going through the SPSS documentation.  After you were ready, 
you ran the program and crossed your fingers.  R IS complex, enough so that 
the useability at a basic level is readily achievable.  What it lacks is 
simply the Stat 1 and Stat 101 packages that lead users from the very basics 
covered in introductory statistics texts into more profound analyses that 
some many R users are interested in.  There are some texts, such as Peter 
Daalgard's Introductory Statistics with R, which is a very useful book.  
However, from a student's view point Chapter 1 focuses on R, everything from 
the R Language to R programming.  The statistics chapters that follow almost 
seem to be used as an adjunct to teaching R rather than vice versa.  For some 
social science students, a package that leads more gradually into R would 
probably be a big help learning learning the language while getting their 
feet wet in statistics.

John

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html