A few comments:

First, your remarks are interesting and, I would say, mainly well founded. However, I 
think they are in many respects irrelevant, although they do point to the much bigger 
underlying issue, which Roger Peng also hinted at in his reply.

I think they are sensible because R IS difficult; the documentation is often 
challenging, which is not surprising given (a) the inherent complexity of R; (b) the 
difficulty in writing good documentation, especially when many of the functions being 
documented are inherently technical, so subject matter knowledge (CS, statistics, 
numerical analysis ,...) must be assumed; (c) the documentation has been written by a 
variety of mostly statistical types as a sidelight of their main professional 
activities -- none of these writers are ** professional documenters ** (whatever that 
may mean)
and some of them even speak ENglish as a second or third language. My own take is that 
the documentation for Core R and many of the packages is remarkably well done given 
these realities, and my hat is off to those who have produced it. Nevertheless, I 
agree, it is challenging -- it MUST be.

But they are irrelevant because the fundamental issue **is** that there is an inherent 
tension between ease of use and power/flexibility. Writing good GUI's for anything is 
hard, very hard. For a project such as R, it doesn't make sense, although it may to 
write GUI's for small subsets of R targeted at specific audiences (as in BioConductor, 
RCommander, etc.). But even this is hard to do well and takes a lot of time and 
effort. So, IMHO, there never will be nor ever should/could be an overall GUI for R: 
it is too complex and needs to be too extensible and flexible to constrain it in
that way.

However, I believe the larger question that both you and Roger Peng hint at is more 
important: not "How does a social scientist learn to use R," but how does any 
scientist/technologist for whom experimental design and data analysis forms a large 
component of their work gain the necessary technical background in statistics and 
related disciplines (linear algebra, numerical analysis, ...) to ** know how to use 
the statistical tools they need that R provides.**  Software like SPSS must assume a 
limited collection of methods to present to their customers in an effective GUI. Their 
strategy
**must** be (this is NOT a criticism) to "dumb it down" so that they can provide 
coherent albeit limited data analysis strategies. As you have explicitly stated, users 
who wish to venture outside those narrow paradigms are simply out of luck. R was 
designed from the outset not to be so constrained, but the cost is that you must know 
a good deal to use it effectively. It is obvious from the questions posted to this 
list that even something as "simple" as lm() often demands from users technical 
statistical understanding far beyond what they have. So we see fairly frequently 
indications
of misunderstanding and confusion in using R. But the problem isn't R -- it's that 
users don't know enough statistics.

I wish I could say I had an answer for this, but I don't have a clue. I do not thing 
it's fair to expect a mechnical engineer or psychologist or biologist to have the 
numerous math and statistical courses and experience in their training that would 
provide the base they need. For one thing, they don't have the time in their studies 
for this; for another, they may not have the background or interest -- they are, after 
all, mechanical engineers or biologists, not statisticians. Unfortunately, they could 
do their jobs as engineers and scientists a lot better if they did know more
statistics.  To me, it's a fundamental conundrum, and no one is to blame. It's just 
the reality, but it is the source for all kinds of frustrations on both sides of the 
statistical divide, which both you and Roger expressed in your own ways.

Obviously, all of this is just personal ranting, so I would love to hear alternative 
views. An thanks again for your clear and interesting comments.

Cheers,
Bert

[EMAIL PROTECTED] wrote:

> First, many thanks to Frank Harrell for once again helping me out.  This actually 
> relates to the next point, which is my contribution to the 'why don't social 
> scientists use R' discussion.  I am a hybrid social scientist(child psychiatrist) 
> who trained on SPSS.  Many of my difficulties in coming to terms with R have been to 
> do with trying to apply the logic underlying SPSS, with dire results.  You do not 
> want to know how long I spent looking for a 'recode' command in R, to change factor 
> names and classes.....
>
> I think the solution is to combine a graphical interface that encourages command 
> line use (such as Rcommander) with the analyse(this) paradigm suggested, but also 
> explaining how one can a) display the code on a separate window ('page' is only an 
> obvious command once you know it), and b) how one can then save one's modification, 
> make it generally available, and not overwrite the unmodified version (again, 
> thanks, Frank).  Finally, one would need to change the emphasis in basic statistical 
> teaching from 'the right test' to 'the right model'.  That should get people used to 
> R's logic.
>
> If a rabbit starts to use R, s/he is likely to head for the help files associated 
> with each function, which can assume that the reader can make sense of gnomic 
> utterances like "Omit 'var' to impute all variables, creating new variables in 
> 'search' position 'where'".  I still don't know what that one means (as I don't 
> understand search positions, or why they're important).  This can be very 
> offputting, and could lead the rabbit to return to familiar SPSS territory.
>
> Finally, friendlier error messages would also help. It took me 3 days, and opening 
> every function I could, to work out that '...cannot find function xxx.data.frame...' 
> meant that MICE was unable to make a polychotomous logistic imputation model 
> converge for the variable immediately preceding it.
>
> I am now off to the help files and FAQs to find out how to change graph parameters, 
> as the plot.mids function in MICE a) doesn't allow one to select a subset of 
> variables, and b) tells me that the graph it wants to produce on the whole of my 26 
> variable dataset is too big to fit on the (windows) plotting device.  Unless anyone 
> wants to tell me how/where? (which of course is why, in the end, R is EASIER to use 
> than SPSS)

--

Bert Gunter

Non-Clinical Biostatistics
Genentech
MS: 240B
Phone: 650-467-7374


"The business of the statistician is to catalyze the scientific learning process."

 -- George E.P. Box

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to