Re: SPlus balked at Linux; R: statistician's dream

Jameson Burt 26 Jan 1998 08:12:19 -0000

This message addresses several private questions about R from my earlier mail 
to the Debian users mail-list.
Of course, the real experts gave us R, knowing answers better than I.
Indeed, I answer these questions from the perspective of a novice.  
I include here a little about the R developers, where to get R,
some documentation sources, and at the end some R entries 
to get a quick feel for R.


The Comprehensive R Archives (CRAN) primary site resides at
        http://www.ci.tuwien.ac.at/R/contents.html
because of communication costs at the primary development site in New Zealand
[sorry, I originally credited Australia].
There, 
        Ross Ihaka              [EMAIL PROTECTED]
and
        Robert Gentleman        [EMAIL PROTECTED]
primarily develop R at the University of Auckland.
The core group of developers now extends to 
        Peter Dalgaard          [EMAIL PROTECTED]
        Kurt Hornik             [EMAIL PROTECTED]
        Friedrich Leisch        [EMAIL PROTECTED]
        Thomas Lumley           [EMAIL PROTECTED]
        Martin Maechler         [EMAIL PROTECTED]
        Paul Murrell, 
        Heiner Schwarte, 
and
        Luke Tierney            [EMAIL PROTECTED]

I repeatedly see others names who actively help.
This many people contributing to R probably equals those actually developing
SPlus  (R is so much like SPlus that our SPlus code has run under R with nary 
a change).

I understand SPlus has 90 people, many who sell and work on authorization 
schemes.  I understand perhaps only 10 people actually work on the SPlus code. 
They must put much effort into what the SPlus Installation Manual 
reflects: its dozens of pages covers almost solely various schemes 
to avoid going over your licensed limit.  
Even then, their manual did not give a hint how to handle serving but
two licenses to what could be either SunOS or Solaris users.  
Only a Unix-like guess and our writing a shell script resolved that problem.  

As a statistician, I consider the viability of software by the breadth 
(across countries) and count of developers. 
I supect R development efforts match those of SPlus.
Indeed, it was the mere existence of R that indicated to me that
SPlus must be useful (rather like the existence of Octave indicates 
the usefulness of Matlab).
There never appeared similar software for the often used "sas"
(Robert Morrison, Oklahoma State University, still has the 76,000
computer cards from just before commercial-sas took the publicly created 
sas code). 

I see in this "R" development group a level of organization and public access
analogous to that of Debian Linux.  Of course, the "R" group solves
statistical/mathematical problems as much as it solves computer problems.
They will be converting to GNU coding standards.

The computer community often funds projects like Debian Linux, and the 
community will probably support word-processor and spreadsheet development.  
Something like R represents more of a nitch market, though every graduate 
student must take at least one statistics course.  As a nitch market, R should 
be supported by us statisticians more than other computer software.  Such 
support would  further the ideas behind GNU and not leave barren GNU software 
for the nitch field statistics.

R and SPlus run like a Mazaratti, while other statistical packages like sas 
run like elephants, lacking flexibility and cutting edge procedures. I heard 
of one engineering college that abandoned sas for SPlus.  I am thankful for 
statistical packages; I am ecstatic about R.

#########################
The following answers some questions about getting and using "R".
Since I sent my original message to the Debian Users' Mailing List,
I presume you use Debian Linux.
It's difficult searching for software having but one letter, "r", since 
you can't reasonably search for "r" or "r-", though you might search 
for "r-base".
You can get "R" from a math directory of a Debian mirror site in the 
hamm distribution,
sometimes in .../hamm/hamm, sometimes in .../hamm/contrib.
In particular, I installed from the site
         ftp://ftp.debian.org/pub/linux/distributions/debian/hamm/
the following four "R" packages
        .../hamm/binary-i386/math/r-base_0.61.1-3.deb
        .../hamm/binary-i386/math/r-cran_0.61-1.deb
        .../hamm/binary-i386/math/r-mlbench_0.61-1.deb
        .../non-free/binary-i386/math/r-cran-non-free_0.61-2.deb

You can also get the latest debianized packages from the primary site
         ftp://franz.stat.wisc.edu/pub/R/bin/i386-linux/Debian-2.0/*
or from CRAN archive sites (which have more than just Debian packages)
        http://www.ci.tuwien.ac.at/R  #master site
        http://lib.stat.cmu.edu/R/CRAN
        ftp://franz.stat.wisc.edu/pub/R
which have the directories
        src/base
        src/contrib
        doc
        bin     #has binaries for *.deb and *.rpm.
On installation of the Debian packages, most of these "R" packages install
libraries in /usr/lib/R/library, so you have a non-"R" way to see 
available libraries.
In R, you load these installed libraries with, eg,
        library(stepfun)
and get information about a library with
        library(help=stepfun)
You can access the very good online manual by entering, eg,
         help(read.table)
or see commands having the word "print",
         apropos("print")
or an html version of help
         help.start()
or the latest html version via internet from    
         http://www.stat.math.ethz.ch/R/manual
When starting, you really need a brief introduction to R.

Many people prefer the 47 page 
        Introductory Guide to S-Plus   by B.D.Ripley
        ftp://markov.stats.ox.ac.uk/SGuide.ps1.z
This is dated 1994, and asks that the reader work with the practice 
library "ripley", whose parts appear to come with the r-base debian package.
You won't need "library(ripley)" mentioned in the B.D.Ripley documentation, 
but to get data like "trees" (from the "ripley" library)
mentioned by Ripley you need to use instead, eg, 
        data(trees)
You can then use 
        attach(trees) 
when mentioned in B.D.Ripley.

While the the online manual itself is probably the official documentation,
a better beginners guide is the 85 page document,
        Notes on R:  A Programming Environment for Data Analysis and Graphics 
        by Bill Venables & Dave Smith
        ftp://franz.stat.wisc.edu/pub/R/doc/Rnotes.ps.gz
They still work at converting this document, dated 1997, from S-Plus to R, 
so a few comments pertain only to S-Plus.

Note: the above two documents' authors Ripley and Venables, produced the 
Springer-Verlag "Modern Applied Statistics with S-Plus".  All S-Plus and R 
authors refer to the standard texts "The NEW S Language" by Becker, R.A., 
Chambers, and Wilks, 1988; and "Statistical Models in S" by Chambers, J.M. and 
Hastie, T.J. eds, 1992.





Answers to Frequently Asked Questions can be found at
        http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

You can use subscribe to the mailing-list  r-help@stat.math.ethz.ch  
by sending in the "body" (NOT the "subject") "subscribe" to
        [EMAIL PROTECTED]
This mail-list runs about 10 messages a day.
An archive of them can be found at
        http://www.ci.tuwien.ac.at/R/doc/mail-archives/



#########################
R is unix like, so vi users can enter <esc><k> to go up to previous commands
and then do vi-like editing, just as they do in the "bash" shell.
When you become familiar with R, you will probably prefer to store data in what
R calls a "data frame", often using the function "read.table()".
For example, if you have the file /tmp/zz
        alpha,beta,gamma
        5,4,3
        2,1,10
        9,8,7
Then in R you can read in this data, with the first row as labels,
         aa <- read.table("/tmp/zz",header=T,sep=",")
You can change the decimals printed (though internal precision remains high) 
with
        options(digits=10)
You can edit an existing R object "trees" with
        trees.new <- vi(trees)
or
        options(editor="vim")
        trees.new <- edit(trees)
On exiting, "R" optionally stores your data in 
        .RData
In this file resides the file .First, which runs on "R" startup.




So that you might get an immediate sense of useability, once installed,
start R with
        R
then at the R prompt,
        1/3
        sqrt(2*pi)
        aa <- c(1,3,9)  #"c" concatenate to create the vector (1,3,9)
        aa              # display the aa vector
        x <- rnorm(50)  #generates 50 pseudo-random numbers
        y <- rnorm(x)   #generates 50 pseudo-random numbers
        plot(x,y)       #plots on a separate window
        help(plot)      #help on "plot"
        identify(x,y)   #then click points on the graph to identify them 
        data()          #lists available data sets with which to muck about 
        data(trees)     #includes a trees dataset like in B.D.Ripley 
        trees           #prints a data-frame for trees data
        attach(trees)   #make trees variables accessible without "trees$Volume"
        hist(Girth)     #histogram of variable Girth in trees data
        pairs(trees)    #plots all possible pairwise plots 
        plot(Girth,Volume)      #plot of Girth by Volume 
        dummy.results <- lm(Volume ~ Girth)     #linear regression
        summary(dummy.results)  #print results stored in dummy.results list
        q()             #quit





-- 
Jim Burt, NJ9L,         Fairfax, Virginia, USA
[EMAIL PROTECTED]       http://www.mnsinc.com/jameson
[EMAIL PROTECTED]

"It is not the shortcomings of others, nor what others have done or not
 done that one should think about, but what one has done or not done oneself."
--Dhammapada   ["dp" command for quotes from the Dhammapada, in Linux]



--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .

Re: SPlus balked at Linux; R: statistician's dream

Reply via email to