RE: [R] Comparison of SAS & R/Splus
Sorry - couldn't resist chipping in. Firstly, this sort of conversation has been done over and over again on the S-News list, and I'd look in the archives for more info. My background: I was a SAS "statistical programmer" in the pharma industry before I joined Insightful (S-PLUS guys). I now work at Mango Solutions (an independent consulting firm) so don't feel I'm particularly biased to either S, R or SAS. IN MY OPINION, the issue of "comparing" SAS to S/R is a strange one, because the technologies had very different upbringings. The tools are also aimed at serving different jobs, and that can really sum up the differences in one sentence: SAS is a data language, S/R are analysis/visualization languages. On your point about when the different softwares were "developed", I believe both languages were developed in the 70s. The real difference is when the systems were "commercialized": SAS in the 70s, S-PLUS in the 90s. That's where the difference in "time lines" occur. As far as comparisons are concerned, I would go as far as saying that you "CAN" do everything in S/R that you can do in SAS and vice-versa. The important factor is often "how easy" it is to do those things. For example, you can create extremely complex graphics in SAS using things like Proc Annotate, but I wouldn't recommend it. This again comes back to the different aims of SAS/S/R: in my experience, data manipulation and basic reporting can be "easier" in SAS, while fitting stats models/creating graphics is far better in S/R. So, why is SAS so heavily used in (say) the pharma industry nowadays? Firstly, ask yourself how long it would take to rewrite every SAS macro/application your company uses today, or how much has been invested in SAS training over the last 10-20 years. Very often, issues such as these can dominate discussions about transferring to S/R/Whatever. Having said this, I do know a number of pharma companies who are looking to move away from SAS. The key here is to take it slowly. There's no way you can just decide to switch from SAS to S/R overnight. The best way of going from SAS to S/R is to look at replacing SAS modules gradually. For example, how about replacing SAS/GRAPH, SAS/STAT, SAS/IML, SAS/ASSIST, SAS/Access and SAS/INSIGHT with S-PLUS or R? In most countries, this will save a good deal of money which can be spent on training/consulting to ensure everyone can get up to speed with S/R. As far as the FDA is concerned, they do not support any commercial software, and I know that SAS and S-PLUS are used there. As far as providing transport files is concerned, that doesn't particularly restrict the choice of package. Plus, I can't see this standard lasting - personally I believe that CDISC (or a version of it) will catch on. www.cdisc.org for more info. At the end of the day, SAS, S and R are all good technologies, and the "best one" to use completely depends on what needs to be achieved. Sorry for the long rambling email ... I'd happily chat more about this with people "offline" ... Cheers, Rich. Mango Solutions -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Lumley Sent: 05 September 2003 14:49 To: Paul, David A Cc: [EMAIL PROTECTED] Subject: RE: [R] Comparison of SAS & R/Splus On Fri, 5 Sep 2003, Paul, David A wrote: > > d. Because SAS is commercial software, a posteriori errors found in > clinical trials analyses (and due to software issues) can be > attributed by the NDA applicants to the SAS Institute. > Lawyers really like this. Of course, Splus is also commercial > and therefore does not suffer from criticism on these > grounds. > Note, however, that one of the few specific pieces of guidance the FDA gives on software is that it isn't sufficient to place your trust in commercial off-the-shelf software (and of course this is reinforced by the software licensing terms). -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help --- Incoming mail is certified Virus Free. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Comparison of SAS & R/Splus
On Fri, 5 Sep 2003, Paul, David A wrote: > > d. Because SAS is commercial software, a posteriori errors found in > clinical trials analyses (and due to software issues) can be > attributed by the NDA applicants to the SAS Institute. > Lawyers really like this. Of course, Splus is also commercial > and therefore does not suffer from criticism on these > grounds. > Note, however, that one of the few specific pieces of guidance the FDA gives on software is that it isn't sufficient to place your trust in commercial off-the-shelf software (and of course this is reinforced by the software licensing terms). -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Comparison of SAS & R/Splus
On Fri, 5 Sep 2003, Brian D. Ripley wrote: > In general I find such discussions irrelvant. > I bet those users make far, far more errors then any > of these packages do so. However, without having the discussions with my colleagues, nothing will ever change. The perception of SAS' "bestness" flows, in my experience from several things: a. It was developed long before Splus and R so more people are familiar with it, especially managers and other decision-makers. b. The FDA requires SAS transport version 5 datasets, and it is somewhat easier to use SAS throughout a clinical trial than to perform analyses in one package and convert data to another at the end. c. Because SAS costs so much $$, it _must_ be good (dumb, but people do think that) d. Because SAS is commercial software, a posteriori errors found in clinical trials analyses (and due to software issues) can be attributed by the NDA applicants to the SAS Institute. Lawyers really like this. Of course, Splus is also commercial and therefore does not suffer from criticism on these grounds. It is a fact of life that building a better mousetrap does not guarantee that the "world will beat a path to your door". Marketing and perception are very important! Part of my job involves defending choice of software, and since I'm swimming upstream by choosing to learn R, I need to have intelligent arguments to use when this choice is questioned. Given the responses to my original post, I now do have those arguments in hand. This merely confirms what is already obvious: this is an amazing listserv! Respectfully, david paul __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
> On Thu, 4 Sep 2003, Paul, David A wrote: > >> I am one of only 5 or 6 people in my organization making the >> effort to include R/Splus as an analysis tool in everyday work - the >> rest of my colleagues use SAS exclusively. >> >> Today, one of them made the assertion that he believes the >> numerical algorithms in SAS are superior to those in Splus >> and R -- ie, optimization routines are faster in SAS, the SAS >> Institute has teams of excellent numerical analysts that >> ensure its superiority to anything freely available, PROC >> NLMIXED is more flexible than nlme( ) in the sense that it >> allows a much wider array of error structures than can be used >> in R/Splus, &etc. > > > While I don't subscribe to the general theory, they have a point about > PROC NLMIXED. It does more accurate calculations for generalised linear > mixed models than are currently available in R/S-PLUS, and for > logistic random effects models the difference can sometimes be large > enought to matter. Yes. Except that I have access to several other codes for GLMM, and not infrequently the answers from NLMIXED are out of line with all of the others, and are sometimes just not credible. So 'more accurate' is as far as I am concerned remains to be proved for SAS. In general I find such discussions irrelvant. I bet those users make far, far more errors then any of these packages do so. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
SUMMARY: [R] Comparison of SAS & R/Splus
My thanks to Drs. Armstrong, Bates, Harrell, Liaw, Lumley, Prager, Schwartz, and Mr. Wang for their replies. I have pasted my original message and their replies below. After viewing http://www.itl.nist.gov/div898/strd/ as suggested by Dr. Schwartz, it occurred to me that it might be educational to search for some data repositories on google. I was able to find some,though I'm sure many of the R listserv readers are already aware of them: http://kdd.ics.uci.edu/ http://www.ics.uci.edu/~mlearn/MLOther.html http://www.ldeo.columbia.edu/datarep/ http://data.geocomm.com/ http://libraries.mit.edu/gis/data/repository.html http://nssdc.gsfc.nasa.gov/ -david paul -Original Message- I am one of only 5 or 6 people in my organization making the effort to include R/Splus as an analysis tool in everyday work - the rest of my colleagues use SAS exclusively. Today, one of them made the assertion that he believes the numerical algorithms in SAS are superior to those in Splus and R -- ie, optimization routines are faster in SAS, the SAS Institute has teams of excellent numerical analysts that ensure its superiority to anything freely available, PROC NLMIXED is more flexible than nlme( ) in the sense that it allows a much wider array of error structures than can be used in R/Splus, &etc. I obviously do not subscribe to these views and would like to refute them, but I am not a numerical analyst and am still a novice at R/Splus. Do there exist refereed papers comparing the numerical capabilities of these platforms? If not, are there other resources I might look up and pass along to my colleagues? --- This link might give you some insight, but SAS is not one of the packages benchmarked here. http://www.sciviews.org/other/benchmark.htm [Whit Armstrong] --- I don't have papers comparing the numerical capabilities but I say bunk to your colleagues. The last time I looked, SAS still relies on the out of date Gauss-Jordan sweep operator in many key places, in place of the QR decomposition that R and S-Plus use in regression. And SAS being closed source makes it impossible to see how it really does calculations in some cases. See http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf Section 1.6 for a comparison of S and SAS (though this doesn't address numerical reliability). Overall, SAS is about 11 years behind R and S-Plus in statistical capabilities (last year it was about 10 years behind) in my estimation. Frank Harrell SAS User, 1969-1991 --- Frank E Harrell JrProfessor and ChairSchool of Medicine Department of BiostatisticsVanderbilt University --- Too bad your colleagues weren't at the "State of Statistical Software" session at JSM. I was there. It was so packed that people ran out of standing room. The three speakers are all R advocates (Jan De Leeuw, Luke Tierney and Duncan Temple Lang). The most interesting thing (to me) about the session is that the discussant is a person from SAS (first name Wolfgang). I just had to hear what he'd say. The SAS person essentially said that the numerical accuracy of R (probability functions, especially) is unmatched because the routines were written by authority figures in the area. (That's one advantage he said R has, but also said that the fact that that code is open, even SAS is looking at the R source, and that, to him, is a disadvantage. He obviously missed the point of open source.) One of the criticisms he had for R, compared to SAS, is that R may not have undergone extensive QA tests. He said that SAS now probably has only a handful of PROC developers (not exactly the "team" your colleague imagined), but 5-6 times more software testers. I think hearing from the horse's mouth beats reading articles in the journal for this sort of things. There was a recent article in American statistician bashing the numerical instability and bad quality of RNG in JMP (a SAS product). SAS posted a "white paper" on their web site refuting some those claims (but they did changed the RNG to Mersenne Twister in JMP5), comparing JMP with Excel and SAS. I must say that comparison isn't convincing, as neither Excel nor SAS can really be trusted as gold standard. Andy [Liaw] --- In follow up to Frank's reply, allow me to point you to some additional papers and articles on numerical accuracy issues. I have not reviewed these in some time and they may be a bit dated relative to current versions. These do not cover R specifically, but do address S-Plus and SAS. This is not an exhaustive list by any means, but many of the papers do have other references that may be of value. 1. http://www.stat.uni-muenchen.de/~knuesel/elv/accuracy.html 2. http://www.amstat.org/publications/tas/mccull-1.pdf 3. http://www.amstat.org/publications/tas/mccull.pdf 4. http://www.npl.
Re: [R] Comparison of SAS & R/Splus
On Thu, 4 Sep 2003, Paul, David A wrote: > I am one of only 5 or 6 people in my organization making the > effort to include R/Splus as an analysis tool in everyday work - > the rest of my colleagues use SAS exclusively. > > Today, one of them made the assertion that he believes the > numerical algorithms in SAS are superior to those in Splus > and R -- ie, optimization routines are faster in SAS, the SAS > Institute has teams of excellent numerical analysts that > ensure its superiority to anything freely available, PROC > NLMIXED is more flexible than nlme( ) in the sense that it > allows a much wider array of error structures than can be used > in R/Splus, &etc. While I don't subscribe to the general theory, they have a point about PROC NLMIXED. It does more accurate calculations for generalised linear mixed models than are currently available in R/S-PLUS, and for logistic random effects models the difference can sometimes be large enought to matter. -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
Douglas Bates <[EMAIL PROTECTED]> writes: > McCullough, B. D. (1998), "Assessing the reliability of statistical > software: Part I", The American Statistician, 52, 149-159. > > McCullough, B. D. (1999), "Assessing the reliability of statistical > software: Part II", The American Statistician, 53, 358-366. In my cutting-and-pasting I got those page numbers backwards. The 1998 article is on pages 358-366 and the 1999 one is on pages 149-159 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
"Paul, David A" <[EMAIL PROTECTED]> writes: > I am one of only 5 or 6 people in my organization making the > effort to include R/Splus as an analysis tool in everyday work - > the rest of my colleagues use SAS exclusively. > > Today, one of them made the assertion that he believes the > numerical algorithms in SAS are superior to those in Splus > and R -- ie, optimization routines are faster in SAS, the SAS > Institute has teams of excellent numerical analysts that > ensure its superiority to anything freely available, PROC > NLMIXED is more flexible than nlme( ) in the sense that it > allows a much wider array of error structures than can be used > in R/Splus, &etc. > > I obviously do not subscribe to these views and would like > to refute them, but I am not a numerical analyst and am still > a novice at R/Splus. Do there exist refereed papers comparing the > numerical capabilities of these platforms? If not, are there > other resources I might look up and pass along to my colleagues? Although they are out of date, there are some comparisons of accuracy in McCullough, B. D. (1998), "Assessing the reliability of statistical software: Part I", The American Statistician, 52, 149-159. McCullough, B. D. (1999), "Assessing the reliability of statistical software: Part II", The American Statistician, 53, 358-366. Regarding PROC NLMIXED versus nlme, there are a lot of differences between them. I don't think that PROC NLMIXED will handle nested random effects while nlme does. However, nlme assumes the underlying noise is Gaussian while PROC NLMIXED allows Gaussian or binomial or Poisson. PROC NLMIXED uses adaptive Gaussian quadrature to evaluate the marginal log-likelihood whereas nlme uses a less accurate evaluation but better parameterizations of the variance of the random effects. I think it would be difficult to declare one to be superior to the other. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
Paul, David A wrote: I am one of only 5 or 6 people in my organization making the effort to include R/Splus as an analysis tool in everyday work - the rest of my colleagues use SAS exclusively. Today, one of them made the assertion that he believes the numerical algorithms in SAS are superior to those in Splus and R -- ie, optimization routines are faster in SAS, the SAS Institute has teams of excellent numerical analysts that ensure its superiority to anything freely available, PROC NLMIXED is more flexible than nlme( ) in the sense that it allows a much wider array of error structures than can be used in R/Splus, &etc. I obviously do not subscribe to these views and would like to refute them, but I am not a numerical analyst and am still a novice at R/Splus. Do there exist refereed papers comparing the numerical capabilities of these platforms? If not, are there other resources I might look up and pass along to my colleagues? I suspect it will be difficult to find the answer to your colleagues' assertions without doing your own studies. How important is it to you to settle this disagreement? One could always name the many leading statisticians who contribute to R, but I don't think that name dropping settles anything. Nonetheless, even if SAS were faster, that would be only part of the issue. As you know, R offers vastly better exploratory graphics, better graphics overall, far more flexible programming, user extensibility, and more natural programming access to the results of previous computations. So even if your colleagues were right in their assertions, they would be overlooking many capabilities of the S language that are not readily available in SAS. IMO, SAS shines in its ability to read files in almost any format, to handle gigantic data sets without burping, and to produce formatted cross-tabulations and other highly structured text reports. However, if your colleagues work at all in data exploration, they are ignoring important tools by not exploring R or S-Plus. -- Michael Prager, Ph.D. NOAA Center for Coastal Fisheries and Habitat Research Beaufort, North Carolina 28516 http://shrimp.ccfhrb.noaa.gov/~mprager/ DISCLAIMER: Opinions expressed are personal, not official. N...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
On Thu, 4 Sep 2003, Paul, David A wrote: > I am one of only 5 or 6 people in my organization making the > effort to include R/Splus as an analysis tool in everyday work - > the rest of my colleagues use SAS exclusively. > > Today, one of them made the assertion that he believes the > numerical algorithms in SAS are superior to those in Splus > and R -- ie, optimization routines are faster in SAS, the SAS I can't say for the optimisation routines, but I have found this... When I was doing my MSc thesis, using tree-based models and neural networks for classifications, I discovered something interesting. Using SAS Enterprise Miner (SAS EM), its Tree Node is far more efficient than the rpart package. Using the same (or very similar at least) parameter settings, SAS EM can produce a tree in about 1 minute while it would take rpart 5 ~ 6 minutes (same data, same machine). Having said that, I still prefer rpart as it can draw a beautiful tree, whereas it is very difficult to fit the graphical tree produced by SAS EM into one A4 page -- in the end I had to use the text tree. However, the Neural Network node in SAS EM is less efficient than nnet. The time it takes to fit a neural network in R using nnet is much faster -- Cheers, Kevin -- "On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question." -- Charles Babbage (1791-1871) From Computer Stupidities: http://rinkworks.com/stupid/ -- Ko-Kang Kevin Wang Master of Science (MSc) Student SLC Tutor and Lab Demonstrator Department of Statistics University of Auckland New Zealand Homepage: http://www.stat.auckland.ac.nz/~kwan022 Ph: 373-7599 x88475 (City) x88480 (Tamaki) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
On Thu, 2003-09-04 at 08:34, Frank E Harrell Jr wrote: > On Thu, 04 Sep 2003 14:50:25 -0400 > "Paul, David A" <[EMAIL PROTECTED]> wrote: > > > I am one of only 5 or 6 people in my organization making the > > effort to include R/Splus as an analysis tool in everyday work - > > the rest of my colleagues use SAS exclusively. > > > > Today, one of them made the assertion that he believes the > > numerical algorithms in SAS are superior to those in Splus > > and R -- ie, optimization routines are faster in SAS, the SAS > > Institute has teams of excellent numerical analysts that > > ensure its superiority to anything freely available, PROC > > NLMIXED is more flexible than nlme( ) in the sense that it > > allows a much wider array of error structures than can be used > > in R/Splus, &etc. > > > > I obviously do not subscribe to these views and would like > > to refute them, but I am not a numerical analyst and am still > > a novice at R/Splus. Do there exist refereed papers comparing the > > numerical capabilities of these platforms? If not, are there > > other resources I might look up and pass along to my colleagues? > > > > > > > > Much thanks in advance, > > > > david paul > > I don't have papers comparing the numerical capabilities but I say > bunk to your colleagues. The last time I looked, SAS still relies on > the out of date Gauss-Jordan sweep operator in many key places, in > place of the QR decomposition that R and S-Plus use in regression. > And SAS being closed source makes it impossible to see how it really > does calculations in some cases. > > See http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf Section > 1.6 for a comparison of S and SAS (though this doesn't address > numerical reliability). Overall, SAS is about 11 years behind R and > S-Plus in statistical capabilities (last year it was about 10 years > behind) in my estimation. > > Frank Harrell > SAS User, 1969-1991 In follow up to Frank's reply, allow me to point you to some additional papers and articles on numerical accuracy issues. I have not reviewed these in some time and they may be a bit dated relative to current versions. These do not cover R specifically, but do address S-Plus and SAS. This is not an exhaustive list by any means, but many of the papers do have other references that may be of value. 1. http://www.stat.uni-muenchen.de/~knuesel/elv/accuracy.html 2. http://www.amstat.org/publications/tas/mccull-1.pdf 3. http://www.amstat.org/publications/tas/mccull.pdf 4. http://www.npl.co.uk/ssfm/download/documents/cmsc06_00.pdf Another option is that NIST has reference datasets available for comparison at: http://www.itl.nist.gov/div898/strd/ These would allow you to conduct your own comparisons if you desire. HTH, Marc Schwartz (Also a former SAS user) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Comparison of SAS & R/Splus
On Thu, 04 Sep 2003 14:50:25 -0400 "Paul, David A" <[EMAIL PROTECTED]> wrote: > I am one of only 5 or 6 people in my organization making the > effort to include R/Splus as an analysis tool in everyday work - > the rest of my colleagues use SAS exclusively. > > Today, one of them made the assertion that he believes the > numerical algorithms in SAS are superior to those in Splus > and R -- ie, optimization routines are faster in SAS, the SAS > Institute has teams of excellent numerical analysts that > ensure its superiority to anything freely available, PROC > NLMIXED is more flexible than nlme( ) in the sense that it > allows a much wider array of error structures than can be used > in R/Splus, &etc. > > I obviously do not subscribe to these views and would like > to refute them, but I am not a numerical analyst and am still > a novice at R/Splus. Do there exist refereed papers comparing the > numerical capabilities of these platforms? If not, are there > other resources I might look up and pass along to my colleagues? > > > > Much thanks in advance, > > david paul I don't have papers comparing the numerical capabilities but I say bunk to your colleagues. The last time I looked, SAS still relies on the out of date Gauss-Jordan sweep operator in many key places, in place of the QR decomposition that R and S-Plus use in regression. And SAS being closed source makes it impossible to see how it really does calculations in some cases. See http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf Section 1.6 for a comparison of S and SAS (though this doesn't address numerical reliability). Overall, SAS is about 11 years behind R and S-Plus in statistical capabilities (last year it was about 10 years behind) in my estimation. Frank Harrell SAS User, 1969-1991 --- Frank E Harrell JrProfessor and ChairSchool of Medicine Department of BiostatisticsVanderbilt University __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Comparison of SAS & R/Splus
I am one of only 5 or 6 people in my organization making the effort to include R/Splus as an analysis tool in everyday work - the rest of my colleagues use SAS exclusively. Today, one of them made the assertion that he believes the numerical algorithms in SAS are superior to those in Splus and R -- ie, optimization routines are faster in SAS, the SAS Institute has teams of excellent numerical analysts that ensure its superiority to anything freely available, PROC NLMIXED is more flexible than nlme( ) in the sense that it allows a much wider array of error structures than can be used in R/Splus, &etc. I obviously do not subscribe to these views and would like to refute them, but I am not a numerical analyst and am still a novice at R/Splus. Do there exist refereed papers comparing the numerical capabilities of these platforms? If not, are there other resources I might look up and pass along to my colleagues? Much thanks in advance, david paul __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help