Re: [R] How to make R running faster

Robert A LaBudde Wed, 28 May 2008 09:19:31 -0700

At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote:

Erin Hodgess wrote:

I remember reading the colSum and colMean were better, when you need
sums and means


Well .. I'm waiting for the experts to jump in and give us the
straight story on this :-)

All of the algorithms are represented internally by sequentialprogram logic using C or Fortran, for example. So the issue isn't thealgorithm itself. Instead, it's where the algorithm is implemented.

However, R is an interpreter, not a compiler. This means that itreads each line of R code one character at a time to develop anunderstanding of what is desired done, and to check for errors insyntax and data classes. Interpreters are very slow compared tocompiled code, where the lines have been pre-interpreted and alreadyconverted to machine code with error checking resolved.

For example a simple "for" loop iteration might take only 0.1microsecond in a compiled program, but 20-100 microseconds in aninterpreted program.

This overhead of parsing each line can be bounded by function callsinside each line. If the compiled function executes on a large numberof cases in one call, then the 50 microsecond overhead per call is diluted out.

R is a parallel processing language. If you use vectors and arraysand the built-in (i.e., compiled) function calls, you get maximum useof the compiled programs and minimum use of the interpreted program.

This is why functions such as colMeans() or apply() are faster thanwriting direct loops in R. You can speed things up by 200-1000x forlarge arrays.

Interpreted languages are very convenient to use, as they do instanterror checking and are very interactive. No overhead of learning andusing compilers and linkers. But they are very slow on complexcalculations. This is why the array processing is stuffed intocompiled functions. The best of both worlds then.

Interpreted languages are Java, R, MatLab, Gauss and others. Compiledlanguages are C and Fortran. Some, like variants of BASIC, can beinterpreted, line-compiled or compiled, depending uponimplementation. Some compiled languages (such as Fortran), can allowparallel processing via multiprocessing on multiple CPUs, whichspeeds things up even more. Compiled languages also typicallyoptimize code for the target machine, which can speed things up afactor of 2 or so.

So the general rule for R is: If you are annoyed at processing time,alter your program to maximize calculations within compiled functions(i.e., "vectorize" your program to process an entire array at onetime) and minimize the number of lines of R.


================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make R running faster

Reply via email to