At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote:
Erin Hodgess wrote:
I remember reading the colSum and colMean were better, when you need
sums and means

Well .. I'm waiting for the experts to jump in and give us the
straight story on this :-)

All of the algorithms are represented internally by sequential program logic using C or Fortran, for example. So the issue isn't the algorithm itself. Instead, it's where the algorithm is implemented.

However, R is an interpreter, not a compiler. This means that it reads each line of R code one character at a time to develop an understanding of what is desired done, and to check for errors in syntax and data classes. Interpreters are very slow compared to compiled code, where the lines have been pre-interpreted and already converted to machine code with error checking resolved.

For example a simple "for" loop iteration might take only 0.1 microsecond in a compiled program, but 20-100 microseconds in an interpreted program.

This overhead of parsing each line can be bounded by function calls inside each line. If the compiled function executes on a large number of cases in one call, then the 50 microsecond overhead per call is diluted out.

R is a parallel processing language. If you use vectors and arrays and the built-in (i.e., compiled) function calls, you get maximum use of the compiled programs and minimum use of the interpreted program.

This is why functions such as colMeans() or apply() are faster than writing direct loops in R. You can speed things up by 200-1000x for large arrays.

Interpreted languages are very convenient to use, as they do instant error checking and are very interactive. No overhead of learning and using compilers and linkers. But they are very slow on complex calculations. This is why the array processing is stuffed into compiled functions. The best of both worlds then.

Interpreted languages are Java, R, MatLab, Gauss and others. Compiled languages are C and Fortran. Some, like variants of BASIC, can be interpreted, line-compiled or compiled, depending upon implementation. Some compiled languages (such as Fortran), can allow parallel processing via multiprocessing on multiple CPUs, which speeds things up even more. Compiled languages also typically optimize code for the target machine, which can speed things up a factor of 2 or so.

So the general rule for R is: If you are annoyed at processing time, alter your program to maximize calculations within compiled functions (i.e., "vectorize" your program to process an entire array at one time) and minimize the number of lines of R.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to