At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote:
Erin Hodgess wrote:
I remember reading the colSum and colMean were better, when you need
sums and means
Well .. I'm waiting for the experts to jump in and give us the
straight story on this :-)
All of the algorithms are represented internally by sequential
program logic using C or Fortran, for example. So the issue isn't the
algorithm itself. Instead, it's where the algorithm is implemented.
However, R is an interpreter, not a compiler. This means that it
reads each line of R code one character at a time to develop an
understanding of what is desired done, and to check for errors in
syntax and data classes. Interpreters are very slow compared to
compiled code, where the lines have been pre-interpreted and already
converted to machine code with error checking resolved.
For example a simple "for" loop iteration might take only 0.1
microsecond in a compiled program, but 20-100 microseconds in an
interpreted program.
This overhead of parsing each line can be bounded by function calls
inside each line. If the compiled function executes on a large number
of cases in one call, then the 50 microsecond overhead per call is diluted out.
R is a parallel processing language. If you use vectors and arrays
and the built-in (i.e., compiled) function calls, you get maximum use
of the compiled programs and minimum use of the interpreted program.
This is why functions such as colMeans() or apply() are faster than
writing direct loops in R. You can speed things up by 200-1000x for
large arrays.
Interpreted languages are very convenient to use, as they do instant
error checking and are very interactive. No overhead of learning and
using compilers and linkers. But they are very slow on complex
calculations. This is why the array processing is stuffed into
compiled functions. The best of both worlds then.
Interpreted languages are Java, R, MatLab, Gauss and others. Compiled
languages are C and Fortran. Some, like variants of BASIC, can be
interpreted, line-compiled or compiled, depending upon
implementation. Some compiled languages (such as Fortran), can allow
parallel processing via multiprocessing on multiple CPUs, which
speeds things up even more. Compiled languages also typically
optimize code for the target machine, which can speed things up a
factor of 2 or so.
So the general rule for R is: If you are annoyed at processing time,
alter your program to maximize calculations within compiled functions
(i.e., "vectorize" your program to process an entire array at one
time) and minimize the number of lines of R.
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.