Re: [R] curiosity: next-gen x86 processors and FP32?

2013-05-26 Thread Jeff Newmiller
I am no HPC expert, but I have been computing for awhile.

There are already many CPU-specific optimizations built into most compilers 
used to compile the R source code. Anyone sincerely interested in getting work 
done today should get on with their work and hope that most of the power of new 
processors gets delivered the same way.

The reason single precision is so uncommon in many computing environments is 
that numerical errors propagate much faster with single precision. I don't 
expect the typical R user to want to perform detailed uncertainty analysis 
every time they set up a computation to decide whether it can be computed with 
sufficient accuracy using SP.

Most speed problems I have encountered have been related to memory (swapping, 
fragmentation) and algorithm inefficiency, not CPU speed.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

ivo welch ivo.we...@anderson.ucla.edu wrote:

dear R experts:

although my question may be better asked on the HPC R mailing list, it
is really about something that average R users who don't plan to write
clever HPC-optimized code would care about: is there a quantum
performance leap on the horizon with CPUs?

like most R average non-HPC users, I want to stick mostly to
mainstream R, often with library parallel but that's it.  I like R to
be fast and effortless.  I don't want to have to rewrite my code
greatly to take advantage of my CPU.  the CUDA forth-and-back on the
memory which requires code rewrites makes CUDA not too useful for me.
in fact, I don't even like setting up computer clusters.  I run code
only on my single personal machine.

now, I am looking at the two upcoming processors---intel haswell (next
month) and amd kaveri (end of year).  does either of them have the
potential to be a quantum leap for R without complex code rewrites?
I presume that any quantum leaps would have to come from R using a
different numerical vector engine.   (I tried different compiler
optimizations when compiling R (such as AVX) on the 1-year old i7-27*,
but it did not really make a difference in basic R benchmarks, such as
simple OLS calculations.  I thought AVX would provide a faster vector
engine, but something didn't really compute here.  pun intended.)

I would guess that haswell will be a nice small evolutionary step
forward.  5-20%, perhaps.  but nothing like a factor 2.

[tomshardware details how intel FP32 math is 4 times as fast as double
math on the i7 architecture.  for most of my applications, a 4 times
speedup at a sacrifice in precision would be worth it.  R seems to use
only doubles---even as.single is not even converting to single, much
less inducing calculations to be single-precision.  so I guess this is
a no-go.  correct?? ]

kaveri's hUMA on the other hand could be a quantum leap.  kaveri could
have the GPU transparently offer common standard built-in vector
operations that we use in R, i.e., improve the speed of many programs
without the need for a rewrite, by a factor of 5?  hard to believe,
but it would seem that AMD actually beat Intel for R users.  a big
turnaround, given their recent deemphasis of FP on the CPU.
(interestingly, the amd-built Xbox One and PS4 processors were also
reported to have  hUMA.)

worth waiting for kaveri?   anything I can do to drastically speed up
R on intel i7 by going to FP32?

regards,

/iaw

Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] curiosity: next-gen x86 processors and FP32?

2013-05-26 Thread ivo welch
I think this is mostly but not fully correct.

most users are better off with double precision most of the time...but
not all of the time if the speedup and memory savings are 4 and 2,
respectively.

algorithm inefficiency may well be true, too---but if I spend one week
of my time (or even 3 days) to tune my program for a one time job that
then saves me one week, it's a net loss.  let's put a value on the
time to tune algorithms...$100/hour?  often, it is worth more maxing
memory and CPU instead.   my question is thus whether the tradeoffs
are becoming even more stark.  if a future vector-GPU can speed up my
FP by a factor of 5, I really shouldn't spend much time tuning
algorithms and write my programs in a simple straightforward way
instead.  YMMV.

memory swapping is death, speedwise.  anyone who doesn't max out RAM
and uses R is myopic IMHO.  unfortunately, standard i7 haswells are
limited to 32GB.  this makes R suitable for the analysis of data sets
that are about 4-6GB in size.  R is prolific in making copies of
structures in memory, even if a little bit of cleverness could avoid
it.   x$bigstruct[bignum] - 1.  R often errs on the side of
non-tuning its algorithms, too.  that's why data.table exists (though
I don't like it for some of its semantic oddities).  if it makes sense
to tune algorithms, it would be as low a level as possible on behalf
of software that is used by as many people as possible.  then again, I
am grateful that we have volunteers who develop R for free.

/iaw


On Sun, May 26, 2013 at 1:01 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 I am no HPC expert, but I have been computing for awhile.

 There are already many CPU-specific optimizations built into most compilers 
 used to compile the R source code. Anyone sincerely interested in getting 
 work done today should get on with their work and hope that most of the power 
 of new processors gets delivered the same way.

 The reason single precision is so uncommon in many computing environments is 
 that numerical errors propagate much faster with single precision. I don't 
 expect the typical R user to want to perform detailed uncertainty analysis 
 every time they set up a computation to decide whether it can be computed 
 with sufficient accuracy using SP.

 Most speed problems I have encountered have been related to memory (swapping, 
 fragmentation) and algorithm inefficiency, not CPU speed.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 ivo welch ivo.we...@anderson.ucla.edu wrote:

dear R experts:

although my question may be better asked on the HPC R mailing list, it
is really about something that average R users who don't plan to write
clever HPC-optimized code would care about: is there a quantum
performance leap on the horizon with CPUs?

like most R average non-HPC users, I want to stick mostly to
mainstream R, often with library parallel but that's it.  I like R to
be fast and effortless.  I don't want to have to rewrite my code
greatly to take advantage of my CPU.  the CUDA forth-and-back on the
memory which requires code rewrites makes CUDA not too useful for me.
in fact, I don't even like setting up computer clusters.  I run code
only on my single personal machine.

now, I am looking at the two upcoming processors---intel haswell (next
month) and amd kaveri (end of year).  does either of them have the
potential to be a quantum leap for R without complex code rewrites?
I presume that any quantum leaps would have to come from R using a
different numerical vector engine.   (I tried different compiler
optimizations when compiling R (such as AVX) on the 1-year old i7-27*,
but it did not really make a difference in basic R benchmarks, such as
simple OLS calculations.  I thought AVX would provide a faster vector
engine, but something didn't really compute here.  pun intended.)

I would guess that haswell will be a nice small evolutionary step
forward.  5-20%, perhaps.  but nothing like a factor 2.

[tomshardware details how intel FP32 math is 4 times as fast as double
math on the i7 architecture.  for most of my applications, a 4 times
speedup at a sacrifice in precision would be worth it.  R seems to use
only doubles---even as.single is not even converting to single, much
less inducing calculations to be single-precision.  so I guess this is
a no-go.  correct?? ]

kaveri's hUMA on the other hand could be a quantum leap.  kaveri could
have the GPU transparently offer common standard built-in vector