The loop kernels in our application call math functions such as sqrt(),
exp(), or log(). While some CPUs provide some of these via machine
instructions (e.g. sqrt() via SSE2), others either need to be scalarised or
need to be implemented manually. Currently, pocl's run-time library
scalarises these, i.e. double4 sqrt(double4) is implemented via 4 calls to
libc's double sqrt(double).
I have implemented vecmathlib <https://bitbucket.org/eschnett/vecmathlib>,
a generic library that implements these math functions directly. Benchmarks
show that this can speed up these intrinsics significantly, i.e. by a
factor of several in some cases. I would like to use vecmathlib with pocl.
vecmathlib is implemented in C++, since using templates greatly simplifies
the code. It builds with both gcc and clang. vecmathlib is still incomplete
in several respects, e.g. some intrinsics should have a higher precision,
could be further optimised, or inf and nan are not yet handled correctly.
-erik
--
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel