Re: [pocl-devel] Optimised vector math functions

Erik Schnetter Tue, 05 Feb 2013 07:22:12 -0800

Ralf

I will compare performance. I'd be happy to use more efficient algorithms
if they are available!


I also have to think about a licence for vecmathlib. I'd prefer something
that would finally be applicable to all of pocl, llvm, gcc, ...

-erik


On Tue, Feb 5, 2013 at 9:34 AM, Ralf Karrenberg <[email protected]> wrote:

> Hi Erik,
>
> have you done any measurements, e.g. how does your implementation compare
> against the code of Julien Pommier (google "SSE math fun")?
> This is what I am currently using, but unfortunately the list of
> implemented functions is a lot shorter even than what Pekka posted...
>
> Best,
> Ralf
>
>
> On 2/5/13 2:55 PM, Erik Schnetter wrote:
>
>> Ralf
>>
>> Much of vecmathlib comes from another project where I needed this
>> functionality. In particular, I am using finite differences on
>> multi-dimensional arrays that can benefit greatly from vectorisation.
>>
>> I now extracted from there and added to vecmathlib intrinsics to load
>> and store numbers from/to memory, i.e. arrays. These functions are
>> mostly equivalent to vload* and vstore* in OpenCL. This provides two
>> important capabilities:
>>
>> (1) The load/store functions accept a mask parameter, allowing
>> vectorising loops that are not an even multiple of the vector length.
>> (2) The load/store functions distinguish between aligned and unaligned
>> memory accesses, where aligned accesses are faster. This may require
>> adjusting the lower loop bound to start on an aligned memory location.
>>
>> The number of loop iterations is in general not a multiple of the vector
>> size. Also, using scalar loop iterations for the left-over iterations
>> does not work since this is much slower and increases the code size.
>>
>> -erik
>>
>>
>>
>> On Tue, Feb 5, 2013 at 6:55 AM, Ralf Karrenberg <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hi,
>>
>>     I haven't had a look at the code, but from what you are writing,
>>     this sounds like exactly what I would need to integrate into libWFV.
>>     The vectorizer has an API to specify mappings of functions to SIMD
>>     equivalents, which is all that you need if all the implementations
>>     are there already.
>>     So, WFV should be able to work with your library within a few hours
>>     of integration work. I'll look into that later.
>>
>>     By the way, I recall a discussion on integrating such a library
>>     (possibly as a .bc file) into LLVM. You may want to have a look at
>>     the thread and respond:
>>     http://llvm.1065342.n5.nabble.**__com/SIMD-trigonometry-__**
>> logarithms-tt54215.html#none
>>
>>     <http://llvm.1065342.n5.**nabble.com/SIMD-trigonometry-**
>> logarithms-tt54215.html#none<http://llvm.1065342.n5.nabble.com/SIMD-trigonometry-logarithms-tt54215.html#none>
>> >
>>
>>     Cheers,
>>     Ralf
>>
>>
>>     On 2/3/13 7:02 PM, Erik Schnetter wrote:
>>
>>         On Sun, Feb 3, 2013 at 12:25 PM, Pekka Jääskeläinen
>>         <[email protected] 
>> <mailto:pekka.jaaskelainen@**tut.fi<[email protected]>
>> >
>>         <mailto:pekka.jaaskelainen@__t**ut.fi <http://tut.fi>
>>
>>         <mailto:pekka.jaaskelainen@**tut.fi <[email protected]>>>>
>> wrote:
>>
>>              On 02/03/2013 03:56 PM, Erik Schnetter wrote:
>>               > In my mind, the vectorizer would never look into sqrt()
>>         or any
>>              other functions
>>               > defined in the language standard, but would simply expect
>>              efficient vector
>>               > implementations of these. Instead of looking into the
>>         language
>>              standard we could
>>               > also add a respective attribute to the function
>>         definitions. This
>>              attribute
>>               > would then confirm that e.g. double2 sqrt(double2) is
>>         equivalent
>>              to double
>>               > sqrt(double). __attribute__((__vector___**equivalence__))
>>
>>         could be a
>>              name.
>>
>>              OK. The "known" functions should not be inlined but the
>>         vectorizer
>>              should
>>              recognize them (if we do not go towards the intrinsics
>>         approach). In
>>              the end,
>>              the autovectorized work group function and an explicitly
>>         vectorized
>>              kernel will
>>              call the same vector-optimized function in this scheme.
>>
>>              For starters we might just use a "white list" for the known
>>         vectorizable
>>              functions, and assume a trivial scalar to vector mapping
>>         for the
>>              arguments
>>              and the return value. Or use intrinsics for the known ones.
>>
>>              Looking at the code of LLVM's LoopVectorize, it seems to be
>>         able to
>>              vectorize some intrinsics already:
>>
>>                  case Intrinsic::sqrt:
>>                  case Intrinsic::sin:
>>                  case Intrinsic::cos:
>>                  case Intrinsic::exp:
>>                  case Intrinsic::exp2:
>>                  case Intrinsic::log:
>>                  case Intrinsic::log10:
>>                  case Intrinsic::log2:
>>                  case Intrinsic::fabs:
>>                  case Intrinsic::floor:
>>                  case Intrinsic::ceil:
>>                  case Intrinsic::trunc:
>>                  case Intrinsic::rint:
>>                  case Intrinsic::nearbyint:
>>                  case Intrinsic::pow:
>>                  case Intrinsic::fma:
>>                  case Intrinsic::fmuladd:
>>
>>              Is there some important ones missing? If not, then we could
>>         think of
>>              going
>>              the intrinsics route for these calls. I.e., call the
>>         intrinsics from
>>              the kernel lib and expand them to calls to your
>>         functions+inline after
>>              autovectorization.
>>
>>
>>         "Important" probably depends on how frequently they are used in
>>         real-world code, or in benchmarks. The actual list of intrinsics
>> (as
>>         listed e.g. in the OpenCL or C standard) is probably three of
>>         four times
>>         as long. I would also add the various convert* and as* (i.e. cast)
>>         functions to the list.
>>
>>         I could create a longer list if that would be helpful.
>>
>>         These functions should still be inlined, but only after
>>         vectorization.
>>
>>         -erik
>>
>>         --
>>         Erik Schnetter 
>> <eschnetter@__perimeterinstitu**te.ca<http://perimeterinstitute.ca>
>>         
>> <mailto:eschnetter@**perimeterinstitute.ca<[email protected]>
>> >
>>         
>> <mailto:eschnetter@__perimeter**institute.ca<http://perimeterinstitute.ca>
>>         
>> <mailto:eschnetter@**perimeterinstitute.ca<[email protected]>
>> >>>
>>
>>         http://www.perimeterinstitute.**__ca/personal/eschnetter/
>>
>>         
>> <http://www.**perimeterinstitute.ca/**personal/eschnetter/<http://www.perimeterinstitute.ca/personal/eschnetter/>
>> >
>>         AIM: eschnett247, Skype: eschnett, Google Talk:
>>         [email protected] <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>
>>
>>
>>
>>         ------------------------------**__----------------------------**
>> --__------------------
>>
>>         Everyone hates slow websites. So do we.
>>         Make your web apps faster with AppDynamics
>>         Download AppDynamics Lite for free today:
>>         
>> http://p.sf.net/sfu/appdyn___**d2d_jan<http://p.sf.net/sfu/appdyn___d2d_jan>
>>         
>> <http://p.sf.net/sfu/appdyn_**d2d_jan<http://p.sf.net/sfu/appdyn_d2d_jan>
>> >
>>
>>
>>
>>         ______________________________**___________________
>>         pocl-devel mailing list
>>         [email protected]._**_net
>>         
>> <mailto:pocl-devel@lists.**sourceforge.net<[email protected]>
>> >
>>         
>> https://lists.sourceforge.net/**__lists/listinfo/pocl-devel<https://lists.sourceforge.net/__lists/listinfo/pocl-devel>
>>
>>         
>> <https://lists.sourceforge.**net/lists/listinfo/pocl-devel<https://lists.sourceforge.net/lists/listinfo/pocl-devel>
>> >
>>
>>
>>
>>
>> --
>> Erik Schnetter 
>> <eschnetter@**perimeterinstitute.ca<[email protected]>
>> <mailto:eschnetter@**perimeterinstitute.ca<[email protected]>
>> >>
>> http://www.perimeterinstitute.**ca/personal/eschnetter/<http://www.perimeterinstitute.ca/personal/eschnetter/>
>> AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
>> <mailto:[email protected]>
>>
>>
>> ------------------------------**------------------------------**
>> ------------------
>> Free Next-Gen Firewall Hardware Offer
>> Buy your Sophos next-gen firewall before the end March 2013
>> and get the hardware for free! Learn more.
>> http://p.sf.net/sfu/sophos-**d2d-feb <http://p.sf.net/sfu/sophos-d2d-feb>
>>
>>
>>
>>
>> ______________________________**_________________
>> pocl-devel mailing list
>> [email protected].**net <[email protected]>
>> https://lists.sourceforge.net/**lists/listinfo/pocl-devel<https://lists.sourceforge.net/lists/listinfo/pocl-devel>
>>
>>


-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb

_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [pocl-devel] Optimised vector math functions

Reply via email to