Hi Erik, have you done any measurements, e.g. how does your implementation compare against the code of Julien Pommier (google "SSE math fun")? This is what I am currently using, but unfortunately the list of implemented functions is a lot shorter even than what Pekka posted...
Best, Ralf On 2/5/13 2:55 PM, Erik Schnetter wrote: > Ralf > > Much of vecmathlib comes from another project where I needed this > functionality. In particular, I am using finite differences on > multi-dimensional arrays that can benefit greatly from vectorisation. > > I now extracted from there and added to vecmathlib intrinsics to load > and store numbers from/to memory, i.e. arrays. These functions are > mostly equivalent to vload* and vstore* in OpenCL. This provides two > important capabilities: > > (1) The load/store functions accept a mask parameter, allowing > vectorising loops that are not an even multiple of the vector length. > (2) The load/store functions distinguish between aligned and unaligned > memory accesses, where aligned accesses are faster. This may require > adjusting the lower loop bound to start on an aligned memory location. > > The number of loop iterations is in general not a multiple of the vector > size. Also, using scalar loop iterations for the left-over iterations > does not work since this is much slower and increases the code size. > > -erik > > > > On Tue, Feb 5, 2013 at 6:55 AM, Ralf Karrenberg <[email protected] > <mailto:[email protected]>> wrote: > > Hi, > > I haven't had a look at the code, but from what you are writing, > this sounds like exactly what I would need to integrate into libWFV. > The vectorizer has an API to specify mappings of functions to SIMD > equivalents, which is all that you need if all the implementations > are there already. > So, WFV should be able to work with your library within a few hours > of integration work. I'll look into that later. > > By the way, I recall a discussion on integrating such a library > (possibly as a .bc file) into LLVM. You may want to have a look at > the thread and respond: > > http://llvm.1065342.n5.nabble.__com/SIMD-trigonometry-__logarithms-tt54215.html#none > > <http://llvm.1065342.n5.nabble.com/SIMD-trigonometry-logarithms-tt54215.html#none> > > Cheers, > Ralf > > > On 2/3/13 7:02 PM, Erik Schnetter wrote: > > On Sun, Feb 3, 2013 at 12:25 PM, Pekka Jääskeläinen > <[email protected] <mailto:[email protected]> > <mailto:pekka.jaaskelainen@__tut.fi > <mailto:[email protected]>>> wrote: > > On 02/03/2013 03:56 PM, Erik Schnetter wrote: > > In my mind, the vectorizer would never look into sqrt() > or any > other functions > > defined in the language standard, but would simply expect > efficient vector > > implementations of these. Instead of looking into the > language > standard we could > > also add a respective attribute to the function > definitions. This > attribute > > would then confirm that e.g. double2 sqrt(double2) is > equivalent > to double > > sqrt(double). __attribute__((__vector___equivalence__)) > could be a > name. > > OK. The "known" functions should not be inlined but the > vectorizer > should > recognize them (if we do not go towards the intrinsics > approach). In > the end, > the autovectorized work group function and an explicitly > vectorized > kernel will > call the same vector-optimized function in this scheme. > > For starters we might just use a "white list" for the known > vectorizable > functions, and assume a trivial scalar to vector mapping > for the > arguments > and the return value. Or use intrinsics for the known ones. > > Looking at the code of LLVM's LoopVectorize, it seems to be > able to > vectorize some intrinsics already: > > case Intrinsic::sqrt: > case Intrinsic::sin: > case Intrinsic::cos: > case Intrinsic::exp: > case Intrinsic::exp2: > case Intrinsic::log: > case Intrinsic::log10: > case Intrinsic::log2: > case Intrinsic::fabs: > case Intrinsic::floor: > case Intrinsic::ceil: > case Intrinsic::trunc: > case Intrinsic::rint: > case Intrinsic::nearbyint: > case Intrinsic::pow: > case Intrinsic::fma: > case Intrinsic::fmuladd: > > Is there some important ones missing? If not, then we could > think of > going > the intrinsics route for these calls. I.e., call the > intrinsics from > the kernel lib and expand them to calls to your > functions+inline after > autovectorization. > > > "Important" probably depends on how frequently they are used in > real-world code, or in benchmarks. The actual list of intrinsics (as > listed e.g. in the OpenCL or C standard) is probably three of > four times > as long. I would also add the various convert* and as* (i.e. cast) > functions to the list. > > I could create a longer list if that would be helpful. > > These functions should still be inlined, but only after > vectorization. > > -erik > > -- > Erik Schnetter <eschnetter@__perimeterinstitute.ca > <mailto:[email protected]> > <mailto:eschnetter@__perimeterinstitute.ca > <mailto:[email protected]>>> > > http://www.perimeterinstitute.__ca/personal/eschnetter/ > <http://www.perimeterinstitute.ca/personal/eschnetter/> > AIM: eschnett247, Skype: eschnett, Google Talk: > [email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > > > > ------------------------------__------------------------------__------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn___d2d_jan > <http://p.sf.net/sfu/appdyn_d2d_jan> > > > > _________________________________________________ > pocl-devel mailing list > [email protected].__net > <mailto:[email protected]> > https://lists.sourceforge.net/__lists/listinfo/pocl-devel > <https://lists.sourceforge.net/lists/listinfo/pocl-devel> > > > > > -- > Erik Schnetter <[email protected] > <mailto:[email protected]>> > http://www.perimeterinstitute.ca/personal/eschnetter/ > AIM: eschnett247, Skype: eschnett, Google Talk: [email protected] > <mailto:[email protected]> > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > > > > _______________________________________________ > pocl-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/pocl-devel > ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
