I've not been following this thread very closely, but it seems like what you're trying to do may be related to Geoffrey Mainland's work on SIMD support in GHC. See [1] for his "SIMD-enabled version of the vector library". He's also written some blog posts about this [2].
Reiner [1] https://github.com/mainland/vector [2] http://ghc-simd.blogspot.com.au/ On 8 July 2012 05:13, Nicolas Trangez <nico...@incubaid.com> wrote: > All, > > After my message of yesterday [1] I got down to it and implemented > something along those lines. I created a playground repository > containing the code at [2]. Initial benchmark results at [3]. More about > the benchmark at the end of this email. > > First some questions and requests for help: > > - I'm stuck with a typing issue related to 'sizeOf' calculation at [4]. > I tried a couple of things, but wasn't able to figure out how to fix it. > - I'm using unsafePerformIO at [5], yet I'm not certain it's OK to do > so. Are there better (safer/performant/...) ways to get this working? > - Currently Alignment phantom types (e.g. A8 and A16) are not related to > each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42) > can have this signature: > > unsafeXorSSE42 :: Storable a => SV.Vector SV.A16 a -> SV.Vector SV.A16 a > -> SV.Vector SV.A16 a > > Yet, imaging I'd have an "SV.Vector SV.A32 Word8" vector at hand, the > function should accept it as well (a 32-byte aligned vector is also > 16-byte aligned). Is there any way to encode this at the type level? > > That's about it :-) > > As of now, I only implemented a couple of the vector API functions (the > ones required to execute my benchmark). Adding the others should be > trivial. > > The benchmark works with Data.Vector.{Unboxed|Storable}.Vector (UV and > SV) vectors of Word8 values, as well as my custom > Data.Vector.SIMD.Vector type (MV) using 16-byte alignment (MV.Vector > MV.A16 Word8). > > benchUV, benchSV and benchMV all take 2 pre-calculated Word8 vectors of > given size (1024 and 4096) and xor them pairwise into the result using > "zipWith xor". benchMVA takes 2 suitable MV vectors and xor's them into > a third using a rather simple and unoptimized C implementation using > SSE4.2 intrinsics [6]. This could be enhanced quite a bit (I guess using > the prim calling convention, FFI overhead can be reduced as well). > Currently, only vectors of a multiple of 32 bytes are supported (mostly > because of laziness on my part). > > As you can see, the zipWith Data.Vector.SIMD implementation is slightly > slower than the Data.Vector.Storable based one. I didn't perform much > profiling yet, but I suspect allocation and ForeignPtr creation is to > blame, this seems to be highly optimized in > GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by > Data.Vector.Storable. > > Thanks for any input, > > Nicolas > > [1] http://www.haskell.org/pipermail/haskell-cafe/2012-July/102167.html > [2] https://github.com/NicolasT/vector-simd/ > [3] http://linode2.nicolast.be/files/vector-simd-xor1.html > [4] > > https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L46 > [5] > > https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L43 > [6] > https://github.com/NicolasT/vector-simd/blob/master/cbits/vector-simd.c#L47 > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe