Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-08 Thread Gábor Lehel
On Sun, Jul 8, 2012 at 3:05 AM, Nicolas Trangez nico...@incubaid.com wrote:
 On Sun, 2012-07-08 at 01:40 +0200, Gábor Lehel wrote:
  unsafeXorSSE42 :: (Storable a,
  SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
  SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
  SV.AlignedToAtLeast SV.A16 o3, SV.Alignment o3) =
  SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

 I wonder if you could get that a bit shorter... I suppose you could write:

 instance (AlignedToAtLeast n a, AlignedToAtLeast n b) =
 AlignedToAtLeast n (a, b)
 instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
 n c) = AlignedToAtLeast n (a, b, c)
 ...and so on...

 Once again, nifty! And implemented in [1].

 though it feels a little strange semantically (relating a tuple to a
 scalar), but I don't see what harm can come of it. And then you can
 just write SV.AlignedToAtLeast SV.A16 (o1, o2, o3) in signatures.

 You
 can also make (Alignment n, Alignment a) a superclass constraint of
 AlignedToAtLeast, and write instances for Alignment inductively on One
 and Twice, and then you don't have to write Alignment o1 etc.
 separately either. So the signature would be just:

 unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast SV.A16 (o1, o2,
 o3)) =  SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

 which is friendlier.

 I implemented the inductive alignment calculation over One and Twice
 (good idea, and easy to do), but I don't get the thing about
 superclasses. I've been trying several approaches (including definitions
 based on forall and other trickery I never used before), but didn't get
 things to work, at least: the compiler always said I'd need
 UndecidableInstances, and that sounds scary... Care to elaborate?

All I meant was

class (Alignment n, Alignment a) = AlignedToAtLeast n a

but I got a bit ahead of myself, because that rules out the instance
on tuples. (I suppose you *could* write some kind of Alignment
instance for them, taking their minimum or something, but that's
getting a bit too subversive for me). The alternative, if you want
both Alignment as a superclass and the ability to constrain multiple
types at once, is to have the above, remove the instance on tuples,
and instead something like:

class (AlignedToAtLeast n a, AlignedToAtLeast n b) = AlignedToAtLeast2 n a b
instance (AlignedToAtLeast n a, AlignedToAtLeast n b) = AlignedToAtLeast2 n a b
class (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast n
c) = AlignedToAtLeast3 n a b c
instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
n c) = AlignedToAtLeast3 n a b c
(feel free to think of better names!)

unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast3 SV.A16 o1 o2 o3)
= SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

That will require UndecidableInstances, but all that means is that GHC
can't prove to itself that instance checking will terminate. So you
could end up getting the compiler into an infinite loop (or in
practice, to exceed its recursion limit). But it doesn't allow
anything unsafe to happen at runtime, and there's plenty of perfectly
good instances which terminate even if GHC can't prove it.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-08 Thread Reiner Pope
I've not been following this thread very closely, but it seems like what
you're trying to do may be related to Geoffrey Mainland's work on SIMD
support in GHC. See [1] for his SIMD-enabled version of the vector
library. He's also written some blog posts about this [2].

Reiner

[1] https://github.com/mainland/vector
[2] http://ghc-simd.blogspot.com.au/

On 8 July 2012 05:13, Nicolas Trangez nico...@incubaid.com wrote:

 All,

 After my message of yesterday [1] I got down to it and implemented
 something along those lines. I created a playground repository
 containing the code at [2]. Initial benchmark results at [3]. More about
 the benchmark at the end of this email.

 First some questions and requests for help:

 - I'm stuck with a typing issue related to 'sizeOf' calculation at [4].
 I tried a couple of things, but wasn't able to figure out how to fix it.
 - I'm using unsafePerformIO at [5], yet I'm not certain it's OK to do
 so. Are there better (safer/performant/...) ways to get this working?
 - Currently Alignment phantom types (e.g. A8 and A16) are not related to
 each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42)
 can have this signature:

 unsafeXorSSE42 :: Storable a = SV.Vector SV.A16 a - SV.Vector SV.A16 a
 - SV.Vector SV.A16 a

 Yet, imaging I'd have an SV.Vector SV.A32 Word8 vector at hand, the
 function should accept it as well (a 32-byte aligned vector is also
 16-byte aligned). Is there any way to encode this at the type level?

 That's about it :-)

 As of now, I only implemented a couple of the vector API functions (the
 ones required to execute my benchmark). Adding the others should be
 trivial.

 The benchmark works with Data.Vector.{Unboxed|Storable}.Vector (UV and
 SV) vectors of Word8 values, as well as my custom
 Data.Vector.SIMD.Vector type (MV) using 16-byte alignment (MV.Vector
 MV.A16 Word8).

 benchUV, benchSV and benchMV all take 2 pre-calculated Word8 vectors of
 given size (1024 and 4096) and xor them pairwise into the result using
 zipWith xor. benchMVA takes 2 suitable MV vectors and xor's them into
 a third using a rather simple and unoptimized C implementation using
 SSE4.2 intrinsics [6]. This could be enhanced quite a bit (I guess using
 the prim calling convention, FFI overhead can be reduced as well).
 Currently, only vectors of a multiple of 32 bytes are supported (mostly
 because of laziness on my part).

 As you can see, the zipWith Data.Vector.SIMD implementation is slightly
 slower than the Data.Vector.Storable based one. I didn't perform much
 profiling yet, but I suspect allocation and ForeignPtr creation is to
 blame, this seems to be highly optimized in
 GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by
 Data.Vector.Storable.

 Thanks for any input,

 Nicolas

 [1] http://www.haskell.org/pipermail/haskell-cafe/2012-July/102167.html
 [2] https://github.com/NicolasT/vector-simd/
 [3] http://linode2.nicolast.be/files/vector-simd-xor1.html
 [4]

 https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L46
 [5]

 https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L43
 [6]
 https://github.com/NicolasT/vector-simd/blob/master/cbits/vector-simd.c#L47


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-08 Thread Nicolas Trangez
On Sun, 2012-07-08 at 20:49 +1000, Reiner Pope wrote:
 I've not been following this thread very closely, but it seems like
 what you're trying to do may be related to Geoffrey Mainland's work on
 SIMD support in GHC. See [1] for his SIMD-enabled version of the
 vector library. He's also written some blog posts about this [2].

Thanks. I knew about this before (did at least some research ;-)), yet I
think the scope is somewhat different (assuming my playground work has
any scope at all...).

I think Geoffreys work goes much further than what I'm trying to
accomplish: it's not my intention to write SIMD-based code at a high
level (which is, obviously, a great thing, but beyond my capabilities as
of now I'm afraid). What I'd like to achieve is to be able to write SIMD
code at the C level (or have SIMD code generated somehow through
LLVM/Clang(/Poly?), ISPC or others), and call into this from Haskell,
providing some guarantees w.r.t. alignment (and maybe length-in-bytes as
well?) of the passed vector pointers (which you don't get when using
plain Storable vectors as far as I could find/see).

Consider it a very short-term stop-gap solution until the tools are in
place to write high-level code which yields the same performance as a
more low-level (assembly-level) implementation.

Nicolas


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-08 Thread Nicolas Trangez
On Sun, 2012-07-08 at 10:27 +0200, Gábor Lehel wrote:
 On Sun, Jul 8, 2012 at 3:05 AM, Nicolas Trangez nico...@incubaid.com wrote:
  I implemented the inductive alignment calculation over One and Twice
  (good idea, and easy to do), but I don't get the thing about
  superclasses. I've been trying several approaches (including definitions
  based on forall and other trickery I never used before), but didn't get
  things to work, at least: the compiler always said I'd need
  UndecidableInstances, and that sounds scary... Care to elaborate?
 
 All I meant was
 
 class (Alignment n, Alignment a) = AlignedToAtLeast n a
 
 but I got a bit ahead of myself, because that rules out the instance
 on tuples.

Maybe that's one of the things I ran into, can't remember OTOH.

  (I suppose you *could* write some kind of Alignment
 instance for them, taking their minimum or something, but that's
 getting a bit too subversive for me).

Heh, it already feels evil-but-in-a-good-and-powerful-way to me right
now ;-) Thanks for all your input!

  The alternative, if you want
 both Alignment as a superclass and the ability to constrain multiple
 types at once, is to have the above, remove the instance on tuples,
 and instead something like:
 
 class (AlignedToAtLeast n a, AlignedToAtLeast n b) = AlignedToAtLeast2 n a b
 instance (AlignedToAtLeast n a, AlignedToAtLeast n b) = AlignedToAtLeast2 n 
 a b
 class (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast n
 c) = AlignedToAtLeast3 n a b c
 instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
 n c) = AlignedToAtLeast3 n a b c
 (feel free to think of better names!)
 
 unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast3 SV.A16 o1 o2 o3)
 = SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

Implemented in [1]. Code says

unsafeXorSSE42 :: (Storable a,
SV.AlignedToAtLeast3 SV.A16 o1 o2 o3) =
SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

ghci says

unsafeXorSSE42
  :: (AlignedToAtLeast (Data.Vector.SIMD.Mutable.Twice A8) o2,
  AlignedToAtLeast (Data.Vector.SIMD.Mutable.Twice A8) o1,
  AlignedToAtLeast (Data.Vector.SIMD.Mutable.Twice A8) o3,
  Foreign.Storable.Storable a) =
 Vector o1 a - Vector o2 a - Vector o3 a

which is the same thing, so should do.

 That will require UndecidableInstances, but all that means is that GHC
 can't prove to itself that instance checking will terminate. So you
 could end up getting the compiler into an infinite loop (or in
 practice, to exceed its recursion limit). But it doesn't allow
 anything unsafe to happen at runtime, and there's plenty of perfectly
 good instances which terminate even if GHC can't prove it.

I see. I had some rather unsafe/undecidable-by-a-developer things in
mind, but I guess I should read up on some of the language extensions
GHC provides.

Thanks,

Nicolas

[1]
https://github.com/NicolasT/vector-simd/commit/8f934891c9630a96ce009fafa7f6ba70df306d4f





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Nicolas Trangez
All, 

After my message of yesterday [1] I got down to it and implemented
something along those lines. I created a playground repository
containing the code at [2]. Initial benchmark results at [3]. More about
the benchmark at the end of this email.

First some questions and requests for help:

- I'm stuck with a typing issue related to 'sizeOf' calculation at [4].
I tried a couple of things, but wasn't able to figure out how to fix it.
- I'm using unsafePerformIO at [5], yet I'm not certain it's OK to do
so. Are there better (safer/performant/...) ways to get this working?
- Currently Alignment phantom types (e.g. A8 and A16) are not related to
each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42)
can have this signature:

unsafeXorSSE42 :: Storable a = SV.Vector SV.A16 a - SV.Vector SV.A16 a
- SV.Vector SV.A16 a

Yet, imaging I'd have an SV.Vector SV.A32 Word8 vector at hand, the
function should accept it as well (a 32-byte aligned vector is also
16-byte aligned). Is there any way to encode this at the type level?

That's about it :-)

As of now, I only implemented a couple of the vector API functions (the
ones required to execute my benchmark). Adding the others should be
trivial.

The benchmark works with Data.Vector.{Unboxed|Storable}.Vector (UV and
SV) vectors of Word8 values, as well as my custom
Data.Vector.SIMD.Vector type (MV) using 16-byte alignment (MV.Vector
MV.A16 Word8).

benchUV, benchSV and benchMV all take 2 pre-calculated Word8 vectors of
given size (1024 and 4096) and xor them pairwise into the result using
zipWith xor. benchMVA takes 2 suitable MV vectors and xor's them into
a third using a rather simple and unoptimized C implementation using
SSE4.2 intrinsics [6]. This could be enhanced quite a bit (I guess using
the prim calling convention, FFI overhead can be reduced as well).
Currently, only vectors of a multiple of 32 bytes are supported (mostly
because of laziness on my part).

As you can see, the zipWith Data.Vector.SIMD implementation is slightly
slower than the Data.Vector.Storable based one. I didn't perform much
profiling yet, but I suspect allocation and ForeignPtr creation is to
blame, this seems to be highly optimized in
GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by
Data.Vector.Storable.

Thanks for any input,

Nicolas

[1] http://www.haskell.org/pipermail/haskell-cafe/2012-July/102167.html
[2] https://github.com/NicolasT/vector-simd/
[3] http://linode2.nicolast.be/files/vector-simd-xor1.html
[4]
https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L46
[5]
https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L43
[6]
https://github.com/NicolasT/vector-simd/blob/master/cbits/vector-simd.c#L47


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Gábor Lehel
On Sat, Jul 7, 2012 at 9:13 PM, Nicolas Trangez nico...@incubaid.com wrote:
 - Currently Alignment phantom types (e.g. A8 and A16) are not related to
 each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42)
 can have this signature:

 unsafeXorSSE42 :: Storable a = SV.Vector SV.A16 a - SV.Vector SV.A16 a
 - SV.Vector SV.A16 a

 Yet, imaging I'd have an SV.Vector SV.A32 Word8 vector at hand, the
 function should accept it as well (a 32-byte aligned vector is also
 16-byte aligned). Is there any way to encode this at the type level?

If the set of alignments you care about is finite, you could do:

class AlignedToAtLeast n a
instance AlignedToAtLeast A1 A1
instance AlignedToAtLeast A4 A1
instance AlignedToAtLeast A4 A4
instance AlignedToAtLeast A8 A1
instance AlignedToAtLeast A8 A4
instance AlignedToAtLeast A8 A8
instance AlignedToAtLeast A16 A1
instance AlignedToAtLeast A16 A4
instance AlignedToAtLeast A16 A8
instance AlignedToAtLeast A16 A16
instance AlignedToAtLeast A32 A1
instance AlignedToAtLeast A32 A4
instance AlignedToAtLeast A32 A8
instance AlignedToAtLeast A32 A16
instance AlignedToAtLeast A32 A32

in which as you can see the numbers of instances grows super-linearly
with the number of alignments, but if there's only a handful it's not
that bad. Then:

unsafeXorSSE42 :: (Storable a, AlignedToAtLeast A16 align) =
SV.Vector align a - SV.Vector align a - SV.Vector align a

A problem with the above is that third parties can add more instances,
breaking the safety. If this is a concern you could do:

class AlignedToAtLeastImpl n a -- do not export this class!
...same instances as above...
class AlignedToAtLeastImpl n a = AlignedToAtLeast n a -- export this class
instance AlignedToAtLeastImpl n a = AlignedToAtLeast n a

The drawback is that it requires UndecidableInstances, and (what
bothers me more) the list of instances won't be present in the
haddocks. So instead of the single instance above you could write all
of them again manually for the exported class, which has the drawback
that you have to write all of them again manually, but not the other
two.

An alternative solution is to encode all of the alignments in unary,
which is more general; if they're all going to be a power of two you
can store just the logarithm:

data One
data Twice n -- not practical to call it Double :)

class AlignedToAtLeast n a
instance AlignedToAtLeast One One
instance AlignedToAtLeast One (Twice a)
instance AlignedToAtLeast n a = AlignedToAtLeast (Twice n) (Twice a)

type A1 = One
type A4 = Twice (Twice A1)
type A8 = Twice A4
type A16 = Twice A8
type A32 = Twice A16

and you can apply the same private class thing from above if you want.

-- 
Your ship was caught in a monadic eruption.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Gábor Lehel
On Sat, Jul 7, 2012 at 9:59 PM, Gábor Lehel illiss...@gmail.com wrote:
 class AlignedToAtLeast n a
 instance AlignedToAtLeast A1 A1
 instance AlignedToAtLeast A4 A1
 instance AlignedToAtLeast A4 A4
 instance AlignedToAtLeast A8 A1
 instance AlignedToAtLeast A8 A4
 instance AlignedToAtLeast A8 A8
 instance AlignedToAtLeast A16 A1
 instance AlignedToAtLeast A16 A4
 instance AlignedToAtLeast A16 A8
 instance AlignedToAtLeast A16 A16
 instance AlignedToAtLeast A32 A1
 instance AlignedToAtLeast A32 A4
 instance AlignedToAtLeast A32 A8
 instance AlignedToAtLeast A32 A16
 instance AlignedToAtLeast A32 A32

Oh dang it. Sorry. All of these should be the other way around.
Proofreading doesn't help if it's your brain that's fried.


-- 
Your ship was caught in a monadic eruption.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Nicolas Trangez
On Sat, 2012-07-07 at 21:59 +0200, Gábor Lehel wrote:
 An alternative solution is to encode all of the alignments in unary,
 which is more general; if they're all going to be a power of two you
 can store just the logarithm:
 
 data One
 data Twice n -- not practical to call it Double :)
 
 class AlignedToAtLeast n a
 instance AlignedToAtLeast One One
 instance AlignedToAtLeast One (Twice a)
 instance AlignedToAtLeast n a = AlignedToAtLeast (Twice n) (Twice a)
 
 type A1 = One
 type A4 = Twice (Twice A1)
 type A8 = Twice A4
 type A16 = Twice A8
 type A32 = Twice A16
 
 and you can apply the same private class thing from above if you
 want. 

Very ingenious, thanks! I pushed this into [1], although export lists of
all modules most likely will need some love once things get into shape.

This also allows functions to become more general:

unsafeXorSSE42 :: (Storable a,
SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
SV.AlignedToAtLeast SV.A16 o3, SV.Alignment o3) =
SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

I wonder whether GHC's upcoming type-level numerals could be useful in
this situation as well.

Nicolas

[1]
https://github.com/NicolasT/vector-simd/commit/a4f13745eb24d87a3628af13109f3e1d8232c925


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Gábor Lehel
On Sun, Jul 8, 2012 at 12:21 AM, Nicolas Trangez nico...@incubaid.com wrote:
 On Sat, 2012-07-07 at 21:59 +0200, Gábor Lehel wrote:
 An alternative solution is to encode all of the alignments in unary,
 which is more general; if they're all going to be a power of two you
 can store just the logarithm:

 data One
 data Twice n -- not practical to call it Double :)

 class AlignedToAtLeast n a
 instance AlignedToAtLeast One One
 instance AlignedToAtLeast One (Twice a)
 instance AlignedToAtLeast n a = AlignedToAtLeast (Twice n) (Twice a)

 type A1 = One
 type A4 = Twice (Twice A1)
 type A8 = Twice A4
 type A16 = Twice A8
 type A32 = Twice A16

 and you can apply the same private class thing from above if you
 want.

 Very ingenious, thanks! I pushed this into [1], although export lists of
 all modules most likely will need some love once things get into shape.

 This also allows functions to become more general:

 unsafeXorSSE42 :: (Storable a,
 SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
 SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
 SV.AlignedToAtLeast SV.A16 o3, SV.Alignment o3) =
 SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

I wonder if you could get that a bit shorter... I suppose you could write:

instance (AlignedToAtLeast n a, AlignedToAtLeast n b) =
AlignedToAtLeast n (a, b)
instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
n c) = AlignedToAtLeast n (a, b, c)
...and so on...

though it feels a little strange semantically (relating a tuple to a
scalar), but I don't see what harm can come of it. And then you can
just write SV.AlignedToAtLeast SV.A16 (o1, o2, o3) in signatures. You
can also make (Alignment n, Alignment a) a superclass constraint of
AlignedToAtLeast, and write instances for Alignment inductively on One
and Twice, and then you don't have to write Alignment o1 etc.
separately either. So the signature would be just:

unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast SV.A16 (o1, o2,
o3)) =  SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a

which is friendlier.


 I wonder whether GHC's upcoming type-level numerals could be useful in
 this situation as well.

I'd guess that this is what they're made for, but I haven't tried them.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Nicolas Trangez
On Sat, 2012-07-07 at 21:13 +0200, Nicolas Trangez wrote:
 As you can see, the zipWith Data.Vector.SIMD implementation is slightly
 slower than the Data.Vector.Storable based one. I didn't perform much
 profiling yet, but I suspect allocation and ForeignPtr creation is to
 blame, this seems to be highly optimized in
 GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by
 Data.Vector.Storable.

I got the MV benchmark on-par with SV by reworking the allocation
mechanism: no more FFI involved, but based on
GHC.Exts.newAlignedPinnedByteArray# and some other trickery, see [1].
This could still be improved a little by using PlainPtr, but this is not
exported by GHC.ForeignPtr.

This did have a pretty big performance-impact on the SIMD-based
benchmark, compare [2] to the old one [3]. I have no clue why the 4096
case now only uses twice the time of the 1024 one, unlike the expected
4x (+- as before).

Nicolas

[1]
https://github.com/NicolasT/vector-simd/commit/5ec539167254435ef4e7d308706dcafae09504d2
[2] http://linode2.nicolast.be/files/vector-simd-xor2.html
[3] http://linode2.nicolast.be/files/vector-simd-xor1.html


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector-simd: some code available, and some questions

2012-07-07 Thread Nicolas Trangez
On Sun, 2012-07-08 at 01:40 +0200, Gábor Lehel wrote:
  unsafeXorSSE42 :: (Storable a,
  SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
  SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
  SV.AlignedToAtLeast SV.A16 o3, SV.Alignment o3) =
  SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a
 
 I wonder if you could get that a bit shorter... I suppose you could write:
 
 instance (AlignedToAtLeast n a, AlignedToAtLeast n b) =
 AlignedToAtLeast n (a, b)
 instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
 n c) = AlignedToAtLeast n (a, b, c)
 ...and so on...

Once again, nifty! And implemented in [1].

 though it feels a little strange semantically (relating a tuple to a
 scalar), but I don't see what harm can come of it. And then you can
 just write SV.AlignedToAtLeast SV.A16 (o1, o2, o3) in signatures.

 You
 can also make (Alignment n, Alignment a) a superclass constraint of
 AlignedToAtLeast, and write instances for Alignment inductively on One
 and Twice, and then you don't have to write Alignment o1 etc.
 separately either. So the signature would be just:
 
 unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast SV.A16 (o1, o2,
 o3)) =  SV.Vector o1 a - SV.Vector o2 a - SV.Vector o3 a
 
 which is friendlier.

I implemented the inductive alignment calculation over One and Twice
(good idea, and easy to do), but I don't get the thing about
superclasses. I've been trying several approaches (including definitions
based on forall and other trickery I never used before), but didn't get
things to work, at least: the compiler always said I'd need
UndecidableInstances, and that sounds scary... Care to elaborate?

Thanks!

Nicolas

[1]
https://github.com/NicolasT/vector-simd/commit/aedf25460b410e04a3d103befea59ebcb3903fdc


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe