Re: [Haskell-cafe] DPH matrix product

2012-07-10 Thread Mauro Blanco
Thanks for both answers

I have used repa with the newer interface for the same example, but I
wanted to have another example using DPH. I know repa is more suited for
regular representations, but I wanted to express the same program in DPH
where I don´t have to worry of nested parallel computation.

The transposeP  did not seem to be the problem as only executing transposeP
on big matrices generated no memory issues. But, something like this (on
square matrices) still have  memory problems:
matMult :: Matrix - Matrix - Matrix
matMult mA mB = mapP (\row - mapP (\col - dotp row col) *mB*) mA

dotp :: MVector - MVector - MMultType
dotp row col = D.sumP (zipWithP (D.*) row col)

Later today (or tomorrow) I will post exact OS, GHC and libraries version
as the command line options and execution information on the simplified
example.

Thanks again

On Tue, Jul 10, 2012 at 8:43 AM, Manuel M T Chakravarty 
c...@cse.unsw.edu.au wrote:

 Firstly, especially when you are talking about performance, please
 provided detailed information on (a) the versions of the compiler and
 libraries that you used and (b) of the command line options that you used
 for compilation.

 Secondly, your function 'transposeP' doesn't make for a good nested
 data-parallel program. I haven't looked at the generated code, but I
 wouldn't be surprised if it doesn't optimise very well.

 The core benefit of nested data parallelism is that the sub-arrays in a
 nested array of arrays can be of varying size leading to irregular
 parallelism. However, that flexibility comes at a price, namely that it is
 a fairly inefficient representation for *rectangular arrays*, such as
 regular two-dimensional matrices (as in your example). It shouldn't be
 quite as inefficient as what you report, but it will never be competitive
 with a dedicated regular representation.

 Hence, for code, such as yours, we recommend to use our Repa library:
 http://hackage.haskell.org/package/repa

 It generates very fast code for regular array problems, see also
 http://www.cse.unsw.edu.au/~chak/papers/LCKP12.html

 Manuel


 mblanco blanco...@gmail.com:

 Hi, I'm trying to implement a matrix product example using DPH. This is
 the code:
 --**--**
 --**-
 type MMultType = Double
 type Matrix = [:[:MMultType:]:]
 type MVector = [:MMultType:]
 type Matrix_wrapper = PArray (PArray MMultType)

 {-# NOINLINE matMult_wrapper #-}
 matMult_wrapper :: Matrix_wrapper - Matrix_wrapper - Matrix_wrapper
 matMult_wrapper mA mB = toPArrayP (mapP toPArrayP (matMult
 (fromNestedPArrayP mA) (fromNestedPArrayP mB)))

 matMult :: Matrix - Matrix - Matrix
 matMult mA mB = mapP (\row - mapP (\col - dotp row col) (transposeP mB))
 mA

 dotp :: MVector - MVector - MMultType
 dotp row col = D.sumP (zipWithP (D.*) row col)

 transposeP :: Matrix - Matrix
 transposeP m =
 let
 h = lengthP m
 w = lengthP (m !: 0)
 rh = I.enumFromToP 0 (h I.- 1)
 rw = I.enumFromToP 0 (w I.- 1)
 in
 if h I.== 0 then [: :]
 else mapP (\y - mapP (\x - m !: x !: y) rh) rw
 --**--**
 --**-

 My problem is at execution time, on matrices of size 300*300 the program
 does finish (although it is very slow), but on 700*700 it consumes GBs of
 RAM until the process is aborted.

 In the paper Work Efficient Higher-Order Vectorisation it is explained
 that a work complexity problem (wich involved unnecesary array replication)
 was recently treated. So at first I thought the code implementation related
 to the paper had not been uploaded to hackage. But as I understand it must
 have been, as that seems to be the motive of the dph-lifted-vseg package.

 Does anybody notice the problem with the example or if the problem is
 related to the subject treated in the paper?

 Thanks in advance!
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe





-- 
Mauro Blanco
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] DPH matrix product

2012-07-10 Thread Mauro Blanco
Hi again.

This is the system information:
- Ubuntu 12.04 32-bit
- Intel® Core™2 Duo CPU T5270 @ 1.40GHz × 2
- 2.9 GiB RAM

GHC version:
- GHC 7.4.1

DPH libraries:
- dph-base-0.6.1.1
- (dph-lifted-base-0.6.1.1)
- (dph-lifted-vseg-0.6.1.2)
- (dph-prim-interface-0.6.1.1)
- (dph-prim-par-0.6.1.1)
- (dph-prim-seq-0.6.1.1)

Compilation flags:
I'm using two combinations of flags, taken from different sources. In both
cases results are identical:
- From https://github.com/ghc/packages-dph: -rtsopts -threaded -fllvm
-optlo-O3 -Odph -fcpr-off -fno-liberate-case -package dph-lifted-vseg
- From dph-examples: -rtsopts -threaded -fllvm -Odph -package
dph-lifted-vseg -fcpr-off -fno-liberate-case -fsimpl-tick-factor=1000

Execution flags:
+RTS -N

Tests:
Computing the product of two 400*400 matrices takes 6.037993 seconds.
Computing the product of two 600*600 matrices yields out of memory
(requested 1728053248 bytes).

DPH code:
-
{-# NOINLINE matMult_wrapper #-}
matMult_wrapper :: Matrix_wrapper - Matrix_wrapper - Matrix_wrapper
matMult_wrapper mA mB = toPArrayP (mapP toPArrayP (matMult
(fromNestedPArrayP mA) (fromNestedPArrayP mB)))

matMult :: Matrix - Matrix - Matrix
matMult mA mB = mapP (\row - mapP (\col - dotp row col) mB) mA
-- I removed the call to transposeP, so It's no longer an actual matrix
product

dotp :: MVector - MVector - MMultType
dotp row col = D.sumP (zipWithP (D.*) row col)
-
I'm attaching the files with the full example code
If there is any other information needed, please let me know

Any help is very appreciated

On Tue, Jul 10, 2012 at 9:51 AM, Mauro Blanco blanco...@gmail.com wrote:

 Thanks for both answers

 I have used repa with the newer interface for the same example, but I
 wanted to have another example using DPH. I know repa is more suited for
 regular representations, but I wanted to express the same program in DPH
 where I don´t have to worry of nested parallel computation.

 The transposeP  did not seem to be the problem as only
 executing transposeP on big matrices generated no memory issues. But,
 something like this (on square matrices) still have  memory problems:
 matMult :: Matrix - Matrix - Matrix
 matMult mA mB = mapP (\row - mapP (\col - dotp row col) *mB*) mA

 dotp :: MVector - MVector - MMultType
 dotp row col = D.sumP (zipWithP (D.*) row col)

 Later today (or tomorrow) I will post exact OS, GHC and libraries version
 as the command line options and execution information on the simplified
 example.

 Thanks again


 On Tue, Jul 10, 2012 at 8:43 AM, Manuel M T Chakravarty 
 c...@cse.unsw.edu.au wrote:

 Firstly, especially when you are talking about performance, please
 provided detailed information on (a) the versions of the compiler and
 libraries that you used and (b) of the command line options that you used
 for compilation.

 Secondly, your function 'transposeP' doesn't make for a good nested
 data-parallel program. I haven't looked at the generated code, but I
 wouldn't be surprised if it doesn't optimise very well.

 The core benefit of nested data parallelism is that the sub-arrays in a
 nested array of arrays can be of varying size leading to irregular
 parallelism. However, that flexibility comes at a price, namely that it is
 a fairly inefficient representation for *rectangular arrays*, such as
 regular two-dimensional matrices (as in your example). It shouldn't be
 quite as inefficient as what you report, but it will never be competitive
 with a dedicated regular representation.

 Hence, for code, such as yours, we recommend to use our Repa library:
 http://hackage.haskell.org/package/repa

  It generates very fast code for regular array problems, see also
 http://www.cse.unsw.edu.au/~chak/papers/LCKP12.html

 Manuel


 mblanco blanco...@gmail.com:

 Hi, I'm trying to implement a matrix product example using DPH. This is
 the code:
 --**--**
 --**-
 type MMultType = Double
 type Matrix = [:[:MMultType:]:]
 type MVector = [:MMultType:]
 type Matrix_wrapper = PArray (PArray MMultType)

 {-# NOINLINE matMult_wrapper #-}
 matMult_wrapper :: Matrix_wrapper - Matrix_wrapper - Matrix_wrapper
 matMult_wrapper mA mB = toPArrayP (mapP toPArrayP (matMult
 (fromNestedPArrayP mA) (fromNestedPArrayP mB)))

 matMult :: Matrix - Matrix - Matrix
 matMult mA mB = mapP (\row - mapP (\col - dotp row col) (transposeP
 mB)) mA

 dotp :: MVector - MVector - MMultType
 dotp row col = D.sumP (zipWithP (D.*) row col)

 transposeP :: Matrix - Matrix
 transposeP m =
 let
 h = lengthP m
 w = lengthP (m !: 0)
 rh = I.enumFromToP 0 (h I.- 1)
 rw = I.enumFromToP 0 (w I.- 1)
 in
 if h I.== 0 then [: :]
 else mapP (\y - mapP (\x - m !: x !: y) rh) rw