Firstly, especially when you are talking about performance, please provided 
detailed information on (a) the versions of the compiler and libraries that you 
used and (b) of the command line options that you used for compilation.

Secondly, your function 'transposeP' doesn't make for a good nested 
data-parallel program. I haven't looked at the generated code, but I wouldn't 
be surprised if it doesn't optimise very well.

The core benefit of nested data parallelism is that the sub-arrays in a nested 
array of arrays can be of varying size leading to irregular parallelism. 
However, that flexibility comes at a price, namely that it is a fairly 
inefficient representation for *rectangular arrays*, such as regular 
two-dimensional matrices (as in your example). It shouldn't be quite as 
inefficient as what you report, but it will never be competitive with a 
dedicated regular representation.

Hence, for code, such as yours, we recommend to use our Repa library: 
http://hackage.haskell.org/package/repa

It generates very fast code for regular array problems, see also 
http://www.cse.unsw.edu.au/~chak/papers/LCKP12.html

Manuel


mblanco <blanco...@gmail.com>:
> Hi, I'm trying to implement a matrix product example using DPH. This is the 
> code:
> -------------------------------------------------------------------------------------------------------------------
> type MMultType = Double
> type Matrix = [:[:MMultType:]:]
> type MVector = [:MMultType:]
> type Matrix_wrapper = PArray (PArray MMultType)
> 
> {-# NOINLINE matMult_wrapper #-}
> matMult_wrapper :: Matrix_wrapper -> Matrix_wrapper -> Matrix_wrapper
> matMult_wrapper mA mB = toPArrayP (mapP toPArrayP (matMult (fromNestedPArrayP 
> mA) (fromNestedPArrayP mB)))
> 
> matMult :: Matrix -> Matrix -> Matrix
> matMult mA mB = mapP (\row -> mapP (\col -> dotp row col) (transposeP mB)) mA
> 
> dotp :: MVector -> MVector -> MMultType
> dotp row col = D.sumP (zipWithP (D.*) row col)
> 
> transposeP :: Matrix -> Matrix
> transposeP m = 
>     let
>         h = lengthP m
>         w = lengthP (m !: 0)
>         rh = I.enumFromToP 0 (h I.- 1)
>         rw = I.enumFromToP 0 (w I.- 1)
>     in
>         if h I.== 0 then [: :]
>         else mapP (\y -> mapP (\x -> m !: x !: y) rh) rw
> -------------------------------------------------------------------------------------------------------------------
> 
> My problem is at execution time, on matrices of size 300*300 the program does 
> finish (although it is very slow), but on 700*700 it consumes GBs of RAM 
> until the process is aborted.
> 
> In the paper "Work Efficient Higher-Order Vectorisation" it is explained that 
> a work complexity problem (wich involved unnecesary array replication) was 
> recently treated. So at first I thought the code implementation related to 
> the paper had not been uploaded to hackage. But as I understand it must have 
> been, as that seems to be the motive of the "dph-lifted-vseg" package.
> 
> Does anybody notice the problem with the example or if the problem is related 
> to the subject treated in the paper?
> 
> Thanks in advance!
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to