> droundy > On Wed, Apr 23, 2008 at 05:24:20PM +0100, Sebastian Sylvan wrote: > > On Wed, Apr 23, 2008 at 5:01 PM, Tillmann Vogt <Tillmann.Vogt <at> > > rwth-aachen.de> > > wrote: > > > > > Hi, > > > > > > I am currently experimenting with parallelizing C-programs. I have > > > therefore written a matrix vector multiplication example that needs 13 > > > seconds to run (5 seconds with OpenMP). Because I like Haskell I did the > > > same in this language, but it takes about 134 seconds. Why is it so slow? > > > Does someone have an idea? > > > > > Yes, in the C version you use unboxed arrays, in the Haskell version you use > > a linked list of linked lists. Naturally the latter will take up more space, > > require more work to index, and will thrash the cache quite a bit. > > In fact, I'm impressed that Haskell can come within a factor of 10, given > that it's using such a challenging data type! (I wonder if this may be due > to inefficiencies in the C code, although I haven't looked at it.)
I had a look at this code, using the (unboxed, strict, fused) data parallel arrays library (http://darcs.haskell.org/packages/ndp). We get rather nice code. The original C program (modified, to sum the result, rather than just filling the array and throwing it out): #include <stdio.h> #include <stdlib.h> #include <time.h> #define M 4000 #define N 4000 #define IT 100 double a[M], b[M][N], c[N]; int main(int argc, char *argv[]) { double d; double sum; int i, j, l; time_t start,end; printf("Initializing matrix B and vector C\n"); for(j=0; j<N; j++) c[j] = 2.0; for(i=0; i<M; i++) for(j=0; j<N; j++) b[i][j] = j; printf("Executing %d matrix mult. for M = %d N = %d\n",IT,M,N); time (&start); sum = 0; for(i=0; i<M; i++) for (j=0; j<N; j++) sum += b[i][j]*c[j]; time (&end); printf ("result: %.2lf\n", sum ); d = difftime (end,start); printf ("calculation time: %.2lf seconds\n", d ); return 0; } And then rewritten in a data parallel style (but note, no parallelism here, just using the nice interface to STUArrays, with stream fusion): import Data.Array.Parallel.Unlifted import Text.Printf m = 4000 n = 4000 it = 100 main = printf "result: %2f\n" . sumU . zipWithU (*) (concatSU b) $ repeatU m c where -- c[N] c = replicateU n (2.0 :: Double) -- b[M][N] b = replicateCU m t where t = mapU fromIntegral (enumFromToU 0 (n-1)) :: UArr Double Finds 4 fusion sites, and runs in: $ time ./a result: 63984000000.0 ./a 0.10s user 0.00s system 112% cpu 0.089 total While the C program, $ time ./a.out Initializing matrix B and vector C Executing 100 matrix mult. for M = 4000 N = 4000 result: 63984000000.00 calculation time: 0.00 seconds ./a.out 0.14s user 0.09s system 92% cpu 0.248 total GHC rocks :) NDP programming encourages a nice combinator style too, which is interesting (and Haskell-ish). The compiler is able to fuse array most of the arrays, (and Roman reports this should do rather better with ghc head). Anyway, we've a few new (fast!) array options on the horizon, looking promising. For more info, check out the data parallel arrays page, http://haskell.org/haskellwiki/GHC/Data_Parallel_Haskell -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe