Re: [Haskell-cafe] Data.Array.Accelerate initialization timings

2012-02-22 Thread Manuel M T Chakravarty
Martin Dybdal:
> On 20 February 2012 16:39, Paul Sujkov  wrote:
>> Ah, it seems that I see now what's going wrong way. I'm not using the 'run'
>> function from the CUDA backend, and so by default I guess the code is
>> interpreted (the test backend used for semantics check). However, it's not
>> perfectly clear how to use CUDA backend explicitly.
> 
> Neither the interpreter or the CUDA code are used in your example.
> Everything in Data.Array.Accelerate are front-end stuff, your arrays
> are allocated on the host, so it is here there is an inefficiency.
> 
> The "use" method inserts a statement in the syntax tree generated by
> the front-end, which the back-end can use as a hint to transfer that
> array to the GPU, while compiling the rest of the program into CUDA
> code. The Data.Array.Accelerate.CUDA.run function is the one that
> actually moves the arrays to the GPU.
> 
> I haven't tried executing your code and I'm not sure why the front-end
> is that slow.

The 'fromList' function is mostly meant for testing or to initialise small 
arrays.  It is not particularly optimised.  (Going via a vanilla list is just a 
bad idea if you want performance.)

For efficient data marshalling have a look at the modules under 
Data.Array.Accelerate.IO.

Manuel


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Array.Accelerate initialization timings

2012-02-22 Thread Martin Dybdal
On 20 February 2012 16:39, Paul Sujkov  wrote:
> Ah, it seems that I see now what's going wrong way. I'm not using the 'run'
> function from the CUDA backend, and so by default I guess the code is
> interpreted (the test backend used for semantics check). However, it's not
> perfectly clear how to use CUDA backend explicitly.

Neither the interpreter or the CUDA code are used in your example.
Everything in Data.Array.Accelerate are front-end stuff, your arrays
are allocated on the host, so it is here there is an inefficiency.

The "use" method inserts a statement in the syntax tree generated by
the front-end, which the back-end can use as a hint to transfer that
array to the GPU, while compiling the rest of the program into CUDA
code. The Data.Array.Accelerate.CUDA.run function is the one that
actually moves the arrays to the GPU.

I haven't tried executing your code and I'm not sure why the front-end
is that slow.

-- 
Martin Dybdal

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Array.Accelerate initialization timings

2012-02-20 Thread Paul Sujkov
Ah, it seems that I see now what's going wrong way. I'm not using the 'run'
function from the CUDA backend, and so by default I guess the code is
interpreted (the test backend used for semantics check). However, it's not
perfectly clear how to use CUDA backend explicitly.

If you have any suggestions, it would be a great help!

On 20 February 2012 16:06, Alex Gremm  wrote:

> Hi Paul,
>
> even though I just started reading about Accelerate, it seems to me that
> you didn't use the "use" method which according to [1] initiates
> asynchronous data transfer from host to GPU.
>
>
> Cheers,
> Alex
>
> [1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf
> On 20/02/12 14:46, Paul Sujkov wrote:
> > Hi everyone,
> >
> > since accelerate mail list seems to be defunct, I'm trying to ask
> > specific questions here. The problem is: array initialization in
> > Data.Array.Accelerate takes a 10x amount of time in contrast to both
> > Data.Array and bare C++ CUDA array initialization. This can be due to
> > Data.Array.Accelerate having two backends (however, it's own tests show
> > that my nVidia card is CUDA-capable), but I'm not aware of how can I
> > profile GPU to check whether it is used or not. Anyway, here's code:
> >
> > http://hpaste.org/64036
> >
> > both generateArray (DIM3) and generateArray1 (DIM1) take the same amount
> > of time to initialize array. I'd say the problem is in GPU memory
> > copying time, but here's bare C++ code:
> >
> > http://hpaste.org/64037
> >
> > which does exactly the same, but 10 times faster. I'm wandering what am
> > I doing wrong and how to check if I really am. Thanks in advance if
> > anyone can point me on my mistakes!
> >
> > --
> > Regards, Paul Sujkov
> >
> >
> > ___
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


-- 
Regards, Paul Sujkov
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Array.Accelerate initialization timings

2012-02-20 Thread Paul Sujkov
Yep. It doesn't help:

generateArray1 n = Acc.use $ Acc.fromList (Acc.Z Acc.:. n*n*n) [0..n*n*n]

still takes the same amount of time. I guess something's wrong elsewhere.

On 20 February 2012 16:06, Alex Gremm  wrote:

> Hi Paul,
>
> even though I just started reading about Accelerate, it seems to me that
> you didn't use the "use" method which according to [1] initiates
> asynchronous data transfer from host to GPU.
>
>
> Cheers,
> Alex
>
> [1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf
> On 20/02/12 14:46, Paul Sujkov wrote:
> > Hi everyone,
> >
> > since accelerate mail list seems to be defunct, I'm trying to ask
> > specific questions here. The problem is: array initialization in
> > Data.Array.Accelerate takes a 10x amount of time in contrast to both
> > Data.Array and bare C++ CUDA array initialization. This can be due to
> > Data.Array.Accelerate having two backends (however, it's own tests show
> > that my nVidia card is CUDA-capable), but I'm not aware of how can I
> > profile GPU to check whether it is used or not. Anyway, here's code:
> >
> > http://hpaste.org/64036
> >
> > both generateArray (DIM3) and generateArray1 (DIM1) take the same amount
> > of time to initialize array. I'd say the problem is in GPU memory
> > copying time, but here's bare C++ code:
> >
> > http://hpaste.org/64037
> >
> > which does exactly the same, but 10 times faster. I'm wandering what am
> > I doing wrong and how to check if I really am. Thanks in advance if
> > anyone can point me on my mistakes!
> >
> > --
> > Regards, Paul Sujkov
> >
> >
> > ___
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


-- 
Regards, Paul Sujkov
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Array.Accelerate initialization timings

2012-02-20 Thread Paul Sujkov
Hi Alex,

I've seen that method, but I don't see how can I use it for initialization
purposes. It creates Acc (Array e i) from Array e i, but what should I do
while I don't have Array yet? And when Array is already initialized, 'use'
will transfer it to GPU memory which will add some extra timings, but won't
optimize what I have at the moment.

However, I'll try to use it somehow. Maybe I misunderstand the mechanics.
Thanks a lot!

On 20 February 2012 16:06, Alex Gremm  wrote:

> Hi Paul,
>
> even though I just started reading about Accelerate, it seems to me that
> you didn't use the "use" method which according to [1] initiates
> asynchronous data transfer from host to GPU.
>
>
> Cheers,
> Alex
>
> [1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf
> On 20/02/12 14:46, Paul Sujkov wrote:
> > Hi everyone,
> >
> > since accelerate mail list seems to be defunct, I'm trying to ask
> > specific questions here. The problem is: array initialization in
> > Data.Array.Accelerate takes a 10x amount of time in contrast to both
> > Data.Array and bare C++ CUDA array initialization. This can be due to
> > Data.Array.Accelerate having two backends (however, it's own tests show
> > that my nVidia card is CUDA-capable), but I'm not aware of how can I
> > profile GPU to check whether it is used or not. Anyway, here's code:
> >
> > http://hpaste.org/64036
> >
> > both generateArray (DIM3) and generateArray1 (DIM1) take the same amount
> > of time to initialize array. I'd say the problem is in GPU memory
> > copying time, but here's bare C++ code:
> >
> > http://hpaste.org/64037
> >
> > which does exactly the same, but 10 times faster. I'm wandering what am
> > I doing wrong and how to check if I really am. Thanks in advance if
> > anyone can point me on my mistakes!
> >
> > --
> > Regards, Paul Sujkov
> >
> >
> > ___
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


-- 
Regards, Paul Sujkov
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Data.Array.Accelerate initialization timings

2012-02-20 Thread Paul Sujkov
Hi everyone,

since accelerate mail list seems to be defunct, I'm trying to ask specific
questions here. The problem is: array initialization in
Data.Array.Accelerate takes a 10x amount of time in contrast to both
Data.Array and bare C++ CUDA array initialization. This can be due to
Data.Array.Accelerate having two backends (however, it's own tests show
that my nVidia card is CUDA-capable), but I'm not aware of how can I
profile GPU to check whether it is used or not. Anyway, here's code:

http://hpaste.org/64036

both generateArray (DIM3) and generateArray1 (DIM1) take the same amount of
time to initialize array. I'd say the problem is in GPU memory copying
time, but here's bare C++ code:

http://hpaste.org/64037

which does exactly the same, but 10 times faster. I'm wandering what am I
doing wrong and how to check if I really am. Thanks in advance if anyone
can point me on my mistakes!

-- 
Regards, Paul Sujkov
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe