Re: [Haskell-cafe] Data.Array.Accelerate initialization timings
Martin Dybdal: > On 20 February 2012 16:39, Paul Sujkov wrote: >> Ah, it seems that I see now what's going wrong way. I'm not using the 'run' >> function from the CUDA backend, and so by default I guess the code is >> interpreted (the test backend used for semantics check). However, it's not >> perfectly clear how to use CUDA backend explicitly. > > Neither the interpreter or the CUDA code are used in your example. > Everything in Data.Array.Accelerate are front-end stuff, your arrays > are allocated on the host, so it is here there is an inefficiency. > > The "use" method inserts a statement in the syntax tree generated by > the front-end, which the back-end can use as a hint to transfer that > array to the GPU, while compiling the rest of the program into CUDA > code. The Data.Array.Accelerate.CUDA.run function is the one that > actually moves the arrays to the GPU. > > I haven't tried executing your code and I'm not sure why the front-end > is that slow. The 'fromList' function is mostly meant for testing or to initialise small arrays. It is not particularly optimised. (Going via a vanilla list is just a bad idea if you want performance.) For efficient data marshalling have a look at the modules under Data.Array.Accelerate.IO. Manuel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Data.Array.Accelerate initialization timings
On 20 February 2012 16:39, Paul Sujkov wrote: > Ah, it seems that I see now what's going wrong way. I'm not using the 'run' > function from the CUDA backend, and so by default I guess the code is > interpreted (the test backend used for semantics check). However, it's not > perfectly clear how to use CUDA backend explicitly. Neither the interpreter or the CUDA code are used in your example. Everything in Data.Array.Accelerate are front-end stuff, your arrays are allocated on the host, so it is here there is an inefficiency. The "use" method inserts a statement in the syntax tree generated by the front-end, which the back-end can use as a hint to transfer that array to the GPU, while compiling the rest of the program into CUDA code. The Data.Array.Accelerate.CUDA.run function is the one that actually moves the arrays to the GPU. I haven't tried executing your code and I'm not sure why the front-end is that slow. -- Martin Dybdal ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Data.Array.Accelerate initialization timings
Ah, it seems that I see now what's going wrong way. I'm not using the 'run' function from the CUDA backend, and so by default I guess the code is interpreted (the test backend used for semantics check). However, it's not perfectly clear how to use CUDA backend explicitly. If you have any suggestions, it would be a great help! On 20 February 2012 16:06, Alex Gremm wrote: > Hi Paul, > > even though I just started reading about Accelerate, it seems to me that > you didn't use the "use" method which according to [1] initiates > asynchronous data transfer from host to GPU. > > > Cheers, > Alex > > [1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf > On 20/02/12 14:46, Paul Sujkov wrote: > > Hi everyone, > > > > since accelerate mail list seems to be defunct, I'm trying to ask > > specific questions here. The problem is: array initialization in > > Data.Array.Accelerate takes a 10x amount of time in contrast to both > > Data.Array and bare C++ CUDA array initialization. This can be due to > > Data.Array.Accelerate having two backends (however, it's own tests show > > that my nVidia card is CUDA-capable), but I'm not aware of how can I > > profile GPU to check whether it is used or not. Anyway, here's code: > > > > http://hpaste.org/64036 > > > > both generateArray (DIM3) and generateArray1 (DIM1) take the same amount > > of time to initialize array. I'd say the problem is in GPU memory > > copying time, but here's bare C++ code: > > > > http://hpaste.org/64037 > > > > which does exactly the same, but 10 times faster. I'm wandering what am > > I doing wrong and how to check if I really am. Thanks in advance if > > anyone can point me on my mistakes! > > > > -- > > Regards, Paul Sujkov > > > > > > ___ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > -- Regards, Paul Sujkov ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Data.Array.Accelerate initialization timings
Yep. It doesn't help: generateArray1 n = Acc.use $ Acc.fromList (Acc.Z Acc.:. n*n*n) [0..n*n*n] still takes the same amount of time. I guess something's wrong elsewhere. On 20 February 2012 16:06, Alex Gremm wrote: > Hi Paul, > > even though I just started reading about Accelerate, it seems to me that > you didn't use the "use" method which according to [1] initiates > asynchronous data transfer from host to GPU. > > > Cheers, > Alex > > [1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf > On 20/02/12 14:46, Paul Sujkov wrote: > > Hi everyone, > > > > since accelerate mail list seems to be defunct, I'm trying to ask > > specific questions here. The problem is: array initialization in > > Data.Array.Accelerate takes a 10x amount of time in contrast to both > > Data.Array and bare C++ CUDA array initialization. This can be due to > > Data.Array.Accelerate having two backends (however, it's own tests show > > that my nVidia card is CUDA-capable), but I'm not aware of how can I > > profile GPU to check whether it is used or not. Anyway, here's code: > > > > http://hpaste.org/64036 > > > > both generateArray (DIM3) and generateArray1 (DIM1) take the same amount > > of time to initialize array. I'd say the problem is in GPU memory > > copying time, but here's bare C++ code: > > > > http://hpaste.org/64037 > > > > which does exactly the same, but 10 times faster. I'm wandering what am > > I doing wrong and how to check if I really am. Thanks in advance if > > anyone can point me on my mistakes! > > > > -- > > Regards, Paul Sujkov > > > > > > ___ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > -- Regards, Paul Sujkov ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Data.Array.Accelerate initialization timings
Hi Alex, I've seen that method, but I don't see how can I use it for initialization purposes. It creates Acc (Array e i) from Array e i, but what should I do while I don't have Array yet? And when Array is already initialized, 'use' will transfer it to GPU memory which will add some extra timings, but won't optimize what I have at the moment. However, I'll try to use it somehow. Maybe I misunderstand the mechanics. Thanks a lot! On 20 February 2012 16:06, Alex Gremm wrote: > Hi Paul, > > even though I just started reading about Accelerate, it seems to me that > you didn't use the "use" method which according to [1] initiates > asynchronous data transfer from host to GPU. > > > Cheers, > Alex > > [1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf > On 20/02/12 14:46, Paul Sujkov wrote: > > Hi everyone, > > > > since accelerate mail list seems to be defunct, I'm trying to ask > > specific questions here. The problem is: array initialization in > > Data.Array.Accelerate takes a 10x amount of time in contrast to both > > Data.Array and bare C++ CUDA array initialization. This can be due to > > Data.Array.Accelerate having two backends (however, it's own tests show > > that my nVidia card is CUDA-capable), but I'm not aware of how can I > > profile GPU to check whether it is used or not. Anyway, here's code: > > > > http://hpaste.org/64036 > > > > both generateArray (DIM3) and generateArray1 (DIM1) take the same amount > > of time to initialize array. I'd say the problem is in GPU memory > > copying time, but here's bare C++ code: > > > > http://hpaste.org/64037 > > > > which does exactly the same, but 10 times faster. I'm wandering what am > > I doing wrong and how to check if I really am. Thanks in advance if > > anyone can point me on my mistakes! > > > > -- > > Regards, Paul Sujkov > > > > > > ___ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > -- Regards, Paul Sujkov ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Data.Array.Accelerate initialization timings
Hi everyone, since accelerate mail list seems to be defunct, I'm trying to ask specific questions here. The problem is: array initialization in Data.Array.Accelerate takes a 10x amount of time in contrast to both Data.Array and bare C++ CUDA array initialization. This can be due to Data.Array.Accelerate having two backends (however, it's own tests show that my nVidia card is CUDA-capable), but I'm not aware of how can I profile GPU to check whether it is used or not. Anyway, here's code: http://hpaste.org/64036 both generateArray (DIM3) and generateArray1 (DIM1) take the same amount of time to initialize array. I'd say the problem is in GPU memory copying time, but here's bare C++ code: http://hpaste.org/64037 which does exactly the same, but 10 times faster. I'm wandering what am I doing wrong and how to check if I really am. Thanks in advance if anyone can point me on my mistakes! -- Regards, Paul Sujkov ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe