---------- Forwarded message ---------- From: Leandro Demarco Vedelago <[email protected]> Date: Mon, Jul 30, 2012 at 2:57 PM Subject: Re: [PyCUDA] Performance Issues To: Brendan Wood <[email protected]>, [email protected]
Brendan: Basically, all the examples are computing the dot product of 2 large vectors. But in each example some new concept is introduced (pinned memory, streams, etc). The last example is the one that incorporates multiple-gpu. As for the work done, I am generating the data randomly and, making some tests at the end in the host side, which considerably increases ex ecution time, but as this are "learning examples" I was not specially worried about it. But I would have expected that given that the server has way more powerful hardware (the 3 teslas 2075 and 4 intel xeon with 6 cores each and 48 GB ram) programs would run faster, in particular this last example that is designed to work with multiples-gpu's. I compiled and ran the bandwith test and the queryDevice samples from the SDK and they both passed, if that is what you meant. Now answering to Andreas: yes, I'm using one thread per each GPU (as the way it's done in the wiki example) and yes, the server has way more than 3 CPU's. As for the SCHED_BLOCKING_SYNC flag, should I pass it as an argument for each device context. What does this flag do? Thank you both for your answers On Mon, Jul 30, 2012 at 12:47 AM, Brendan Wood <[email protected]> wrote: > Hi Leandro, > > Without knowing exactly what examples you're running, it may be hard to > say what the problem is. In fact, you may not really have a problem. > > How much work is being done in each example program? Is it enough to > really work the GPU, or is communication and other overhead dominating > runtime? Note that laptops may have lower communication latency over > the PCI bus than desktops/servers, which can make small programs run > much faster on laptops regardless of how much processing power the GPU > has. > > Have you tried running the sample code from the SDK, so that you can > verify that it's not a code problem? > > Regards, > > Brendan Wood > > > On Sun, 2012-07-29 at 23:59 -0300, Leandro Demarco Vedelago wrote: >> Hello: I've been reading and learning CUDA in the last few weeks and >> last week I started writing (translating to Pycuda from Cuda-C) some >> examples taken from the book "Cuda by Example". >> I started coding on a laptop with just one nvidia GPU (a gtx 560M if >> my memory is allright) with Windows 7. >> >> But in the project I'm currently working at, we intend to run (py)cuda >> on a multi-gpu server that has three Tesla C2075 cards. >> >> So I installed Ubuntu server 10.10 (with no GUI) and managed to >> install and get running the very same examples I ran on the single-gpu >> laptop. However they run really slow, in some cases it takes 3 times >> more than in the laptop. And this happens with most, if not all, the >> examples I wrote. >> >> I thought it could be a driver issue but I double-checked and I've >> installed the correct ones, meaning those listed on the CUDA Zone >> section of nvidia.com for linux 64-bits. So I'm kind of lost right now >> and was wondering if anyone has had this or somewhat similar problem >> running on a server. >> >> Sorry for the English, but it's not my native language. >> >> Thanks in advance, Leandro Demarco >> >> _______________________________________________ >> PyCUDA mailing list >> [email protected] >> http://lists.tiker.net/listinfo/pycuda > > _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
