[PyCUDA] Fwd: Performance Issues

Leandro Demarco Vedelago Mon, 30 Jul 2012 11:23:41 -0700

---------- Forwarded message ----------
From: Leandro Demarco Vedelago <[email protected]>
Date: Mon, Jul 30, 2012 at 2:57 PM
Subject: Re: [PyCUDA] Performance Issues
To: Brendan Wood <[email protected]>, [email protected]

Brendan:
Basically, all the examples are computing the dot product of 2 large
vectors. But in each example some new concept is introduced (pinned
memory, streams, etc).
The last example is the one that incorporates multiple-gpu.

As for the work done, I am generating the data randomly and, making
some tests at the end in the host side, which considerably increases
ex ecution time, but as this are "learning examples" I was not
specially worried about it. But I would have expected that given that
the server has way more powerful hardware (the 3 teslas 2075 and 4
intel xeon with 6 cores each and 48 GB ram) programs would run faster,
in particular this last example that is designed to work with
multiples-gpu's.

I compiled and ran the bandwith test and the queryDevice samples from
the SDK and they both passed, if that is what you meant.

Now answering to Andreas:
yes, I'm using one thread per each GPU (as the way it's done in the
wiki example) and yes, the server has way more than 3 CPU's. As for
the SCHED_BLOCKING_SYNC flag, should I pass it as an argument for each
device context. What does this flag do?

Thank you both for your answers

On Mon, Jul 30, 2012 at 12:47 AM, Brendan Wood <[email protected]> wrote:
> Hi Leandro,
>
> Without knowing exactly what examples you're running, it may be hard to
> say what the problem is.  In fact, you may not really have a problem.
>
> How much work is being done in each example program?  Is it enough to
> really work the GPU, or is communication and other overhead dominating
> runtime?  Note that laptops may have lower communication latency over
> the PCI bus than desktops/servers, which can make small programs run
> much faster on laptops regardless of how much processing power the GPU
> has.
>
> Have you tried running the sample code from the SDK, so that you can
> verify that it's not a code problem?
>
> Regards,
>
> Brendan Wood
>
>
> On Sun, 2012-07-29 at 23:59 -0300, Leandro Demarco Vedelago wrote:
>> Hello: I've been reading and learning CUDA in the last few weeks and
>> last week I started writing (translating to Pycuda from Cuda-C) some
>> examples taken from the book "Cuda by Example".
>> I started coding on a laptop with just one nvidia GPU (a gtx 560M if
>> my memory is allright) with Windows 7.
>>
>> But in the project I'm currently working at, we intend to run (py)cuda
>> on a multi-gpu server that has three Tesla C2075 cards.
>>
>> So I installed Ubuntu server 10.10 (with no  GUI) and managed to
>> install and get running the very same examples I ran on the single-gpu
>> laptop. However they run really slow, in some cases it takes 3 times
>> more than in the laptop. And this happens with most, if not all, the
>> examples I wrote.
>>
>> I thought it could be a driver issue but I double-checked and I've
>> installed the correct ones, meaning those listed on the CUDA Zone
>> section of nvidia.com for linux 64-bits. So I'm kind of lost right now
>> and was wondering if anyone has had this or somewhat similar problem
>> running on a server.
>>
>> Sorry for the English, but it's not my native language.
>>
>> Thanks in advance, Leandro Demarco
>>
>> _______________________________________________
>> PyCUDA mailing list
>> [email protected]
>> http://lists.tiker.net/listinfo/pycuda
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

[PyCUDA] Fwd: Performance Issues

Reply via email to