our testbox has 2gpus balay at bb30:~>lspci |grep -i nvidia 0b:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1) 0c:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1) balay at bb30:~>
Is there some test I can run on this? [it has cuda 4.0] satish On Sat, 1 Oct 2011, Matthew Knepley wrote: > This diagnosis is total crap (I think), as I tried to explain. We would > never get the same result (or the right result), > and partitioning makes no sense. Something else is going on. Can't we run on > a 2 GPU system at ANL? > > Matt > > On Sat, Oct 1, 2011 at 9:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: > > > > > On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote: > > > > > Hi Barry, > > > > > > I've sent a couple more emails on this topic. What I am trying to do at > > the > > > moment is to figure out how to have a problem run on only one gpu if it > > will > > > fit in the memory of that gpu. Back in April when I had built petsc-dev > > with > > > Cuda 3.2, petsc would only use one gpu if you had multiple gpus on your > > > machine. In order to use multiple gpus for a problem, one had to use > > > multiple threads with a separate thread assigned to control each gpu. > > But > > > Cuda 4.0 has, I believe, made that transparent and under the hood. So > > now > > > when I run a small example problem such as > > > src/ksp/ksp/examples/tutorials/ex2f.F with an 800x800 problem, it gets > > > partitioned to run on both of the gpus in my machine. The result is a > > very > > > large performance hit because of communication back and forth from one > > gpu to > > > the other via the cpu. > > > > How do you know there is lots of communication from the GPU to the CPU? > > In the -log_summary? Nope because PETSc does not manage anything like that > > (that is one CPU process using both GPUs). > > > > > > > So this problem with a 3200x3200 grid runs 5x slower > > > now than it did with Cuda 3.2. I believe if one is programming down at > > the > > > cuda level, it is possible to have a smaller problem run on only one gpu > > so > > > that there is communication only between the cpu and gpu and only at the > > > start and end of the calculation. > > > > > > To me, it seems like what is needed is a petsc option to specify the > > number > > > of gpus to run on that can somehow get passed down to the cuda level > > through > > > cusp and thrust. I fear that the short term solution is going to have to > > be > > > for me to pull one of the gpus out of my desktop system but it would be > > nice > > > if there was a way to tell petsc and friends to just use one gpu when I > > want > > > it to. > > > > > > If necessary, I can send a couple of log files to demonstrate what I am > > > trying to describe regarding the performance hit. > > > > I am not convinced that the poor performance you are getting now has > > anything to do with using both GPUs. Please run > > a PETSc program with the command -cuda_show_devices > > > > What are the choices? You can then pick one of them and run with > > -cuda_set_device integer > > > > Does this change things? > > > > Barry > > > > > > > > Thanks, > > > > > > Dave > > > > > > Barry Smith writes: > > >> Dave, > > >> > > >> We have no mechanism in the PETSc code for a PETSc single CPU process to > > >> use two GPUs at the same time. However you could have two MPI processes > > >> each using their own GPU. > > >> > > >> The one tricky part is you need to make sure each MPI process uses a > > >> different GPU. We currently do not have a mechanism to do this > > assignment > > >> automatically. I think it can be done with cudaSetDevice(). But I don't > > >> know the details, sending this to petsc-dev at mcs.anl.gov where more > > people > > >> may know. > > >> > > >> PETSc-folks, > > >> > > >> We need a way to have this setup automatically. > > >> > > >> Barry > > >> > > >> On Oct 1, 2011, at 5:43 PM, Dave Nystrom wrote: > > >> > > >>> I'm running petsc on a machine with Cuda 4.0 and 2 gpus. This is a > > desktop > > >>> machine with a single processor. I know that Cuda 4.0 has support for > > >>> running on multiple gpus but don't know if petsc uses that. But > > suppose I > > >>> have a problem that will fit in the memory for a single gpu. Will > > petsc run > > >>> the problem on a single gpu or does it split it between the 2 gpus and > > incur > > >>> the communication overhead of copying data between the two gpus? > > >>> > > >>> Thanks, > > >>> > > >>> Dave > > >>> > > >> > > > > > > >