Failed?  Means nothing, send link or cut and paste error

 It could be that since we have multiple separate tests running at the same 
time they overload the GPU or cause some inconsistent behavior that doesn't 
appear every time the tests are run.

   Barry

Maybe we need to sequentialize all the tests that use the GPUs, we just trust 
gnumake for the parallelism maybe you could some how add dependencies to get 
gnu make to achieve this?


 

> On Sep 19, 2019, at 3:53 PM, Zhang, Junchao <jczh...@mcs.anl.gov> wrote:
> 
> On Thu, Sep 19, 2019 at 3:24 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> 
> 
> > On Sep 19, 2019, at 2:50 PM, Zhang, Junchao <jczh...@mcs.anl.gov> wrote:
> > 
> > I saw your update. In PetscCUDAInitialize we have
> > 
> >     
> > 
> > 
> > 
> >       /* First get the device count */
> > 
> >       err   = cudaGetDeviceCount(&devCount);
> > 
> > 
> > 
> > 
> >       /* next determine the rank and then set the device via a mod */
> > 
> >       ierr   = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);
> > 
> >       device = rank % devCount;
> > 
> >     }
> > 
> >     err = cudaSetDevice(device);
> > 
> > 
> > 
> > 
> > 
> > If we rely on the first CUDA call to do initialization, how could CUDA know 
> > these MPI stuff.
> 
>   It doesn't, so it does whatever it does (which may be dumb).
> 
>   Are you proposing something?
> 
> No. My test failed in CI with -cuda_initialize 0 on frog but I could not 
> reproduce it. I'm doing investigation. 
> 
>   Barry
> 
> > 
> > --Junchao Zhang
> > 
> > 
> > 
> > On Wed, Sep 18, 2019 at 11:42 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> > 
> >   Fixed the docs. Thanks for pointing out the lack of clarity
> > 
> > 
> > > On Sep 18, 2019, at 11:25 PM, Zhang, Junchao via petsc-dev 
> > > <petsc-dev@mcs.anl.gov> wrote:
> > > 
> > > Barry,
> > > 
> > > I saw you added these in init.c
> > > 
> > > 
> > > +  -cuda_initialize - do the initialization in PetscInitialize()
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Notes:
> > > 
> > >    Initializing cuBLAS takes about 1/2 second there it is done by default 
> > > in PetscInitialize() before logging begins
> > > 
> > > 
> > > 
> > > But I did not get otherwise with -cuda_initialize 0, when will cuda be 
> > > initialized?
> > > --Junchao Zhang
> > 

Reply via email to