All failed tests just said "application called MPI_Abort" and had no stack 
trace. They are not cuda tests. I updated SF to avoid CUDA  related 
initialization if not needed. Let's see the new test result.

not ok dm_impls_stag_tests-ex13_none_none_none_3d_par_stag_stencil_width-1
#       application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1

--Junchao Zhang


On Thu, Sep 19, 2019 at 3:57 PM Smith, Barry F. 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:

 Failed?  Means nothing, send link or cut and paste error

 It could be that since we have multiple separate tests running at the same 
time they overload the GPU or cause some inconsistent behavior that doesn't 
appear every time the tests are run.

   Barry

Maybe we need to sequentialize all the tests that use the GPUs, we just trust 
gnumake for the parallelism maybe you could some how add dependencies to get 
gnu make to achieve this?




> On Sep 19, 2019, at 3:53 PM, Zhang, Junchao 
> <jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>> wrote:
>
> On Thu, Sep 19, 2019 at 3:24 PM Smith, Barry F. 
> <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:
>
>
> > On Sep 19, 2019, at 2:50 PM, Zhang, Junchao 
> > <jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>> wrote:
> >
> > I saw your update. In PetscCUDAInitialize we have
> >
> >
> >
> >
> >
> >       /* First get the device count */
> >
> >       err   = cudaGetDeviceCount(&devCount);
> >
> >
> >
> >
> >       /* next determine the rank and then set the device via a mod */
> >
> >       ierr   = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);
> >
> >       device = rank % devCount;
> >
> >     }
> >
> >     err = cudaSetDevice(device);
> >
> >
> >
> >
> >
> > If we rely on the first CUDA call to do initialization, how could CUDA know 
> > these MPI stuff.
>
>   It doesn't, so it does whatever it does (which may be dumb).
>
>   Are you proposing something?
>
> No. My test failed in CI with -cuda_initialize 0 on frog but I could not 
> reproduce it. I'm doing investigation.
>
>   Barry
>
> >
> > --Junchao Zhang
> >
> >
> >
> > On Wed, Sep 18, 2019 at 11:42 PM Smith, Barry F. 
> > <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:
> >
> >   Fixed the docs. Thanks for pointing out the lack of clarity
> >
> >
> > > On Sep 18, 2019, at 11:25 PM, Zhang, Junchao via petsc-dev 
> > > <petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:
> > >
> > > Barry,
> > >
> > > I saw you added these in init.c
> > >
> > >
> > > +  -cuda_initialize - do the initialization in PetscInitialize()
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Notes:
> > >
> > >    Initializing cuBLAS takes about 1/2 second there it is done by default 
> > > in PetscInitialize() before logging begins
> > >
> > >
> > >
> > > But I did not get otherwise with -cuda_initialize 0, when will cuda be 
> > > initialized?
> > > --Junchao Zhang
> >

Reply via email to