I saw your update. In PetscCUDAInitialize we have /* First get the device count */ err = cudaGetDeviceCount(&devCount);
/* next determine the rank and then set the device via a mod */ ierr = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr); device = rank % devCount; } err = cudaSetDevice(device); If we rely on the first CUDA call to do initialization, how could CUDA know these MPI stuff. --Junchao Zhang On Wed, Sep 18, 2019 at 11:42 PM Smith, Barry F. <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote: Fixed the docs. Thanks for pointing out the lack of clarity > On Sep 18, 2019, at 11:25 PM, Zhang, Junchao via petsc-dev > <petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote: > > Barry, > > I saw you added these in init.c > > > + -cuda_initialize - do the initialization in PetscInitialize() > > > > > > > > > Notes: > > Initializing cuBLAS takes about 1/2 second there it is done by default in > PetscInitialize() before logging begins > > > > But I did not get otherwise with -cuda_initialize 0, when will cuda be > initialized? > --Junchao Zhang