On Mon, Jan 31, 2022 at 10:50 AM Fande Kong <[email protected]> wrote:
> Sorry for the confusion. I thought I explained pretty well :-) > > Good: > > PETSc was linked to /usr/lib64/libcuda for libcuda > > Bad: > > PETSc was linked > to > /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs > for libcuda > > My question would be: where should I look for libcuda? > > Our HPC admin told me that I should use the one from /usr/lib64/libcuda > Your admin was correct. > > I am trying to understand why we need to link to "stubs"? > Kokkos needs libcuda.so, so we added this requirement. > Just to be clear, I am fine with PETSc-main as is since I can use a > compute node to compile PETSc. However, here I am trying really hard to > understand where I should look for the right libcuda. > I need your help to find out: Why on compute nodes, did the petsc test executable find libcuda.so at /apps/local/spack/software/ gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs? Note this path is not in the executable's rpath. Maybe you need to login to a compute node and do 'env' to list all variables for us to have a look. > > > Thanks for your help > > Fande > > > On Mon, Jan 31, 2022 at 9:19 AM Junchao Zhang <[email protected]> > wrote: > >> Fande, >> From your configure_main.log >> >> cuda: >> Version: 10.1 >> Includes: >> -I/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/include >> Library: >> >> -Wl,-rpath,/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64 >> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64 >> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs >> -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda >> >> >> You can see the `stubs` directory is not in rpath. We took a lot of >> effort to achieve that. You need to double check the reason. >> >> --Junchao Zhang >> >> >> On Mon, Jan 31, 2022 at 9:40 AM Fande Kong <[email protected]> wrote: >> >>> OK, >>> >>> Finally we resolved the issue. The issue was that there were two >>> libcuda libs on a GPU compute node: /usr/lib64/libcuda >>> and >>> /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. >>> But on a login node there is one libcuda lib: >>> /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. >>> We can not see /usr/lib64/libcuda from a login node where I was compiling >>> the code. >>> >>> Before the Junchao's commit, we did not have "-Wl,-rpath" to force >>> PETSc take >>> /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. >>> A code compiled on a login node could correctly pick up the cuda lib >>> from /usr/lib64/libcuda at runtime. When with "-Wl,-rpath", the code >>> always takes the cuda lib from >>> /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda, >>> wihch was a bad lib. >>> >>> Right now, I just compiled code on a compute node instead of a login >>> node, PETSc was able to pick up the correct lib from /usr/lib64/libcuda, >>> and everything ran fine. >>> >>> I am not sure whether or not it is a good idea to search for "stubs" >>> since the system might have the correct ones in other places. Should not I >>> do a batch compiling? >>> >>> Thanks, >>> >>> Fande >>> >>> >>> On Wed, Jan 26, 2022 at 1:49 PM Fande Kong <[email protected]> wrote: >>> >>>> Yes, please see the attached file. >>>> >>>> Fande >>>> >>>> On Wed, Jan 26, 2022 at 11:49 AM Junchao Zhang <[email protected]> >>>> wrote: >>>> >>>>> Do you have the configure.log with main? >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Wed, Jan 26, 2022 at 12:26 PM Fande Kong <[email protected]> >>>>> wrote: >>>>> >>>>>> I am on the petsc-main >>>>>> >>>>>> commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6 >>>>>> >>>>>> Merge: 96c919c d5f3255 >>>>>> >>>>>> Author: Satish Balay <[email protected]> >>>>>> >>>>>> Date: Wed Jan 26 10:28:32 2022 -0600 >>>>>> >>>>>> >>>>>> Merge remote-tracking branch 'origin/release' >>>>>> >>>>>> >>>>>> It is still broken. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> Fande >>>>>> >>>>>> On Wed, Jan 26, 2022 at 7:40 AM Junchao Zhang < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> The good uses the compiler's default library/header path. The bad >>>>>>> searches from cuda toolkit path and uses rpath linking. >>>>>>> Though the paths look the same on the login node, they could have >>>>>>> different behavior on a compute node depending on its environment. >>>>>>> I think we fixed the issue in cuda.py (i.e., first try the >>>>>>> compiler's default, then toolkit). That's why I wanted Fande to use >>>>>>> petsc/main. >>>>>>> >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 25, 2022 at 11:59 PM Barry Smith <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> bad has extra >>>>>>>> >>>>>>>> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs >>>>>>>> -lcuda >>>>>>>> >>>>>>>> good does not. >>>>>>>> >>>>>>>> Try removing the stubs directory and -lcuda from the bad >>>>>>>> $PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start >>>>>>>> working. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> I never liked the stubs stuff. >>>>>>>> >>>>>>>> On Jan 25, 2022, at 11:29 PM, Fande Kong <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Junchao, >>>>>>>> >>>>>>>> I attached a "bad" configure log and a "good" configure log. >>>>>>>> >>>>>>>> The "bad" one was on produced >>>>>>>> at 246ba74192519a5f34fb6e227d1c64364e19ce2c >>>>>>>> >>>>>>>> and the "good" one at 384645a00975869a1aacbd3169de62ba40cad683 >>>>>>>> >>>>>>>> This good hash is the last good hash that is just the right before >>>>>>>> the bad one. >>>>>>>> >>>>>>>> I think you could do a comparison between these two logs, and >>>>>>>> check what the differences were. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Fande >>>>>>>> >>>>>>>> On Tue, Jan 25, 2022 at 8:21 PM Junchao Zhang < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Fande, could you send the configure.log that works (i.e., before >>>>>>>>> this offending commit)? >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jan 25, 2022 at 8:21 PM Fande Kong <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Not sure if this is helpful. I did "git bisect", and here was the >>>>>>>>>> result: >>>>>>>>>> >>>>>>>>>> [kongf@sawtooth2 petsc]$ git bisect bad >>>>>>>>>> 246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit >>>>>>>>>> commit 246ba74192519a5f34fb6e227d1c64364e19ce2c >>>>>>>>>> Author: Junchao Zhang <[email protected]> >>>>>>>>>> Date: Wed Oct 13 05:32:43 2021 +0000 >>>>>>>>>> >>>>>>>>>> Config: fix CUDA library and header dirs >>>>>>>>>> >>>>>>>>>> :040000 040000 187c86055adb80f53c1d0565a8888704fec43a96 >>>>>>>>>> ea1efd7f594fd5e8df54170bc1bc7b00f35e4d5f M config >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Started from this commit, and GPU did not work for me on our HPC >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Fande >>>>>>>>>> >>>>>>>>>> On Tue, Jan 25, 2022 at 7:18 PM Fande Kong <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Configure should not have an impact here I think. The reason I >>>>>>>>>>>> had you run `cudaGetDeviceCount()` is because this is the CUDA >>>>>>>>>>>> call (and in >>>>>>>>>>>> fact the only CUDA call) in the initialization sequence that >>>>>>>>>>>> returns the >>>>>>>>>>>> error code. There should be no prior CUDA calls. Maybe this is a >>>>>>>>>>>> problem >>>>>>>>>>>> with oversubscribing GPU’s? In the runs that crash, how many ranks >>>>>>>>>>>> are >>>>>>>>>>>> using any given GPU at once? Maybe MPS is required. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I used one MPI rank. >>>>>>>>>>> >>>>>>>>>>> Fande >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> >>>>>>>>>>>> Jacob Faibussowitsch >>>>>>>>>>>> (Jacob Fai - booss - oh - vitch) >>>>>>>>>>>> >>>>>>>>>>>> On Jan 21, 2022, at 12:01, Fande Kong <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks Jacob, >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 20, 2022 at 6:25 PM Jacob Faibussowitsch < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Segfault is caused by the following check at >>>>>>>>>>>>> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a >>>>>>>>>>>>> PetscUnlikelyDebug() rather than just PetscUnlikely(): >>>>>>>>>>>>> >>>>>>>>>>>>> ``` >>>>>>>>>>>>> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // >>>>>>>>>>>>> _defaultDevice is in fact < 0 here and uncaught >>>>>>>>>>>>> ``` >>>>>>>>>>>>> >>>>>>>>>>>>> To clarify: >>>>>>>>>>>>> >>>>>>>>>>>>> “lazy” initialization is not that lazy after all, it still >>>>>>>>>>>>> does some 50% of the initialization that “eager” initialization >>>>>>>>>>>>> does. It >>>>>>>>>>>>> stops short initializing the CUDA runtime, checking CUDA aware >>>>>>>>>>>>> MPI, >>>>>>>>>>>>> gathering device data, and initializing cublas and friends. Lazy >>>>>>>>>>>>> also >>>>>>>>>>>>> importantly swallows any errors that crop up during >>>>>>>>>>>>> initialization, storing >>>>>>>>>>>>> the resulting error code for later (specifically _defaultDevice = >>>>>>>>>>>>> -init_error_value;). >>>>>>>>>>>>> >>>>>>>>>>>>> So whether you initialize lazily or eagerly makes no >>>>>>>>>>>>> difference here, as _defaultDevice will always contain -35. >>>>>>>>>>>>> >>>>>>>>>>>>> The bigger question is why cudaGetDeviceCount() is returning >>>>>>>>>>>>> cudaErrorInsufficientDriver. Can you compile and run >>>>>>>>>>>>> >>>>>>>>>>>>> ``` >>>>>>>>>>>>> #include <cuda_runtime.h> >>>>>>>>>>>>> >>>>>>>>>>>>> int main() >>>>>>>>>>>>> { >>>>>>>>>>>>> int ndev; >>>>>>>>>>>>> return cudaGetDeviceCount(&ndev): >>>>>>>>>>>>> } >>>>>>>>>>>>> ``` >>>>>>>>>>>>> >>>>>>>>>>>>> Then show the value of "echo $?”? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Modify your code a little to get more information. >>>>>>>>>>>> >>>>>>>>>>>> #include <cuda_runtime.h> >>>>>>>>>>>> #include <cstdio> >>>>>>>>>>>> >>>>>>>>>>>> int main() >>>>>>>>>>>> { >>>>>>>>>>>> int ndev; >>>>>>>>>>>> int error = cudaGetDeviceCount(&ndev); >>>>>>>>>>>> printf("ndev %d \n", ndev); >>>>>>>>>>>> printf("error %d \n", error); >>>>>>>>>>>> return 0; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Results: >>>>>>>>>>>> >>>>>>>>>>>> $ ./a.out >>>>>>>>>>>> ndev 4 >>>>>>>>>>>> error 0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have not read the PETSc cuda initialization code yet. If I >>>>>>>>>>>> need to guess at what was happening. I will naively think that >>>>>>>>>>>> PETSc did >>>>>>>>>>>> not get correct GPU information in the configuration because the >>>>>>>>>>>> compiler >>>>>>>>>>>> node does not have GPUs, and there was no way to get any GPU device >>>>>>>>>>>> information. >>>>>>>>>>>> >>>>>>>>>>>> During the runtime on GPU nodes, PETSc might have incorrect >>>>>>>>>>>> information grabbed during configuration and had this kind of >>>>>>>>>>>> false error >>>>>>>>>>>> message. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Fande >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Jacob Faibussowitsch >>>>>>>>>>>>> (Jacob Fai - booss - oh - vitch) >>>>>>>>>>>>> >>>>>>>>>>>>> On Jan 20, 2022, at 17:47, Matthew Knepley <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Jed >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can't create CUDA or Kokkos Vecs if you're running on a >>>>>>>>>>>>>>> node without a GPU. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am running the code on compute nodes that do have GPUs. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> If you are actually running on GPUs, why would you need lazy >>>>>>>>>>>>> initialization? It would not break with GPUs present. >>>>>>>>>>>>> >>>>>>>>>>>>> Matt >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> With PETSc-3.16.1, I got good speedup by running GAMG on >>>>>>>>>>>>>> GPUs. That might be a bug of PETSc-main. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fande >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> KSPSetUp 13 1.0 6.4400e-01 1.0 2.02e+09 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 3140 >>>>>>>>>>>>>> 64630 15 >>>>>>>>>>>>>> 1.05e+02 5 3.49e+01 100 >>>>>>>>>>>>>> KSPSolve 1 1.0 1.0109e+00 1.0 3.49e+10 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 87 0 0 0 0 87 0 0 0 34522 >>>>>>>>>>>>>> 69556 4 >>>>>>>>>>>>>> 4.35e-03 1 2.38e-03 100 >>>>>>>>>>>>>> KSPGMRESOrthog 142 1.0 1.2674e-01 1.0 1.06e+10 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 83755 >>>>>>>>>>>>>> 87801 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 100 >>>>>>>>>>>>>> SNESSolve 1 1.0 4.4402e+01 1.0 4.00e+10 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 21100 0 0 0 21100 0 0 0 901 >>>>>>>>>>>>>> 51365 57 >>>>>>>>>>>>>> 1.10e+03 52 8.78e+02 100 >>>>>>>>>>>>>> SNESSetUp 1 1.0 3.9101e-05 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> SNESFunctionEval 2 1.0 1.7097e+01 1.0 1.60e+07 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 1 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 6 1.92e+02 0 >>>>>>>>>>>>>> SNESJacobianEval 1 1.0 1.6213e+01 1.0 2.80e+07 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 2 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 1 3.20e+01 0 >>>>>>>>>>>>>> SNESLineSearch 1 1.0 8.5582e+00 1.0 1.24e+08 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 14 >>>>>>>>>>>>>> 64153 1 >>>>>>>>>>>>>> 3.20e+01 3 9.61e+01 94 >>>>>>>>>>>>>> PCGAMGGraph_AGG 5 1.0 3.0509e+00 1.0 8.19e+07 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 >>>>>>>>>>>>>> 0 5 >>>>>>>>>>>>>> 3.49e+01 9 7.43e+01 0 >>>>>>>>>>>>>> PCGAMGCoarse_AGG 5 1.0 3.8711e+00 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> PCGAMGProl_AGG 5 1.0 7.0748e-01 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> PCGAMGPOpt_AGG 5 1.0 1.2904e+00 1.0 2.14e+09 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 1661 >>>>>>>>>>>>>> 29807 26 >>>>>>>>>>>>>> 7.15e+02 20 2.90e+02 99 >>>>>>>>>>>>>> GAMG: createProl 5 1.0 8.9489e+00 1.0 2.22e+09 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 4 6 0 0 0 4 6 0 0 0 249 >>>>>>>>>>>>>> 29666 31 >>>>>>>>>>>>>> 7.50e+02 29 3.64e+02 96 >>>>>>>>>>>>>> Graph 10 1.0 3.0478e+00 1.0 8.19e+07 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 >>>>>>>>>>>>>> 0 5 >>>>>>>>>>>>>> 3.49e+01 9 7.43e+01 0 >>>>>>>>>>>>>> MIS/Agg 5 1.0 4.1290e-01 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> SA: col data 5 1.0 1.9127e-02 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> SA: frmProl0 5 1.0 6.2662e-01 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> SA: smooth 5 1.0 4.9595e-01 1.0 1.21e+08 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 244 >>>>>>>>>>>>>> 2709 15 >>>>>>>>>>>>>> 1.97e+02 15 2.55e+02 90 >>>>>>>>>>>>>> GAMG: partLevel 5 1.0 4.7330e-01 1.0 6.98e+08 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1475 >>>>>>>>>>>>>> 4120 5 >>>>>>>>>>>>>> 1.78e+02 10 2.55e+02 100 >>>>>>>>>>>>>> PCGAMG Squ l00 1 1.0 2.6027e+00 1.0 0.00e+00 0.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>>>>>>>>>>> 0 0 >>>>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>>>> PCGAMG Gal l00 1 1.0 3.8406e-01 1.0 5.48e+08 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1426 >>>>>>>>>>>>>> 4270 1 >>>>>>>>>>>>>> 1.48e+02 2 2.11e+02 100 >>>>>>>>>>>>>> PCGAMG Opt l00 1 1.0 2.4932e-01 1.0 7.20e+07 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 289 >>>>>>>>>>>>>> 2653 1 >>>>>>>>>>>>>> 6.41e+01 1 1.13e+02 100 >>>>>>>>>>>>>> PCGAMG Gal l01 1 1.0 6.6279e-02 1.0 1.09e+08 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1645 >>>>>>>>>>>>>> 3851 1 >>>>>>>>>>>>>> 2.40e+01 2 3.64e+01 100 >>>>>>>>>>>>>> PCGAMG Opt l01 1 1.0 2.9544e-02 1.0 7.15e+06 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 242 >>>>>>>>>>>>>> 1671 1 >>>>>>>>>>>>>> 4.84e+00 1 1.23e+01 100 >>>>>>>>>>>>>> PCGAMG Gal l02 1 1.0 1.8874e-02 1.0 3.72e+07 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1974 >>>>>>>>>>>>>> 3636 1 >>>>>>>>>>>>>> 5.04e+00 2 6.58e+00 100 >>>>>>>>>>>>>> PCGAMG Opt l02 1 1.0 7.4353e-03 1.0 2.40e+06 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 323 >>>>>>>>>>>>>> 1457 1 >>>>>>>>>>>>>> 7.71e-01 1 2.30e+00 100 >>>>>>>>>>>>>> PCGAMG Gal l03 1 1.0 2.8479e-03 1.0 4.10e+06 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1440 >>>>>>>>>>>>>> 2266 1 >>>>>>>>>>>>>> 4.44e-01 2 5.51e-01 100 >>>>>>>>>>>>>> PCGAMG Opt l03 1 1.0 8.2684e-04 1.0 2.80e+05 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 339 >>>>>>>>>>>>>> 1667 1 >>>>>>>>>>>>>> 6.72e-02 1 2.03e-01 100 >>>>>>>>>>>>>> PCGAMG Gal l04 1 1.0 1.2238e-03 1.0 2.09e+05 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 170 >>>>>>>>>>>>>> 244 1 >>>>>>>>>>>>>> 2.05e-02 2 2.53e-02 100 >>>>>>>>>>>>>> PCGAMG Opt l04 1 1.0 4.1008e-04 1.0 1.77e+04 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 43 >>>>>>>>>>>>>> 165 1 >>>>>>>>>>>>>> 4.49e-03 1 1.19e-02 100 >>>>>>>>>>>>>> PCSetUp 2 1.0 9.9632e+00 1.0 4.95e+09 1.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 5 12 0 0 0 5 12 0 0 0 496 >>>>>>>>>>>>>> 17826 55 >>>>>>>>>>>>>> 1.03e+03 45 6.54e+02 98 >>>>>>>>>>>>>> PCSetUpOnBlocks 44 1.0 9.9087e-04 1.0 2.88e+03 1.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The point of lazy initialization is to make it possible to >>>>>>>>>>>>>>> run a solve that doesn't use a GPU in PETSC_ARCH that supports >>>>>>>>>>>>>>> GPUs, >>>>>>>>>>>>>>> regardless of whether a GPU is actually present. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fande Kong <[email protected]> writes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > I spoke too soon. It seems that we have trouble creating >>>>>>>>>>>>>>> cuda/kokkos vecs >>>>>>>>>>>>>>> > now. Got Segmentation fault. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Fande >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>>>>>>> > 0x00002aaab5558b11 in >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize >>>>>>>>>>>>>>> > (this=0x1) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54 >>>>>>>>>>>>>>> > 54 PetscErrorCode >>>>>>>>>>>>>>> CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept >>>>>>>>>>>>>>> > Missing separate debuginfos, use: debuginfo-install >>>>>>>>>>>>>>> > bzip2-libs-1.0.6-13.el7.x86_64 >>>>>>>>>>>>>>> elfutils-libelf-0.176-5.el7.x86_64 >>>>>>>>>>>>>>> > elfutils-libs-0.176-5.el7.x86_64 >>>>>>>>>>>>>>> glibc-2.17-325.el7_9.x86_64 >>>>>>>>>>>>>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64 >>>>>>>>>>>>>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64 >>>>>>>>>>>>>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64 >>>>>>>>>>>>>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64 >>>>>>>>>>>>>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64 >>>>>>>>>>>>>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64 >>>>>>>>>>>>>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 >>>>>>>>>>>>>>> libnl3-3.2.28-4.el7.x86_64 >>>>>>>>>>>>>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64 >>>>>>>>>>>>>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 >>>>>>>>>>>>>>> libxcb-1.13-1.el7.x86_64 >>>>>>>>>>>>>>> > libxml2-2.9.1-6.el7_9.6.x86_64 >>>>>>>>>>>>>>> numactl-libs-2.0.12-5.el7.x86_64 >>>>>>>>>>>>>>> > systemd-libs-219-78.el7_9.3.x86_64 >>>>>>>>>>>>>>> xz-libs-5.2.2-1.el7.x86_64 >>>>>>>>>>>>>>> > zlib-1.2.7-19.el7_9.x86_64 >>>>>>>>>>>>>>> > (gdb) bt >>>>>>>>>>>>>>> > #0 0x00002aaab5558b11 in >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize >>>>>>>>>>>>>>> > (this=0x1) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54 >>>>>>>>>>>>>>> > #1 0x00002aaab5558db7 in >>>>>>>>>>>>>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice >>>>>>>>>>>>>>> > (this=this@entry=0x2aaab7f37b70 >>>>>>>>>>>>>>> > <CUDADevice>, device=0x115da00, id=-35, id@entry=-1) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344 >>>>>>>>>>>>>>> > #2 0x00002aaab55577de in PetscDeviceCreate >>>>>>>>>>>>>>> (type=type@entry=PETSC_DEVICE_CUDA, >>>>>>>>>>>>>>> > devid=devid@entry=-1, device=device@entry=0x2aaab7f37b48 >>>>>>>>>>>>>>> > <defaultDevices+8>) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107 >>>>>>>>>>>>>>> > #3 0x00002aaab5557b3a in >>>>>>>>>>>>>>> PetscDeviceInitializeDefaultDevice_Internal >>>>>>>>>>>>>>> > (type=type@entry=PETSC_DEVICE_CUDA, >>>>>>>>>>>>>>> defaultDeviceId=defaultDeviceId@entry=-1) >>>>>>>>>>>>>>> > at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273 >>>>>>>>>>>>>>> > #4 0x00002aaab5557bf6 in PetscDeviceInitialize >>>>>>>>>>>>>>> > (type=type@entry=PETSC_DEVICE_CUDA) >>>>>>>>>>>>>>> > at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234 >>>>>>>>>>>>>>> > #5 0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244 >>>>>>>>>>>>>>> > #6 0x00002aaab5649b40 in VecSetType (vec=vec@entry >>>>>>>>>>>>>>> =0x115d150, >>>>>>>>>>>>>>> > method=method@entry=0x2aaab70b45b8 "seqcuda") at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93 >>>>>>>>>>>>>>> > #7 0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/ >>>>>>>>>>>>>>> > mpicuda.cu:214 >>>>>>>>>>>>>>> > #8 0x00002aaab5649b40 in VecSetType (vec=vec@entry >>>>>>>>>>>>>>> =0x115d150, >>>>>>>>>>>>>>> > method=method@entry=0x7fffffff9260 "cuda") at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93 >>>>>>>>>>>>>>> > #9 0x00002aaab5648bf1 in VecSetTypeFromOptions_Private >>>>>>>>>>>>>>> (vec=0x115d150, >>>>>>>>>>>>>>> > PetscOptionsObject=0x7fffffff9210) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263 >>>>>>>>>>>>>>> > #10 VecSetFromOptions (vec=0x115d150) at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297 >>>>>>>>>>>>>>> > #11 0x00002aaab02ef227 in >>>>>>>>>>>>>>> libMesh::PetscVector<double>::init >>>>>>>>>>>>>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false, >>>>>>>>>>>>>>> ptype=libMesh::PARALLEL) >>>>>>>>>>>>>>> > at >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693 >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >> Thanks, Jed, >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> This worked! >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Fande >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >>> Fande Kong <[email protected]> writes: >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch < >>>>>>>>>>>>>>> >>> [email protected]> >>>>>>>>>>>>>>> >>> > wrote: >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> >>> >> Are you running on login nodes or compute nodes (I >>>>>>>>>>>>>>> can’t seem to tell >>>>>>>>>>>>>>> >>> from >>>>>>>>>>>>>>> >>> >> the configure.log)? >>>>>>>>>>>>>>> >>> >> >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> >>> > I was compiling codes on login nodes, and running >>>>>>>>>>>>>>> codes on compute >>>>>>>>>>>>>>> >>> nodes. >>>>>>>>>>>>>>> >>> > Login nodes do not have GPUs, but compute nodes do >>>>>>>>>>>>>>> have GPUs. >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> >>> > Just to be clear, the same thing (code, machine) with >>>>>>>>>>>>>>> PETSc-3.16.1 >>>>>>>>>>>>>>> >>> worked >>>>>>>>>>>>>>> >>> > perfectly. I have this trouble with PETSc-main. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> I assume you can >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> export PETSC_OPTIONS='-device_enable lazy' >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> and it'll work. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> I think this should be the default. The main complaint >>>>>>>>>>>>>>> is that timing the >>>>>>>>>>>>>>> >>> first GPU-using event isn't accurate if it includes >>>>>>>>>>>>>>> initialization, but I >>>>>>>>>>>>>>> >>> think this is mostly hypothetical because you can't >>>>>>>>>>>>>>> trust any timing that >>>>>>>>>>>>>>> >>> doesn't preload in some form and the first GPU-using >>>>>>>>>>>>>>> event will almost >>>>>>>>>>>>>>> >>> always be something uninteresting so I think it will >>>>>>>>>>>>>>> rarely lead to >>>>>>>>>>>>>>> >>> confusion. Meanwhile, eager initialization is viscerally >>>>>>>>>>>>>>> disruptive for >>>>>>>>>>>>>>> >>> lots of people. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>>> their experiments is infinitely more interesting than any results >>>>>>>>>>>>> to which >>>>>>>>>>>>> their experiments lead. >>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> <configure_bad.log><configure_good.log> >>>>>>>> >>>>>>>> >>>>>>>>
