gprof or some similar tool?


> On Feb 10, 2020, at 11:18 AM, Zhang, Hong via petsc-dev 
> <petsc-dev@mcs.anl.gov> wrote:
> 
> -cuda_initialize 0 does not make any difference. Actually this issue has 
> nothing to do with PetscInitialize(). I tried to call cudaFree(0) before 
> PetscInitialize(), and it still took 7.5 seconds.
> 
> Hong
> 
>> On Feb 10, 2020, at 10:44 AM, Zhang, Junchao <jczh...@mcs.anl.gov> wrote:
>> 
>> As I mentioned, have you tried -cuda_initialize 0? Also, PetscCUDAInitialize 
>> contains
>> ierr = PetscCUBLASInitializeHandle();CHKERRQ(ierr);
>> ierr = PetscCUSOLVERDnInitializeHandle();CHKERRQ(ierr);
>> Have you tried to comment out them and test again?
>> --Junchao Zhang
>> 
>> 
>> On Sat, Feb 8, 2020 at 5:22 PM Zhang, Hong via petsc-dev 
>> <petsc-dev@mcs.anl.gov> wrote:
>> 
>> 
>>> On Feb 8, 2020, at 5:03 PM, Matthew Knepley <knep...@gmail.com> wrote:
>>> 
>>> On Sat, Feb 8, 2020 at 4:34 PM Zhang, Hong via petsc-dev 
>>> <petsc-dev@mcs.anl.gov> wrote:
>>> I did some further investigation. The overhead persists for both the PETSc 
>>> shared library and the static library. In the previous example, it does not 
>>> call any PETSc function, the first CUDA function becomes very slow when it 
>>> is linked to the petsc so. This indicates that the slowdown occurs if the 
>>> symbol (cudafree)is searched through the petsc so, but does not occur if 
>>> the symbol is found directly in the cuda runtime lib. 
>>> 
>>> So the issue has nothing to do with the dynamic linker. The following 
>>> example can be used to easily reproduce the problem (cudaFree(0) always 
>>> takes ~7.5 seconds).  
>>> 
>>> 1) This should go to OLCF admin as Jeff suggests
>> 
>> I had sent this to OLCF admin before the discussion was started here. Thomas 
>> Papatheodore has followed up. I am trying to help him reproduce the problem 
>> on summit. 
>> 
>>> 
>>> 2) Just to make sure I understand, a static executable with this code is 
>>> still slow on the cudaFree(), since CUDA is a shared library by default.
>> 
>> I prepared the code as a minimal example to reproduce the problem. It would 
>> be fair to say any code using PETSc (with CUDA enabled, built statically or 
>> dynamically) on summit suffers a 7.5-second overhead on the first CUDA 
>> function call (either in the user code or inside PETSc).
>> 
>> Thanks,
>> Hong
>> 
>>> 
>>> I think we should try:
>>> 
>>>   a) Forcing a full static link, if possible
>>> 
>>>   b) Asking OLCF about link resolution order
>>> 
>>> It sounds like a similar thing I have seen in the past where link 
>>> resolution order can exponentially increase load time.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> bash-4.2$ cat ex_simple_petsc.c
>>> #include <time.h>
>>> #include <cuda_runtime.h>
>>> #include <stdio.h>
>>> #include <petscmat.h>
>>> 
>>> int main(int argc,char **args)
>>> {
>>>   clock_t start,s1,s2,s3;
>>>   double  cputime;
>>>   double  *init,tmp[100] = {0};
>>>   PetscErrorCode ierr=0;
>>> 
>>>   ierr = PetscInitialize(&argc,&args,(char*)0,NULL);if (ierr) return ierr;
>>>   start = clock();
>>>   cudaFree(0);
>>>   s1 = clock();
>>>   cudaMalloc((void **)&init,100*sizeof(double));
>>>   s2 = clock();
>>>   cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
>>>   s3 = clock();
>>>   printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - 
>>> start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) 
>>> (s3 - s2)) / CLOCKS_PER_SEC);
>>>   ierr = PetscFinalize();
>>>   return ierr;
>>> }
>>> 
>>> Hong
>>> 
>>>> On Feb 7, 2020, at 3:09 PM, Zhang, Hong <hongzh...@anl.gov> wrote:
>>>> 
>>>> Note that the overhead was triggered by the first call to a CUDA function. 
>>>> So it seems that the first CUDA function triggered loading petsc so (if 
>>>> petsc so is linked), which is slow on the summit file system.
>>>> 
>>>> Hong
>>>> 
>>>>> On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev 
>>>>> <petsc-dev@mcs.anl.gov> wrote:
>>>>> 
>>>>> Linking any other shared library does not slow down the execution. The 
>>>>> PETSc shared library is the only one causing trouble.
>>>>> 
>>>>> Here are the ldd output for two different versions. For the first 
>>>>> version, I removed -lpetsc and it ran very fast. The second (slow) 
>>>>> version was linked to petsc so. 
>>>>> 
>>>>> bash-4.2$ ldd ex_simple
>>>>>         linux-vdso64.so.1 =>  (0x0000200000050000)
>>>>>         liblapack.so.0 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0
>>>>>  (0x0000200000070000)
>>>>>         libblas.so.0 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0
>>>>>  (0x00002000009b0000)
>>>>>         libhdf5hl_fortran.so.100 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100
>>>>>  (0x0000200000e80000)
>>>>>         libhdf5_fortran.so.100 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100
>>>>>  (0x0000200000ed0000)
>>>>>         libhdf5_hl.so.100 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100
>>>>>  (0x0000200000f50000)
>>>>>         libhdf5.so.103 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103
>>>>>  (0x0000200000fb0000)
>>>>>         libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002000015e0000)
>>>>>         libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 
>>>>> (0x0000200001770000)
>>>>>         libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 
>>>>> (0x0000200009b00000)
>>>>>         libcudart.so.10.1 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x000020000d950000)
>>>>>         libcusparse.so.10 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x000020000d9f0000)
>>>>>         libcusolver.so.10 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200012f50000)
>>>>>         libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000020001dc40000)
>>>>>         libdl.so.2 => /usr/lib64/libdl.so.2 (0x000020001ddd0000)
>>>>>         libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x000020001de00000)
>>>>>         libmpiprofilesupport.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3
>>>>>  (0x000020001de40000)
>>>>>         libmpi_ibm_usempi.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so
>>>>>  (0x000020001de70000)
>>>>>         libmpi_ibm_mpifh.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3
>>>>>  (0x000020001dea0000)
>>>>>         libmpi_ibm.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3
>>>>>  (0x000020001df40000)
>>>>>         libpgf90rtl.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so
>>>>>  (0x000020001e0b0000)
>>>>>         libpgf90.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so
>>>>>  (0x000020001e0f0000)
>>>>>         libpgf90_rpm1.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so
>>>>>  (0x000020001e6a0000)
>>>>>         libpgf902.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so
>>>>>  (0x000020001e6d0000)
>>>>>         libpgftnrtl.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so
>>>>>  (0x000020001e700000)
>>>>>         libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000020001e730000)
>>>>>         libpgkomp.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so
>>>>>  (0x000020001e760000)
>>>>>         libomp.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so
>>>>>  (0x000020001e790000)
>>>>>         libomptarget.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so
>>>>>  (0x000020001e880000)
>>>>>         libpgmath.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so
>>>>>  (0x000020001e8b0000)
>>>>>         libpgc.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so
>>>>>  (0x000020001e9d0000)
>>>>>         librt.so.1 => /usr/lib64/librt.so.1 (0x000020001eb40000)
>>>>>         libm.so.6 => /usr/lib64/libm.so.6 (0x000020001eb70000)
>>>>>         libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x000020001ec60000)
>>>>>         libc.so.6 => /usr/lib64/libc.so.6 (0x000020001eca0000)
>>>>>         libz.so.1 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1
>>>>>  (0x000020001ee90000)
>>>>>         libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x000020001eef0000)
>>>>>         /lib64/ld64.so.2 (0x0000200000000000)
>>>>>         libcublasLt.so.10 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x000020001ef40000)
>>>>>         libutil.so.1 => /usr/lib64/libutil.so.1 (0x0000200020e50000)
>>>>>         libhwloc_ompi.so.15 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15
>>>>>  (0x0000200020e80000)
>>>>>         libevent-2.1.so.6 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6
>>>>>  (0x0000200020ef0000)
>>>>>         libevent_pthreads-2.1.so.6 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6
>>>>>  (0x0000200020f70000)
>>>>>         libopen-rte.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3
>>>>>  (0x0000200020fa0000)
>>>>>         libopen-pal.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3
>>>>>  (0x00002000210b0000)
>>>>>         libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002000211a0000)
>>>>> 
>>>>> 
>>>>> bash-4.2$ ldd ex_simple_slow
>>>>>         linux-vdso64.so.1 =>  (0x0000200000050000)
>>>>>         libpetsc.so.3.012 => 
>>>>> /autofs/nccs-svm1_home1/hongzh/Projects/petsc/arch-olcf-summit-sell-opt/lib/libpetsc.so.3.012
>>>>>  (0x0000200000070000)
>>>>>         liblapack.so.0 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0
>>>>>  (0x0000200002be0000)
>>>>>         libblas.so.0 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0
>>>>>  (0x0000200003520000)
>>>>>         libhdf5hl_fortran.so.100 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100
>>>>>  (0x00002000039f0000)
>>>>>         libhdf5_fortran.so.100 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100
>>>>>  (0x0000200003a40000)
>>>>>         libhdf5_hl.so.100 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100
>>>>>  (0x0000200003ac0000)
>>>>>         libhdf5.so.103 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103
>>>>>  (0x0000200003b20000)
>>>>>         libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000200004150000)
>>>>>         libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 
>>>>> (0x00002000042e0000)
>>>>>         libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 
>>>>> (0x000020000c670000)
>>>>>         libcudart.so.10.1 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x00002000104c0000)
>>>>>         libcusparse.so.10 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x0000200010560000)
>>>>>         libcusolver.so.10 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200015ac0000)
>>>>>         libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002000207b0000)
>>>>>         libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000200020940000)
>>>>>         libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x0000200020970000)
>>>>>         libmpiprofilesupport.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3
>>>>>  (0x00002000209b0000)
>>>>>         libmpi_ibm_usempi.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so
>>>>>  (0x00002000209e0000)
>>>>>         libmpi_ibm_mpifh.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3
>>>>>  (0x0000200020a10000)
>>>>>         libmpi_ibm.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3
>>>>>  (0x0000200020ab0000)
>>>>>         libpgf90rtl.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so
>>>>>  (0x0000200020c20000)
>>>>>         libpgf90.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so
>>>>>  (0x0000200020c60000)
>>>>>         libpgf90_rpm1.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so
>>>>>  (0x0000200021210000)
>>>>>         libpgf902.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so
>>>>>  (0x0000200021240000)
>>>>>         libpgftnrtl.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so
>>>>>  (0x0000200021270000)
>>>>>         libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00002000212a0000)
>>>>>         libpgkomp.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so
>>>>>  (0x00002000212d0000)
>>>>>         libomp.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so
>>>>>  (0x0000200021300000)
>>>>>         libomptarget.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so
>>>>>  (0x00002000213f0000)
>>>>>         libpgmath.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so
>>>>>  (0x0000200021420000)
>>>>>         libpgc.so => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so
>>>>>  (0x0000200021540000)
>>>>>         librt.so.1 => /usr/lib64/librt.so.1 (0x00002000216b0000)
>>>>>         libm.so.6 => /usr/lib64/libm.so.6 (0x00002000216e0000)
>>>>>         libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002000217d0000)
>>>>>         libc.so.6 => /usr/lib64/libc.so.6 (0x0000200021810000)
>>>>>         libz.so.1 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1
>>>>>  (0x0000200021a10000)
>>>>>         libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x0000200021a60000)
>>>>>         /lib64/ld64.so.2 (0x0000200000000000)
>>>>>         libcublasLt.so.10 => 
>>>>> /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x0000200021ab0000)
>>>>>         libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002000239c0000)
>>>>>         libhwloc_ompi.so.15 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15
>>>>>  (0x00002000239f0000)
>>>>>         libevent-2.1.so.6 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6
>>>>>  (0x0000200023a60000)
>>>>>         libevent_pthreads-2.1.so.6 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6
>>>>>  (0x0000200023ae0000)
>>>>>         libopen-rte.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3
>>>>>  (0x0000200023b10000)
>>>>>         libopen-pal.so.3 => 
>>>>> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3
>>>>>  (0x0000200023c20000)
>>>>>         libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000200023d10000)
>>>>> 
>>>>> 
>>>>>> On Feb 7, 2020, at 2:31 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
>>>>>> 
>>>>>> 
>>>>>>  ldd -o on the executable of both linkings of your code.
>>>>>> 
>>>>>>  My guess is that without PETSc it is linking the static version of the 
>>>>>> needed libraries and with PETSc the shared. And, in typical fashion, the 
>>>>>> shared libraries are off on some super slow file system so take a long 
>>>>>> time to be loaded and linked in on demand.
>>>>>> 
>>>>>>   Still a performance bug in Summit. 
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>> 
>>>>>>> On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev 
>>>>>>> <petsc-dev@mcs.anl.gov> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Previously I have noticed that the first call to a CUDA function such 
>>>>>>> as cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on 
>>>>>>> summit. Then I prepared a simple example as attached to help OCLF 
>>>>>>> reproduce the problem. It turned out that the problem was  caused by 
>>>>>>> PETSc. The 7.5-second overhead can be observed only when the PETSc lib 
>>>>>>> is linked. If I do not link PETSc, it runs normally. Does anyone have 
>>>>>>> any idea why this happens and how to fix it?
>>>>>>> 
>>>>>>> Hong (Mr.)
>>>>>>> 
>>>>>>> bash-4.2$ cat ex_simple.c
>>>>>>> #include <time.h>
>>>>>>> #include <cuda_runtime.h>
>>>>>>> #include <stdio.h>
>>>>>>> 
>>>>>>> int main(int argc,char **args)
>>>>>>> {
>>>>>>> clock_t start,s1,s2,s3;
>>>>>>> double  cputime;
>>>>>>> double   *init,tmp[100] = {0};
>>>>>>> 
>>>>>>> start = clock();
>>>>>>> cudaFree(0);
>>>>>>> s1 = clock();
>>>>>>> cudaMalloc((void **)&init,100*sizeof(double));
>>>>>>> s2 = clock();
>>>>>>> cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
>>>>>>> s3 = clock();
>>>>>>> printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 
>>>>>>> - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / 
>>>>>>> CLOCKS_PER_SEC,((double) (s3 - s2)) / CLOCKS_PER_SEC);
>>>>>>> 
>>>>>>> return 0;
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/
>> 
> 

Reply via email to