Have you tried passing -cuda_initialize 0 to petsc? --Junchao Zhang
On Fri, Feb 7, 2020 at 5:16 PM Zhang, Hong via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > I tried to install PETSc shared library in /gpfs/alpine/scratch, which > should be faster than the home directory. But the same overhead still > persists. > > Hong > > > On Feb 7, 2020, at 4:32 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > > > Perhaps the intent is that you build or install (--prefix) your > libraries in a different place than /autofs/nccs-svm1_home1 > > > > > > > >> On Feb 7, 2020, at 3:09 PM, Zhang, Hong <hongzh...@anl.gov> wrote: > >> > >> Note that the overhead was triggered by the first call to a CUDA > function. So it seems that the first CUDA function triggered loading petsc > so (if petsc so is linked), which is slow on the summit file system. > >> > >> Hong > >> > >>> On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev < > petsc-dev@mcs.anl.gov> wrote: > >>> > >>> Linking any other shared library does not slow down the execution. The > PETSc shared library is the only one causing trouble. > >>> > >>> Here are the ldd output for two different versions. For the first > version, I removed -lpetsc and it ran very fast. The second (slow) version > was linked to petsc so. > >>> > >>> bash-4.2$ ldd ex_simple > >>> linux-vdso64.so.1 => (0x0000200000050000) > >>> liblapack.so.0 => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0 > (0x0000200000070000) > >>> libblas.so.0 => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0 > (0x00002000009b0000) > >>> libhdf5hl_fortran.so.100 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100 > (0x0000200000e80000) > >>> libhdf5_fortran.so.100 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100 > (0x0000200000ed0000) > >>> libhdf5_hl.so.100 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100 > (0x0000200000f50000) > >>> libhdf5.so.103 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103 > (0x0000200000fb0000) > >>> libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002000015e0000) > >>> libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 > (0x0000200001770000) > >>> libcublas.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 (0x0000200009b00000) > >>> libcudart.so.10.1 => > /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x000020000d950000) > >>> libcusparse.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x000020000d9f0000) > >>> libcusolver.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200012f50000) > >>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000020001dc40000) > >>> libdl.so.2 => /usr/lib64/libdl.so.2 (0x000020001ddd0000) > >>> libpthread.so.0 => /usr/lib64/libpthread.so.0 > (0x000020001de00000) > >>> libmpiprofilesupport.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3 > (0x000020001de40000) > >>> libmpi_ibm_usempi.so => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so > (0x000020001de70000) > >>> libmpi_ibm_mpifh.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3 > (0x000020001dea0000) > >>> libmpi_ibm.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3 > (0x000020001df40000) > >>> libpgf90rtl.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so > (0x000020001e0b0000) > >>> libpgf90.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so > (0x000020001e0f0000) > >>> libpgf90_rpm1.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so > (0x000020001e6a0000) > >>> libpgf902.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so > (0x000020001e6d0000) > >>> libpgftnrtl.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so > (0x000020001e700000) > >>> libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000020001e730000) > >>> libpgkomp.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so > (0x000020001e760000) > >>> libomp.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so > (0x000020001e790000) > >>> libomptarget.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so > (0x000020001e880000) > >>> libpgmath.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so > (0x000020001e8b0000) > >>> libpgc.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so > (0x000020001e9d0000) > >>> librt.so.1 => /usr/lib64/librt.so.1 (0x000020001eb40000) > >>> libm.so.6 => /usr/lib64/libm.so.6 (0x000020001eb70000) > >>> libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x000020001ec60000) > >>> libc.so.6 => /usr/lib64/libc.so.6 (0x000020001eca0000) > >>> libz.so.1 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1 > (0x000020001ee90000) > >>> libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x000020001eef0000) > >>> /lib64/ld64.so.2 (0x0000200000000000) > >>> libcublasLt.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x000020001ef40000) > >>> libutil.so.1 => /usr/lib64/libutil.so.1 (0x0000200020e50000) > >>> libhwloc_ompi.so.15 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15 > (0x0000200020e80000) > >>> libevent-2.1.so.6 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6 > (0x0000200020ef0000) > >>> libevent_pthreads-2.1.so.6 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6 > (0x0000200020f70000) > >>> libopen-rte.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3 > (0x0000200020fa0000) > >>> libopen-pal.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3 > (0x00002000210b0000) > >>> libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002000211a0000) > >>> > >>> > >>> bash-4.2$ ldd ex_simple_slow > >>> linux-vdso64.so.1 => (0x0000200000050000) > >>> libpetsc.so.3.012 => > /autofs/nccs-svm1_home1/hongzh/Projects/petsc/arch-olcf-summit-sell-opt/lib/libpetsc.so.3.012 > (0x0000200000070000) > >>> liblapack.so.0 => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0 > (0x0000200002be0000) > >>> libblas.so.0 => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0 > (0x0000200003520000) > >>> libhdf5hl_fortran.so.100 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100 > (0x00002000039f0000) > >>> libhdf5_fortran.so.100 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100 > (0x0000200003a40000) > >>> libhdf5_hl.so.100 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100 > (0x0000200003ac0000) > >>> libhdf5.so.103 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103 > (0x0000200003b20000) > >>> libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000200004150000) > >>> libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 > (0x00002000042e0000) > >>> libcublas.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 (0x000020000c670000) > >>> libcudart.so.10.1 => > /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x00002000104c0000) > >>> libcusparse.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x0000200010560000) > >>> libcusolver.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200015ac0000) > >>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002000207b0000) > >>> libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000200020940000) > >>> libpthread.so.0 => /usr/lib64/libpthread.so.0 > (0x0000200020970000) > >>> libmpiprofilesupport.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3 > (0x00002000209b0000) > >>> libmpi_ibm_usempi.so => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so > (0x00002000209e0000) > >>> libmpi_ibm_mpifh.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3 > (0x0000200020a10000) > >>> libmpi_ibm.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3 > (0x0000200020ab0000) > >>> libpgf90rtl.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so > (0x0000200020c20000) > >>> libpgf90.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so > (0x0000200020c60000) > >>> libpgf90_rpm1.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so > (0x0000200021210000) > >>> libpgf902.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so > (0x0000200021240000) > >>> libpgftnrtl.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so > (0x0000200021270000) > >>> libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00002000212a0000) > >>> libpgkomp.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so > (0x00002000212d0000) > >>> libomp.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so > (0x0000200021300000) > >>> libomptarget.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so > (0x00002000213f0000) > >>> libpgmath.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so > (0x0000200021420000) > >>> libpgc.so => > /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so > (0x0000200021540000) > >>> librt.so.1 => /usr/lib64/librt.so.1 (0x00002000216b0000) > >>> libm.so.6 => /usr/lib64/libm.so.6 (0x00002000216e0000) > >>> libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002000217d0000) > >>> libc.so.6 => /usr/lib64/libc.so.6 (0x0000200021810000) > >>> libz.so.1 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1 > (0x0000200021a10000) > >>> libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x0000200021a60000) > >>> /lib64/ld64.so.2 (0x0000200000000000) > >>> libcublasLt.so.10 => > /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x0000200021ab0000) > >>> libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002000239c0000) > >>> libhwloc_ompi.so.15 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15 > (0x00002000239f0000) > >>> libevent-2.1.so.6 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6 > (0x0000200023a60000) > >>> libevent_pthreads-2.1.so.6 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6 > (0x0000200023ae0000) > >>> libopen-rte.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3 > (0x0000200023b10000) > >>> libopen-pal.so.3 => > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3 > (0x0000200023c20000) > >>> libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000200023d10000) > >>> > >>> > >>>> On Feb 7, 2020, at 2:31 PM, Smith, Barry F. <bsm...@mcs.anl.gov> > wrote: > >>>> > >>>> > >>>> ldd -o on the executable of both linkings of your code. > >>>> > >>>> My guess is that without PETSc it is linking the static version of > the needed libraries and with PETSc the shared. And, in typical fashion, > the shared libraries are off on some super slow file system so take a long > time to be loaded and linked in on demand. > >>>> > >>>> Still a performance bug in Summit. > >>>> > >>>> Barry > >>>> > >>>> > >>>>> On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev < > petsc-dev@mcs.anl.gov> wrote: > >>>>> > >>>>> Hi all, > >>>>> > >>>>> Previously I have noticed that the first call to a CUDA function > such as cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on > summit. Then I prepared a simple example as attached to help OCLF reproduce > the problem. It turned out that the problem was caused by PETSc. The > 7.5-second overhead can be observed only when the PETSc lib is linked. If I > do not link PETSc, it runs normally. Does anyone have any idea why this > happens and how to fix it? > >>>>> > >>>>> Hong (Mr.) > >>>>> > >>>>> bash-4.2$ cat ex_simple.c > >>>>> #include <time.h> > >>>>> #include <cuda_runtime.h> > >>>>> #include <stdio.h> > >>>>> > >>>>> int main(int argc,char **args) > >>>>> { > >>>>> clock_t start,s1,s2,s3; > >>>>> double cputime; > >>>>> double *init,tmp[100] = {0}; > >>>>> > >>>>> start = clock(); > >>>>> cudaFree(0); > >>>>> s1 = clock(); > >>>>> cudaMalloc((void **)&init,100*sizeof(double)); > >>>>> s2 = clock(); > >>>>> cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice); > >>>>> s3 = clock(); > >>>>> printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) > (s1 - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / > CLOCKS_PER_SEC,((double) (s3 - s2)) / CLOCKS_PER_SEC); > >>>>> > >>>>> return 0; > >>>>> } > >>>>> > >>>>> > >>>> > >>> > >> > > > >