Note that the overhead was triggered by the first call to a CUDA function. So 
it seems that the first CUDA function triggered loading petsc so (if petsc so 
is linked), which is slow on the summit file system.

Hong

On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:

Linking any other shared library does not slow down the execution. The PETSc 
shared library is the only one causing trouble.

Here are the ldd output for two different versions. For the first version, I 
removed -lpetsc and it ran very fast. The second (slow) version was linked to 
petsc so.

bash-4.2$ ldd ex_simple
        linux-vdso64.so.1 =>  (0x0000200000050000)
        liblapack.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0
 (0x0000200000070000)
        libblas.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0
 (0x00002000009b0000)
        libhdf5hl_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100
 (0x0000200000e80000)
        libhdf5_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100
 (0x0000200000ed0000)
        libhdf5_hl.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100
 (0x0000200000f50000)
        libhdf5.so.103 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103
 (0x0000200000fb0000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002000015e0000)
        libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 
(0x0000200001770000)
        libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 
(0x0000200009b00000)
        libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 
(0x000020000d950000)
        libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 
(0x000020000d9f0000)
        libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 
(0x0000200012f50000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000020001dc40000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x000020001ddd0000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x000020001de00000)
        libmpiprofilesupport.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3
 (0x000020001de40000)
        libmpi_ibm_usempi.so => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so
 (0x000020001de70000)
        libmpi_ibm_mpifh.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3
 (0x000020001dea0000)
        libmpi_ibm.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3
 (0x000020001df40000)
        libpgf90rtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so
 (0x000020001e0b0000)
        libpgf90.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so
 (0x000020001e0f0000)
        libpgf90_rpm1.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so
 (0x000020001e6a0000)
        libpgf902.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so
 (0x000020001e6d0000)
        libpgftnrtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so
 (0x000020001e700000)
        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000020001e730000)
        libpgkomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so
 (0x000020001e760000)
        libomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so
 (0x000020001e790000)
        libomptarget.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so
 (0x000020001e880000)
        libpgmath.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so
 (0x000020001e8b0000)
        libpgc.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so
 (0x000020001e9d0000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x000020001eb40000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x000020001eb70000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x000020001ec60000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x000020001eca0000)
        libz.so.1 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1
 (0x000020001ee90000)
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x000020001eef0000)
        /lib64/ld64.so.2 (0x0000200000000000)
        libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 
(0x000020001ef40000)
        libutil.so.1 => /usr/lib64/libutil.so.1 (0x0000200020e50000)
        libhwloc_ompi.so.15 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15
 (0x0000200020e80000)
        libevent-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6
 (0x0000200020ef0000)
        libevent_pthreads-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6
 (0x0000200020f70000)
        libopen-rte.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3
 (0x0000200020fa0000)
        libopen-pal.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3
 (0x00002000210b0000)
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002000211a0000)


bash-4.2$ ldd ex_simple_slow
        linux-vdso64.so.1 =>  (0x0000200000050000)
        libpetsc.so.3.012 => 
/autofs/nccs-svm1_home1/hongzh/Projects/petsc/arch-olcf-summit-sell-opt/lib/libpetsc.so.3.012
 (0x0000200000070000)
        liblapack.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0
 (0x0000200002be0000)
        libblas.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0
 (0x0000200003520000)
        libhdf5hl_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100
 (0x00002000039f0000)
        libhdf5_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100
 (0x0000200003a40000)
        libhdf5_hl.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100
 (0x0000200003ac0000)
        libhdf5.so.103 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103
 (0x0000200003b20000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000200004150000)
        libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 
(0x00002000042e0000)
        libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 
(0x000020000c670000)
        libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 
(0x00002000104c0000)
        libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 
(0x0000200010560000)
        libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 
(0x0000200015ac0000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002000207b0000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000200020940000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x0000200020970000)
        libmpiprofilesupport.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3
 (0x00002000209b0000)
        libmpi_ibm_usempi.so => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so
 (0x00002000209e0000)
        libmpi_ibm_mpifh.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3
 (0x0000200020a10000)
        libmpi_ibm.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3
 (0x0000200020ab0000)
        libpgf90rtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so
 (0x0000200020c20000)
        libpgf90.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so
 (0x0000200020c60000)
        libpgf90_rpm1.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so
 (0x0000200021210000)
        libpgf902.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so
 (0x0000200021240000)
        libpgftnrtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so
 (0x0000200021270000)
        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00002000212a0000)
        libpgkomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so
 (0x00002000212d0000)
        libomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so
 (0x0000200021300000)
        libomptarget.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so
 (0x00002000213f0000)
        libpgmath.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so
 (0x0000200021420000)
        libpgc.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so
 (0x0000200021540000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00002000216b0000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00002000216e0000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002000217d0000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x0000200021810000)
        libz.so.1 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1
 (0x0000200021a10000)
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x0000200021a60000)
        /lib64/ld64.so.2 (0x0000200000000000)
        libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 
(0x0000200021ab0000)
        libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002000239c0000)
        libhwloc_ompi.so.15 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15
 (0x00002000239f0000)
        libevent-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6
 (0x0000200023a60000)
        libevent_pthreads-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6
 (0x0000200023ae0000)
        libopen-rte.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3
 (0x0000200023b10000)
        libopen-pal.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3
 (0x0000200023c20000)
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000200023d10000)


On Feb 7, 2020, at 2:31 PM, Smith, Barry F. 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:


 ldd -o on the executable of both linkings of your code.

 My guess is that without PETSc it is linking the static version of the needed 
libraries and with PETSc the shared. And, in typical fashion, the shared 
libraries are off on some super slow file system so take a long time to be 
loaded and linked in on demand.

  Still a performance bug in Summit.

  Barry


On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:

Hi all,

Previously I have noticed that the first call to a CUDA function such as 
cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. 
Then I prepared a simple example as attached to help OCLF reproduce the 
problem. It turned out that the problem was  caused by PETSc. The 7.5-second 
overhead can be observed only when the PETSc lib is linked. If I do not link 
PETSc, it runs normally. Does anyone have any idea why this happens and how to 
fix it?

Hong (Mr.)

bash-4.2$ cat ex_simple.c
#include <time.h>
#include <cuda_runtime.h>
#include <stdio.h>

int main(int argc,char **args)
{
clock_t start,s1,s2,s3;
double  cputime;
double   *init,tmp[100] = {0};

start = clock();
cudaFree(0);
s1 = clock();
cudaMalloc((void **)&init,100*sizeof(double));
s2 = clock();
cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
s3 = clock();
printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - 
start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 - 
s2)) / CLOCKS_PER_SEC);

return 0;
}





Reply via email to