1. Commenting out ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); in device/impls/cupm/cupmcontext.hpp:L199
CUDA memory: 1.575GB CUDA memory without importing torch: 0.370GB This has the same effect as commenting out L437-L440 in interface/device.cxx 2. Comment out these two: . src/sys/objects/device/impls/cupm/cupmdevice.cxx:327 [ierr = _devices[_defaultDevice]->configure();CHKERRQ(ierr);] . src/sys/objects/device/impls/cupm/cupmdevice.cxx:326 [ierr = _devices[_defaultDevice]->initialize();CHKERRQ(ierr);] CUDA memory: 1.936GB CUDA memory without importing torch: 0.730GB On Jan 7, 2022, at 11:21 AM, Jacob Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>> wrote: They had no influence to the memory usage. ??????????????????????????????????????????????????????????????????????? Comment out the ierr = _devices[id]->initialize();CHKERRQ(ierr); on line 360 in cupmdevice.cxx as well. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) On Jan 7, 2022, at 12:18, Zhang, Hong <hongzh...@anl.gov<mailto:hongzh...@anl.gov>> wrote: I have tried all of these. They had no influence to the memory usage. On Jan 7, 2022, at 11:15 AM, Jacob Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>> wrote: Initializing cutlass and cusolver does not affect the memory usage. I did the following to turn them off: Ok next things to try out in order: 1. src/sys/objects/device/impls/cupm/cupmcontext.hpp:178 [PetscFunctionBegin;] Put a PetscFunctionReturn(0); right after this 2. src/sys/objects/device/impls/cupm/cupmdevice.cxx:327 [ierr = _devices[_defaultDevice]->configure();CHKERRQ(ierr);] Comment this out 3. src/sys/objects/device/impls/cupm/cupmdevice.cxx:326 [ierr = _devices[_defaultDevice]->initialize();CHKERRQ(ierr);] Comment this out Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) On Jan 7, 2022, at 12:02, Zhang, Hong <hongzh...@anl.gov<mailto:hongzh...@anl.gov>> wrote: Initializing cutlass and cusolver does not affect the memory usage. I did the following to turn them off: diff --git a/src/sys/objects/device/impls/cupm/cupmcontext.hpp b/src/sys/objects/device/impls/cupm/cupmcontext.hpp index 51fed809e4d..9a5f068323a 100644 --- a/src/sys/objects/device/impls/cupm/cupmcontext.hpp +++ b/src/sys/objects/device/impls/cupm/cupmcontext.hpp @@ -199,7 +199,7 @@ inline PetscErrorCode CUPMContext<T>::setUp(PetscDeviceContext dctx) noexcept #if PetscDefined(USE_DEBUG) dci->timerInUse = PETSC_FALSE; #endif - ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); + //ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); PetscFunctionReturn(0); } On Jan 7, 2022, at 10:53 AM, Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>> wrote: I don't think this is right. We want the device initialized by PETSc , we just don't want the cublas and cusolve stuff initialized. In order to see how much memory initializing the blas and solvers takes. So I think you need to comment things in cupminterface.hpp like cublasCreate and cusolverDnCreate. Urgh, I hate C++ where huge chunks of real code are in header files. On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>> wrote: Hit send too early… If you don’t want to comment out, you can also run with "-device_enable lazy" option. Normally this is the default behavior but if -log_view or -log_summary is provided this defaults to “-device_enable eager”. See src/sys/objects/device/interface/device.cxx:398 Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>> wrote: You need to go into the PetscInitialize() routine find where it loads the cublas and cusolve and comment out those lines then run with -log_view Comment out #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || PetscDefined(HAVE_SYCL)) ierr = PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr); #endif At src/sys/objects/pinit.c:956 Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) On Jan 7, 2022, at 11:24, Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>> wrote: Without log_view it does not load any cuBLAS/cuSolve immediately with -log_view it loads all that stuff at startup. You need to go into the PetscInitialize() routine find where it loads the cublas and cusolve and comment out those lines then run with -log_view On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev <petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote: When PETSc is initialized, it takes about 2GB CUDA memory. This is way too much for doing nothing. A test script is attached to reproduce the issue. If I remove the first line "import torch", PETSc consumes about 0.73GB, which is still significant. Does anyone have any idea about this behavior? Thanks, Hong hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples (caidao22/update-examples)$ python3 test.py CUDA memory before PETSc 0.000GB CUDA memory after PETSc 0.004GB hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples (caidao22/update-examples)$ python3 test.py -log_view :0.txt CUDA memory before PETSc 0.000GB CUDA memory after PETSc 1.936GB import torch import sys import os import nvidia_smi nvidia_smi.nvmlInit() handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) print('CUDA memory before PETSc %.3fGB' % (info.used/1e9)) petsc4py_path = os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib') sys.path.append(petsc4py_path) import petsc4py petsc4py.init(sys.argv) handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))