[ https://issues.apache.org/jira/browse/SYSTEMML-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Niketan Pansare resolved SYSTEMML-1396. --------------------------------------- Resolution: Fixed > Enable lazily freeing cuda allocated memory chunks > -------------------------------------------------- > > Key: SYSTEMML-1396 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1396 > Project: SystemML > Issue Type: Improvement > Components: Runtime > Reporter: Nakul Jindal > Assignee: Nakul Jindal > Fix For: SystemML 1.0 > > > The current version of deallocating cuda memory chunks is done > asynchronously. That came about as a result of the {{cudaFree}} operations > being expensive and so the thought process of doing cudaFree asynchronously > was that the cudaFree could happen when the CPU was busy with other work. In > tight loops where most operations are done on the GPU, the asynchronous > cudaFree weren't really asynchronous. Operations waiting to use the GPU would > pay the penalty for the cudaFree operation. > After adding extra instrumentation, it was determined that {{cudaAlloc}} > operations were fairly expensive as well. > Most GPU operations are done in loops with constantly allocating and > deallocating the same size of memory chunks per loop. What would be more > efficient is to "clear out" or set the memory to 0 instead. -- This message was sent by Atlassian JIRA (v6.3.15#6346)