[ 
https://issues.apache.org/jira/browse/SYSTEMML-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare resolved SYSTEMML-1396.
---------------------------------------
    Resolution: Fixed

> Enable lazily freeing cuda allocated memory chunks
> --------------------------------------------------
>
>                 Key: SYSTEMML-1396
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1396
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Runtime
>            Reporter: Nakul Jindal
>            Assignee: Nakul Jindal
>             Fix For: SystemML 1.0
>
>
> The current version of deallocating cuda memory chunks is done 
> asynchronously. That came about as a result of the {{cudaFree}} operations 
> being expensive and so the thought process of doing cudaFree asynchronously 
> was that the cudaFree could happen when the CPU was busy with other work. In 
> tight loops where most operations are done on the GPU, the asynchronous 
> cudaFree weren't really asynchronous. Operations waiting to use the GPU would 
> pay the penalty for the cudaFree operation.
> After adding extra instrumentation, it was determined that {{cudaAlloc}} 
> operations were fairly expensive as well. 
> Most GPU operations are done in loops with constantly allocating and 
> deallocating the same size of memory chunks per loop. What would be more 
> efficient is to "clear out" or set the memory to 0 instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to