Thanks for the quick response.  Threading is not completely essential for what 
I'm doing right now, but certainly a nice-to-have at least.  It seems disabling 
the thread check in this case removes the leak (I'm only using one context), 
but does introduce other odd behavior that I haven't figured out yet.  I 
removed the other threads I had for now, but would be interested in looking at 
this some more -- perhaps I will stop by sometime next week.

David


On Feb 2, 2012, at 4:56 PM, Andreas Kloeckner wrote:

> On Thu, 2 Feb 2012 14:49:32 -0500, David Eigen <dei...@cs.nyu.edu> wrote:
>> Hi,
>> 
>> I ran into a gpu memory leak that appears to happen when gc frees GPUArrays 
>> that were created in a thread other than the one gc is running in.  I did 
>> not see a github issue tracking this.  Is this a known issue that others 
>> have run into?  I'm using pycuda 2011.2.2 with python 2.6.7.
>> 
>> This happens when gc frees a DeviceAllocation that had been created in 
>> another thread.  Since there are no refs to it, it is indeed freed, and the 
>> destructor tries to free the corresponding CUdeviceptr.  However, the 
>> following lines cause mem_free not to be called: the 
>> scoped_context_activation checks that the running thread matches the 
>> context's thread; since it doesn't, it throws an exception, which is 
>> silently caught in CUDAPP_CATCH_CLEANUP_ON_DEAD_CONTEXT:
>> 
>> class device_allocation ...
>>      void free()
>>      {
>>        if (m_valid)
>>        {
>>          try
>>          {
>>            scoped_context_activation ca(get_context());
>>            mem_free(m_devptr);
>>          }
>>          CUDAPP_CATCH_CLEANUP_ON_DEAD_CONTEXT(device_allocation);
>> 
>> I was wondering how to go about fixing or working with this, or if
>> anyone has any advice?
> 
> OpenCL is a better API if you absolutely need threading, so using
> PyOpenCL is one possible workaround. Using processes rather than threads
> is another workaround, possibly with some explicitly shared memory. [1]
> (Note you need to fork before pycuda.init(), it seems.)
> 
> [1]
> http://docs.python.org/dev/library/multiprocessing.html#module-multiprocessing
> 
> If you absolutely want this fixed, you might introduce a per-context
> queue of things to be freed. I'll warn you that context management in
> CUDA is a mess with poorly documented semantics. This got partially
> fixed with a new, less broken API in CUDA 4.0, and I'm happy that the
> current code doesn't seem to be too horribly broken. If we were to
> switch to APIs now, we'd ditch backward compatibility with CUDA 3.x and
> below.
> 
> If you like, you can also just come by to discuss this. (I'm in 1105A WWH)
> 
> Andreas


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to