Michael Boulton <[email protected]> writes:
> I'm in Simon McIntosh-Smith's group at Bristol university and I've been 
> using PyOpenCL for a couple of weeks to convert some old fortran code, 
> but I'm having an issue with it and Simon suggested that I talk to you 
> directly.

Sure. I've cc'd the list--hope you don't mind.

> The problem is that whenever I get the OpenCL platforms (whether it be 
> indirectly by doing create_some_context() or by directly calling 
> get_platforms()) it allocates either 32 or 64 gigabytes of memory 
> (seemingly at random depending on the system and type of devices). If I 
> try to delete the platform objects then the memory still stays there, so 
> it means that whenever I start a run I'm allocating a huge chunk of 
> memory that I can never deallocate.

I don't think those are "real" memory allocations in the sense that they
are backed by physical system memory. You're probably seeing them in
"top" (or similar). I'm guessing they might be some sort of aperture
into which the driver maps GPU memory and various other stuff. Looking
at /proc/self/maps from within the process should give you a better idea
of what exactly is being mapped.

> I'm trying to do something with multiple threads at the moment where I 
> am looking what devices are available in the main thread, and spawning 
> one more thread for each device. if there's 2 GPUs and a CPU on the 
> system, this results in it allocating over 200 GB of memory instantly 
> which is obviously not intended. Whenever I try to create a context 
> after this happens then it throws a "RuntimeError: Context failed: out 
> of resources"

Are you sure you're putting the contexts onto different devices?
Contexts are quite memory-hungry on the device side (on Nvidia).

> (which I'm also guessing should actually show up as a 
> pyopencl.RuntimeError"?).

Could you check the type of the exception? I don't see how the current
code would throw a non-pyopencl exception.

> Before I was getting the devices like this I was trying to do it another 
> way, but I was running into another problem which I think may be related 
> to some weird internal python thing. I was initially trying to create a 
> context/command queue for each device in the main thread then sending it 
> to each spawned thread (I assume it pickles it to do this - I'm not that 
> well versed on the internals of python)

No, threads share data and address space directly. No pickles.

> then the command queue will 'become' invalid at some point. Calling
> queue.finish() would throw an 'invalid queue' exception, but trying to
> launch a kernel using the queue would cause it to just hang silently
> and I'd have to kill the process in linux.

That's also how (Nvidia) OpenCL "reports" segmentation faults (for
instance), i.e. bugs in your code. Are you sure there aren't any bugs in
your code that might cause the device to crash?

Alternatively, have you looked at the output of 'dmesg' to see if
there's anything incriminating? (The messages may look like gibberish,
but they might say something important.)

Hope this helps,
Andreas

Attachment: pgp82je6sROUw.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to