Now in your situation there's a failure when reactivating the context to detach from it, probably because the runtime is meddling about. The only reason why cuCtxPushCurrent would throw an "invalid value", is, IMO, if that context is already somewhere in the context stack. So it's likely that the runtime reactivated the context. In current git, a failure of the PushCurrent call should not cause a failure any more--it will print a warning instead.
I believe different contexts can't share variables that is on the GPUs. I can use GPUArray objects as arguments to my fft functions, and these objects still exist after fft. So I think fft is using the same context as pycuda.

I made the change indicated in the attached diff file, such that context.synchronize() and context.detach() would print out the context stack size, and detach() would also print out whether current context is active. With this I verify that the stack size is 1 before and after running fft code and the context does not change.

My guess is that CUFFT make some change to the current context, such that once this context is poped, it is automatically destroyed. If my guess is correct, then calling context.detach() would destroy the context, since its usage count drops to 0, and it could circumvent the warning message when the context destructor is called. I don't want people using my package to see warning message when Python exit, so I am not using autoinit in my package, but to create a context explicitly.

The following is kind of unrelated. I have done some experiments with contexts. I think context.pop() always pops up the top context from the stack, disregarding whether context is really at the top of the stack. E.g. I create two contexts c1 and c2 and then I can do c1.pop() twice without getting error.
Regarding the complex numbers stuff, I discovered cuComplex.h in
/usr/local/cuda/include. It includes complex.h from C library if
CU_USE_NATIVE_COMPLEX is defined. Would cuComplex.h or complex.h be a
better way to implement complex numbers?

Do you know to what extent this is documented? I wouldn't like for Nvidia to be changing this stuff out from under us. Also, since what CUDA version has it been around?
If it pans out, this does sound like a pretty good plan.
cuComplex.h exists since CUDA 2.1 and it hasn't changed in subsequent version. cuComplex.h is used by cufft.h and cublas.h. I can't find any documentation to it. A quick search on google shows that JCuda seems to be using it. http://www.jcuda.org/jcuda/jcublas/doc/jcuda/jcublas/cuComplex.html

Maybe we can simply use complex.h from GNU C library. A quick seach on my Ubuntu machine locates the following files:
/usr/include/complex.h, which includes
/usr/include/c++/4.4/complex.h, which then includes
/usr/include/c++/4.4/ccomplex, which in turn includes
/usr/include/c++/4.4/complex, which includes overloading of operators for complex number.

GNU C is available in most platform and I think nvcc requires the GNU C (at least on Linux).
By the way, will you attend the PyCon2010 next month? I'll be there
presenting a poster. If you'll be there, you can consider organizing a
sprint on pycuda.

I'd love to, but my number one priority for the next few months is on finishing my PhD. This is an excellent idea though--I'd be happy to hang out on IRC and support such a sprint remotely if there is interest
I am finishing my PhD too and I am lucky to have PyCon coming to my city. Good luck on your PhD.

Daniel
diff --git a/src/cpp/cuda.hpp b/src/cpp/cuda.hpp
index ee3036b..bae552c 100644
--- a/src/cpp/cuda.hpp
+++ b/src/cpp/cuda.hpp
@@ -423,10 +423,14 @@ namespace cuda
           bool active_before_destruction = current_context().get() == this;
           if (active_before_destruction)
           {
+              std::cerr << "context active before destruction." << std::endl;
+	      std::cerr << get_context_stack().size() << std::endl;
             CUDAPP_CALL_GUARDED_CLEANUP(cuCtxDetach, (m_context));
           }
           else
           {
+              std::cerr << "context inactive before destruction." << std::endl;
+	      std::cerr << get_context_stack().size() << std::endl;
             if (m_thread == boost::this_thread::get_id())
             {
               CUDAPP_CALL_GUARDED_CLEANUP(cuCtxPushCurrent, (m_context));
@@ -499,7 +503,9 @@ namespace cuda
 #endif
 
       static void synchronize()
-      { CUDAPP_CALL_GUARDED_THREADED(cuCtxSynchronize, ()); }
+      {
+	std::cerr << get_context_stack().size() << std::endl;
+	CUDAPP_CALL_GUARDED_THREADED(cuCtxSynchronize, ()); }
 
       static boost::shared_ptr<context> current_context(context *except=0)
       {
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to