Ok, I'm testing pycuda now. I think I'll want to use pyopencl but until I test that I'll use this.
So, it is working now... How would I get an md5 sum to get run? would that be put into the kernel? Then the kernel would be run per 'thread' ? Also, on a Nvidia GeForce 9500 GT, how many threads can be run at one time? http://www.nvidia.com/object/product_geforce_9500gt_us.html It is 'compute Capability' 1.1, so it has 32 multiprocessors so isn't it 32*768=24,576 threads? Can pycuda automatically max out all the cards on the machine? Or how would I tell it to use both cards? Also, here is the devices I'm using: r...@quentusrex-desktop:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release# ./deviceQuery CUDA Device Query (Runtime API) version (CUDART static linking) There are 2 devices supporting CUDA Device 0: "GeForce 9500 GT" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 1 Total amount of global memory: 1073020928 bytes Number of multiprocessors: 4 Number of cores: 32 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.38 GHz Concurrent copy and execution: Yes Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: No Compute mode: Default (multiple host threads can use this device simultaneously) Device 1: "GeForce 9500 GT" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 1 Total amount of global memory: 1073479680 bytes Number of multiprocessors: 4 Number of cores: 32 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.38 GHz Concurrent copy and execution: Yes Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: No Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED Press ENTER to exit... _______________________________________________ PyCUDA mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
