Ok, I'm testing pycuda now. I think I'll want to use pyopencl but until
I test that I'll use this.

So, it is working now... How would I get an md5 sum to get run? would
that be put into the kernel? Then the kernel would be run per 'thread' ?

Also, on a Nvidia GeForce 9500 GT, how many threads can be run at one
time?  http://www.nvidia.com/object/product_geforce_9500gt_us.html

It is 'compute Capability' 1.1,  so it has 32 multiprocessors so isn't
it 32*768=24,576 threads?


Can pycuda automatically max out all the cards on the machine? Or how
would I tell it to use both cards?
Also, here is the devices I'm using:

r...@quentusrex-desktop:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release#
./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA

Device 0: "GeForce 9500 GT"
  CUDA Driver Version:                           2.30
  CUDA Runtime Version:                          2.30
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         1
  Total amount of global memory:                 1073020928 bytes
  Number of multiprocessors:                     4
  Number of cores:                               32
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          262144 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.38 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       No
  Compute mode:                                  Default (multiple host
threads can use this device simultaneously)

Device 1: "GeForce 9500 GT"
  CUDA Driver Version:                           2.30
  CUDA Runtime Version:                          2.30
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         1
  Total amount of global memory:                 1073479680 bytes
  Number of multiprocessors:                     4
  Number of cores:                               32
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          262144 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.38 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       No
  Compute mode:                                  Default (multiple host
threads can use this device simultaneously)

Test PASSED

Press ENTER to exit...




_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to