Many thanks for your support, Lev!

It works and looks good!

I have already mentioned about parallelization: your answer was that it happens automatically.
To be more detailed:

   0. Could you please explain this mechanism?
   1. How many blocks/threads has been used in my program?
   2. How to obtain this numbers in program? How to manipulate?
   3. Could you give any literature, where I can read about it?

I have such a question because I am new one in such theme. When using the SourceModul - I give the number of blocks and threads. So it is not clear for me - how it works automatically? Is that depends from the Matrix size an so on?

I think such information will be useful for all new users! I hope, you can help me (or someone else:-) ) to understand!

Best regards,
Evgeny
Am 25.02.2014 15:47, schrieb Lev Givon:
Received from Evgeny Lazutkin on Tue, Feb 25, 2014 at 03:18:18AM EST:
Dear Lev, dear all,

the problem with DataTypeError I have solved. I did not mention that
I pass float64 instead of float32.

In attach you will find the code, it works......but....GPU brings
wrong solution. I print the results in program. They dont match.
Could you explain why?
This is because arrays in numpy are row-major by default while CULA assumes the
data is column-major [1]. If you transpose the two input matrices, you should
obtain the correct result (transposed). Corrected code attached.

[1] http://www.culatools.com/cula_dense_programmers_guide/#column-major-ordering

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to