Many thanks for your support, Lev!
It works and looks good!
I have already mentioned about parallelization: your answer was that it
happens automatically.
To be more detailed:
0. Could you please explain this mechanism?
1. How many blocks/threads has been used in my program?
2. How to obtain this numbers in program? How to manipulate?
3. Could you give any literature, where I can read about it?
I have such a question because I am new one in such theme. When using
the SourceModul - I give the number of blocks and threads. So it is not
clear for me - how it works automatically? Is that depends from the
Matrix size an so on?
I think such information will be useful for all new users! I hope, you
can help me (or someone else:-) ) to understand!
Best regards,
Evgeny
Am 25.02.2014 15:47, schrieb Lev Givon:
Received from Evgeny Lazutkin on Tue, Feb 25, 2014 at 03:18:18AM EST:
Dear Lev, dear all,
the problem with DataTypeError I have solved. I did not mention that
I pass float64 instead of float32.
In attach you will find the code, it works......but....GPU brings
wrong solution. I print the results in program. They dont match.
Could you explain why?
This is because arrays in numpy are row-major by default while CULA assumes the
data is column-major [1]. If you transpose the two input matrices, you should
obtain the correct result (transposed). Corrected code attached.
[1] http://www.culatools.com/cula_dense_programmers_guide/#column-major-ordering
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda