James Bergstra skrev: > Suppose you want to evaluate "dot(a*b+c*sqrt(d), e)". The GPU is > great for doing dot(), The CPU is equally great (or better?) for doing dot(). In both cases:
- memory access scale O(n) for dot producs. - computation scale O(n) for dot producs. - memory is low - computation is fast (faster for GPU) In both cases, the floating point unit is starved. That means it could do a lot more work if memory were faster. For the GPU to be "faster than CPU", you have to have a situation where computation dominates over memory access. Matrix-matrix multiplication is one such example. This is what GPUs are designed to do, as it is the major bootleneck in 3D graphics. The proper way to speed up "dot(a*b+c*sqrt(d), e)" is to get rid of temporary intermediates. That is, in Python pseudo-code: result = 0 for i in range(n): result += (a[i]*b[i] + c[i]*sqrt(d[i])) * e[i] instead of: tmp0 = empty(n) for i in range(n): tmp0[i] = a[i] * b[i] tmp1 = empty(n) for i in range(n): tmp1[i] = sqrt(d[i]) tmp2 = empty(n) for i in range(n): tmp2[i] = c[i] * tmp1[i] tmp3 = empty(n) for i in range(n): tmp3[i] = tmp0[i] + tmp2[i] result = 0 for i in range(n): result += tmp3[i] * e[i] It is this complication that makes NumPy an order of magnitude slower than hand-crafted C (but still much faster than pure Python!) Adding in GPUs will not change this. The amount of computation (flop count) is the same, so it cannot be the source of the slowness. Sturla Molden _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion