Hi, The example with numpy array for small array, the speed problem is probably because NumPy have not been speed optimized for low overhead. For example, each c function should check first if the input is a NumPy array, if not jump to a function to make one. For example, currently in the c function(PyArray_Multiply?) that got called by dot(), a c function call is made to check if the array is a NumPy array. This is an extra overhead for the expected most frequent expected behavior that the input is a NumPy array. I'm pretty sure this happen at many place. In this particular function, there is many other function call before calling blas just for the simple case of vector x vector, vector x matrix or matrix x matrix dot product.
But this is probably for another thread if people want to discuss it more. Also, I didn't verify how frequently we could lower the overhead as we don't need it. So it could be just a few function that need those type of optimization. For the comparison with the multiple type of array on the GPU, I think the first reason is that people worked isolated and that the only implemented the subset of the numpy ndarray they needed. As different project/groups need different part, reusing other people work was not trivial. Otherwise, I see the problem, but I don't know what to say about it as I didn't experience it. Fred _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion