George Dahl skrev: > I know that for my work, I can get around an order of a 50-fold speedup over > numpy using a python wrapper for a simple GPU matrix class. So I might be > dealing with a lot of matrix products where I multiply a fixed 512 by 784 matrix > by a 784 by 256 matrix that changes between each matrix product, although to > really see the largest gains I use a 4096 by 2048 matrix times a bunch of 2048 > by 256 matrices.
Matrix multiplication is at the core of 3D graphics, and the raison d'etre for GPUs. That is specifically what they are designed to do. Matrix multiplication scale O(n**3) with floating point operations and O(n**2) with memory access. That is GPUs gives fast 3D graphics (matrix multiplications) by speeding up floating point operations. GPUs makes sence for certain level-3 BLAS calls, but that really belongs in BLAS, not in NumPy's core. One could e.g. consider linking with a BLAS wrapper that directs these special cases to the GPU and the rest to ATLAS / MKL / netlib BLAS. Sturla Molden _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion