Mathieu Blondel skrev: > Hello, > > About one year ago, a high-level, objected-oriented SIMD API was added > to Mono. For example, there is a class Vector4f for vectors of 4 > floats and this class implements methods such as basic operators, > bitwise operators, comparison operators, min, max, sqrt, shuffle > directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set.
SIMD means "single instruction - multiple data". NumPy is interherently an object-oriented SIMD API: array1[:] = array2 + array3 is a SIMD instruction by definition. SIMD instructions in hardware for length-4 vectors are mostly useful for 3D graphics. But they are not used a lot for that purpose, because GPUs are getting common. SSE is mostly for rendering 3D graphics without a GPU. There is nothing that prevents NumPy from having a Vector4f dtype, that internally stores four float32 and is aligned at 16 byte boundaries. But it would not be faster than the current float32 dtype. Do you know why? The reason is that memory access is slow, and computation is fast. Modern CPUs are starved. The speed of NumPy is not limited by not using MMX/SSE whenever possible. It is limited from having to create and delete temporary arrays all the time. You are suggesting to optimize in the wrong place. There is a lot that can be done to speed up computation: There are optimized BLAS libraries like ATLAS and MKL. NumPy uses BLAS for things like matrix multiplication. There are OpenMP for better performance on multicores. There are OpenCL and CUDA for moving computation from CPUs to GPU. But the main boost you get from going from NumPy to hand-written C or Fortran comes from reduced memory use. > existing discussion here. Memory-alignment is an import related issue > since non-aligned movs can tank the performance. > > You can align an ndarray on 16-byte boundary like this: def aligned_array(N, dtype): d = dtype() tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8) address = tmp.__array_interface__['data'][0] offset = (16 - address % 16) % 16 return tmp[offset:offset+N].view(dtype=dtype) Sturla Molden _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion