Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Charles R Harris
On Tue, Oct 20, 2009 at 11:44 PM, Mathieu Blondel wrote:

> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations. You can have a look at the following
> pages for further details:
>
> http://tirania.org/blog/archive/2008/Nov-03.html (blog post)
> http://go-mono.com/docs/index.aspx?tlin...@n%3amono.simd (API reference)
>
> It seems to me that such an API would possibly be a great fit in Numpy
> too. It would also be possible to add classes that don't directly map
> to SIMD types. For example, Vector8f can easily be implemented in
> terms of 2 Vector4f. In addition to vectors, additional API may be
> added to support operations on matrices of fixed width or height.
>
> I search the archives for similar discussions but I only found a
> discussion about memory-alignment so I hope I am not restarting an
> existing discussion here. Memory-alignment is an import related issue
> since non-aligned movs can tank the performance.
>
> Any thoughts? I don't know the Numpy code base yet but I'm willing to
> help if such an effort is started.
>
>
The licenses look all hodge-podge:


   - The C# compiler is dual-licensed under the MIT/X11 license and the GNU
   General Public License
(*http://www.opensource.org/licenses/gpl-license.html*) (GPL).


   - The tools are released under the terms of the GNU General Public
   License  (*
   http://www.opensource.org/licenses/gpl-license.html*) (GPL).


   - The runtime libraries are under the GNU Library GPL
2.0
(*http://www.gnu.org/copyleft/library.html#TOC1*) (LGPL 2.0).


   - The class libraries are released under the terms of the MIT
X11
(*http://www.opensource.org/licenses/mit-license.html*) license.


   - ASP.NET MVC and ASP.NET AJAX client software are released by Microsoft
   under the open source Microsoft Permissive
License
(*http://www.opensource.org/licenses/ms-pl.html*).

However, if the good stuff is in the class libraries, that looks OK. But
that still leaves it in C#, no? You could have a looksie to see how it would
fit into, say, Cython. I don't know where it would go in numpy, maybe some
of the vector bits would be suitable for some generalized ufuncs. Apart from
that, I believe ATLAS can already make use of SIMD, but I have no idea how
far it goes in using the full feature set.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
Hi Mathieu,

Mathieu Blondel wrote:
> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations. You can have a look at the following
> pages for further details:
>
> http://tirania.org/blog/archive/2008/Nov-03.html (blog post)
>   

I am not sure how this could be applied to numpy case ? From what I can
understand, this cannot be directly applied to python: the described
changes are vm changes, and we cannot do anything at python vm level (I
would guess the python vm to be too primitive to implement this kind of
things anyway).

I don't see how the high level API at the assembly level (Mono.Simd)
would work either: the overhead of python and numpy to deal with 4 or 8
items in python would make this API useless from a speed POV.

Implementing some numpy internal code in SIMD, and having a 'object
oriented' C API for SIMD would indeed be nice - gcc provides SSE
intrinsics, as well as visual studio (although the later seems quite
buggy if I believe this link:
http://www.virtualdub.org/blog/pivot/entry.php?id=162), which would make
this in principle relatively easy.

This is only my opinion (read other numpy dev may disagree), but I think
that the numpy C code should be cleaned up before adding this kind of
features: there is still too much coupling between the pure C core and
the python machinery. Also, any use of SIMD code should be done at
runtime IMHO (so that one binary can be used on multiple architectures),
which has some issues on its own from a cross platform POV.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Mathieu Blondel
> The licenses look all hodge-podge:

[...]

> However, if the good stuff is in the class libraries, that looks OK. But
> that still leaves it in C#, no?


I was mentioning Mono just to show that "this has been done" and also
their API reference can serve as inspiration to design Numpy's own
API.

Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Mathieu Blondel
Hi David,

On Wed, Oct 21, 2009 at 3:56 PM, David Cournapeau
 wrote:
> I am not sure how this could be applied to numpy case ? From what I can
> understand, this cannot be directly applied to python: the described
> changes are vm changes, and we cannot do anything at python vm level (I
> would guess the python vm to be too primitive to implement this kind of
> things anyway).

Yes in Mono this is realized with Just-In-Time compilation, so at the VM level.

The reason I thought of Numpy rather than Cython is that Python's
support for vectors/matrices is limited and Numpy has kind of become
the standard for that in the Python world.

I saw the video of Peter Norvig at the last Scipy conference who was
suggesting to merge Numpy into Cython. The SIMD API would be an
argument in favor of this too because of the possible interactions
between such a SIMD API and an array API.

> I don't see how the high level API at the assembly level (Mono.Simd)
> would work either: the overhead of python and numpy to deal with 4 or 8
> items in python would make this API useless from a speed POV.

My original idea was to write the code in C with Intel/Alvitec/Neon
intrinsics and have this code binded to be able to call it from
Python. So the SIMD code would be compiled already, ready to be called
from Python. Like you said, there's a risk that the overhead of
calling Python is bigger than the benefit of using SIMD instructions.
If it's worth trying out, an experiment can be made with Vector4f to
see if it's even worth continuing with other types.

> This is only my opinion (read other numpy dev may disagree), but I think
> that the numpy C code should be cleaned up before adding this kind of
> features: there is still too much coupling between the pure C core and
> the python machinery. Also, any use of SIMD code should be done at
> runtime IMHO (so that one binary can be used on multiple architectures),
> which has some issues on its own from a cross platform POV.

I recently used SIMD instructions for a project and I realized that
they cannot be activated in a standard Debian package, because the
package has to remain general-purpose. So people who want to benefit
the speed up have to compile my project from source... I also see that
sometimes packages are available in different flavors (-msse,
-msse2...).

Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Pauli Virtanen
Wed, 21 Oct 2009 16:48:22 +0900, Mathieu Blondel wrote:
[clip]
> My original idea was to write the code in C with Intel/Alvitec/Neon
> intrinsics and have this code binded to be able to call it from Python.
> So the SIMD code would be compiled already, ready to be called from
> Python. Like you said, there's a risk that the overhead of calling
> Python is bigger than the benefit of using SIMD instructions. If it's
> worth trying out, an experiment can be made with Vector4f to see if it's
> even worth continuing with other types.

The overhead is quickly checked for multiplication with numpy arrays of 
varying size, without SSE:

Overhead per iteration (ms): 1.6264549101
Time per array element (ms): 0.000936947636565
Cross-over point:1735.90801303

#--
import numpy as np
from scipy import optimize
import time
import matplotlib.pyplot as plt

def main():
data = []

for n in np.unique(np.logspace(0, 5, 20).astype(int)):
print n
m = 100
reps = 5
times = []
for rep in xrange(reps):
x = np.zeros((n,), dtype=np.float_)
start = time.time()
#--
for k in xrange(m):
x *= 1.1
#--
end = time.time()
times.append(end - start)
t = min(times)
data.append((n, t))

data = np.array(data)

def model(z):
n, t = data.T
overhead, per_elem = z
return np.log10(t) - np.log10(overhead + per_elem * n)

z, ier = optimize.leastsq(model, [1., 1.])
overhead, per_elem = z

print ""
print "Overhead per iteration (ms):", overhead*1e3
print "Time per array element (ms):", per_elem*1e3
print "Cross-over point:   ", overhead/per_elem

n = np.logspace(0, 5, 500)
plt.loglog(data[:,0], data[:,0]/data[:,1], 'x',
   label=r'measured')
plt.loglog(n, n/(overhead + per_elem*n), 'k-',
   label=r'fit to $t = a + b n$')
plt.xlabel(r'$n$')
plt.ylabel(r'ops/second')
plt.grid(1)
plt.legend()
plt.show()

if __name__ == "__main__":
main()

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
Mathieu Blondel wrote:
> I saw the video of Peter Norvig at the last Scipy conference who was
> suggesting to merge Numpy into Cython. The SIMD API would be an
> argument in favor of this too because of the possible interactions
> between such a SIMD API and an array API.
>   

Hm, I don't remember this - I guess I would have to look at the video.
Do you know at which point of the presentation he discussed about SIMD ?

> My original idea was to write the code in C with Intel/Alvitec/Neon
> intrinsics and have this code binded to be able to call it from
> Python. So the SIMD code would be compiled already, ready to be called
> from Python. Like you said, there's a risk that the overhead of
> calling Python is bigger than the benefit of using SIMD instructions.
> If it's worth trying out, an experiment can be made with Vector4f to
> see if it's even worth continuing with other types.
>   

I am quite confident that the overhead will be way too significant for
this approach to be useful. If you have two python objects, using + on
it will induce at least one function call, and most likely several
function calls at the python level. Python function calls are painfully
slow (several thousand cycles per call in the most optimistic case).

Python overhead is several order of magnitude bigger than what you can
earn between SIMD and straightforward C. The only way I can see to make
this work is to generate SIMD code from python (which would be a poor
man's replacement for a JIT in a way), there was a presentation
following this direction at scipy 09 conference.


> I recently used SIMD instructions for a project and I realized that
> they cannot be activated in a standard Debian package, because the
> package has to remain general-purpose. So people who want to benefit
> the speed up have to compile my project from source... 
>   

Yes - that's unacceptable IMHO. The real solution is to include all the
code at build time, detect at *runtime* which ISA is supported, and
select the functions accordingly. The problem is that loading shared
code at runtime in a cross platform way is complicated - python already
does it, but unfortunately does not provide a C API for it AFAIK, so we
would have to re-implement it in python.

cheers,

David

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Mathieu Blondel
On Wed, Oct 21, 2009 at 5:05 PM, David Cournapeau
 wrote:
> Mathieu Blondel wrote:
>> I saw the video of Peter Norvig at the last Scipy conference who was
>> suggesting to merge Numpy into Cython. The SIMD API would be an
>> argument in favor of this too because of the possible interactions
>> between such a SIMD API and an array API.
>>
>
> Hm, I don't remember this - I guess I would have to look at the video.
> Do you know at which point of the presentation he discussed about SIMD ?

Peter Norvig suggested to merge Numpy into Cython but he didn't
mention SIMD as the reason (this one is from me). Sorry if I wasn't
clear. IIRC, the reason was to help democratize Numpy and make it
easier for users to install it. He went on to say that he talked about
it with Guido and apparently the main barrier was the release cycle.
Please check the video as I'm telling you that from memory.

Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
Mathieu Blondel wrote:
> He went on to say that he talked about
> it with Guido and apparently the main barrier was the release cycle.
> Please check the video as I'm telling you that from memory.
>   

Ah, I think you are mistaken, then - he referred to merging numpy and
scipy into python during his talk, not cython.

For the reason you gave, including numpy into python is not really on
the radar. It was tried unsuccessfully some time ago, and the PEP buffer
(3118 IIRC) is a much more low-level API to share "typed" buffer at the
C level. Hopefully, numpy will be built on top of this at some point.
Scipy is very unlikely IMHO - I doubt depending on fortran code would be
acceptable for python.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Mathieu Blondel
On Wed, Oct 21, 2009 at 5:23 PM, David Cournapeau
 wrote:

> Ah, I think you are mistaken, then - he referred to merging numpy and
> scipy into python during his talk, not cython.

Oh, I meant to say CPython (the default implementation of Python), not
Cython. I didn't realize that they were different projects.

So the method dispatch seems to be a great obstacle to an
object-oriented SIMD API. That would seem more feasible in C++ with
non-virtual methods. Java has final methods, which can be useful
information to the JIT. C# seems to have "sealed" methods.
Interestingly, the Mono.SIMD API uses static methods, which I guess is
to avoid the dispatch problem. But it makes the code look uglier. For
example, instead of a + b, you have to do Vector4f.Addition(a, b).

Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Francesc Alted
A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigué:
> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations.
[clip]

It is important to stress out that all the above operations, except probably 
sqrt, are all memory-bound operations, and that implementing them for numpy 
would not represent a significant improvement at all.

This is because numpy is a package that works mainly with arrays in an 
element-wise way, and in this scenario, the time to transmit data to CPU 
dominates, by and large, over the time to perform operations.

Among other places, you can find a detailed explication of this fact in my 
presentation at latest EuroSciPy:

http://www.pytables.org/docs/StarvingCPUs.pdf

Cheers,

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
On Wed, Oct 21, 2009 at 6:12 PM, Francesc Alted  wrote:
> A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigué:
>> Hello,
>>
>> About one year ago, a high-level, objected-oriented SIMD API was added
>> to Mono. For example, there is a class Vector4f for vectors of 4
>> floats and this class implements methods such as basic operators,
>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>> directly using SIMD operations.
> [clip]
>
> It is important to stress out that all the above operations, except probably
> sqrt, are all memory-bound operations, and that implementing them for numpy
> would not represent a significant improvement at all.


> This is because numpy is a package that works mainly with arrays in an
> element-wise way, and in this scenario, the time to transmit data to CPU
> dominates, by and large, over the time to perform operations.

Is it general, or just for simple operations in numpy and ufunc ? I
remember that for music softwares, SIMD used to matter a lot, even for
simple bus mixing (which is basically a ax+by with a, b scalars and x
y the input arrays).

Do you have any interest in adding SIMD to some core numpy
(transcendental functions). If so, I would try to go back to the
problem of runtime SSE detection and loading of optimized shared
library in a cross-platform way - that's something which should be
done at some point in numpy, and people requiring it would be a good
incentive.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Matthieu Brucher
> Is it general, or just for simple operations in numpy and ufunc ? I
> remember that for music softwares, SIMD used to matter a lot, even for
> simple bus mixing (which is basically a ax+by with a, b scalars and x
> y the input arrays).

Indeed, it shouldn't :| I think the main reason might not be SIMD, but
the additional hypothesis you put on the arrays (aliasing). This way,
todays compilers may not even need the actual SIMD instructions.
I have the same opinion as Francesc, it would only be useful for
operations that need more computations that load/store.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Francesc Alted
A Wednesday 21 October 2009 14:27:46 David Cournapeau escrigué:
> > This is because numpy is a package that works mainly with arrays in an
> > element-wise way, and in this scenario, the time to transmit data to CPU
> > dominates, by and large, over the time to perform operations.
>
> Is it general, or just for simple operations in numpy and ufunc ? I
> remember that for music softwares, SIMD used to matter a lot, even for
> simple bus mixing (which is basically a ax+by with a, b scalars and x
> y the input arrays).

This is general, as long as the dataset has to be brought from memory to CPU, 
and operations to be done are element-wise and simple (i.e. not 
transcendental).  SIMD does matter in general when the dataset:

1) is already in cache
2) you have to perform costly operations (mainly transcendental)
3) a combination of the above

I don't know the case for music software, but if you say that ax+by are 
accelerated by SIMD, I'd say that case 1) is happening.

> Do you have any interest in adding SIMD to some core numpy
> (transcendental functions). If so, I would try to go back to the
> problem of runtime SSE detection and loading of optimized shared
> library in a cross-platform way - that's something which should be
> done at some point in numpy, and people requiring it would be a good
> incentive.

I don't personally have a lot of interest implementing this for numpy.  But in 
case anyone does, I find the next library:

http://gruntthepeon.free.fr/ssemath/

very interesting.  Perhaps there could be other (free) implementations...

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Pauli Virtanen
Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
[clip]
>> Do you have any interest in adding SIMD to some core numpy
>> (transcendental functions). If so, I would try to go back to the
>> problem of runtime SSE detection and loading of optimized shared
>> library in a cross-platform way - that's something which should be done
>> at some point in numpy, and people requiring it would be a good
>> incentive.
> 
> I don't personally have a lot of interest implementing this for numpy. 
> But in case anyone does, I find the next library:
> 
> http://gruntthepeon.free.fr/ssemath/
> 
> very interesting.  Perhaps there could be other (free)
> implementations...

Optimized transcendental functions could be interesting. For example for 
tanh, call overhead is overcome already for ~30-element arrays.

Since these are ufuncs, I suppose the SSE implementations could just be 
put in a separate module, which is always compiled. Before importing the 
module, we could simply check from Python side that the CPU supports the 
necessary instructions. If everything is OK, the accelerated 
implementations would then just replace the Numpy routines.

This type of project could probably also be started outside Numpy, and 
just monkey-patch the Numpy routines on import.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread René Dudfield
On Wed, Oct 21, 2009 at 2:14 PM, Pauli Virtanen 
> wrote:

> Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
> [clip]
> >> Do you have any interest in adding SIMD to some core numpy
> >> (transcendental functions). If so, I would try to go back to the
> >> problem of runtime SSE detection and loading of optimized shared
> >> library in a cross-platform way - that's something which should be done
> >> at some point in numpy, and people requiring it would be a good
> >> incentive.
> >
> > I don't personally have a lot of interest implementing this for numpy.
> > But in case anyone does, I find the next library:
> >
> > http://gruntthepeon.free.fr/ssemath/
> >
> > very interesting.  Perhaps there could be other (free)
> > implementations...
>
> Optimized transcendental functions could be interesting. For example for
> tanh, call overhead is overcome already for ~30-element arrays.
>
> Since these are ufuncs, I suppose the SSE implementations could just be
> put in a separate module, which is always compiled. Before importing the
> module, we could simply check from Python side that the CPU supports the
> necessary instructions. If everything is OK, the accelerated
> implementations would then just replace the Numpy routines.
>
> This type of project could probably also be started outside Numpy, and
> just monkey-patch the Numpy routines on import.
>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


Anyone seen the corepy numpy gsoc project?
http://numcorepy.blogspot.com/

It implements a number of functions with the corepy runtime assembler.  The
project showed nice simd speedups for numpy.


I've been following the liborc project... which is a runtime assembler that
uses a generic assembly language and supports many different simd assembly
languages (eg SSE, MMX, ARM, Altivec).  It's the replacement for the liboil
library (used in gstreamer etc).
http://code.entropywave.com/projects/orc/


cu!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Gregor Thalhammer
Pauli Virtanen schrieb:
> Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
> [clip]
>
>>> Do you have any interest in adding SIMD to some core numpy
>>> (transcendental functions). If so, I would try to go back to the
>>> problem of runtime SSE detection and loading of optimized shared
>>> library in a cross-platform way - that's something which should be done
>>> at some point in numpy, and people requiring it would be a good
>>> incentive.
>>>
>> I don't personally have a lot of interest implementing this for numpy.
>> But in case anyone does, I find the next library:
>>
>> http://gruntthepeon.free.fr/ssemath/
>>
>> very interesting.  Perhaps there could be other (free)
>> implementations...
>>
>
> Optimized transcendental functions could be interesting. For example for
> tanh, call overhead is overcome already for ~30-element arrays.
>
> Since these are ufuncs, I suppose the SSE implementations could just be
> put in a separate module, which is always compiled. Before importing the
> module, we could simply check from Python side that the CPU supports the
> necessary instructions. If everything is OK, the accelerated
> implementations would then just replace the Numpy routines.
>
I once wrote a module that replaces the built in transcendental
functions of numpy by optimized versions from Intels vector math
library. If someone is interested, I can publish it. In my experience it
was of little use since real world problems are limited by memory
bandwidth. Therefore extending numexpr with optimized transcendental
functions was the better solution. Afterwards I discovered that I could
have saved the effort of the first approach since gcc is able to use
optimized functions from Intels vector math library or AMD's math core
library, see the doc's of -mveclibabi. You just need to recompile numpy
with proper compiler arguments.

Gregor
> This type of project could probably also be started outside Numpy, and
> just monkey-patch the Numpy routines on import.
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Ryan May
On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer
 wrote:
> I once wrote a module that replaces the built in transcendental
> functions of numpy by optimized versions from Intels vector math
> library. If someone is interested, I can publish it. In my experience it
> was of little use since real world problems are limited by memory
> bandwidth. Therefore extending numexpr with optimized transcendental
> functions was the better solution. Afterwards I discovered that I could
> have saved the effort of the first approach since gcc is able to use
> optimized functions from Intels vector math library or AMD's math core
> library, see the doc's of -mveclibabi. You just need to recompile numpy
> with proper compiler arguments.

Do you have a link to the documentation for -mveclibabi?  I can't find
this anywhere and I'm *very* interested.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Ryan May
On Wed, Oct 21, 2009 at 1:23 PM, Ryan May  wrote:
> On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer
>  wrote:
>> I once wrote a module that replaces the built in transcendental
>> functions of numpy by optimized versions from Intels vector math
>> library. If someone is interested, I can publish it. In my experience it
>> was of little use since real world problems are limited by memory
>> bandwidth. Therefore extending numexpr with optimized transcendental
>> functions was the better solution. Afterwards I discovered that I could
>> have saved the effort of the first approach since gcc is able to use
>> optimized functions from Intels vector math library or AMD's math core
>> library, see the doc's of -mveclibabi. You just need to recompile numpy
>> with proper compiler arguments.
>
> Do you have a link to the documentation for -mveclibabi?  I can't find
> this anywhere and I'm *very* interested.

Ah, there it is.  Google doesn't come up with much, but the PDF manual
does have it:
http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc.pdf

(It helps when you don't mis-type your search in the PDF).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from Norman, Oklahoma, United States
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Neal Becker
...
> I once wrote a module that replaces the built in transcendental
> functions of numpy by optimized versions from Intels vector math
> library. If someone is interested, I can publish it. In my experience it
> was of little use since real world problems are limited by memory
> bandwidth. Therefore extending numexpr with optimized transcendental
> functions was the better solution. Afterwards I discovered that I could
> have saved the effort of the first approach since gcc is able to use
> optimized functions from Intels vector math library or AMD's math core
> library, see the doc's of -mveclibabi. You just need to recompile numpy
> with proper compiler arguments.
> 

I'm interested.  I'd like to try AMD rather than intel, because AMD is 
easier to obtain.  I'm running on intel machine, I hope that doesn't matter 
too much.

What exactly do I need to do?

I see that numpy/site.cfg has an MKL section.  I'm assuming I should not 
touch that, but just mess with gcc flags?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Warde-Farley

On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:

> Since these are ufuncs, I suppose the SSE implementations could just  
> be
> put in a separate module, which is always compiled. Before importing  
> the
> module, we could simply check from Python side that the CPU supports  
> the
> necessary instructions. If everything is OK, the accelerated
> implementations would then just replace the Numpy routines.

Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
CorePy work this summer?

Looking at his slides again, the speedups are rather impressive. I  
wonder if these could be usefully integrated into numpy itself?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Andrew Friedley
sigh; yet another email dropped by the list.

David Warde-Farley wrote:
> On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:
> 
>> Since these are ufuncs, I suppose the SSE implementations could just  
>> be
>> put in a separate module, which is always compiled. Before importing  
>> the
>> module, we could simply check from Python side that the CPU supports  
>> the
>> necessary instructions. If everything is OK, the accelerated
>> implementations would then just replace the Numpy routines.
> 
> Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
> CorePy work this summer?
> 
> Looking at his slides again, the speedups are rather impressive. I  
> wonder if these could be usefully integrated into numpy itself?

Yes, my GSoC project is closely related, though I didn't do the CPU 
detection part, that'd be easy to do.  Also I wrote my code specifically 
for 64-bit x86.

I didn't focus so much on the transcendental functions, though they 
wouldn't be too hard to implement.  There's also the possibility to 
provide implementations with differing tradeoffs between accuracy and 
performance.

I think the blog link got posted already, but here's relevant info:

http://numcorepy.blogspot.com
http://www.corepy.org/wiki/index.php?title=CoreFunc

I talked about this in my SciPy talk and up-coming paper, as well.

Also people have just been talking about x86 in this thread -- other 
architectures could be supported too; eg PPC/Altivec or even Cell SPU 
and other accelerators.  I actually wrote a quick/dirty implementation 
of addition and vector normalization ufuncs for Cell SPU recently. Basic 
result is that overall performance is very roughly comparable to a 
similar speed x86 chip, but this is a huge win over just running on the 
extremely slow Cell PPC cores.

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
On Wed, Oct 21, 2009 at 10:14 PM, Pauli Virtanen  wrote:

>
> This type of project could probably also be started outside Numpy, and
> just monkey-patch the Numpy routines on import.

I think I would prefer this approach as a first shot. I will look into
adding a small C library + wrapper in python to know which SIMD
instructions are available to numpy. Then people can reuse this for
whatever approach they prefer.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Sturla Molden
Mathieu Blondel skrev:
> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations.
I think you are confusing SIMD with Intel's MMX/SSE instruction set.

SIMD means "single instruction - multiple data". NumPy is interherently 
an object-oriented SIMD API:

  array1[:] = array2 + array3

is a SIMD instruction by definition.

SIMD instructions in hardware for length-4 vectors are mostly useful for 
3D graphics. But they are not used a lot for that purpose, because GPUs 
are getting common. SSE is mostly for rendering 3D graphics without a 
GPU. There is nothing that prevents NumPy from having a Vector4f dtype, 
that internally stores four float32 and is aligned at 16 byte 
boundaries. But it would not be faster than the current float32 dtype. 
Do you know why?

The reason is that memory access is slow, and computation is fast. 
Modern CPUs are starved. The speed of NumPy is not limited by not using 
MMX/SSE whenever possible. It is limited from having to create and 
delete temporary arrays all the time. You are suggesting to optimize in 
the wrong place. There is a lot that can be done to speed up 
computation: There are optimized BLAS libraries like ATLAS and MKL. 
NumPy uses BLAS for things like matrix multiplication. There are OpenMP 
for better performance on multicores. There are OpenCL and CUDA for 
moving computation from CPUs to GPU. But the main boost you get from 
going from NumPy to hand-written C or Fortran comes from reduced memory use.

> existing discussion here. Memory-alignment is an import related issue
> since non-aligned movs can tank the performance.
>
>   

You can align an ndarray on 16-byte boundary like this:

def aligned_array(N, dtype):
 d = dtype()
 tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8)
 address = tmp.__array_interface__['data'][0]
 offset = (16 - address % 16) % 16
 return tmp[offset:offset+N].view(dtype=dtype)


Sturla Molden










___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Mathieu Blondel
On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden  wrote:
> Mathieu Blondel skrev:
>> Hello,
>>
>> About one year ago, a high-level, objected-oriented SIMD API was added
>> to Mono. For example, there is a class Vector4f for vectors of 4
>> floats and this class implements methods such as basic operators,
>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>> directly using SIMD operations.
> I think you are confusing SIMD with Intel's MMX/SSE instruction set.

OK, I should have said "Object-oriented SIMD API that is implemented
using hardware SIMD instructions".

And when an ISA doesn't allow to perform a specific operation in only
one instruction (say the absolute value of the differences), the
operation can be implemented in terms of other instructions.

> SIMD instructions in hardware for length-4 vectors are mostly useful for
> 3D graphics. But they are not used a lot for that purpose, because GPUs
> are getting common. SSE is mostly for rendering 3D graphics without a
> GPU. There is nothing that prevents NumPy from having a Vector4f dtype,
> that internally stores four float32 and is aligned at 16 byte
> boundaries. But it would not be faster than the current float32 dtype.
> Do you know why?

Yes I know because this has already been explained in this very thread
by someone before you!


Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Robert Kern
On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel  wrote:
> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden  wrote:
>> Mathieu Blondel skrev:
>>> Hello,
>>>
>>> About one year ago, a high-level, objected-oriented SIMD API was added
>>> to Mono. For example, there is a class Vector4f for vectors of 4
>>> floats and this class implements methods such as basic operators,
>>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>>> directly using SIMD operations.
>> I think you are confusing SIMD with Intel's MMX/SSE instruction set.
>
> OK, I should have said "Object-oriented SIMD API that is implemented
> using hardware SIMD instructions".

No, I think you're right. Using "SIMD" to refer to numpy-like
operations is an abuse of the term not supported by any outside
community that I am aware of. Everyone else uses "SIMD" to describe
hardware instructions, not the application of a single syntactical
element of a high level language to a non-trivial data structure
containing lots of atomic data elements.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Robert Kern skrev:
> No, I think you're right. Using "SIMD" to refer to numpy-like
> operations is an abuse of the term not supported by any outside
> community that I am aware of. Everyone else uses "SIMD" to describe
> hardware instructions, not the application of a single syntactical
> element of a high level language to a non-trivial data structure
> containing lots of atomic data elements.
>   
Then you should pick up a book on parallel computing.

It is common to differentiate between four classes of computers: SISD, 
MISD, SIMD, and MIMD machines.

A SISD system is the classical von Neuman machine. A MISD system is a 
pipelined von Neuman machine, for example the x86 processor.

A SIMD system is one that has one CPU dedicated to control, and a large 
collection of subordinate ALUs for computation. Each ALU has a small 
amount of private memory. The IBM Cell processor is the typical SIMD 
machine.

A special class of SIMD machines are the so-called "vector machines", of 
which the most famous is the Cray C90. The MMX and SSE instructions in 
Intel Pentium processors are an example of vector instructions. Some 
computer scientists regard vector machines a subtype of MISD systems, 
orthogonal to piplines, because there are no subordinate ALUs with 
private memory.

MIMD systems multiple independent CPUs. MIMD systems comes in two 
categories: shared-memory processors (SMP) and distributed-memory 
machines (also called cluster computers). The dual- and quad-core x86 
processors are shared-memory MIMD machines.

Many people associate the word SIMD with SSE due to Intel marketing. But 
to the extent that vector machines are MISD orthogonal to piplined von 
Neuman machines, SSE cannot be called SIMD.

NumPy is a software simulated vector machine, usually executed on MISD 
hardware. To the extent that vector machines (such as SSE and C90) are 
SIMD, we must call NumPy an object-oriented SIMD library.


S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Matthieu Brucher
>> OK, I should have said "Object-oriented SIMD API that is implemented
>> using hardware SIMD instructions".
>
> No, I think you're right. Using "SIMD" to refer to numpy-like
> operations is an abuse of the term not supported by any outside
> community that I am aware of. Everyone else uses "SIMD" to describe
> hardware instructions, not the application of a single syntactical
> element of a high level language to a non-trivial data structure
> containing lots of atomic data elements.

I agree with Sturla, for instance nVidia GPUs do SIMD computations
with blocs of 16 values at a time, but the hardware behind can't
compute on so much data at a time. It's SIMD from our point of view,
just like Numpy does ;)

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Matthieu Brucher skrev:
> I agree with Sturla, for instance nVidia GPUs do SIMD computations
> with blocs of 16 values at a time, but the hardware behind can't
> compute on so much data at a time. It's SIMD from our point of view,
> just like Numpy does ;)
>
>   
A computer with a CPU and a GPU is a SIMD machine by definition, due to 
the single CPU and the multiple ALUs in the GPU, which are subordinate 
to the CPU. But with modern computers, these classifications becomes a 
bit unclear.

S.M.




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Mathieu Blondel skrev:
> Peter Norvig suggested to merge Numpy into Cython but he didn't
> mention SIMD as the reason (this one is from me). 

I don't know what Norvig said or meant.

However:

There is NumPy support in Cython. Cython has a general syntax applicable 
to any PEP 3118 buffer. (As NumPy is not yet PEP 3118 compliant, NumPy 
arrays are converted to Py_buffer structs behind the scenes.)

Support for optimized vector expressions might be added later. 
Currently, slicing works as with NumPy in Python, producing slice 
objects and invoking NumPy's own code, instead of being converted to 
fast inlined C.

The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, 
replacing the current C source. That might be what Norvig meant if he 
suggested merging NumPy into Cython.


S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Mathieu Blondel
On Thu, Oct 22, 2009 at 5:05 PM, Sturla Molden  wrote:
> Mathieu Blondel skrev:

> The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k,
> replacing the current C source. That might be what Norvig meant if he
> suggested merging NumPy into Cython.

As I wrote earlier in this thread, I confused Cython and CPython. PN
was suggesting to include Numpy in the CPython  distribution (not
Cython). The reason why was also given earlier.

Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Mathieu Blondel skrev:
> As I wrote earlier in this thread, I confused Cython and CPython. PN
> was suggesting to include Numpy in the CPython  distribution (not
> Cython). The reason why was also given earlier.
>
>   
First, that would currently not be possible, as NumPy does not support 
Py3k. Second, the easiest way to port NumPy to Py3k is Cython, which 
would prevent adoption in the Python standard library. At least they 
have to change their current policy. Also with NumPy in the standard 
library, any modification to NumPy would require a PEP.

But Python should have a PEP 3118 compliant buffer object in the 
standard library, which NumPy could subclass.

S.M.





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Gregor Thalhammer
2009/10/21 Neal Becker 

> ...
> > I once wrote a module that replaces the built in transcendental
> > functions of numpy by optimized versions from Intels vector math
> > library. If someone is interested, I can publish it. In my experience it
> > was of little use since real world problems are limited by memory
> > bandwidth. Therefore extending numexpr with optimized transcendental
> > functions was the better solution. Afterwards I discovered that I could
> > have saved the effort of the first approach since gcc is able to use
> > optimized functions from Intels vector math library or AMD's math core
> > library, see the doc's of -mveclibabi. You just need to recompile numpy
> > with proper compiler arguments.
> >
>
> I'm interested.  I'd like to try AMD rather than intel, because AMD is
> easier to obtain.  I'm running on intel machine, I hope that doesn't matter
> too much.
>
> What exactly do I need to do?
>
I once tried to recompile numpy with AMD's AMCL. Unfortunately I lost the
settings after an upgrade. What I remember: install AMCL, (and read the docs
;-) ), mess with the compiler args (-mveclibabi and related), link with the
AMCL. Then you get faster pow/sin/cos/exp. The transcendental functions of
AMCL also work with Intel processors with the same performance. I did not
try the Intel SVML, which belongs to the Intel compilers.
This is different to the first approach, which is a small wrapper for Intels
VML, put into a python module and which can inject it's ufuncs (via
numpy.set_numeric_ops) into numpy. If you want I can send the package per
private email.


> I see that numpy/site.cfg has an MKL section.  I'm assuming I should not
> touch that, but just mess with gcc flags?
>
This is for using the lapack provided by Intels MKL. These settings are not
related to the above mentioned compiler options.

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Dag Sverre Seljebotn
Robert Kern wrote:
> On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel  wrote:
>   
>> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden  wrote:
>> 
>>> Mathieu Blondel skrev:
>>>   
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
 
>>> I think you are confusing SIMD with Intel's MMX/SSE instruction set.
>>>   
>> OK, I should have said "Object-oriented SIMD API that is implemented
>> using hardware SIMD instructions".
>> 
>
> No, I think you're right. Using "SIMD" to refer to numpy-like
> operations is an abuse of the term not supported by any outside
> community that I am aware of. Everyone else uses "SIMD" to describe
> hardware instructions, not the application of a single syntactical
> element of a high level language to a non-trivial data structure
> containing lots of atomic data elements.
>   
BTW, is there any term for this latter concept that's not SIMD or 
"vector operation"? It would be good to have a word to distinguish this 
concept from both CPU instructions and linear algebra.

(Personally I think describing NumPy as SIMD and use "SSE/MMX" for CPU 
instructions makes best sense, but I'm happy to yield to conventions...)

Dag Sverre

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Robert Ferrell

On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote:

> Robert Kern skrev:
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> Then you should pick up a book on parallel computing.
>
> It is common to differentiate between four classes of computers: SISD,
> MISD, SIMD, and MIMD machines.
>
> A SISD system is the classical von Neuman machine. A MISD system is a
> pipelined von Neuman machine, for example the x86 processor.
>
> A SIMD system is one that has one CPU dedicated to control, and a  
> large
> collection of subordinate ALUs for computation. Each ALU has a small
> amount of private memory. The IBM Cell processor is the typical SIMD
> machine.
>
> A special class of SIMD machines are the so-called "vector  
> machines", of
> which the most famous is the Cray C90. The MMX and SSE instructions in
> Intel Pentium processors are an example of vector instructions. Some
> computer scientists regard vector machines a subtype of MISD systems,
> orthogonal to piplines, because there are no subordinate ALUs with
> private memory.
>
> MIMD systems multiple independent CPUs. MIMD systems comes in two
> categories: shared-memory processors (SMP) and distributed-memory
> machines (also called cluster computers). The dual- and quad-core x86
> processors are shared-memory MIMD machines.
>
> Many people associate the word SIMD with SSE due to Intel marketing.  
> But
> to the extent that vector machines are MISD orthogonal to piplined von
> Neuman machines, SSE cannot be called SIMD.
>
> NumPy is a software simulated vector machine, usually executed on MISD
> hardware. To the extent that vector machines (such as SSE and C90) are
> SIMD, we must call NumPy an object-oriented SIMD library.

This is not the terminology I am familiar with.  Calling NumPy an "  
object-oriented SIMD library" is very confusing for me.  I worked in  
the parallel computer world for a while (back in the dark ages) and  
this terminology would have been confusing to everyone I dealt with.   
I've also read many parallel computing books.  In my experience SIMD  
refers to hardware, not software.  There is no reason that NumPy can't  
be written to run great (get good speed-ups) on an 8-core shared  
memory system.  That would be a MIMD system, and there's nothing about  
it that doesn't fit with the NumPy abstraction.  And, although SIMD  
can be a subset of MIMD, there are things that can be done in NumPy  
that be parallelized on MIMD machines but not on SIMD machines (e.g.  
the NumPy vector type is flexible enough it can store a list of tasks,  
and the operations on that vector can be parallelized easily on a  
shared memory MIMD machine - task parallelism - but not on a SIMD  
machine).

If we say that  "NumPy is a software simulated vector machine" or an "  
object-oriented SIMD library" we are pigeonholing NumPy in a way which  
is too limiting and isn't accurate.  As a user it feels to me that  
NumPy is built around various algebra abstractions, many of which map  
well onto vector machine operations.  That means that many of the  
operations are amenable to efficient implementation on SIMD hardware.   
But, IMO, one of the nice features of NumPy is it is built around high- 
level operations, and I would hate to see the project go down a path  
which insists that everything in NumPy be efficient on all SIMD  
hardware.

Of course, I would also love to see implementations which take as much  
advantage of available HW as possible (e.g. exploit SIMD HW if  
available).

That's my $0.02, worth only a couple cents less than that.

-robert

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Robert Kern
On Thu, Oct 22, 2009 at 02:35, Sturla Molden  wrote:
> Robert Kern skrev:
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> Then you should pick up a book on parallel computing.

I would be delighted to see a reference to one that refers to a high
level language's API as SIMD. Please point one out to me. It's
certainly not any of the ones I have available to me.

> It is common to differentiate between four classes of computers: SISD,
> MISD, SIMD, and MIMD machines.
>
> A SISD system is the classical von Neuman machine. A MISD system is a
> pipelined von Neuman machine, for example the x86 processor.
>
> A SIMD system is one that has one CPU dedicated to control, and a large
> collection of subordinate ALUs for computation. Each ALU has a small
> amount of private memory. The IBM Cell processor is the typical SIMD
> machine.
>
> A special class of SIMD machines are the so-called "vector machines", of
> which the most famous is the Cray C90. The MMX and SSE instructions in
> Intel Pentium processors are an example of vector instructions. Some
> computer scientists regard vector machines a subtype of MISD systems,
> orthogonal to piplines, because there are no subordinate ALUs with
> private memory.
>
> MIMD systems multiple independent CPUs. MIMD systems comes in two
> categories: shared-memory processors (SMP) and distributed-memory
> machines (also called cluster computers). The dual- and quad-core x86
> processors are shared-memory MIMD machines.
>
> Many people associate the word SIMD with SSE due to Intel marketing. But
> to the extent that vector machines are MISD orthogonal to piplined von
> Neuman machines, SSE cannot be called SIMD.

That's a fair point, but unrelated to whether or not numpy can be
labeled SIMD. These all refer to hardware.

> NumPy is a software simulated vector machine, usually executed on MISD
> hardware. To the extent that vector machines (such as SSE and C90) are
> SIMD, we must call NumPy an object-oriented SIMD library.

numpy does not "simulate" anything. It is an object-oriented library.
If numpy could be said to "simulate" a vector machine, than just about
any object-oriented library that overloads operators could. It creates
a false equivalence between numpy and software that actually does
simulate hardware.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Robert Kern
On Thu, Oct 22, 2009 at 06:20, Dag Sverre Seljebotn
 wrote:
> Robert Kern wrote:
>> On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel  wrote:
>>
>>> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden  wrote:
>>>
 Mathieu Blondel skrev:

> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations.
>
 I think you are confusing SIMD with Intel's MMX/SSE instruction set.

>>> OK, I should have said "Object-oriented SIMD API that is implemented
>>> using hardware SIMD instructions".
>>>
>>
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> BTW, is there any term for this latter concept that's not SIMD or
> "vector operation"? It would be good to have a word to distinguish this
> concept from both CPU instructions and linear algebra.

Of course, "vector instruction" and "vectorized operation" sometimes
also refer to the CPU instructions. :-)

I don't think you will get much better than "vectorized operation",
though. While it's ambiguous, it has a long history in the high level
language world thanks to Matlab.

> (Personally I think describing NumPy as SIMD and use "SSE/MMX" for CPU
> instructions makes best sense, but I'm happy to yield to conventions...)

Well, "SSE/MMX" is also too limiting. Altivec instructions are also in
the same class, and we should be able to use them on PPC platforms.
Regardless of the origin of the term, "SIMD" is used to refer to all
of these instructions in common practice. Sturla may be right in some
prescriptive sense, but descriptively, he's quite wrong.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Robert Kern skrev:
> I would be delighted to see a reference to one that refers to a high
> level language's API as SIMD. Please point one out to me. It's
> certainly not any of the ones I have available to me.
>
>   
Numerical Receipes in Fortran 90, page 964 and 985-986, describes the 
syntax of Fortran 90 and 95 as SIMD.

Peter Pacheco's book on MPI describes the difference between von Neumann 
machines and vector machines as analogous to the difference between 
Fortran77 and Fortran 90 (with an example from Fortran90 array slicing). 
He is ambigous as to whether vector machines really are SIMD, or more 
related to pipelined von Neumann machines.

Grama et al. "Introduction to Parallel Computing" describes SIMD as an 
"architecture", but it is more or less clear that the mean hardware. 
They do say the Fortran 90 "where statement" is a primitive used to 
support selective execution on SIMD processors, as conditional execution 
(if statements) are detrimental to performance.

So at least we here have three books claiming that Fortran is a language 
with special primities for SIMD processors.

>
> That's a fair point, but unrelated to whether or not numpy can be
> labeled SIMD. These all refer to hardware.
>   
Actually I don't think the distinction is that important as we are 
taking about Turing machines. Also, a lot of what we call "hardware" is 
actually implemented  as software on the chip: The most extreme example 
would be Transmeta, which completely software emulated x86 processors. 
The vague distinction between hardware and software is why we get 
patents on software in Europe, although pure software patents are 
prohibited. One can always argue that the program and the computer 
together constitutes a physical device; and circumventing patents by 
moving hardware into software should not be allowed. The distinction 
between hardware and software is not as clear as programmers tend to 
believe.

Another thing is that performance issues for vector machines and "vector 
languages" (Fortran 90, Matlab, NumPy) are similar. Precisely the same 
situations that makes NumPy and Matlab code slow are detrimental on 
SIMD/vector hardware. That would for example be long for loops with 
conditional if statements. On the other hand, vectorized operations over 
arrays, possibly using where/find masks, are fast. So although NumPy is 
not executed on a vector machine like the Cray C90, it certainly behaves 
like one performance wise.

I'd say that a MIMD machine running NumPy is a Turing machine emulating 
a SIMD/vector machine.

And now I am done with this stupid discussion...


Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion