Re: [sage-devel] Re: Stampede Cluster with Intel's Many Integrated Core (MIC) coprocessors

Thierry Dumont Sat, 05 Jan 2013 09:00:51 -0800

Le 05/01/2013 16:23, Volker Braun a écrit :

Fundamentally, the Xeon Phi programming model is not really that much
different from OpenCL/Cuda. You send data to the coprocessor card, run
some code there, and pull back the result to the host CPU. It doesn't
speed up anything that is not specifically targeted at the coprocessor
card.


If you want to use it, you first of all need a problem that is
sufficiently parallelizable. Write Xeon Phi code in C/C++, compile it
with the special compiler, wrap it into a shared library, load it into
Cython/Python.

The Intel MKL basically does that, so if we get around to implementing
the proposal that I wrote earlier then at least linear algebra would be
sped up on stampede.

--
You received this message because you are subscribed to the Google
Groups "sage-devel" group.
To post to this group, send email to sage-devel@googlegroups.com.
To unsubscribe from this group, send email to
sage-devel+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel?hl=en.

I have a small experience with the Xeon Phi (alias MIC): a pure C++code, about 25000 instructions was ported in 10 minutes on the Xeon Phi:once the code is working on a classical Intel machine with an Intelcompiler), it works on the Xeon Phi (we have a joint project with peoplefrom Intel to port numerical projects on this platform). This is niceand very impressive. Also, complicated data structures can be used: so,it seems very nice and easy.

But the devil is waiting for you: getting good performances is much moredifficult, as everyone can imagine: my code is build with the TBBlibrary which seems to be a (the?) good choice for this architecture: atthe the first execution, the code was running 2 times slower than onclassical Intel machines (Sandy Bridge). The problem with performances,is that:1) you must be sure to have permanently more than 60 threads availablefor running,2) you must absolutely use the vector unit (512 bits), and this is notso easy: what can be vectorized in Sage's libs?. Ok, the sources willremain in C, C++, but vectorizing means often rewrite a marge part ofthe code.

Remember also that the Xeon has only 8gb of ram.

One of the port we tried was a classical EDO solver (Radau5) recoded inC++: the code evaluates the Jacobian matrix of some f: IR^n->IR^n byfinite differences: this can be vectorized, this is not too difficult,but it works better if n is a multiple of 8 (because 512= 8x64).

But altogether, developing on this architecture is much more classicalthan developing with cuda: for the old guys like me it's a bit likeprogramming on Cray machine in 1990....this is quite nice.

If there is some project to do something around Sage and Xeon Phi, I aminterested (we will by 2 this year). But which project?


t.d.

--
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To post to this group, send email to sage-devel@googlegroups.com.
To unsubscribe from this group, send email to 
sage-devel+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel?hl=en.

<<attachment: tdumont.vcf>>

Re: [sage-devel] Re: Stampede Cluster with Intel's Many Integrated Core (MIC) coprocessors

Reply via email to