Re: Thoughts on parallel programming?

Fawzi Mohamed Fri, 12 Nov 2010 03:50:51 -0800


On 11-nov-10, at 20:41, Russel Winder wrote:

On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote:
[ . . . ]
on this I am not so sure, heterogeneous clusters are more difficulttoprogram, and GPU & co are slowly becoming more and more generalpurpose.
Being able to take advantage of those is useful, but I am not
convinced they are necessarily the future.
The Intel roadmap is for processor chips that have a number of cores
with different architectures. Heterogeneity is not going going tobe achoice, it is going to be an imposition. And this is at bus level,not
at cluster level.

Vector co processors, yes I see that, and short term the effect ofthings like AMD fusion (CPU/GPU merging).Is this necessarily the future? I don't know, neither does intel Ithink, as they are still evaluating larabee.

But CPU/GPU will stay around fro some time more for sure.

[ . . . ]
yes many core is the future I agree on this, and also thatdistributed
approach is the only way to scale to a really large number of
processors.
Bud distributed systems *are* more complex, so I think that for the
foreseeable future one will have a hybrid approach.
Hybrid is what I am saying is the future whether we like it or not.SMP
as the whole system is the past.

I disagree that distributed systems are more complex per se. Isuspectcomments are getting so general here that anything anyone writes canbe
seen as both true and false simultaneously.  My perception is that
shared memory multithreading is less and less a tool that applications
programmers should be thinking in terms of. Multiple processes withan
hierarchy of communications costs is the overarching architecture with
each process potentially being SMP or CSP or . . .

I agree that on not too large shared memory machines a hierarchy oftasks is the correct approach.This is what I did in blip.parallel.smp. Using that one can havefairly efficient automatic scheduling, and so forget most of thecomplexities, and actual hardware configuration.

again not sure the situation is as dire as you paint it, Linux does
quite well in the HPC field... but I agree that to be the ideal OSfor
these architectures it will need more changes.


The Linux driver architecture is already creaking at the seams, it
implies a central monolithic approach to operating system.  This falls
down in a multiprocessor shared memory context.  The fact that the Top
500 generally use Linux is because it is the least worst option.  M$
despite throwing large amounts of money at the problem, and indeed
bought some very high profile names to try and do something about the
lack of traction, have failed to make any headway in the HPC operating
system stakes.  Do you want to have to run a virus checker on your HPC
system?

My gut reaction is that we are going to see a rise of hypervisors asper

Tilera chips, at least in the short to medium term, simply as a bridge
from the now OSes to the future.  My guess is that L4 microkernels

and/or nanokernels, exokernels, etc. will find a central place infuturesystems. The problem to be solved is ensuring that the appropriateABIis available on the appropriate core at the appropriate time.Mobility

of ABI is the critical factor here.

yes microkernels& co will be more and more important (but I wonder howmuch this will be the case for the desktop).ABI mobility?not so sure, for hpc I can imagine having to compile todifferent ABIs (but maybe that is what you mean with ABI mobility)

[ . . . ]
Whole array operation are useful, and when possible one gains much
using them, unfortunately not all problems can be reduced to fewlarge
array operations, data parallel languages are not the main type of
language for these reasons.
Agreed. My point was that in 1960s code people explicitly handledarray
operations using do loops because they had to.  Nowadays such code is
anathema to efficient execution. My complaint here is that peoplehaveput effort into compiler technology instead of rewriting the codesin abetter language and/or idiom. Clearly whole array operations onlyapply
to algorithms that involve arrays!

[ . . . ]
well whole array operations are a generalization of the SPMDapproach,so I this sense you said that that kind of approach will have afuture(but with a more difficult optimization as the hardware is morecomplex.
I guess this is where the PGAS people are challenging things.
Applications can be couched in terms of array algorithms which can be
scattered across distributed memory systems.  Inappropriate operations
lead to huge inefficiencies, but handles correctly, code runs very
fast.
About MPI I think that many don't see what MPI really does, mpioffers
a simplified parallel model.
The main weakness of this model is that it assumes some kind of
reliability, but then it offers
a clear computational model with processors ordered in a linear of
higher dimensional structure and efficient collective communication
primitives.
Yes MPI is not the right choice for all problems, but when usable it
is very powerful, often superior to the alternatives, and programming
with it is *simpler* than thinking about a generic distributedsystem.
So I think that for problems that are not trivially parallel, or
easily parallelizable MPI will remain as the best choice.
I guess my main irritant with MPI is that I have to run the same
executable on every node and, perhaps more importantly, the message
passing structure is founded on Fortran primitive data types. OK soyoucan hack up some element of abstraction so as to send complexmessages,
but it would be far better if the MPI standard provided better
abstractions.

PGAS and MPI both have the same executable everywhere, but MPI is moreflexible, with respect of making different part execute differentthings, and MPI does provide more generic packing/unpacking, but Iguess I see you problems with it.Having the same executable is a big constraint, but is also asimplification.

[ . . . ]

It might be a personal thing, but I am kind of "suspicious" toward
PGAS, I find a generalized MPI model better than PGAS when you wantto
have separated address spaces.
Using MPI one can define a PGAS like object wrapping local storage
with an object that sends remote requests to access remote memory
pieces.
This means having a local server where this wrapped objects can be
"published" and that can respond in any moment to externalrequests. Icall this rpc (remote procedure call) and it can be realized easilyon
the top of MPI.
As not all objects are distributed and in a complex program it does
not always makes sense to distribute these objects on all processors
or none, I find that the robust partitioning and collective
communication primitives of MPI superior to PGAS.
With enough effort you probably can get everything also from PGAS,but
then you loose all its simplicity.

I think we are going to have to take this one off the list. Mysummary

is that MPI and PGAS solve different problems differently.  There are
some problems that one can code up neatly in MPI and that are ugly in
PGAS, but the converse is also true.

Yes I guess that is true

[ . . . ]

The situation is not so dire, some problems are trivially parallel,or

can be solved with simple parallel patterns, others don't need to be
solved in parallel, as the sequential solution if fast enough, but I
do agree that being able to develop parallel systems is increasingly
important.
In fact it is something that I like to do, and I thought about a lot.
I did program parallel systems, and out of my experience I tried to

build something to do parallel programs "the way it should be", orat

least the way I would like it to be ;)


The real question is whether future computers will run Word,
OpenOffice.org, Excel, Powerpoint fast enough so that people don't
complain.  Everything else is an HPC ghetto :-)

The result is what I did with blip, http://dsource.org/projects/blip .

I don't think that (excluding some simple examples) fully automatic
(trasparent) parallelization is really feasible.
At some point being parallel is more complex, and it puts an extra
burden on the programmer.

Still it is possible to have several levels of parallelization, andif

you program a fully parallel program it should still be possible to
use it relatively efficiently locally, but a local program will not
automatically become fully parallel.


At the heart of all this is that programmers are taught that algorithm

is a sequence of actions to achieve a goal. Programmers are trainedto

think sequentially and this affects their coding.  This means that
parallelism has to be expressed at a sufficiently high level that
programmers can still reason about algorithms as sequential things.

when you have a network of things communicating (I think that once youhave a distributed system you come at that level) then i is notsufficient anymore to think about each piece in isolation, you have tothink about the interactions too.There are some patterns that might help reduce the complexity: client/server, map/reduce,.... but in general it is more complex.

Re: Thoughts on parallel programming?

Reply via email to