Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done,
but anyone have thoughts on this as future direction?  Thank you.

Any programming language that cannot be used to program applications
running on a heterogeneous collection of processors, including CPUs and
GPUs as computational devices, on a single chip, with there being many
such chips on a board, possibly clustered, doesn't have much of a
future.  Timescale 5--10 years.

on this I am not so sure, heterogeneous clusters are more difficult to program, and GPU & co are slowly becoming more and more general purpose. Being able to take advantage of those is useful, but I am not convinced they are necessarily the future.

Intel's 80-core, 48-core and 50-core devices show the way server,
workstation and laptop architectures are going.  There may be a large
central memory unit as now, but it will be secondary storage not primary storage. All the chip architectures are shifting to distributed memory -- basically cache coherence is too hard a problem to solve, so instead
of solving it, they are getting rid of it.  Also the memory bus stops
being the bottleneck for computations, which is actually the biggest
problem with current architectures.

yes many core is the future I agree on this, and also that distributed approach is the only way to scale to a really large number of processors. Bud distributed systems *are* more complex, so I think that for the foreseeable future one will have a hybrid approach.

Windows, Linux and Mac OS X have a serious problem and will either die
or be revolutionized.  Apple at least recognize the issue, hence they
pushed OpenCL.

again not sure the situation is as dire as you paint it, Linux does quite well in the HPC field... but I agree that to be the ideal OS for these architectures it will need more changes.

Actor model, CSP, dataflow, and similar distributed memory/process- based
architectures will become increasingly important for software.  There
will be an increasing move to declarative expression, but I doubt
functional languages will ever make the main stream. The issue here is that parallelism generally requires programmers not to try and tell the computer every detail how to do something, but instead specify the start
and end conditions and allow the runtime system to handle the
realization of the transformation. Hence the move in Fortran from lots
of "do" loops to "whole array" operations.

Whole array operation are useful, and when possible one gains much using them, unfortunately not all problems can be reduced to few large array operations, data parallel languages are not the main type of language for these reasons.

MPI and all the SPMD approaches have a severely limited future, but I
bet the HPC codes are still using Fortran and MPI in 50 years time.

well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex.

About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher dimensional structure and efficient collective communication primitives. Yes MPI is not the right choice for all problems, but when usable it is very powerful, often superior to the alternatives, and programming with it is *simpler* than thinking about a generic distributed system. So I think that for problems that are not trivially parallel, or easily parallelizable MPI will remain as the best choice.

You mentioned Chapel and X10, but don't forget the other one of the
original three HPCS projects, Fortress.  Whilst all three are PGAS
(partitioned global address space) languages, Fortress takes a very
different viewpoint compared to Chapel and X10.

It might be a personal thing, but I am kind of "suspicious" toward PGAS, I find a generalized MPI model better than PGAS when you want to have separated address spaces. Using MPI one can define a PGAS like object wrapping local storage with an object that sends remote requests to access remote memory pieces. This means having a local server where this wrapped objects can be "published" and that can respond in any moment to external requests. I call this rpc (remote procedure call) and it can be realized easily on the top of MPI. As not all objects are distributed and in a complex program it does not always makes sense to distribute these objects on all processors or none, I find that the robust partitioning and collective communication primitives of MPI superior to PGAS. With enough effort you probably can get everything also from PGAS, but then you loose all its simplicity.

The summary of the summary is:  programmers will either be developing
parallelism systems or they will be unemployed.

The situation is not so dire, some problems are trivially parallel, or can be solved with simple parallel patterns, others don't need to be solved in parallel, as the sequential solution if fast enough, but I do agree that being able to develop parallel systems is increasingly important.
In fact it is something that I like to do, and I thought about a lot.
I did program parallel systems, and out of my experience I tried to build something to do parallel programs "the way it should be", or at least the way I would like it to be ;)

The result is what I did with blip, http://dsource.org/projects/blip .
I don't think that (excluding some simple examples) fully automatic (trasparent) parallelization is really feasible. At some point being parallel is more complex, and it puts an extra burden on the programmer. Still it is possible to have several levels of parallelization, and if you program a fully parallel program it should still be possible to use it relatively efficiently locally, but a local program will not automatically become fully parallel.

What I did is a basic smp parallelization for programs with shared memory. This level tries to schedule efficiently independent recursive tasks using all processors as efficiently as possible (using the topology detected by libhwloc. It leverages an event based framework (libev) to avoid blocking waiting for external tasks. The ability to describe complex asynchronous processes can be very useful also to work with GPUs.

mpi parallelization is part of the hierarchy of parallelization, for the reasons I described before, it is wrapped so that on a single processor one can use a "pseudo" mpi.

rpc (remote procedure call) might be better described as distributed objects, offers a server that can responds to external requests at any moment and the possibility to publish objects that will be then identified by urls. There urls can be used to create local proxies that call the remote object and get results from it.
This can be done using mpi, or directly sockets.
If one uses sockets he has the whole flexibility (but also the whole complexity) of a fully distributed system. The basic building blocks of this can be used also in a distributed protocol like distributed hashtables.

blip is available now, and works with osx and linux. It should be possible to port it to windows, (both libhwloc and libev work on windows), but I didn't do it. It needs D1 and tango, tango trunk can be compiled using the scripts in blip/buildTango, and then programs using blip can be compiled more easily with the dbuild script (that uses xfbuild behind the scenes).

I planned to make an official release this w.e., but you can look already now, the code is all there...


