On Fri, Jan 15, 2010 at 6:59 AM, Philipp K. Janert <[email protected]> wrote:

>
> Computational Linear Algebra is an incredibly
> well established field.
>

Indeed.


> Now we have M/R as a real option for day-to-day
> work.
>

Yes.


> But I could imagine that the best way to use M/R
> to do computational Linear Algebra might not
> consist of "naive" adaptions of classical algorithms
> to M/R, but in the development of different algos,
> that take advantage of M/R, or in the rediscovery
> of existing algos that are of little importance in
> classical applications, but that all of a sudden
> make a lot of sense in a M/R context.
>

Absolutely.

Another thing that is happening is that the important problems are different
at the scale of map-reduce.  Once upon a time, solving 300x300 dense systems
was very important.  That makes no sense on a large cluster because the
overhead dominates the computation.  Better to use a single core of what is
now a single machine.

Right now, the major driver of numerical computation by users of map-reduce
clusters is machine learning on sparse data.  For that problem, approximate
SVD is a very important problem.  This has implications in graph algorithms,
search and recommendations at the least.  The traditional approach to this
is Lanczos' algorithm or other Krylov subspace technique.  These algorithms
are very difficult to parallelize at the level of map-reduce on
shared-nothing processors.

On the other hand, the stochastic decomposition algorithms appear ideally
suited to map-reduce.

This is exactly an example of what you hypothesize above ... poorly known
algorithms or facts suddenly gain new prominence in a new computational
paradigm.

Counter example of well-known algorithms that port very well to map-reduce
with few changes are all the flavors of the E-M algorithm.  K-means for
instance is very easy to port and runs very well in a distributed
environment.

Other algorithms for which no map-reduce efficient parallel is known include
stochastic gradient descent, the Pegasos fast SVM algorithm,
Rao-Blackwellized Gibb's sampling for Latent Dirichlet Allocation, many
Metropolis algorithms and many others.  The problems are often not the
obvious compute/communicate ratio in these problems, but rather that since
the algorithms are highly iterative, it is difficult to get different
processors to do sufficiently distinct work so that their efforts add up to
significantly more than what one highly threaded machine could do.


So, I am wondering whether people in the numerical
> analysis community have started to think in this
> direction and what they have come up with so far
> (if anything).
>

I don't think that the numerical analysis community has turned their
attention much to map-reduce yet.


> This is not a question about Mahout, and I admit that
> it is therefore a little OT for this list, on the other hand,
> I would assume that the members of this list would be
> the first people to know about progress in this area...
>

It is entirely on-topic.  The biggest news I know of is the stochastic
decomposition stuff that I mentioned before.

Reply via email to