On 2 November 2011 10:24, Sebastian Schelter <s...@apache.org> wrote:
> As you might know I recently started an experimental graph mining
> module. I was already concerned at the beginning of this whether
> MapReduce is really a suitable platform for (most) graph algorithms.
>
> I'm not content with the performance of the algorithms after some
> testing and I'm pretty sure the future of large scale graph processing
> is not on MapReduce (but hopefully on a Pregel like platform such as
> Giraph).
>
> As we're currently removing clutter and trying to concentrate on the
> core algorithms, I suggest to remove all graph algorithms with the
> exception of PageRank.
>
> If no one objects with this, I'll start the cleanup in a few days.

It all depends what you mean by 'graph algorithms', as Jake more or
less says. I take your point re shortest paths etc. However it would
be a mistake I think to send out a message that Mahout isn't good for
consuming graph data, even while Hadoop certainly has issues with some
kinds of graph-processing.

All this can be something of a matter of perspective and descriptive
gloss. Much of the work of the recommender / Taste component of Mahout
can be thought of (and marketed as?) consuming a specialist flavour of
graph data. Something like an 'interest graph' (a
http://en.wikipedia.org/wiki/Bipartite_graph) where the nodes are
items or users, and the affinities/associations are indications of
interest (possibly date-stamped, possibly weighted).

I work a lot with factual graph data expressed in W3C RDF form; in
this case our 'graph' has nodes that are entities or atomic values,
and links that are different typed links, representing relationship
types, attributes/properties etc.  Depending on the task in hand this
can be consumed in Mahout by munging it into recommendations format
input, or as with CSV input, into vectors, etc. So again it's 'graph'
data processing even if the processing paradigm isn't from graph
theory.

Finally the spectral clustering piece of Mahout also takes graph input
(affinities) and there are decades of research papers that account for
this in terms of eigenvectors/values of laplacian representations of
the graph affinity matrix; so I'd also count that as a Mahout tool for
(I guess 'lossy' in Jake's terminology) graph processing.

Or am I being too marketing-minded here? Is it fair to say "Mahout is
a toolkit that can do specific useful things with various forms of
graph-shaped data, but isn't a general-purpose graph processing
environment"?

Dan

Reply via email to