Re: Distributed Graph Algorithms

Neal Clark Thu, 10 Jun 2010 18:09:50 -0700

The  problem is too many duplicate edges but the fact that as the algorithm
grows the number of edges adjacent to the minim vertexes increases. In a
worst case scenario the graph is completely connected and only has a single
component. So far I have not figured out an efficient method of partitioning
the edges of a single vertex while still passing on the minimum vertex to
each partition.

The twitter dataset has approximately 44 million vertexes. Even with a graph
this large the time necessary to process 44 million edges isn't too long.
One advantage to the sorting approach is that duplicate edges become trivial
to detect.

Once we are able to process the twitter dataset we can work on further
optimizations.

Thanks,

Neal.

On Thu, Jun 10, 2010 at 5:31 PM, Ted Dunning <[email protected]> wrote:

> Would a combiner help?
>
> On Thu, Jun 10, 2010 at 5:27 PM, Neal Clark <[email protected]> wrote:
>
> > Using this approach it should be possible to use do the bulk of the
> > algorithm in the Mapper and then use the SecondarySort Reducer to sort
> the
> > output.
> >
>

Re: Distributed Graph Algorithms

Reply via email to