Re: How could I do this algorithm in Spark?

James Barney Wed, 24 Feb 2016 20:43:38 -0800

Guillermo,
I think you're after an associative algorithm where A is ultimately
associated with D, correct? Jakob would correct if that is a typo--a sort
would be all that is necessary in that case.

I believe you're looking for something else though, if I understand
correctly.

This seems like a similar algorithm to PageRank, no?
https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py
Except return the "neighbor" itself, not the necessarily the rank of the
page.

If you wanted to, use Scala and Graphx for this problem. Might be a bit of
overhead though: Construct a node for each member of each tuple with an
edge between. Then traverse the graph for all sets of nodes that are
connected. That result set would quickly explode in size, but you could
restrict results to a minimum N connections. I'm not super familiar with
Graphx myself, however. My intuition is saying 'graph problem' though.

Thoughts?

On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com> wrote:

> Hi Guillermo,
> assuming that the first "a,b" is a typo and you actually meant "a,d",
> this is a sorting problem.
>
> You could easily model your data as an RDD or tuples (or as a
> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
> methods.
>
> best,
> --Jakob
>
> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
> > I want to do some algorithm in Spark.. I know how to do it in a single
> > machine where all data are together, but I don't know a good way to do
> it in
> > Spark.
> >
> > If someone has an idea..
> > I have some data like this
> > a , b
> > x , y
> > b , c
> > y , y
> > c , d
> >
> > I want something like:
> > a , d
> > b , d
> > c , d
> > x , y
> > y , y
> >
> > I need to know that a->b->c->d, so a->d, b->d and c->d.
> > I don't want the code, just an idea how I could deal with it.
> >
> > Any idea?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: How could I do this algorithm in Spark?

Reply via email to