I don't see that sorting the data helps.
The answer has to be all the associations. In this case the answer has to
be:
a , b --> it was a error in the question, sorry.
b , d
c , d
x , y
y , y

I feel like all the data which is associate should be in the same executor.
On this case if I order the inputs.
a , b
x , y
b , c
y , y
c , d
--> to
a , b
b , c
c , d
x , y
y , y

Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one and
so on.
I could get the relation between "a,b,c", but not about "d" with "a,b,c",
am I wrong? I hope to be wrong!.

It seems that it could be done with GraphX, but as you said, it seems a
little bit overhead.


2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>:

> Guillermo,
> I think you're after an associative algorithm where A is ultimately
> associated with D, correct? Jakob would correct if that is a typo--a sort
> would be all that is necessary in that case.
>
> I believe you're looking for something else though, if I understand
> correctly.
>
> This seems like a similar algorithm to PageRank, no?
> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py
> Except return the "neighbor" itself, not the necessarily the rank of the
> page.
>
> If you wanted to, use Scala and Graphx for this problem. Might be a bit of
> overhead though: Construct a node for each member of each tuple with an
> edge between. Then traverse the graph for all sets of nodes that are
> connected. That result set would quickly explode in size, but you could
> restrict results to a minimum N connections. I'm not super familiar with
> Graphx myself, however. My intuition is saying 'graph problem' though.
>
> Thoughts?
>
>
> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com> wrote:
>
>> Hi Guillermo,
>> assuming that the first "a,b" is a typo and you actually meant "a,d",
>> this is a sorting problem.
>>
>> You could easily model your data as an RDD or tuples (or as a
>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
>> methods.
>>
>> best,
>> --Jakob
>>
>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com>
>> wrote:
>> > I want to do some algorithm in Spark.. I know how to do it in a single
>> > machine where all data are together, but I don't know a good way to do
>> it in
>> > Spark.
>> >
>> > If someone has an idea..
>> > I have some data like this
>> > a , b
>> > x , y
>> > b , c
>> > y , y
>> > c , d
>> >
>> > I want something like:
>> > a , d
>> > b , d
>> > c , d
>> > x , y
>> > y , y
>> >
>> > I need to know that a->b->c->d, so a->d, b->d and c->d.
>> > I don't want the code, just an idea how I could deal with it.
>> >
>> > Any idea?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to