Oh, the letters were just an example, it could be:
a , t
b, o
t, k
k, c

So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o
I don't know if you were thinking about sortBy because the another example
where letter were consecutive.


2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:

> I don't see that sorting the data helps.
> The answer has to be all the associations. In this case the answer has to
> be:
> a , b --> it was a error in the question, sorry.
> b , d
> c , d
> x , y
> y , y
>
> I feel like all the data which is associate should be in the same executor.
> On this case if I order the inputs.
> a , b
> x , y
> b , c
> y , y
> c , d
> --> to
> a , b
> b , c
> c , d
> x , y
> y , y
>
> Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one
> and so on.
> I could get the relation between "a,b,c", but not about "d" with "a,b,c",
> am I wrong? I hope to be wrong!.
>
> It seems that it could be done with GraphX, but as you said, it seems a
> little bit overhead.
>
>
> 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>:
>
>> Guillermo,
>> I think you're after an associative algorithm where A is ultimately
>> associated with D, correct? Jakob would correct if that is a typo--a sort
>> would be all that is necessary in that case.
>>
>> I believe you're looking for something else though, if I understand
>> correctly.
>>
>> This seems like a similar algorithm to PageRank, no?
>> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py
>> Except return the "neighbor" itself, not the necessarily the rank of the
>> page.
>>
>> If you wanted to, use Scala and Graphx for this problem. Might be a bit
>> of overhead though: Construct a node for each member of each tuple with an
>> edge between. Then traverse the graph for all sets of nodes that are
>> connected. That result set would quickly explode in size, but you could
>> restrict results to a minimum N connections. I'm not super familiar with
>> Graphx myself, however. My intuition is saying 'graph problem' though.
>>
>> Thoughts?
>>
>>
>> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com> wrote:
>>
>>> Hi Guillermo,
>>> assuming that the first "a,b" is a typo and you actually meant "a,d",
>>> this is a sorting problem.
>>>
>>> You could easily model your data as an RDD or tuples (or as a
>>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
>>> methods.
>>>
>>> best,
>>> --Jakob
>>>
>>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com>
>>> wrote:
>>> > I want to do some algorithm in Spark.. I know how to do it in a single
>>> > machine where all data are together, but I don't know a good way to do
>>> it in
>>> > Spark.
>>> >
>>> > If someone has an idea..
>>> > I have some data like this
>>> > a , b
>>> > x , y
>>> > b , c
>>> > y , y
>>> > c , d
>>> >
>>> > I want something like:
>>> > a , d
>>> > b , d
>>> > c , d
>>> > x , y
>>> > y , y
>>> >
>>> > I need to know that a->b->c->d, so a->d, b->d and c->d.
>>> > I don't want the code, just an idea how I could deal with it.
>>> >
>>> > Any idea?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Reply via email to