Oh, the letters were just an example, it could be: a , t b, o t, k k, c So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o I don't know if you were thinking about sortBy because the another example where letter were consecutive.
2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>: > I don't see that sorting the data helps. > The answer has to be all the associations. In this case the answer has to > be: > a , b --> it was a error in the question, sorry. > b , d > c , d > x , y > y , y > > I feel like all the data which is associate should be in the same executor. > On this case if I order the inputs. > a , b > x , y > b , c > y , y > c , d > --> to > a , b > b , c > c , d > x , y > y , y > > Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one > and so on. > I could get the relation between "a,b,c", but not about "d" with "a,b,c", > am I wrong? I hope to be wrong!. > > It seems that it could be done with GraphX, but as you said, it seems a > little bit overhead. > > > 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>: > >> Guillermo, >> I think you're after an associative algorithm where A is ultimately >> associated with D, correct? Jakob would correct if that is a typo--a sort >> would be all that is necessary in that case. >> >> I believe you're looking for something else though, if I understand >> correctly. >> >> This seems like a similar algorithm to PageRank, no? >> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py >> Except return the "neighbor" itself, not the necessarily the rank of the >> page. >> >> If you wanted to, use Scala and Graphx for this problem. Might be a bit >> of overhead though: Construct a node for each member of each tuple with an >> edge between. Then traverse the graph for all sets of nodes that are >> connected. That result set would quickly explode in size, but you could >> restrict results to a minimum N connections. I'm not super familiar with >> Graphx myself, however. My intuition is saying 'graph problem' though. >> >> Thoughts? >> >> >> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com> wrote: >> >>> Hi Guillermo, >>> assuming that the first "a,b" is a typo and you actually meant "a,d", >>> this is a sorting problem. >>> >>> You could easily model your data as an RDD or tuples (or as a >>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets) >>> methods. >>> >>> best, >>> --Jakob >>> >>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com> >>> wrote: >>> > I want to do some algorithm in Spark.. I know how to do it in a single >>> > machine where all data are together, but I don't know a good way to do >>> it in >>> > Spark. >>> > >>> > If someone has an idea.. >>> > I have some data like this >>> > a , b >>> > x , y >>> > b , c >>> > y , y >>> > c , d >>> > >>> > I want something like: >>> > a , d >>> > b , d >>> > c , d >>> > x , y >>> > y , y >>> > >>> > I need to know that a->b->c->d, so a->d, b->d and c->d. >>> > I don't want the code, just an idea how I could deal with it. >>> > >>> > Any idea? >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >