Thank you!, I'm trying to do it with Pregel,, it's being hard because I have never used GraphX and Pregel before.
2016-02-25 14:00 GMT+01:00 Sabarish Sasidharan <sabarish....@gmail.com>: > Like Robin said, pls explore Pregel. You could do it without Pregel but it > might be laborious. I have a simple outline below. You will need more > iterations if the number of levels is higher. > > a-b > b-c > c-d > b-e > e-f > f-c > > flatmaptopair > > a -> (a-b) > b -> (a-b) > b -> (b-c) > c -> (b-c) > c -> (c-d) > d -> (c-d) > b -> (b-e) > e -> (b-e) > e -> (e-f) > f -> (e-f) > f -> (f-c) > c -> (f-c) > > aggregatebykey > > a -> (a-b) > b -> (a-b, b-c, b-e) > c -> (b-c, c-d, f-c) > d -> (c-d) > e -> (b-e, e-f) > f -> (e-f, f-c) > > filter to remove keys with less than 2 values > > b -> (a-b, b-c, b-e) > c -> (b-c, c-d, f-c) > e -> (b-e, e-f) > f -> (e-f, f-c) > > flatmap > > a-b-c > a-b-e > b-c-d > b-e-f > e-f-c > > flatmaptopair followed by aggregatebykey > > (a-b) -> (a-b-c, a-b-e) > (b-c) -> (a-b-c, b-c-d) > (c-d) -> (b-c-d) > (b-e) -> (b-e-f) > (e-f) -> (b-e-f, e-f-c) > (f-c) -> (e-f-c) > > filter out keys with less than 2 values > > (b-c) -> (a-b-c, b-c-d) > (e-f) -> (b-e-f, e-f-c) > > mapvalues > > a-b-c-d > b-e-f-c > > flatmap > > a,d > b,d > c,d > b,c > e,c > f,c > > > On Thu, Feb 25, 2016 at 6:19 PM, Guillermo Ortiz <konstt2...@gmail.com> > wrote: > >> I'm taking a look to Pregel. It seems it's a good way to do it. The only >> negative thing that I see it's not a really complex graph with a lot of >> edges between the vertex .. They are more like a lot of isolated small >> graphs >> >> 2016-02-25 12:32 GMT+01:00 Robin East <robin.e...@xense.co.uk>: >> >>> The structures you are describing look like edges of a graph and you >>> want to follow the graph to a terminal vertex and then propagate that value >>> back up the path. On this assumption it would be simple to create the >>> structures as graphs in GraphX and use Pregel for the algorithm >>> implementation. >>> >>> ------------------------------------------------------------------------------- >>> Robin East >>> *Spark GraphX in Action* Michael Malak and Robin East >>> Manning Publications Co. >>> http://www.manning.com/books/spark-graphx-in-action >>> >>> >>> >>> >>> >>> On 25 Feb 2016, at 10:52, Guillermo Ortiz <konstt2...@gmail.com> wrote: >>> >>> Oh, the letters were just an example, it could be: >>> a , t >>> b, o >>> t, k >>> k, c >>> >>> So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o >>> I don't know if you were thinking about sortBy because the another >>> example where letter were consecutive. >>> >>> >>> 2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>: >>> >>>> I don't see that sorting the data helps. >>>> The answer has to be all the associations. In this case the answer has >>>> to be: >>>> a , b --> it was a error in the question, sorry. >>>> b , d >>>> c , d >>>> x , y >>>> y , y >>>> >>>> I feel like all the data which is associate should be in the same >>>> executor. >>>> On this case if I order the inputs. >>>> a , b >>>> x , y >>>> b , c >>>> y , y >>>> c , d >>>> --> to >>>> a , b >>>> b , c >>>> c , d >>>> x , y >>>> y , y >>>> >>>> Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one >>>> and so on. >>>> I could get the relation between "a,b,c", but not about "d" with >>>> "a,b,c", am I wrong? I hope to be wrong!. >>>> >>>> It seems that it could be done with GraphX, but as you said, it seems a >>>> little bit overhead. >>>> >>>> >>>> 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>: >>>> >>>>> Guillermo, >>>>> I think you're after an associative algorithm where A is ultimately >>>>> associated with D, correct? Jakob would correct if that is a typo--a sort >>>>> would be all that is necessary in that case. >>>>> >>>>> I believe you're looking for something else though, if I understand >>>>> correctly. >>>>> >>>>> This seems like a similar algorithm to PageRank, no? >>>>> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py >>>>> Except return the "neighbor" itself, not the necessarily the rank of the >>>>> page. >>>>> >>>>> If you wanted to, use Scala and Graphx for this problem. Might be a >>>>> bit of overhead though: Construct a node for each member of each tuple >>>>> with >>>>> an edge between. Then traverse the graph for all sets of nodes that are >>>>> connected. That result set would quickly explode in size, but you could >>>>> restrict results to a minimum N connections. I'm not super familiar with >>>>> Graphx myself, however. My intuition is saying 'graph problem' though. >>>>> >>>>> Thoughts? >>>>> >>>>> >>>>> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com> >>>>> wrote: >>>>> >>>>>> Hi Guillermo, >>>>>> assuming that the first "a,b" is a typo and you actually meant "a,d", >>>>>> this is a sorting problem. >>>>>> >>>>>> You could easily model your data as an RDD or tuples (or as a >>>>>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets) >>>>>> methods. >>>>>> >>>>>> best, >>>>>> --Jakob >>>>>> >>>>>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz < >>>>>> konstt2...@gmail.com> wrote: >>>>>> > I want to do some algorithm in Spark.. I know how to do it in a >>>>>> single >>>>>> > machine where all data are together, but I don't know a good way to >>>>>> do it in >>>>>> > Spark. >>>>>> > >>>>>> > If someone has an idea.. >>>>>> > I have some data like this >>>>>> > a , b >>>>>> > x , y >>>>>> > b , c >>>>>> > y , y >>>>>> > c , d >>>>>> > >>>>>> > I want something like: >>>>>> > a , d >>>>>> > b , d >>>>>> > c , d >>>>>> > x , y >>>>>> > y , y >>>>>> > >>>>>> > I need to know that a->b->c->d, so a->d, b->d and c->d. >>>>>> > I don't want the code, just an idea how I could deal with it. >>>>>> > >>>>>> > Any idea? >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >