Re: How could I do this algorithm in Spark?

Guillermo Ortiz Thu, 25 Feb 2016 07:05:55 -0800

Thank you!, I'm trying to do it with Pregel,, it's being hard because I
have never used GraphX and Pregel before.


2016-02-25 14:00 GMT+01:00 Sabarish Sasidharan <sabarish....@gmail.com>:

> Like Robin said, pls explore Pregel. You could do it without Pregel but it
> might be laborious. I have a simple outline below. You will need more
> iterations if the number of levels is higher.
>
> a-b
> b-c
> c-d
> b-e
> e-f
> f-c
>
> flatmaptopair
>
> a -> (a-b)
> b -> (a-b)
> b -> (b-c)
> c -> (b-c)
> c -> (c-d)
> d -> (c-d)
> b -> (b-e)
> e -> (b-e)
> e -> (e-f)
> f -> (e-f)
> f -> (f-c)
> c -> (f-c)
>
> aggregatebykey
>
> a -> (a-b)
> b -> (a-b, b-c, b-e)
> c -> (b-c, c-d, f-c)
> d -> (c-d)
> e -> (b-e, e-f)
> f -> (e-f, f-c)
>
> filter to remove keys with less than 2 values
>
> b -> (a-b, b-c, b-e)
> c -> (b-c, c-d, f-c)
> e -> (b-e, e-f)
> f -> (e-f, f-c)
>
> flatmap
>
> a-b-c
> a-b-e
> b-c-d
> b-e-f
> e-f-c
>
> flatmaptopair followed by aggregatebykey
>
> (a-b) -> (a-b-c, a-b-e)
> (b-c) -> (a-b-c, b-c-d)
> (c-d) -> (b-c-d)
> (b-e) -> (b-e-f)
> (e-f) -> (b-e-f, e-f-c)
> (f-c) -> (e-f-c)
>
> filter out keys with less than 2 values
>
> (b-c) -> (a-b-c, b-c-d)
> (e-f) -> (b-e-f, e-f-c)
>
> mapvalues
>
> a-b-c-d
> b-e-f-c
>
> flatmap
>
> a,d
> b,d
> c,d
> b,c
> e,c
> f,c
>
>
> On Thu, Feb 25, 2016 at 6:19 PM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> I'm taking a look to Pregel. It seems it's a good way to do it. The only
>> negative thing that I see it's not a really complex graph with a lot of
>> edges between the vertex .. They are more like a lot of isolated small
>> graphs
>>
>> 2016-02-25 12:32 GMT+01:00 Robin East <robin.e...@xense.co.uk>:
>>
>>> The structures you are describing look like edges of a graph and you
>>> want to follow the graph to a terminal vertex and then propagate that value
>>> back up the path. On this assumption it would be simple to create the
>>> structures as graphs in GraphX and use Pregel for the algorithm
>>> implementation.
>>>
>>> -------------------------------------------------------------------------------
>>> Robin East
>>> *Spark GraphX in Action* Michael Malak and Robin East
>>> Manning Publications Co.
>>> http://www.manning.com/books/spark-graphx-in-action
>>>
>>>
>>>
>>>
>>>
>>> On 25 Feb 2016, at 10:52, Guillermo Ortiz <konstt2...@gmail.com> wrote:
>>>
>>> Oh, the letters were just an example, it could be:
>>> a , t
>>> b, o
>>> t, k
>>> k, c
>>>
>>> So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o
>>> I don't know if you were thinking about sortBy because the another
>>> example where letter were consecutive.
>>>
>>>
>>> 2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
>>>
>>>> I don't see that sorting the data helps.
>>>> The answer has to be all the associations. In this case the answer has
>>>> to be:
>>>> a , b --> it was a error in the question, sorry.
>>>> b , d
>>>> c , d
>>>> x , y
>>>> y , y
>>>>
>>>> I feel like all the data which is associate should be in the same
>>>> executor.
>>>> On this case if I order the inputs.
>>>> a , b
>>>> x , y
>>>> b , c
>>>> y , y
>>>> c , d
>>>> --> to
>>>> a , b
>>>> b , c
>>>> c , d
>>>> x , y
>>>> y , y
>>>>
>>>> Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one
>>>> and so on.
>>>> I could get the relation between "a,b,c", but not about "d" with
>>>> "a,b,c", am I wrong? I hope to be wrong!.
>>>>
>>>> It seems that it could be done with GraphX, but as you said, it seems a
>>>> little bit overhead.
>>>>
>>>>
>>>> 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>:
>>>>
>>>>> Guillermo,
>>>>> I think you're after an associative algorithm where A is ultimately
>>>>> associated with D, correct? Jakob would correct if that is a typo--a sort
>>>>> would be all that is necessary in that case.
>>>>>
>>>>> I believe you're looking for something else though, if I understand
>>>>> correctly.
>>>>>
>>>>> This seems like a similar algorithm to PageRank, no?
>>>>> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py
>>>>> Except return the "neighbor" itself, not the necessarily the rank of the
>>>>> page.
>>>>>
>>>>> If you wanted to, use Scala and Graphx for this problem. Might be a
>>>>> bit of overhead though: Construct a node for each member of each tuple 
>>>>> with
>>>>> an edge between. Then traverse the graph for all sets of nodes that are
>>>>> connected. That result set would quickly explode in size, but you could
>>>>> restrict results to a minimum N connections. I'm not super familiar with
>>>>> Graphx myself, however. My intuition is saying 'graph problem' though.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Guillermo,
>>>>>> assuming that the first "a,b" is a typo and you actually meant "a,d",
>>>>>> this is a sorting problem.
>>>>>>
>>>>>> You could easily model your data as an RDD or tuples (or as a
>>>>>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
>>>>>> methods.
>>>>>>
>>>>>> best,
>>>>>> --Jakob
>>>>>>
>>>>>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <
>>>>>> konstt2...@gmail.com> wrote:
>>>>>> > I want to do some algorithm in Spark.. I know how to do it in a
>>>>>> single
>>>>>> > machine where all data are together, but I don't know a good way to
>>>>>> do it in
>>>>>> > Spark.
>>>>>> >
>>>>>> > If someone has an idea..
>>>>>> > I have some data like this
>>>>>> > a , b
>>>>>> > x , y
>>>>>> > b , c
>>>>>> > y , y
>>>>>> > c , d
>>>>>> >
>>>>>> > I want something like:
>>>>>> > a , d
>>>>>> > b , d
>>>>>> > c , d
>>>>>> > x , y
>>>>>> > y , y
>>>>>> >
>>>>>> > I need to know that a->b->c->d, so a->d, b->d and c->d.
>>>>>> > I don't want the code, just an idea how I could deal with it.
>>>>>> >
>>>>>> > Any idea?
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: How could I do this algorithm in Spark?

Reply via email to