Re: GSoC 2013

Gianmarco De Francisci Morales Tue, 02 Apr 2013 09:21:40 -0700

FYI, Giraph has a Random Walk implementation.

Pig does not support iteration natively, so any iterative algorithm is not
a very good fit for it. Just my 2c.


Cheers,

--
Gianmarco


On Tue, Apr 2, 2013 at 10:04 AM, burakkk <[email protected]> wrote:

> So what do you suggest? Is it clear?
>
>
> On Mon, Apr 1, 2013 at 9:35 PM, burakkk <[email protected]> wrote:
>
> > I'm using only WTF graph representation to fit the memory. By the way I
> > haven't seen any explanation from the pig 0.11 release page about WTF or
> > graph models.
> > I don't wanna use Cassovary. I believe it can be done with pig. I
> > implement a graph representation using WTF paper to pig and then I'll use
> > it to implement random walk algorithm. To do that maybe I need to improve
> > some features such as joins(fuzzy join) etc or implement a new operator.
> I
> > can implement it using either existing operators or new operators. That's
> > up to us and it doesn't really matter. If there is already a
> implementation
> > to random walker algorithm, please feel free to tell. Because I haven't
> > found it.
> > Are you proposing to create an open-source implementation of those
> > algorithms?
> > Yes, I'm proposing to implement a random walk algorithm, new data model
> > which is representing graph. After that, people can use it coding the
> pig.
> >
> > Do you suggest they should be Pig scripts added to the Pig project, or do
> > you want to create some new operators?
> > Maybe, it can be UDF or new operator.
> >
> > I made a quick example. It may not be completely accurate, I've just
> tried
> > to explain it.
> > Think about you have a graph file just like that
> > user_id follower
> > 1 2
> > 1 3
> > 1 10
> > 2 3
> > 3 4
> > 3 5
> > ...
> >
> > Vertex List is an array including sorted vertex ids
> > node List is a matrix including vertex id and its starting position
> >
> >
> > graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
> > --load the graph file
> > vertex = COGROUP graph BY (vertex);
> > list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as
> > vertexList; --load the whole vertexes from HDFS into the memory
> > list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
> > nodeList; --load the whole vertexes from HDFS into the memory
> > randomWalk = FOREACH vertex GENERATE
> > flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
> generate a
> > score using the node list you can traverse the graph to the your
> finishing
> > position
> > store...
> >
> >
> > Thanks
> > Best Regards...
> >
> >
> > On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <[email protected]>
> wrote:
> >
> >> I'm somewhat familiar with WTF code (my day job is managing the
> analytics
> >> infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in
> >> fact
> >> some of the Pig 11 features/improvements are directly due to this
> >> project...), and mostly has to do with clever algorithms implemented in
> >> Pig
> >> (an earlier version of WTF loaded the graph into main memory on
> large-mem
> >> machines -- that system is open sourced, too, under
> >> github.com/twitter/cassovary). Are you proposing to create an
> open-source
> >> implementation of those algorithms? Do you suggest they should be Pig
> >> scripts added to the Pig project, or do you want to create some new
> >> operators? I'm not totally sure where you are going here.
> >>
> >> GSoC proposals for Pig are usually made by students who want to work on
> >> issues labeled as GSoC candidates on the apache jira. The students spend
> >> some time to understand the problem stated in the jira, familiarize
> >> themselves with the existing codebase, and put a basic technical
> >> implementation plan and schedule into their proposal. Since in this case
> >> you are proposing something we haven't scoped or defined well for
> >> ourselves, we need you to be very clear and specific about what you are
> >> trying to do, and how you plan to go about it. I think that Graph
> >> processing in Pig (or other Hadoop-based systems) is a really
> interesting
> >> topic and there is a lot of work to be done, but we really need you to
> be
> >> far more detailed to be able to give you good guidance with regards to
> >> GSoC.
> >>
> >> Best,
> >> Dmitriy
> >>
> >>
> >> On Sat, Mar 30, 2013 at 10:12 AM, burakkk <[email protected]>
> wrote:
> >>
> >> > Sure. We can implement a graph model using  "WTF: The Who to Follow
> >> Service
> >> > at Twitter article we can" article.This article's said that in this
> way
> >> > graph can be stored one machine's memory so that every node will read
> >> from
> >> > HDFS and cache the graph to the memory. Every node is responsible from
> >> its
> >> > bucket edge to process. I mean it can be splitted. Every node can be
> >> > processed its bucket using random walk algorithm for instance. Finally
> >> it
> >> > can be reduced to get to the final results. I hope it's clear :)
> >> >
> >> > Thanks
> >> > Best Regards...
> >> >
> >> >
> >> > On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy <[email protected]>
> >> > wrote:
> >> >
> >> > > Hi Burakk,
> >> > > The general idea of making graph processing easier is a good one.
> I'm
> >> not
> >> > > sure what exactly you are proposing to do, though. Could you be more
> >> > > detailed about what you are thinking?
> >> > >
> >> > >
> >> > > On Thu, Mar 28, 2013 at 1:28 PM, burakkk <[email protected]>
> >> wrote:
> >> > >
> >> > > > Hi,
> >> > > > I might be a little bit late. I come up with a new idea for the
> last
> >> > > > minute. Currently I'm working on social graph processing. I think
> we
> >> > can
> >> > > > implement a solution for pig.  With this idea I'm thinking to
> apply
> >> the
> >> > > > GSOC 2013 so that I can do some tasks about it. Is there any
> mentor
> >> to
> >> > do
> >> > > > it with me?  Is there any suggestion? :)
> >> > > >
> >> > > > Details:
> >> > > > Of course I can improve some join operations. I'm not sure is
> there
> >> any
> >> > > > implementation about fuzzy joins for instance. These are the
> papers
> >> > that
> >> > > I
> >> > > > found
> >> > > >
> >> > > > Fuzzy Joins Using MapReduce
> >> > > > http://ilpubs.stanford.edu:8090/1006/
> >> > > >
> >> > > > Dimension independent similarity computation
> >> > > > http://arxiv.org/abs/1206.2082
> >> > > >
> >> > > > MapReduce is Good Enough? If All You Have is a Hammer, Throw Away
> >> > > > Everything That’s Not a Nail!
> >> > > > http://arxiv.org/pdf/1209.2191.pdf
> >> > > >
> >> > > > Large Graph Processing in the Cloud
> >> > > > http://www.ntu.edu.sg/home/bshe/sigmod10_demo.pdf
> >> > > >
> >> > > > ..etc
> >> > > >
> >> > > > Thanks
> >> > > > Best regards..
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> >> > > > *
> >> > > > *
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> >> > *
> >> > *
> >> >
> >>
> >
> >
> >
> > --
> >
> > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> > *
> > *
> >
>
>
>
> --
>
> *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> *
> *
>

Re: GSoC 2013

Reply via email to