So what do you suggest? Is it clear?
On Mon, Apr 1, 2013 at 9:35 PM, burakkk <burak.isi...@gmail.com> wrote: > I'm using only WTF graph representation to fit the memory. By the way I > haven't seen any explanation from the pig 0.11 release page about WTF or > graph models. > I don't wanna use Cassovary. I believe it can be done with pig. I > implement a graph representation using WTF paper to pig and then I'll use > it to implement random walk algorithm. To do that maybe I need to improve > some features such as joins(fuzzy join) etc or implement a new operator. I > can implement it using either existing operators or new operators. That's > up to us and it doesn't really matter. If there is already a implementation > to random walker algorithm, please feel free to tell. Because I haven't > found it. > Are you proposing to create an open-source implementation of those > algorithms? > Yes, I'm proposing to implement a random walk algorithm, new data model > which is representing graph. After that, people can use it coding the pig. > > Do you suggest they should be Pig scripts added to the Pig project, or do > you want to create some new operators? > Maybe, it can be UDF or new operator. > > I made a quick example. It may not be completely accurate, I've just tried > to explain it. > Think about you have a graph file just like that > user_id follower > 1 2 > 1 3 > 1 10 > 2 3 > 3 4 > 3 5 > ... > > Vertex List is an array including sorted vertex ids > node List is a matrix including vertex id and its starting position > > > graph = load 'graph' using PigStorage() (vertex:int, follower:int) - > --load the graph file > vertex = COGROUP graph BY (vertex); > list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as > vertexList; --load the whole vertexes from HDFS into the memory > list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as > nodeList; --load the whole vertexes from HDFS into the memory > randomWalk = FOREACH vertex GENERATE > flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; -- generate a > score using the node list you can traverse the graph to the your finishing > position > store... > > > Thanks > Best Regards... > > > On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > >> I'm somewhat familiar with WTF code (my day job is managing the analytics >> infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in >> fact >> some of the Pig 11 features/improvements are directly due to this >> project...), and mostly has to do with clever algorithms implemented in >> Pig >> (an earlier version of WTF loaded the graph into main memory on large-mem >> machines -- that system is open sourced, too, under >> github.com/twitter/cassovary). Are you proposing to create an open-source >> implementation of those algorithms? Do you suggest they should be Pig >> scripts added to the Pig project, or do you want to create some new >> operators? I'm not totally sure where you are going here. >> >> GSoC proposals for Pig are usually made by students who want to work on >> issues labeled as GSoC candidates on the apache jira. The students spend >> some time to understand the problem stated in the jira, familiarize >> themselves with the existing codebase, and put a basic technical >> implementation plan and schedule into their proposal. Since in this case >> you are proposing something we haven't scoped or defined well for >> ourselves, we need you to be very clear and specific about what you are >> trying to do, and how you plan to go about it. I think that Graph >> processing in Pig (or other Hadoop-based systems) is a really interesting >> topic and there is a lot of work to be done, but we really need you to be >> far more detailed to be able to give you good guidance with regards to >> GSoC. >> >> Best, >> Dmitriy >> >> >> On Sat, Mar 30, 2013 at 10:12 AM, burakkk <burak.isi...@gmail.com> wrote: >> >> > Sure. We can implement a graph model using "WTF: The Who to Follow >> Service >> > at Twitter article we can" article.This article's said that in this way >> > graph can be stored one machine's memory so that every node will read >> from >> > HDFS and cache the graph to the memory. Every node is responsible from >> its >> > bucket edge to process. I mean it can be splitted. Every node can be >> > processed its bucket using random walk algorithm for instance. Finally >> it >> > can be reduced to get to the final results. I hope it's clear :) >> > >> > Thanks >> > Best Regards... >> > >> > >> > On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy <dvrya...@gmail.com> >> > wrote: >> > >> > > Hi Burakk, >> > > The general idea of making graph processing easier is a good one. I'm >> not >> > > sure what exactly you are proposing to do, though. Could you be more >> > > detailed about what you are thinking? >> > > >> > > >> > > On Thu, Mar 28, 2013 at 1:28 PM, burakkk <burak.isi...@gmail.com> >> wrote: >> > > >> > > > Hi, >> > > > I might be a little bit late. I come up with a new idea for the last >> > > > minute. Currently I'm working on social graph processing. I think we >> > can >> > > > implement a solution for pig. With this idea I'm thinking to apply >> the >> > > > GSOC 2013 so that I can do some tasks about it. Is there any mentor >> to >> > do >> > > > it with me? Is there any suggestion? :) >> > > > >> > > > Details: >> > > > Of course I can improve some join operations. I'm not sure is there >> any >> > > > implementation about fuzzy joins for instance. These are the papers >> > that >> > > I >> > > > found >> > > > >> > > > Fuzzy Joins Using MapReduce >> > > > http://ilpubs.stanford.edu:8090/1006/ >> > > > >> > > > Dimension independent similarity computation >> > > > http://arxiv.org/abs/1206.2082 >> > > > >> > > > MapReduce is Good Enough? If All You Have is a Hammer, Throw Away >> > > > Everything That’s Not a Nail! >> > > > http://arxiv.org/pdf/1209.2191.pdf >> > > > >> > > > Large Graph Processing in the Cloud >> > > > http://www.ntu.edu.sg/home/bshe/sigmod10_demo.pdf >> > > > >> > > > ..etc >> > > > >> > > > Thanks >> > > > Best regards.. >> > > > >> > > > >> > > > -- >> > > > >> > > > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com* >> > > > * >> > > > * >> > > > >> > > >> > >> > >> > >> > -- >> > >> > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com* >> > * >> > * >> > >> > > > > -- > > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com* > * > * > -- *BURAK ISIKLI** *| *http://burakisikli.wordpress.com* * *