So what do you suggest? Is it clear?

On Mon, Apr 1, 2013 at 9:35 PM, burakkk <burak.isi...@gmail.com> wrote:

> I'm using only WTF graph representation to fit the memory. By the way I
> haven't seen any explanation from the pig 0.11 release page about WTF or
> graph models.
> I don't wanna use Cassovary. I believe it can be done with pig. I
> implement a graph representation using WTF paper to pig and then I'll use
> it to implement random walk algorithm. To do that maybe I need to improve
> some features such as joins(fuzzy join) etc or implement a new operator. I
> can implement it using either existing operators or new operators. That's
> up to us and it doesn't really matter. If there is already a implementation
> to random walker algorithm, please feel free to tell. Because I haven't
> found it.
> Are you proposing to create an open-source implementation of those
> algorithms?
> Yes, I'm proposing to implement a random walk algorithm, new data model
> which is representing graph. After that, people can use it coding the pig.
>
> Do you suggest they should be Pig scripts added to the Pig project, or do
> you want to create some new operators?
> Maybe, it can be UDF or new operator.
>
> I made a quick example. It may not be completely accurate, I've just tried
> to explain it.
> Think about you have a graph file just like that
> user_id follower
> 1 2
> 1 3
> 1 10
> 2 3
> 3 4
> 3 5
> ...
>
> Vertex List is an array including sorted vertex ids
> node List is a matrix including vertex id and its starting position
>
>
> graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
> --load the graph file
> vertex = COGROUP graph BY (vertex);
> list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as
> vertexList; --load the whole vertexes from HDFS into the memory
> list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
> nodeList; --load the whole vertexes from HDFS into the memory
> randomWalk = FOREACH vertex GENERATE
> flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; -- generate a
> score using the node list you can traverse the graph to the your finishing
> position
> store...
>
>
> Thanks
> Best Regards...
>
>
> On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
>
>> I'm somewhat familiar with WTF code (my day job is managing the analytics
>> infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in
>> fact
>> some of the Pig 11 features/improvements are directly due to this
>> project...), and mostly has to do with clever algorithms implemented in
>> Pig
>> (an earlier version of WTF loaded the graph into main memory on large-mem
>> machines -- that system is open sourced, too, under
>> github.com/twitter/cassovary). Are you proposing to create an open-source
>> implementation of those algorithms? Do you suggest they should be Pig
>> scripts added to the Pig project, or do you want to create some new
>> operators? I'm not totally sure where you are going here.
>>
>> GSoC proposals for Pig are usually made by students who want to work on
>> issues labeled as GSoC candidates on the apache jira. The students spend
>> some time to understand the problem stated in the jira, familiarize
>> themselves with the existing codebase, and put a basic technical
>> implementation plan and schedule into their proposal. Since in this case
>> you are proposing something we haven't scoped or defined well for
>> ourselves, we need you to be very clear and specific about what you are
>> trying to do, and how you plan to go about it. I think that Graph
>> processing in Pig (or other Hadoop-based systems) is a really interesting
>> topic and there is a lot of work to be done, but we really need you to be
>> far more detailed to be able to give you good guidance with regards to
>> GSoC.
>>
>> Best,
>> Dmitriy
>>
>>
>> On Sat, Mar 30, 2013 at 10:12 AM, burakkk <burak.isi...@gmail.com> wrote:
>>
>> > Sure. We can implement a graph model using  "WTF: The Who to Follow
>> Service
>> > at Twitter article we can" article.This article's said that in this way
>> > graph can be stored one machine's memory so that every node will read
>> from
>> > HDFS and cache the graph to the memory. Every node is responsible from
>> its
>> > bucket edge to process. I mean it can be splitted. Every node can be
>> > processed its bucket using random walk algorithm for instance. Finally
>> it
>> > can be reduced to get to the final results. I hope it's clear :)
>> >
>> > Thanks
>> > Best Regards...
>> >
>> >
>> > On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
>> > wrote:
>> >
>> > > Hi Burakk,
>> > > The general idea of making graph processing easier is a good one. I'm
>> not
>> > > sure what exactly you are proposing to do, though. Could you be more
>> > > detailed about what you are thinking?
>> > >
>> > >
>> > > On Thu, Mar 28, 2013 at 1:28 PM, burakkk <burak.isi...@gmail.com>
>> wrote:
>> > >
>> > > > Hi,
>> > > > I might be a little bit late. I come up with a new idea for the last
>> > > > minute. Currently I'm working on social graph processing. I think we
>> > can
>> > > > implement a solution for pig.  With this idea I'm thinking to apply
>> the
>> > > > GSOC 2013 so that I can do some tasks about it. Is there any mentor
>> to
>> > do
>> > > > it with me?  Is there any suggestion? :)
>> > > >
>> > > > Details:
>> > > > Of course I can improve some join operations. I'm not sure is there
>> any
>> > > > implementation about fuzzy joins for instance. These are the papers
>> > that
>> > > I
>> > > > found
>> > > >
>> > > > Fuzzy Joins Using MapReduce
>> > > > http://ilpubs.stanford.edu:8090/1006/
>> > > >
>> > > > Dimension independent similarity computation
>> > > > http://arxiv.org/abs/1206.2082
>> > > >
>> > > > MapReduce is Good Enough? If All You Have is a Hammer, Throw Away
>> > > > Everything That’s Not a Nail!
>> > > > http://arxiv.org/pdf/1209.2191.pdf
>> > > >
>> > > > Large Graph Processing in the Cloud
>> > > > http://www.ntu.edu.sg/home/bshe/sigmod10_demo.pdf
>> > > >
>> > > > ..etc
>> > > >
>> > > > Thanks
>> > > > Best regards..
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
>> > > > *
>> > > > *
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
>> > *
>> > *
>> >
>>
>
>
>
> --
>
> *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> *
> *
>



-- 

*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*

Reply via email to