Re: evaluating Hama

Raghava Mutharaju Sun, 12 Sep 2010 21:38:39 -0700

Hi Edward,

Thank you for the reply. I would look into BSPLib. I am also looking into
Parallel Boost Graph Library. The discussion at
http://stackoverflow.com/questions/3010805/scalable-parallel-large-graph-analysis-libraryis
useful to me. I am also looking at the libraries mentioned there as
alternatives :).


Regards,
Raghava.

On Sun, Sep 12, 2010 at 9:25 PM, Edward J. Yoon <[email protected]>wrote:

> I estimate a month or two months for 0.2.0 release.
>
> If input/output system and fault tolerant mechanism are added to BSP
> package in the future, the graph specified programming model and
> framework will be implemented easily. I guess, we can implement
> input/output system and FT mechanism within this year.
>
> > Do you know of any alternate parallel graph processing frameworks similar
> to
> > Pregel & Hama?
>
> Nope, but I was used BSPLib for simulate of Pregel concept. It might
> be good solution for you.
>
> > About the data split -- if, for example, there are 10 nodes in the
> cluster
> > and the data be divided into 10 splits (split-1 to split-10), then can we
> > control which split goes to which node as local data? In case of MR
> splits,
> > this cannot be controlled isn't it, can we do that here?
>
> I understood your question.
>
> It is depending on {"how to designing the data structure", "how to
> storing, organizing and re-using data"} on "somewhere". We don't have
> a plan for graph data store yet.
>
> Thanks. :)
>
> On Fri, Sep 10, 2010 at 1:38 PM, Raghava Mutharaju
> <[email protected]> wrote:
> > Hello Edward,
> >
> > Thank you for the reply. Please correct me if I am wrong about what I am
> > going to say.
> >
> > If the BSP computing framework is in place, how much more of a work would
> it
> > be to place a graph processing framework on top of it? I guess some parts
> of
> > the graph processing framework (Angrapa) is in place?
> >
> > While I was searching for parallel graph processing frameworks, I came
> > across Pregel and also Hama :). Pregel development would have taken lot
> of
> > time, Hama is just starting out, so it would be unrealistic to make it as
> > robust with as many features as Pregel, but it would be great to have
> > something in place to test out my ideas.
> >
> > When is the release of 0.2.0 scheduled?
> >
> > Do you know of any alternate parallel graph processing frameworks similar
> to
> > Pregel & Hama?
> >
> > About the data split -- if, for example, there are 10 nodes in the
> cluster
> > and the data be divided into 10 splits (split-1 to split-10), then can we
> > control which split goes to which node as local data? In case of MR
> splits,
> > this cannot be controlled isn't it, can we do that here?
> >
> > Thank you.
> >
> > Regards,
> > Raghava.
> >
> > On Thu, Sep 9, 2010 at 10:47 PM, Edward J. Yoon <[email protected]
> >wrote:
> >
> >> Hello,
> >>
> >> > 1) What is the status of the project, specifically the graph
> processing
> >> part
> >> > (Angrapa?). Is it sufficiently stable to be used? Although this is an
> >> > academic research project, it would be better to work on a stable one.
> >>
> >> At present, we're focussing on a framework for more general-purpose
> >> BSP computing, so yet far from the graph processing framework such as
> >> Google Pregel.
> >>
> >> We have a release plan for 0.2.0 version and we're working on it.The
> >> release 0.2.0 will include:
> >>
> >>  * BSP computing framework (no fault tolerant mechanism, no data
> >> input-output API)
> >>  * and its examples
> >>
> >> > 2) I haven't come across any installation/building steps for Hama. How
> to
> >> > integrate with HDFS/HBase?
> >>
> >> We'll create a input-output system that can be used to process data.
> >> You can think it as a M/R computing framework on HDFS/HBase.
> >>
> >> > 3) Are there more extensive performance tests say w.r.t the latest
> branch
> >> of
> >> > development? Do they have better performance?
> >>
> >> Not yet.
> >>
> >> > 4) Can the data assigned to each partition (cluster) be split
> according
> >> to
> >> > some condition i.e. can it be controlled unlike a MR split?
> >>
> >> Do you mean, whether it can assign a task to slaves according to other
> >> condition (not based on local)? Then, no.
> >>
> >> The all splits should be loaded and computed locally. Otherwise, it
> >> will cause meaningless huge data-copy overhead among servers.
> >>
> >> Thanks :)
> >>
> >> On Fri, Sep 10, 2010 at 7:09 AM, Raghava Mutharaju
> >> <[email protected]> wrote:
> >> > Hi all,
> >> >
> >> > I am working on a research project where I faced the issues that
> formed
> >> the
> >> > motivation for Hama (Hamburg) -- the splits in the data depend on each
> >> other
> >> > and data locality issue in case of multiple MR iterations. I was
> thinking
> >> of
> >> > checking other alternatives to MR when I came across Hama. I am in the
> >> > process of checking whether Hama would fit our project needs and I
> need
> >> your
> >> > help in that regard.
> >> >
> >> > I am interested in the graph processing part of Hama.
> >> >
> >> > I have the following questions
> >> >
> >> > 1) What is the status of the project, specifically the graph
> processing
> >> part
> >> > (Angrapa?). Is it sufficiently stable to be used? Although this is an
> >> > academic research project, it would be better to work on a stable one.
> >> > 2) I haven't come across any installation/building steps for Hama. How
> to
> >> > integrate with HDFS/HBase?
> >> > 3) Are there more extensive performance tests say w.r.t the latest
> branch
> >> of
> >> > development? Do they have better performance?
> >> > 4) Can the data assigned to each partition (cluster) be split
> according
> >> to
> >> > some condition i.e. can it be controlled unlike a MR split?
> >> >
> >> > Thank you.
> >> >
> >> > Regards,
> >> > Raghava.
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> [email protected]
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> [email protected]
> http://blog.udanax.org
>

Re: evaluating Hama

Reply via email to