Re: Re: Loading binary file in Hama (with graph API)

Suraj Menon Fri, 03 May 2013 17:44:13 -0700

Hi Jiwon,

In 0.6.0, Hama supported all input formats in loadVertices. In 0.6.1, new
partitioner limited the capability of GraphJobRunner to run only on
sequential file format. Record converter was (painfully) implemented to
bridge this gap. This would be eliminated once we have a better
partitioning design. Today, the VertexInputReader#parseVertex is called
inside VertexInputReader#convertRecord. VertexInputReader is a
RecordConverter.  So, this reads vertices in your format and feeds the
GraphJobRunner the vertices written in sequential file format. It does
duplicate the data for the first time. You can reuse the partitions created
in the next job submission without any converter/VertexInputReader.


Hi Ikhtiyor,

Sounds like a good plan. Please create a JIRA issue and suggest/implement
some more code refactoring for the purpose.

Hi Edward,
No one likes this stop-gap solution. :)

Regards,
Suraj





On Fri, May 3, 2013 at 7:59 PM, Ikhtiyor Ahmedov <[email protected]
> wrote:

> As a user would like to add, maybe considering Apache Gora is good solution
> for integrating with NoSQLs
> On May 4, 2013 8:47 AM, "Edward J. Yoon" <[email protected]> wrote:
>
> > PartitioningRunner rewrites (converted to VertexWritable) records to
> > particular partition files. and then, GraphJobRunner reads just
> > VertexWritable.
> >
> > To Hama devs,
> >
> > BTW, I hadn't really thought about 'Range Partitioning' and
> > 'integration with NoSQLs' until just now. And I just found my old
> > opinion[1] on record converter. I didn't like 'Record converter'.
> >
> > 1. http://markmail.org/message/ol32pp2ixfazcxfc
> >
> > On Sat, May 4, 2013 at 7:36 AM, Jiwon Seo <[email protected]> wrote:
> > > Edward, thanks for your reply.
> > >
> > > Right, I checked that PartitioningRunner is the only place that calls
> the
> > > convertRecord method.
> > >
> > > However, it is not clear how that class is related to the
> GraphJobRunner
> > > class.
> > > The loadVertices() method in the GraphJobRunner does not call the
> > > convertRecord method as in PartitioningRunner::bsp().
> > >
> > > Is the GraphJobRunner::loadVertices() not used for loading vertices?
> > > If it is used, how is it related to PartitioningRunner::bsp()? It would
> > be
> > > helpful to know the (rough) call stack from PartitioningRunner to
> > > GraphJobRunner (or vice versa).
> > >
> > > Thanks,
> > >
> > > -Jiwon
> > >
> > >> Hi Mr.Seo,
> > >>
> > >> Please look at VertexInputReader.convertRecord() method. see also
> > >> PartitioningRunner and RecordConverter classes[1].
> > >>
> > >> 1.
> > >
> >
> http://svn.apache.org/repos/asf/hama/trunk/core/src/main/java/org/apache/hama/bsp/PartitioningRunner
> > >>
> > >>On Fri, May 3, 2013 at 5:49 PM, Jiwon Seo <[email protected]> wrote:
> > >>> Hi,
> > >>>
> > >>> I'm trying to understand how vertex loading is done in hama.
> > >>>
> > >>> The part that I don't understand is, the relation between
> > > VertexInputReader
> > >>> and InputFormat.
> > >>>
> > >>> As far as I understand, VertexInputReader.parseVertex is the method
> to
> > >>> initialize each vertex, but it is not clear where the function is
> > called
> > > in
> > >>> Hama 0.6.1.
> > >>> In Hama 0.6.0, the parseVertex function is explicitly called inside
> > >>> GraphJobRunner::loadVertices, but in Hama 0.6.1, it is replaced with
> > >>> peer.readNext(vertex, NullWritable.get()), and parseVertex does not
> > seem
> > > to
> > >>> get called. Where is the function called?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> -Jiwon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Re: Re: Loading binary file in Hama (with graph API)

Reply via email to