Hi Jiwon, In 0.6.0, Hama supported all input formats in loadVertices. In 0.6.1, new partitioner limited the capability of GraphJobRunner to run only on sequential file format. Record converter was (painfully) implemented to bridge this gap. This would be eliminated once we have a better partitioning design. Today, the VertexInputReader#parseVertex is called inside VertexInputReader#convertRecord. VertexInputReader is a RecordConverter. So, this reads vertices in your format and feeds the GraphJobRunner the vertices written in sequential file format. It does duplicate the data for the first time. You can reuse the partitions created in the next job submission without any converter/VertexInputReader.
Hi Ikhtiyor, Sounds like a good plan. Please create a JIRA issue and suggest/implement some more code refactoring for the purpose. Hi Edward, No one likes this stop-gap solution. :) Regards, Suraj On Fri, May 3, 2013 at 7:59 PM, Ikhtiyor Ahmedov <[email protected] > wrote: > As a user would like to add, maybe considering Apache Gora is good solution > for integrating with NoSQLs > On May 4, 2013 8:47 AM, "Edward J. Yoon" <[email protected]> wrote: > > > PartitioningRunner rewrites (converted to VertexWritable) records to > > particular partition files. and then, GraphJobRunner reads just > > VertexWritable. > > > > To Hama devs, > > > > BTW, I hadn't really thought about 'Range Partitioning' and > > 'integration with NoSQLs' until just now. And I just found my old > > opinion[1] on record converter. I didn't like 'Record converter'. > > > > 1. http://markmail.org/message/ol32pp2ixfazcxfc > > > > On Sat, May 4, 2013 at 7:36 AM, Jiwon Seo <[email protected]> wrote: > > > Edward, thanks for your reply. > > > > > > Right, I checked that PartitioningRunner is the only place that calls > the > > > convertRecord method. > > > > > > However, it is not clear how that class is related to the > GraphJobRunner > > > class. > > > The loadVertices() method in the GraphJobRunner does not call the > > > convertRecord method as in PartitioningRunner::bsp(). > > > > > > Is the GraphJobRunner::loadVertices() not used for loading vertices? > > > If it is used, how is it related to PartitioningRunner::bsp()? It would > > be > > > helpful to know the (rough) call stack from PartitioningRunner to > > > GraphJobRunner (or vice versa). > > > > > > Thanks, > > > > > > -Jiwon > > > > > >> Hi Mr.Seo, > > >> > > >> Please look at VertexInputReader.convertRecord() method. see also > > >> PartitioningRunner and RecordConverter classes[1]. > > >> > > >> 1. > > > > > > http://svn.apache.org/repos/asf/hama/trunk/core/src/main/java/org/apache/hama/bsp/PartitioningRunner > > >> > > >>On Fri, May 3, 2013 at 5:49 PM, Jiwon Seo <[email protected]> wrote: > > >>> Hi, > > >>> > > >>> I'm trying to understand how vertex loading is done in hama. > > >>> > > >>> The part that I don't understand is, the relation between > > > VertexInputReader > > >>> and InputFormat. > > >>> > > >>> As far as I understand, VertexInputReader.parseVertex is the method > to > > >>> initialize each vertex, but it is not clear where the function is > > called > > > in > > >>> Hama 0.6.1. > > >>> In Hama 0.6.0, the parseVertex function is explicitly called inside > > >>> GraphJobRunner::loadVertices, but in Hama 0.6.1, it is replaced with > > >>> peer.readNext(vertex, NullWritable.get()), and parseVertex does not > > seem > > > to > > >>> get called. Where is the function called? > > >>> > > >>> Thanks, > > >>> > > >>> -Jiwon > > > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > >
