[
https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427782#comment-13427782
]
Brian Femiano commented on GIRAPH-277:
--------------------------------------
I'm in favor of anything that makes subclassing I/O formats easier on the
users. I've implemented my own Giraph jobs about 10-12 times already and have
only been able to recycle I/O formats once or twice. The combination of
id/value/message and the variation I require across jobs usually forces a new
format from scratch. This is just my own observation based on my data. The work
going on to separate messages from vertices will help quite a bit with regard
to this.
In the spirit of this thread, I could definitely refactor
HBaseVertexInputFormat to be more friendly towards subclassing like this patch
shows. I particularly like the generic propogation down to the nested record
reader. Having to retype all the generic parameters gets old. I should also
make some subclasses for HBaseVertexInputFormat that let people define
column-families for vertex values and other handy behavior. It might help with
adoption and code reuse.
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
> Key: GIRAPH-277
> URL: https://issues.apache.org/jira/browse/GIRAPH-277
> Project: Giraph
> Issue Type: Improvement
> Components: examples, lib
> Reporter: Jaeho Shin
> Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}}
> had bad smell. It required users to understand how these two classes are
> glued together, and forced similar codes to be duplicated in every new input
> format. (Similarly for the VertexOutputFormat and VertexWriter.) Anyone who
> wants to create a new format should create an underlying record reader or
> writer at the right moment and delegate some calls to it, which seemed
> unnecessary detail being exposed. Besides, type parameters had to appear all
> over every new format code, which was extremely annoying for both reading
> existing code and writing a new one. I was very frustrated writing my first
> format code especially when I compared it to writing a new vertex code. I
> thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new
> forms that have no difference in their interfaces, but remove a lot of burden
> for subclassing. Instead of providing static VertexReader base classes, I
> made it a non-static inner-class of its format class, which helps eliminate
> the repeated code for gluing these two, already tightly coupled classes.
> This has additional advantage of eliminating all the Generics type variables
> on the VertexReader side, which makes overall code much more concise. I
> added several useful TextVertexReader base classes that can save efforts for
> implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other
> aspects. I'm unsure of how these additional layers of abstraction could
> affect performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira