[
https://issues.apache.org/jira/browse/GIRAPH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Reisman updated GIRAPH-469:
-------------------------------
Attachment: GIRAPH-469-2.patch
This is the cleaned up patch, ready for review. Passes mvn verify under all the
profiles. Sorry, I know its a big one, but it only does a couple things:
1. Refactors all the long methods in GraphMapper into easier to read, better
documented methods.
2. Moves all the GraphMapper code that has to do with Giraph/BSP processes into
a new GraphTaskManager. In this way GraphMapper becomes a simple wrapper to set
up Hadoop-specific boilerplate while delegating all the work to our more
platform neutral GraphTaskManager. This also allows GraphMapper to continue to
inherit from Mapper
I am trying to set this part of the code up in steps to be processing-platform
independent so I can implement a "pure YARN" mode a la GIRAPH-13. I tried not
to do it all in one patch. As of now, the only absolutely direct pipeline from
Hadoop into our Giraph workings at this point stem from the Mapper#Context
which is still passed into the GiraphTaskManager from the GraphMapper.
Future JIRAs on this will include:
1. breaking out ZookeeperManager into an interface, and setting up a parallel
impl that will spawn a YARN app container-hosted ZK instance.
2. Determining the extent of the things a replacement interface for
Mapper#Context would have to do for Giraph, and replacing the Mapper#Context we
get from GraphMapper (and Hadoop) with this interface so we can implement
alternate implementations that let Giraph get what it needs from the underlying
cluster without being Hadoop specific. You get the idea...
Thanks, I'll try to throw this up on ReviewBoard as well
> Cleanup GraphMapper
> -------------------
>
> Key: GIRAPH-469
> URL: https://issues.apache.org/jira/browse/GIRAPH-469
> Project: Giraph
> Issue Type: Improvement
> Reporter: Nitay Joffe
> Assignee: Eli Reisman
> Attachments: GIRAPH-469-1-eli-idea.patch, GIRAPH-469-2.patch
>
>
> I don't see why we even call a map() method seeing as we are overriding
> run(). We are clearly not particularly "mapreduce-y" so we should make it our
> entry point more clear than a map(). Also I think we should have something
> like a WorkerThread similar to MasterThread and clean up all of this to just
> creare whichever threads the node is assigned roles of.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira