[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237959#comment-13237959 ]
Jakob Homan commented on GIRAPH-153: ------------------------------------ bq. I'm concerned with how fat the jar becomes once the HBase core files are coalesced into the Giraph jar. This is a great effort, but will have to be done in some other way than just including a direct dependency on hbase into Giraph. Lots of sites already have a different HBase installed and this will just cause headaches for them. Alternatively, for those sites that don't use HBase (and may not want it on their clusters) these jars as part of Giraph isn't a viable option. Basically, making Giraph depend on HBase is a non-starter. Can maven modules help us out here? Can we have a separate artifact, giraph-hbase-formats.jar or something, we can publish that those that wish this functionality can pull in? That jar can depend on both hbase and giraph with no extra requirement on either of those projects. > HBase/Accumulo Input and Output formats > --------------------------------------- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp > Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB > Reporter: Brian Femiano > > Four abstract classes that wrap their respective delegate input/output > formats for > easy hooks into vertex input format subclasses. I've included some sample > programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed > structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in > the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. > Every vertex starts thinking it's a root. At superstep 0, send a message down > to each > child as a non-root notification. After superstep 1, only root nodes will > have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by > bundling the notification logic followed by root node propagation. Once we've > marked the appropriate nodes as roots, tell every child which roots it can be > traced back to via one or more spanning trees. This will take N + 2 > supersteps where N is the maximum number of hops from any root to any leaf, > plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for > recursive cache file and archive searches. It is more hadoop centric than > giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the > aforementioned hardware, and full distributed on EC2. More details in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira