[ https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jakob Homan updated GIRAPH-64: ------------------------------ Attachment: GIRAPH-64.patch Here's a patch that introduces that old bin folder we all know and lo{ve|athe}. This also gives us the start of the package we'll need to think about making releases. Users no longer have to merge their code into the Giraph source to get it to run. With the new bin/giraph, assuming an implementation of Vertex such as (taken from the pagerankbenchmark, obviously): {code}import java.util.Iterator; public class FirstVertex extends Vertex<LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> { /** Configuration from Configurable */ private Configuration conf; /** How many supersteps to run */ public static String SUPERSTEP_COUNT = "PageRankBenchmark.superstepCount"; @Override public void preApplication() throws InstantiationException, IllegalAccessException { } @Override public void postApplication() { } @Override public void preSuperstep() { } @Override public void compute(Iterator<DoubleWritable> msgIterator) { if (getSuperstep() >= 1) { double sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } DoubleWritable vertexValue = new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum); setVertexValue(vertexValue); } if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) { long edges = getNumOutEdges(); sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / edges)); } else { voteToHalt(); } } @Override public Configuration getConf() { return conf; } @Override public void setConf(Configuration conf) { this.conf = conf; } }{code} one can run it via: {noformat}bin/giraph \ -DPageRankBenchmark.superstepCount=30 \ -DpseduoRandomVertexReader.aggregateVertices=220 \ -DpseduoRandomVertexReader.edgesPerVertex=37 \ ~/kick-ass-vertex-1.0.jar giraph1.FirstVertex \ -w 10 \ -if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat \ -of org.apache.giraph.lib.JsonBase64VertexOutputFormat \ -op output_path{noformat} bin/giraph is heavily cribbed from mahout and pig, btw. Is there any reason the fatjar approach was taken other than expediency? This patch uses the fatjar approach for testing, but uses a standard lib folder approach for the actual package. I'd like to remove the fatjar entirely, eventually. This is a rough script and will need lots of enhancements as we go, but I think it's a good start. > Create VertexRunner to make it easier to run users' computations > ---------------------------------------------------------------- > > Key: GIRAPH-64 > URL: https://issues.apache.org/jira/browse/GIRAPH-64 > Project: Giraph > Issue Type: New Feature > Reporter: Jakob Homan > Assignee: Jakob Homan > Attachments: GIRAPH-64.patch > > > Currently, if a user wants to implement a Giraph algorithm by extending > {{Vertex}} they must also write all the boilerplate around the {{Tool}} > interface and bundle it with the Giraph jar (or get Giraph on the classpath > and playing nice with the implementation). For example, what is included in > the PageRankBenchmark and what Kohei has done: > https://github.com/smly/java-Giraph-LabelPropagation It would be better if > we had perhaps a Vertex implementation to be subclassed that already had all > the standard Tooling included such that all one had to run would be (assuming > the Giraph jar was already on the classpath): > {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o > jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble > -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} > This wouldn't work with every algorithm, but would be useful in a large > number of cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira