How to output into multiple files through a GiraphJob

2014-06-19 Thread Ferenc Béres
Hi Everyone,

Currently I'm working on an ALS implementation in giraph 1.1.0 and I would
like to output the values of the vertices into multiple output files, but I
could not figure it out how to do it.

I found that in Hadoop it can be done by using
*org.apache.hadoop.mapreduce.lib.output.MultipleOutputsKEYOUT,VALUEOUT,
*but it didn't work with the GiraphJob.

Is it possible to output into multiple files by configuring the GiraphJob,
or there is an other way?

I would appreciate any idea in this matter.

Thank you,
Ferenc Béres


Giraph rdf input format and problem with BspUtils

2014-06-19 Thread Carmen Manzulli
Hi my name is Carmen,
i have some problems with my code and i think it is dute of BspUtils class
(maybe i should use ReflectionUtils or something else but i don't know
how); Can someone help me?

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Pattern;

import org.apache.giraph.graph.Vertex;
import org.apache.giraph.graph.BspUtils;
import org.apache.giraph.io.VertexReader;
import org.apache.giraph.io.formats.TextVertexInputFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;

import com.google.common.collect.Lists;

public class SimpleRDFVertexInputFormat extends
TextVertexInputFormatText, Text, Text, NullWritable {

@Override
public VertexReaderText, Text, Text, NullWritable createVertexReader(
InputSplit split, TaskAttemptContext context) throws IOException {
return new SimpleRDFVertexReader(TextInputFormat.createRecordReader(split,
context));
}

public class SimpleRDFVertexReader extends
TextVertexInputFormat.TextVertexReaderText, Text, Text, NullWritable {

/** Separator of the vertex and neighbors */
private final Pattern SEPARATOR = Pattern.compile([\t ]);


public SimpleRDFVertexReader(
RecordReaderLongWritable, Text lineRecordReader) {
super(lineRecordReader);
}

@Override
public boolean nextVertex() throws IOException, InterruptedException {
return getRecordReader().nextKeyValue();
}


@Override
public BasicVertexText, Text, Text, NullWritable getCurrentVertex()
throws IOException, InterruptedException {

BasicVertexText, Text, Text, NullWritable
vertex = BspUtils.Text, Text, Text,
NullWritablecreateVertex(getContext().getConfiguration());

String[] tokens =
SEPARATOR.split(getRecordReader().getCurrentValue().toString());

// we load the following format (VERSION 2)
// field separator: \t
// subjectURI edgeCount isStartNode predicate1 object1 ... predicateN
objectN

Text vertexURI = new Text(tokens[0]);

Text vertexValue = new Text();

// don't do anything with the edgeCount right now (tokens[1])
// don't do anything with isStartNode right now (tokens[2])

// edges maps objectURI to predicateURI
MapText, Text edges = new HashMapText, Text();
if (tokens.length  2 ) {
for (int n = 3; n  tokens.length; n = n + 2) {
edges.put(new Text(tokens[n]), new Text(tokens[n+1]));
}
} else {
// pass an empty list of edges to vertex.initialize()
}

vertex.initialize(vertexURI, vertexValue, edges,
Lists.NullWritablenewArrayList());

return vertex;
}

}


Re: How to output into multiple files through a GiraphJob

2014-06-19 Thread John Yost
Hi Ferenc,

I have an Giraph job that outputs from the Computation class as opposed to
the MasterCompute because I need to maintain alot of state within
VertexValues as opposed to Aggregators.  This is one way of outputting
results as multiple files.  I am assuming that you want to scope output
files per sub-graph groupings of vertices, of course. :)

--John


On Thu, Jun 19, 2014 at 4:02 AM, Ferenc Béres ferdzs...@gmail.com wrote:

 Hi Everyone,

 Currently I'm working on an ALS implementation in giraph 1.1.0 and I would
 like to output the values of the vertices into multiple output files, but I
 could not figure it out how to do it.

 I found that in Hadoop it can be done by using 
 *org.apache.hadoop.mapreduce.lib.output.MultipleOutputsKEYOUT,VALUEOUT,
 *but it didn't work with the GiraphJob.

 Is it possible to output into multiple files by configuring the GiraphJob,
 or there is an other way?

 I would appreciate any idea in this matter.

 Thank you,
 Ferenc Béres