Re: How to format Giraph input dataset

2015-03-13 Thread Steven Harenberg
Hi Ralph,

I also wanted to use edge-list input format as well since I am running
examples from SNAP. I ran into a lot of issues and at this point if I could
go back in time I would probably just make a script to convert the graphs
into giraphs standard format.

To deal with the type of errors you had above, I created my own class files:

   - LongFloatTextEdgeInputFormat.java (for pagerank)
   - LongNullTextEdgeInputFormat.java
   - LongNullReverseTextEdgeInputFormat.java (for undirected)
   - LongPair (used inside the above classes)

Basically, these just were the same as their corresponding int class file.

However, the main issue with edgelist input files, there is a fundamental
issue with SSSP (and I believe pagerank) when using an edgelist input
format. If a vertex is not ever listed first in an edge (e.g., it only has
incoming edges), it will not be "active" in superstep 0. This means it will
not be initialized with the correct value (
http://mail-archives.apache.org/mod_mbox/giraph-user/201502.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmail.com%3E
).

On Thu, Mar 12, 2015 at 11:04 AM, MengXiaodong 
wrote:

> Hi Martin,
>
> Thank you for your kindly reply. I followed your suggestion and input the
> command like blow:
>
> *hadoop
> jar 
> giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>  org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.SimpleShortestPathsComputation
> -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat -eip
> /WikiTalk.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
> -op /outputTran -w 1*
>
> However, I got a error when I try this common:
> *Exception in thread "main" java.lang.IllegalArgumentException:
> checkClassTypes: vertex index types not assignable, computation - class
> org.apache.hadoop.io.LongWritable, EdgeInputFormat - class
> org.apache.hadoop.io.NullWritable*
> * at
> org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(GiraphConfigurationValidator.java:384)*
> * at
> org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputFormatGenericTypes(GiraphConfigurationValidator.java:242)*
> * at
> org.apache.giraph.job.GiraphConfigurationValidator.validateConfiguration(GiraphConfigurationValidator.java:142)*
> * at
> org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:222)*
> * at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)*
> * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
> * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)*
> * at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)*
> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
> * at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
> * at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
> * at java.lang.reflect.Method.invoke(Method.java:483)*
> * at org.apache.hadoop.util.RunJar.main(RunJar.java:156)*
>
>
>
> I assume that the error happens because the input format is intwritable
> while the example uses longwritable as the vertex id. If so, may I ask how
> to transfer intwritable to longwritable?
>
> Kindly Regards,
> Ralph
>
> On Mar 11, 2015, at 4:02 PM, Martin Junghanns 
> wrote:
>
> Hi Ralph,
>
> you can set a vertex or edge input format when running a Giraph job.
> In the example, you used the vertex input format (vif)
>
> "-vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat"
>
> Your wikitalk input format is an edge list and Giraph offers, e.g.,
>
> "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat"
>
> which reads a graph where "Each line consists of: source_vertex,
> target_vertex" (separated by a \t)
>
> You can set the edge input format via the -eif parameter.
>
> Cheers,
> Martin
>
> The package "org.apache.giraph.io.formats" in giraph-core contains a lot
> more formats.
>
> On 11.03.2015 06:37, MengXiaodong wrote:
>
> Hi all,
>
> I'm new to Giraph, now I successfully ran my first example by
> following the instruction on Giraph - Quick Start. However, I met a
> question when I write my own Giraph code.
>
> In the "quick start", The format of input graph is as following:
>
> [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]]
> [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]
>
> But the graphs (like Facebook, twitter social network) datasets
> downloaded from public websites are in various format. How can I
> transform a graph into the standard Giraph graph like the above
> one?
>
> For example the WikiTalk graph as blow, which is a directed graph.
> Directed edge A->B means user A edited talk page of B.
>
> # FromNodeId ToNodeId 0 1 2 1 2 21 2 46 2 63 2 88 2 93 2 94 2 101 2
> 102 2 103 2 116 2 119 2 125
>
> Regards, Ralph
>
>
>


Re: How to format Giraph input dataset

2015-03-12 Thread MengXiaodong
Hi Martin,

Thank you for your kindly reply. I followed your suggestion and input the 
command like blow:

hadoop jar 
giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner 
org.apache.giraph.examples.SimpleShortestPathsComputation -eif 
org.apache.giraph.io.formats.IntNullTextEdgeInputFormat -eip /WikiTalk.txt -vof 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /outputTran -w 1

However, I got a error when I try this common:
Exception in thread "main" java.lang.IllegalArgumentException: checkClassTypes: 
vertex index types not assignable, computation - class 
org.apache.hadoop.io.LongWritable, EdgeInputFormat - class 
org.apache.hadoop.io.NullWritable
at 
org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(GiraphConfigurationValidator.java:384)
at 
org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputFormatGenericTypes(GiraphConfigurationValidator.java:242)
at 
org.apache.giraph.job.GiraphConfigurationValidator.validateConfiguration(GiraphConfigurationValidator.java:142)
at 
org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:222)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



I assume that the error happens because the input format is intwritable while 
the example uses longwritable as the vertex id. If so, may I ask how to 
transfer intwritable to longwritable?

Kindly Regards,
Ralph

> On Mar 11, 2015, at 4:02 PM, Martin Junghanns  
> wrote:
> 
> Hi Ralph,
> 
> you can set a vertex or edge input format when running a Giraph job.
> In the example, you used the vertex input format (vif)
> 
> "-vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat"
> 
> Your wikitalk input format is an edge list and Giraph offers, e.g.,
> 
> "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat"
> 
> which reads a graph where "Each line consists of: source_vertex,
> target_vertex" (separated by a \t)
> 
> You can set the edge input format via the -eif parameter.
> 
> Cheers,
> Martin
> 
> The package "org.apache.giraph.io.formats" in giraph-core contains a lot
> more formats.
> 
> On 11.03.2015 06:37, MengXiaodong wrote:
>> Hi all,
>> 
>> I'm new to Giraph, now I successfully ran my first example by
>> following the instruction on Giraph - Quick Start. However, I met a
>> question when I write my own Giraph code.
>> 
>> In the "quick start", The format of input graph is as following:
>> 
>> [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] 
>> [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]
>> 
>> But the graphs (like Facebook, twitter social network) datasets
>> downloaded from public websites are in various format. How can I
>> transform a graph into the standard Giraph graph like the above
>> one?
>> 
>> For example the WikiTalk graph as blow, which is a directed graph.
>> Directed edge A->B means user A edited talk page of B.
>> 
>> # FromNodeId ToNodeId 0  1 2 1 2 21 246 263 288 2
>> 93 294 2101 2
>> 102 2103 2   116 2   119 2   125
>> 
>> Regards, Ralph
>> 



Re: How to format Giraph input dataset

2015-03-11 Thread Martin Junghanns
Hi Ralph,

you can set a vertex or edge input format when running a Giraph job.
In the example, you used the vertex input format (vif)

"-vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat"

Your wikitalk input format is an edge list and Giraph offers, e.g.,

"org.apache.giraph.io.formats.IntNullTextEdgeInputFormat"

which reads a graph where "Each line consists of: source_vertex,
target_vertex" (separated by a \t)

You can set the edge input format via the -eif parameter.

Cheers,
Martin

The package "org.apache.giraph.io.formats" in giraph-core contains a lot
more formats.

On 11.03.2015 06:37, MengXiaodong wrote:
> Hi all,
> 
> I'm new to Giraph, now I successfully ran my first example by
> following the instruction on Giraph - Quick Start. However, I met a
> question when I write my own Giraph code.
> 
> In the "quick start", The format of input graph is as following:
> 
> [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] 
> [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]
> 
> But the graphs (like Facebook, twitter social network) datasets
> downloaded from public websites are in various format. How can I
> transform a graph into the standard Giraph graph like the above
> one?
> 
> For example the WikiTalk graph as blow, which is a directed graph.
> Directed edge A->B means user A edited talk page of B.
> 
> # FromNodeId  ToNodeId 0  1 2 1 2 21 246 263 288 2
> 93 294 2101 2
> 102 2 103 2   116 2   119 2   125
> 
> Regards, Ralph
>