Re: How to format Giraph input dataset
Hi Ralph, I also wanted to use edge-list input format as well since I am running examples from SNAP. I ran into a lot of issues and at this point if I could go back in time I would probably just make a script to convert the graphs into giraphs standard format. To deal with the type of errors you had above, I created my own class files: - LongFloatTextEdgeInputFormat.java (for pagerank) - LongNullTextEdgeInputFormat.java - LongNullReverseTextEdgeInputFormat.java (for undirected) - LongPair (used inside the above classes) Basically, these just were the same as their corresponding int class file. However, the main issue with edgelist input files, there is a fundamental issue with SSSP (and I believe pagerank) when using an edgelist input format. If a vertex is not ever listed first in an edge (e.g., it only has incoming edges), it will not be "active" in superstep 0. This means it will not be initialized with the correct value ( http://mail-archives.apache.org/mod_mbox/giraph-user/201502.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmail.com%3E ). On Thu, Mar 12, 2015 at 11:04 AM, MengXiaodong wrote: > Hi Martin, > > Thank you for your kindly reply. I followed your suggestion and input the > command like blow: > > *hadoop > jar > giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > org.apache.giraph.examples.SimpleShortestPathsComputation > -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat -eip > /WikiTalk.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat > -op /outputTran -w 1* > > However, I got a error when I try this common: > *Exception in thread "main" java.lang.IllegalArgumentException: > checkClassTypes: vertex index types not assignable, computation - class > org.apache.hadoop.io.LongWritable, EdgeInputFormat - class > org.apache.hadoop.io.NullWritable* > * at > org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(GiraphConfigurationValidator.java:384)* > * at > org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputFormatGenericTypes(GiraphConfigurationValidator.java:242)* > * at > org.apache.giraph.job.GiraphConfigurationValidator.validateConfiguration(GiraphConfigurationValidator.java:142)* > * at > org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:222)* > * at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)* > * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* > * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)* > * at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)* > * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* > * at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)* > * at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* > * at java.lang.reflect.Method.invoke(Method.java:483)* > * at org.apache.hadoop.util.RunJar.main(RunJar.java:156)* > > > > I assume that the error happens because the input format is intwritable > while the example uses longwritable as the vertex id. If so, may I ask how > to transfer intwritable to longwritable? > > Kindly Regards, > Ralph > > On Mar 11, 2015, at 4:02 PM, Martin Junghanns > wrote: > > Hi Ralph, > > you can set a vertex or edge input format when running a Giraph job. > In the example, you used the vertex input format (vif) > > "-vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat" > > Your wikitalk input format is an edge list and Giraph offers, e.g., > > "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat" > > which reads a graph where "Each line consists of: source_vertex, > target_vertex" (separated by a \t) > > You can set the edge input format via the -eif parameter. > > Cheers, > Martin > > The package "org.apache.giraph.io.formats" in giraph-core contains a lot > more formats. > > On 11.03.2015 06:37, MengXiaodong wrote: > > Hi all, > > I'm new to Giraph, now I successfully ran my first example by > following the instruction on Giraph - Quick Start. However, I met a > question when I write my own Giraph code. > > In the "quick start", The format of input graph is as following: > > [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] > [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]] > > But the graphs (like Facebook, twitter social network) datasets > downloaded from public websites are in various format. How can I > transform a graph into the standard Giraph graph like the above > one? > > For example the WikiTalk graph as blow, which is a directed graph. > Directed edge A->B means user A edited talk page of B. > > # FromNodeId ToNodeId 0 1 2 1 2 21 2 46 2 63 2 88 2 93 2 94 2 101 2 > 102 2 103 2 116 2 119 2 125 > > Regards, Ralph > > >
Re: How to format Giraph input dataset
Hi Martin, Thank you for your kindly reply. I followed your suggestion and input the command like blow: hadoop jar giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat -eip /WikiTalk.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /outputTran -w 1 However, I got a error when I try this common: Exception in thread "main" java.lang.IllegalArgumentException: checkClassTypes: vertex index types not assignable, computation - class org.apache.hadoop.io.LongWritable, EdgeInputFormat - class org.apache.hadoop.io.NullWritable at org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(GiraphConfigurationValidator.java:384) at org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputFormatGenericTypes(GiraphConfigurationValidator.java:242) at org.apache.giraph.job.GiraphConfigurationValidator.validateConfiguration(GiraphConfigurationValidator.java:142) at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:222) at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I assume that the error happens because the input format is intwritable while the example uses longwritable as the vertex id. If so, may I ask how to transfer intwritable to longwritable? Kindly Regards, Ralph > On Mar 11, 2015, at 4:02 PM, Martin Junghanns > wrote: > > Hi Ralph, > > you can set a vertex or edge input format when running a Giraph job. > In the example, you used the vertex input format (vif) > > "-vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat" > > Your wikitalk input format is an edge list and Giraph offers, e.g., > > "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat" > > which reads a graph where "Each line consists of: source_vertex, > target_vertex" (separated by a \t) > > You can set the edge input format via the -eif parameter. > > Cheers, > Martin > > The package "org.apache.giraph.io.formats" in giraph-core contains a lot > more formats. > > On 11.03.2015 06:37, MengXiaodong wrote: >> Hi all, >> >> I'm new to Giraph, now I successfully ran my first example by >> following the instruction on Giraph - Quick Start. However, I met a >> question when I write my own Giraph code. >> >> In the "quick start", The format of input graph is as following: >> >> [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] >> [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]] >> >> But the graphs (like Facebook, twitter social network) datasets >> downloaded from public websites are in various format. How can I >> transform a graph into the standard Giraph graph like the above >> one? >> >> For example the WikiTalk graph as blow, which is a directed graph. >> Directed edge A->B means user A edited talk page of B. >> >> # FromNodeId ToNodeId 0 1 2 1 2 21 246 263 288 2 >> 93 294 2101 2 >> 102 2103 2 116 2 119 2 125 >> >> Regards, Ralph >>
Re: How to format Giraph input dataset
Hi Ralph, you can set a vertex or edge input format when running a Giraph job. In the example, you used the vertex input format (vif) "-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat" Your wikitalk input format is an edge list and Giraph offers, e.g., "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat" which reads a graph where "Each line consists of: source_vertex, target_vertex" (separated by a \t) You can set the edge input format via the -eif parameter. Cheers, Martin The package "org.apache.giraph.io.formats" in giraph-core contains a lot more formats. On 11.03.2015 06:37, MengXiaodong wrote: > Hi all, > > I'm new to Giraph, now I successfully ran my first example by > following the instruction on Giraph - Quick Start. However, I met a > question when I write my own Giraph code. > > In the "quick start", The format of input graph is as following: > > [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] > [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]] > > But the graphs (like Facebook, twitter social network) datasets > downloaded from public websites are in various format. How can I > transform a graph into the standard Giraph graph like the above > one? > > For example the WikiTalk graph as blow, which is a directed graph. > Directed edge A->B means user A edited talk page of B. > > # FromNodeId ToNodeId 0 1 2 1 2 21 246 263 288 2 > 93 294 2101 2 > 102 2 103 2 116 2 119 2 125 > > Regards, Ralph >