[SOLVED] Re: Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

2015-03-11 Thread Phillip Rhodes
And finally, success is at hand!  This is a bit quirky, but here's what
fixed it:

My command line originally looked like this:

$> hadoop jar 
/home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/prhodes/input/tiny_graph.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/prhodes/giraph_output/shortestpaths -w 4 -yj
/home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar

I vaguely recalled seeing a mention of another user creating a symlink
to the Giraph jar files, and originally I thought he did that as a convenience.
But some intuition suggested that the fully qualified paths might actually
be causing a problem, so I did

$> ln -s 
/home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar

to create a link to that jar in the current directory, then re-ran the
command as:

$> hadoop jar 
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/prhodes/input/tiny_graph.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/prhodes/giraph_output/shortestpaths -w 4 -yj
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar

and viola! It works...


Cheers,


Phil

On Wed, Mar 11, 2015 at 11:37 PM, Phillip Rhodes
 wrote:
> Gang:
>
> I am getting further with my attempt to get Giraph running on a YARN cluster,
> but now I'm stuck at this error:
>
> Could not find or load main class 
> org.apache.giraph.yarn.GiraphApplicationMaster
>
> I've tried everything I can find in previous messages on this topic,
> to no avail. My command line is like this:
>
> hadoop jar 
> /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.SimpleShortestPathsComputation -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/prhodes/input/tiny_graph.txt -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/prhodes/giraph_output/shortestpaths -w 4 -yj
> /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar,/home/prhodes/giraph/giraph-core/target/giraph-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
>
> and those jars are in those locations on each node in the cluster.
> But yet it still complains about not finding the
> GiraphApplicationMaster.   Also, I've verified that the jar file does
> contain the GiraphApplicationMaster.class.
>
> Any ideas what else might be causing the problem, or any workarounds?
> I thought about distributing the Giraph jars to each node and
> physically putting them on the Hadoop Classpath, but the Maven build
> builds fat jars by default, which would probably cause issues doing
> that.  Any other suggestions or ideas?
>
>
> Phil
> ---
> This message optimized for indexing by NSA PRISM


[SOLVED] Re: Giraph job never ends

2015-03-11 Thread Phillip Rhodes
OK, this was easy enough to fix, once I understood what
was actually happening.  Since I'm running on EC2 nodes on
AWS, it is not the case that any give node can talk to any other
node on any port (at least not by default).  I had tried to
cherry-pick which ports to whitelist in the security group,
but I missed one or more that YARN needed for internal
communication.   I discovered this when examining the
resourcemanager logs.


For now, instead of trying to enumerate exactly which ports
to allow, I added a rule to allow "all traffic" for address 10.0.0.0/24
and that solved this.


Cheers,


Phil


On Wed, Mar 11, 2015 at 1:39 PM, Phillip Rhodes
 wrote:
> Interesting... It totally did not work for me when built using the
> hadoop_2 profile, but with the hadoop_yarn profile everything at least
> starts up.  I'm pretty baffled right now... my cluster is essentially
> working, and I can run, for example, the WordCount example just fine.
> And the Giraph job starts and shows no apparent errors, but I get no
> output and it seems to run forever.
>
> It's probably some really small detail of my Hadoop configuration, or
> some environmental issue.  The problem is, I don't even know where to
> start looking right now.  :-(
>
>
> Phil
> This message optimized for indexing by NSA PRISM
>
>
> On Wed, Mar 11, 2015 at 3:16 AM, Martin Junghanns
>  wrote:
>> Hi Phillip,
>>
>> I am using Hadoop 2.5.2 with Giraph 1.1.0 and it runs fine with
>> -Phadoop2 (from scratch) and -Phadoop_yarn (after removing
>> STATIC_SASL_SYMBOL from munge.symbols in pom.xml).
>>
>> Maybe you can also try the stable Giraph
>> version and report your problem as an issue?
>>
>> Cheers,
>> Martin
>>
>> On 11.03.2015 04:03, Phillip Rhodes wrote:
>>> Giraph crew:
>>>
>>> I'm trying to run the SimpleShortestPathsComputation example using
>>> the latest Giraph code and Hadoop 2.5.2.  My command line looks
>>> like this:
>>>
>>> hadoop jar
>>> /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
>>>
>>>
>> org.apache.giraph.GiraphRunner
>>> org.apache.giraph.examples.SimpleShortestPathsComputation -vif
>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>
>>>
>> -vip /user/prhodes/input/tiny_graph.txt -vof
>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> /user/prhodes/giraph_output/shortestpaths -w 4
>>>
>>>
>>> and the job appears to start OK.  But then it starts outputing
>>> these kinds of messages, and this just continues (seemingly)
>>> forever until you ctrl+c it.
>>>
>>> 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: Giraph:
>>> org.apache.giraph.examples.SimpleShortestPathsComputation,
>>> Elapsed: 305.43 secs 15/03/11 02:54:31 INFO yarn.GiraphYarnClient:
>>> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers
>>> used: 1 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: Giraph:
>>> org.apache.giraph.examples.SimpleShortestPathsComputation,
>>> Elapsed: 309.44 secs 15/03/11 02:54:35 INFO yarn.GiraphYarnClient:
>>> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers
>>> used: 1 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: Giraph:
>>> org.apache.giraph.examples.SimpleShortestPathsComputation,
>>> Elapsed: 313.45 secs 15/03/11 02:54:39 INFO yarn.GiraphYarnClient:
>>> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers
>>> used: 1 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: Giraph:
>>> org.apache.giraph.examples.SimpleShortestPathsComputation,
>>> Elapsed: 317.45 secs 15/03/11 02:54:43 INFO yarn.GiraphYarnClient:
>>> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers
>>> used: 1 ^C15/03/11 02:54:47 INFO yarn.GiraphYarnClient: Giraph:
>>> org.apache.giraph.examples.SimpleShortestPathsComputation,
>>> Elapsed: 321.46 secs 15/03/11 02:54:47 INFO yarn.GiraphYarnClient:
>>> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers
>>> used: 1
>>>
>>> Any idea what is going on here?
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Phil ---
>>>
>>>
>>> This message optimized for indexing by NSA PRISM
>>>


Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

2015-03-11 Thread Phillip Rhodes
Gang:

I am getting further with my attempt to get Giraph running on a YARN cluster,
but now I'm stuck at this error:

Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

I've tried everything I can find in previous messages on this topic,
to no avail. My command line is like this:

hadoop jar 
/home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/prhodes/input/tiny_graph.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/prhodes/giraph_output/shortestpaths -w 4 -yj
/home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar,/home/prhodes/giraph/giraph-core/target/giraph-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar

and those jars are in those locations on each node in the cluster.
But yet it still complains about not finding the
GiraphApplicationMaster.   Also, I've verified that the jar file does
contain the GiraphApplicationMaster.class.

Any ideas what else might be causing the problem, or any workarounds?
I thought about distributing the Giraph jars to each node and
physically putting them on the Hadoop Classpath, but the Maven build
builds fat jars by default, which would probably cause issues doing
that.  Any other suggestions or ideas?


Phil
---
This message optimized for indexing by NSA PRISM


Re: SccCompitationTestInMemory - LimitExceededException

2015-03-11 Thread Young Han
This seems like the known problem with MapReduce counters. Try adding the
following to your hadoop-*/conf/mapred-site.xml:

  
mapreduce.job.counters.max
100
  
  
mapreduce.job.counters.limit
100
  

This does the trick for me on Hadoop 1.0.4, and should work for 0.20 as
well. Not sure about YARN.

Young

On Wed, Mar 11, 2015 at 6:28 AM, Michał Szynkiewicz 
wrote:

> I was able to increase the counters limit with: Counters.MAX_COUNTER_LIMIT
> = 2024 (works for hadoop_1 and hadoop 1.2.1).
>
> Then it turned out that whatever limit I set, it is always exceeded.
>
> It turned out that for some reason IntOverwriteAggregator that
> SccPhaseMasterCompute uses to propagate algorithm phase didn't work as
> expected. When read from "copmutations" it had correct value, while read
> from master computation it returned the old value.
>
> I am writing a similar test, where the value to be passed only increases
> and I was able to work around this issue by using Max instead of Overwrite
> aggregator.
>
> Note that I didn't tried to run it yet, these are just results from unit
> tests.
>
> btw, I'm using release-1.1.0
>
>
>
>
> 2015-03-01 23:42 GMT+01:00 Michał Szynkiewicz :
>
>> Hi,
>>
>> I'm trying to run SccComputationTestInMemory and I'm
>> hitting org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
>> many counters: 121 max=120
>>
>> I tried adding both
>> conf.set("mapreduce.job.counters.max", Integer.toString(1024));
>> and
>> conf.set("mapreduce.job.counters.limit", Integer.toString(1024));
>> at the begging of the test, but none of these changed the limit of
>> counters.
>>
>> I tried -Phadoop_2 with hadoop.version=2.6.0 and 2.5.1, -Phadoop_1 with
>> 1.2.1, -Phadoop_0.20.203.
>>
>> How can I run this test successfully?
>>
>> Thanks
>>
>> Michał
>>
>
>


Re: SccCompitationTestInMemory - LimitExceededException

2015-03-11 Thread Michał Szynkiewicz
I was able to increase the counters limit with: Counters.MAX_COUNTER_LIMIT
= 2024 (works for hadoop_1 and hadoop 1.2.1).

Then it turned out that whatever limit I set, it is always exceeded.

It turned out that for some reason IntOverwriteAggregator that
SccPhaseMasterCompute uses to propagate algorithm phase didn't work as
expected. When read from "copmutations" it had correct value, while read
from master computation it returned the old value.

I am writing a similar test, where the value to be passed only increases
and I was able to work around this issue by using Max instead of Overwrite
aggregator.

Note that I didn't tried to run it yet, these are just results from unit
tests.

btw, I'm using release-1.1.0




2015-03-01 23:42 GMT+01:00 Michał Szynkiewicz :

> Hi,
>
> I'm trying to run SccComputationTestInMemory and I'm
> hitting org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
> many counters: 121 max=120
>
> I tried adding both
> conf.set("mapreduce.job.counters.max", Integer.toString(1024));
> and
> conf.set("mapreduce.job.counters.limit", Integer.toString(1024));
> at the begging of the test, but none of these changed the limit of
> counters.
>
> I tried -Phadoop_2 with hadoop.version=2.6.0 and 2.5.1, -Phadoop_1 with
> 1.2.1, -Phadoop_0.20.203.
>
> How can I run this test successfully?
>
> Thanks
>
> Michał
>


Re: Undirected Vertex Definition and Reflexivity

2015-03-11 Thread Matthew Saltz
Hi,

I believe the answer to your question is yes, though I've never done it. If
you use only the edge reader, only the vertices in your graph that have at
least one edge attached to them will be present in your graph. So, if you
have vertices that are entirely disconnected that you want included, you'd
need to do both a VertexReader and an Edge Reader (though I've never done
this). If you don't have disconnected vertices, you don't need the
VertexReader because Giraph will automatically add all of the vertices in
your edge file to the graph (I think this can be disabled). Use the -eip
flag to specify the edge file.

Best,
Matthew Saltz

On Wed, Mar 11, 2015 at 1:54 AM, G.W.  wrote:

> Thanks for that!
>
> This is the right idea, however I was only using a VertexReader until now
> – IntNullReverseTextEdgeInputFormat calls for an EdgeReader.
>
> I am not sure this is the way it works but I like the idea of segregating
> edge and vertex definitions.
>
> *That leads to the following questions: can Giraph support the use of a
> VertexReader and EdgeReader at the same time, that is through the -vif and
> -eif arguments? *
>
> If that works, the idea would be to refactor my input files with:
>
> Vertices:
> vertex_id, vertex_type, ...
>
> Edges
> source_id, target_id
>
> with the edge reader working in "reverse" mode.
>
> Thanks!
>
>
>
>
> On 10 March 2015 at 20:02, Matthew Saltz  wrote:
>
>> Have a look at IntNullReverseTextEdgeInputFormat
>> .
>> It automatically creates reverse edges, but it expects the file format
>>
>> 
>>
>> on each line. If you need to convert it to use longs you can change the
>> code pretty easily.
>>
>> Best,
>> Matthew
>>
>> On Tue, Mar 10, 2015 at 5:37 AM, Young Han 
>> wrote:
>>
>>> The input is assumed to be the vertex followed by a set of *directed*
>>> edges. So, in your example, leaving out E2 means that the final graph will
>>> not have the directed edge from V2 to V1. To get an undirected edge, you
>>> need a pair of directed edges.
>>>
>>> Internally, Giraph stores the out-edges of each vertex as an adjacency
>>> list at that vertex. So, for example, your undirected graph becomes a
>>> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an
>>> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency
>>> list {V2} and a vertex V2 with an empty adjacency list {}.
>>>
>>> There's no easy way for Giraph to infer that V2's adjacency list should
>>> contain V1, because V2 does not track its in-edges. To get around this, you
>>> can either (1) use an undirected input file with both pairs of edges
>>> present; (2) have, in your algorithms, all vertices broadcast their ids to
>>> their out-edge neighbours and then perform mutations to add the missing
>>> edges in the first two superstep; or (3) modify the code in
>>> org.apache.giraph.io.* (in giraph-core) to cache and add missing edges
>>> (i.e., add a new "type" of input format). I'm fairly certain that there
>>> doesn't already exist an "assume undirected graph" input reader, but I'm
>>> not too familiar with the code paths and options there so I could be wrong.
>>>
>>> Young
>>>
>>> On Mon, Mar 9, 2015 at 11:54 PM, G.W.  wrote:
>>>
 Hi Giraph Mailing List,

 I am writing about an undirected graph I am trying to move to Giraph. I
 have a question about the assumption Giraph makes when processing an input.

 Let V1 and V2, two vertices connected with a common edge. E1 defines an
 edge from V1 to V2. E2 defines an edge from V2 to V1.

 Simply put, these are defined in an input file as:
 V1, E1
 V2, E2

 This is working fine, I can process the graph accordingly.

 I was trying to see what would happen if I was to simplify the input
 file:
 V1, E1
 V2

 When would come the time that V2 is processed in a superstep, Giraph
 would not suggest E1 as an  outgoing edge. My questions is why, knowing
 that E1 defines an edge from V1 to V2. The graph being undirected (although
 there is no provision for that in my Giraph computation), shouldn't Giraph
 assume that V2 is connected to V1?

 Down the road, the idea would be to streamline the input format, hence
 my question.

 Thanks!



>>>
>>
>


Re: Giraph job never ends

2015-03-11 Thread Martin Junghanns
Hi Phillip,

I am using Hadoop 2.5.2 with Giraph 1.1.0 and it runs fine with
-Phadoop2 (from scratch) and -Phadoop_yarn (after removing
STATIC_SASL_SYMBOL from munge.symbols in pom.xml).

Maybe you can also try the stable Giraph
version and report your problem as an issue?

Cheers,
Martin

On 11.03.2015 04:03, Phillip Rhodes wrote:
> Giraph crew:
> 
> I'm trying to run the SimpleShortestPathsComputation example using
> the latest Giraph code and Hadoop 2.5.2.  My command line looks
> like this:
> 
> hadoop jar
> /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
>
> 
org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>
> 
-vip /user/prhodes/input/tiny_graph.txt -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
> /user/prhodes/giraph_output/shortestpaths -w 4
> 
> 
> and the job appears to start OK.  But then it starts outputing
> these kinds of messages, and this just continues (seemingly)
> forever until you ctrl+c it.
> 
> 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: Giraph: 
> org.apache.giraph.examples.SimpleShortestPathsComputation,
> Elapsed: 305.43 secs 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: 
> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers 
> used: 1 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: Giraph: 
> org.apache.giraph.examples.SimpleShortestPathsComputation,
> Elapsed: 309.44 secs 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: 
> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers 
> used: 1 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: Giraph: 
> org.apache.giraph.examples.SimpleShortestPathsComputation,
> Elapsed: 313.45 secs 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: 
> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers 
> used: 1 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: Giraph: 
> org.apache.giraph.examples.SimpleShortestPathsComputation,
> Elapsed: 317.45 secs 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: 
> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers 
> used: 1 ^C15/03/11 02:54:47 INFO yarn.GiraphYarnClient: Giraph: 
> org.apache.giraph.examples.SimpleShortestPathsComputation,
> Elapsed: 321.46 secs 15/03/11 02:54:47 INFO yarn.GiraphYarnClient: 
> appattempt_1426041786848_0002_01, State: ACCEPTED, Containers 
> used: 1
> 
> Any idea what is going on here?
> 
> 
> Thanks,
> 
> 
> Phil ---
> 
> 
> This message optimized for indexing by NSA PRISM
> 


Re: How to format Giraph input dataset

2015-03-11 Thread Martin Junghanns
Hi Ralph,

you can set a vertex or edge input format when running a Giraph job.
In the example, you used the vertex input format (vif)

"-vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat"

Your wikitalk input format is an edge list and Giraph offers, e.g.,

"org.apache.giraph.io.formats.IntNullTextEdgeInputFormat"

which reads a graph where "Each line consists of: source_vertex,
target_vertex" (separated by a \t)

You can set the edge input format via the -eif parameter.

Cheers,
Martin

The package "org.apache.giraph.io.formats" in giraph-core contains a lot
more formats.

On 11.03.2015 06:37, MengXiaodong wrote:
> Hi all,
> 
> I'm new to Giraph, now I successfully ran my first example by
> following the instruction on Giraph - Quick Start. However, I met a
> question when I write my own Giraph code.
> 
> In the "quick start", The format of input graph is as following:
> 
> [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] 
> [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]
> 
> But the graphs (like Facebook, twitter social network) datasets
> downloaded from public websites are in various format. How can I
> transform a graph into the standard Giraph graph like the above
> one?
> 
> For example the WikiTalk graph as blow, which is a directed graph.
> Directed edge A->B means user A edited talk page of B.
> 
> # FromNodeId  ToNodeId 0  1 2 1 2 21 246 263 288 2
> 93 294 2101 2
> 102 2 103 2   116 2   119 2   125
> 
> Regards, Ralph
>