Re: [SOLVED] Re: Giraph job never ends
Figured out the issue via the container log file: container_1426433168188_0001_01_01/gam-stdout.log. Too much virtual memory was trying to be used (I am using a micro instance on EC2 so there is not much to work with) causing an exitCode: 143. Apparently, there is a limit on the virtual memory based on the physical memory, but you can ignore this limit by adding the following to yarn-site.xml: property nameyarn.nodemanager.vmem-check-enabled/name valuefalse/value descriptionWhether virtual memory limits will be enforced for containers./description /property source: http://stackoverflow.com/questions/14110428/am-container-is-running-beyond-virtual-memory-limits Everything seems to be working for me now. On Fri, Mar 13, 2015 at 10:24 PM, Steven Harenberg sdhar...@ncsu.edu wrote: Thanks Phil, I appreciate the help. Your posts over the past couple days have already been quite helpful. There were a few things I was going to play with as well, perhaps it is some configuration issue as you mentioned earlier. I had some issues with EC2 today and I will look at it again tomorrow. Thanks for letting me know about your talk, it sounds interesting. I will try and go as long as I can get there in time. --Steve On Fri, Mar 13, 2015 at 3:37 PM, Phillip Rhodes motley.crue@gmail.com wrote: Steve: I'm not 100% sure what to tell you, and I don't have access to my cluster right this minute. But later this evening I can log in and see if I can find anything that might be useful to you. Also, as an FYI, I'll be doing a presentation on Giraph at the Triangle Java User's Group meeting this coming Monday... if you're in the area (I see you have an @ncsu.edu address), and you can come by, I might be able to help you then. Part of my presentation will be walking through how to setup a Giraph / YARN cluster, based on my experiences over the past few days... Phil This message optimized for indexing by NSA PRISM On Fri, Mar 13, 2015 at 3:30 PM, Steven Harenberg sdhar...@ncsu.edu wrote: Hey Phil, I have been having the exact same problems as you (I am also setting up Giraph on EC2), but this solution did not work for me. Do you recall what error you saw in resourcemanager logs? I am also looking at these logs, but nothing is standing out to me. In fact, it almost seems like the application should have successfully finished. The log stops updating and I see a lot of COMPLETED, RESULT=SUCCESS, FINISHED at the end of the log. Though, it does look like one of the containers is not transitioning to these states. Thanks, Steve On Wed, Mar 11, 2015 at 11:54 PM, Phillip Rhodes motley.crue@gmail.com wrote: OK, this was easy enough to fix, once I understood what was actually happening. Since I'm running on EC2 nodes on AWS, it is not the case that any give node can talk to any other node on any port (at least not by default). I had tried to cherry-pick which ports to whitelist in the security group, but I missed one or more that YARN needed for internal communication. I discovered this when examining the resourcemanager logs. For now, instead of trying to enumerate exactly which ports to allow, I added a rule to allow all traffic for address 10.0.0.0/24 and that solved this. Cheers, Phil On Wed, Mar 11, 2015 at 1:39 PM, Phillip Rhodes motley.crue@gmail.com wrote: Interesting... It totally did not work for me when built using the hadoop_2 profile, but with the hadoop_yarn profile everything at least starts up. I'm pretty baffled right now... my cluster is essentially working, and I can run, for example, the WordCount example just fine. And the Giraph job starts and shows no apparent errors, but I get no output and it seems to run forever. It's probably some really small detail of my Hadoop configuration, or some environmental issue. The problem is, I don't even know where to start looking right now. :-( Phil This message optimized for indexing by NSA PRISM On Wed, Mar 11, 2015 at 3:16 AM, Martin Junghanns martin.jungha...@gmx.net wrote: Hi Phillip, I am using Hadoop 2.5.2 with Giraph 1.1.0 and it runs fine with -Phadoop2 (from scratch) and -Phadoop_yarn (after removing STATIC_SASL_SYMBOL from munge.symbols in pom.xml). Maybe you can also try the stable Giraph version and report your problem as an issue? Cheers, Martin On 11.03.2015 04:03, Phillip Rhodes wrote: Giraph crew: I'm trying to run the SimpleShortestPathsComputation example using the latest Giraph code and Hadoop 2.5.2. My command line looks like this: hadoop jar /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif
Re: [SOLVED] Re: Giraph job never ends
Hey Phil, I have been having the exact same problems as you (I am also setting up Giraph on EC2), but this solution did not work for me. Do you recall what error you saw in resourcemanager logs? I am also looking at these logs, but nothing is standing out to me. In fact, it almost seems like the application should have successfully finished. The log stops updating and I see a lot of COMPLETED, RESULT=SUCCESS, FINISHED at the end of the log. Though, it does look like one of the containers is not transitioning to these states. Thanks, Steve On Wed, Mar 11, 2015 at 11:54 PM, Phillip Rhodes motley.crue@gmail.com wrote: OK, this was easy enough to fix, once I understood what was actually happening. Since I'm running on EC2 nodes on AWS, it is not the case that any give node can talk to any other node on any port (at least not by default). I had tried to cherry-pick which ports to whitelist in the security group, but I missed one or more that YARN needed for internal communication. I discovered this when examining the resourcemanager logs. For now, instead of trying to enumerate exactly which ports to allow, I added a rule to allow all traffic for address 10.0.0.0/24 and that solved this. Cheers, Phil On Wed, Mar 11, 2015 at 1:39 PM, Phillip Rhodes motley.crue@gmail.com wrote: Interesting... It totally did not work for me when built using the hadoop_2 profile, but with the hadoop_yarn profile everything at least starts up. I'm pretty baffled right now... my cluster is essentially working, and I can run, for example, the WordCount example just fine. And the Giraph job starts and shows no apparent errors, but I get no output and it seems to run forever. It's probably some really small detail of my Hadoop configuration, or some environmental issue. The problem is, I don't even know where to start looking right now. :-( Phil This message optimized for indexing by NSA PRISM On Wed, Mar 11, 2015 at 3:16 AM, Martin Junghanns martin.jungha...@gmx.net wrote: Hi Phillip, I am using Hadoop 2.5.2 with Giraph 1.1.0 and it runs fine with -Phadoop2 (from scratch) and -Phadoop_yarn (after removing STATIC_SASL_SYMBOL from munge.symbols in pom.xml). Maybe you can also try the stable Giraph version and report your problem as an issue? Cheers, Martin On 11.03.2015 04:03, Phillip Rhodes wrote: Giraph crew: I'm trying to run the SimpleShortestPathsComputation example using the latest Giraph code and Hadoop 2.5.2. My command line looks like this: hadoop jar /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/prhodes/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/prhodes/giraph_output/shortestpaths -w 4 and the job appears to start OK. But then it starts outputing these kinds of messages, and this just continues (seemingly) forever until you ctrl+c it. 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 305.43 secs 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 309.44 secs 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 313.45 secs 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 317.45 secs 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 ^C15/03/11 02:54:47 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 321.46 secs 15/03/11 02:54:47 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 Any idea what is going on here? Thanks, Phil --- This message optimized for indexing by NSA PRISM
Re: [SOLVED] Re: Giraph job never ends
Thanks Phil, I appreciate the help. Your posts over the past couple days have already been quite helpful. There were a few things I was going to play with as well, perhaps it is some configuration issue as you mentioned earlier. I had some issues with EC2 today and I will look at it again tomorrow. Thanks for letting me know about your talk, it sounds interesting. I will try and go as long as I can get there in time. --Steve On Fri, Mar 13, 2015 at 3:37 PM, Phillip Rhodes motley.crue@gmail.com wrote: Steve: I'm not 100% sure what to tell you, and I don't have access to my cluster right this minute. But later this evening I can log in and see if I can find anything that might be useful to you. Also, as an FYI, I'll be doing a presentation on Giraph at the Triangle Java User's Group meeting this coming Monday... if you're in the area (I see you have an @ncsu.edu address), and you can come by, I might be able to help you then. Part of my presentation will be walking through how to setup a Giraph / YARN cluster, based on my experiences over the past few days... Phil This message optimized for indexing by NSA PRISM On Fri, Mar 13, 2015 at 3:30 PM, Steven Harenberg sdhar...@ncsu.edu wrote: Hey Phil, I have been having the exact same problems as you (I am also setting up Giraph on EC2), but this solution did not work for me. Do you recall what error you saw in resourcemanager logs? I am also looking at these logs, but nothing is standing out to me. In fact, it almost seems like the application should have successfully finished. The log stops updating and I see a lot of COMPLETED, RESULT=SUCCESS, FINISHED at the end of the log. Though, it does look like one of the containers is not transitioning to these states. Thanks, Steve On Wed, Mar 11, 2015 at 11:54 PM, Phillip Rhodes motley.crue@gmail.com wrote: OK, this was easy enough to fix, once I understood what was actually happening. Since I'm running on EC2 nodes on AWS, it is not the case that any give node can talk to any other node on any port (at least not by default). I had tried to cherry-pick which ports to whitelist in the security group, but I missed one or more that YARN needed for internal communication. I discovered this when examining the resourcemanager logs. For now, instead of trying to enumerate exactly which ports to allow, I added a rule to allow all traffic for address 10.0.0.0/24 and that solved this. Cheers, Phil On Wed, Mar 11, 2015 at 1:39 PM, Phillip Rhodes motley.crue@gmail.com wrote: Interesting... It totally did not work for me when built using the hadoop_2 profile, but with the hadoop_yarn profile everything at least starts up. I'm pretty baffled right now... my cluster is essentially working, and I can run, for example, the WordCount example just fine. And the Giraph job starts and shows no apparent errors, but I get no output and it seems to run forever. It's probably some really small detail of my Hadoop configuration, or some environmental issue. The problem is, I don't even know where to start looking right now. :-( Phil This message optimized for indexing by NSA PRISM On Wed, Mar 11, 2015 at 3:16 AM, Martin Junghanns martin.jungha...@gmx.net wrote: Hi Phillip, I am using Hadoop 2.5.2 with Giraph 1.1.0 and it runs fine with -Phadoop2 (from scratch) and -Phadoop_yarn (after removing STATIC_SASL_SYMBOL from munge.symbols in pom.xml). Maybe you can also try the stable Giraph version and report your problem as an issue? Cheers, Martin On 11.03.2015 04:03, Phillip Rhodes wrote: Giraph crew: I'm trying to run the SimpleShortestPathsComputation example using the latest Giraph code and Hadoop 2.5.2. My command line looks like this: hadoop jar /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/prhodes/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/prhodes/giraph_output/shortestpaths -w 4 and the job appears to start OK. But then it starts outputing these kinds of messages, and this just continues (seemingly) forever until you ctrl+c it. 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 305.43 secs 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 309.44 secs 15/03/11
[SOLVED] Re: Giraph job never ends
OK, this was easy enough to fix, once I understood what was actually happening. Since I'm running on EC2 nodes on AWS, it is not the case that any give node can talk to any other node on any port (at least not by default). I had tried to cherry-pick which ports to whitelist in the security group, but I missed one or more that YARN needed for internal communication. I discovered this when examining the resourcemanager logs. For now, instead of trying to enumerate exactly which ports to allow, I added a rule to allow all traffic for address 10.0.0.0/24 and that solved this. Cheers, Phil On Wed, Mar 11, 2015 at 1:39 PM, Phillip Rhodes motley.crue@gmail.com wrote: Interesting... It totally did not work for me when built using the hadoop_2 profile, but with the hadoop_yarn profile everything at least starts up. I'm pretty baffled right now... my cluster is essentially working, and I can run, for example, the WordCount example just fine. And the Giraph job starts and shows no apparent errors, but I get no output and it seems to run forever. It's probably some really small detail of my Hadoop configuration, or some environmental issue. The problem is, I don't even know where to start looking right now. :-( Phil This message optimized for indexing by NSA PRISM On Wed, Mar 11, 2015 at 3:16 AM, Martin Junghanns martin.jungha...@gmx.net wrote: Hi Phillip, I am using Hadoop 2.5.2 with Giraph 1.1.0 and it runs fine with -Phadoop2 (from scratch) and -Phadoop_yarn (after removing STATIC_SASL_SYMBOL from munge.symbols in pom.xml). Maybe you can also try the stable Giraph version and report your problem as an issue? Cheers, Martin On 11.03.2015 04:03, Phillip Rhodes wrote: Giraph crew: I'm trying to run the SimpleShortestPathsComputation example using the latest Giraph code and Hadoop 2.5.2. My command line looks like this: hadoop jar /home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/prhodes/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/prhodes/giraph_output/shortestpaths -w 4 and the job appears to start OK. But then it starts outputing these kinds of messages, and this just continues (seemingly) forever until you ctrl+c it. 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 305.43 secs 15/03/11 02:54:31 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 309.44 secs 15/03/11 02:54:35 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 313.45 secs 15/03/11 02:54:39 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 317.45 secs 15/03/11 02:54:43 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 ^C15/03/11 02:54:47 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 321.46 secs 15/03/11 02:54:47 INFO yarn.GiraphYarnClient: appattempt_1426041786848_0002_01, State: ACCEPTED, Containers used: 1 Any idea what is going on here? Thanks, Phil --- This message optimized for indexing by NSA PRISM