Re: Giraph Use Case
I recommend downloading a Twitter data set from SNAP and trying out PageRank,, Jaccard, Lin, etc...to define and compare communities..That's kind where I started. :) --John On Mon, Aug 11, 2014 at 8:46 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, Although I have installed and ran Giraph example on my Hadoop Cluster referring to example below https://giraph.apache.org/quick_start.html its working great but I wanted to know what could be the other possible use case scenario/implementation of Giraph. Experts advice would be highly appreciated! Thanks!
Re: Couldn't instantiate
@465962c4 2014-07-02 15:49:17,509 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:22181. Will not attempt to authenticate using SASL (unknown error) 2014-07-02 15:49:17,509 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:22181, initiating session 2014-07-02 15:49:17,515 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:22181, sessionid = 0x146f756106b0001, negotiated timeout = 60 2014-07-02 15:49:17,516 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete. 2014-07-02 15:49:17,530 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker 2014-07-02 15:49:17,530 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY 2014-07-02 15:49:17,561 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021315_0003/_masterJobState) 2014-07-02 15:49:17,568 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021315_0003/_masterElectionDir/carmen-HP-Pavilion-Sleekbook-15_000' and my bid is '/_hadoopBsp/job_201407021315_0003/_masterElectionDir/carmen-HP-Pavilion-Sleekbook-15_000' 2014-07-02 15:49:17,570 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021315_0003/_applicationAttemptsDir already exists! 2014-07-02 15:49:17,625 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution group with 8 threads for requestFrameDecoder. 2014-07-02 15:49:17,674 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:3 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 2014-07-02 15:49:17,679 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after request-encoder. 2014-07-02 15:49:17,682 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master! 2014-07-02 15:49:17,684 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021315_0003/_applicationAttemptsDir already exists! 2014-07-02 15:49:17,717 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with NullPointerException java.lang.NullPointerException at org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:330) at org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:619) at org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:686) at org.apache.giraph.master.MasterThread.run(MasterThread.java:108) 2014-07-02 15:49:17,718 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.NullPointerException, exiting... java.lang.IllegalStateException: java.lang.NullPointerException at org.apache.giraph.master.MasterThread.run(MasterThread.java:193) Caused by: java.lang.NullPointerException at org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:330) at org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:619) at org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:686) at org.apache.giraph.master.MasterThread.run(MasterThread.java:108) 2014-07-02 15:49:17,722 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started. 2014-07-02 15:49:17,727 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process. 2014-07-02 15:49:18,049 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f756106b0001, likely server has closed socket, closing socket connection and attempting reconnect 2014-07-02 15:49:18,050 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed). 2014-07-02 13:52 GMT+02:00 John Yost soozandjohny...@gmail.com: Hi Carmen, Please post more of the exception stack trace, not enough here for me to figure anything out. :) Thanks --John On Wed, Jul 2, 2014 at 7:33 AM, soozandjohny...@gmail.com wrote: Hi Carmen, Glad that one problem is fixed, and I can take a look at this one as well. --John Sent from my iPhone On Jul 2, 2014, at 6:50 AM, Carmen Manzulli carmenmanzu...@gmail.com wrote: ok; i've done what you have told me...but now i've got this problem.. ava.lang.Throwable: Child Error
Re: Couldn't instantiate
Hi Carmen, Please post more of the exception stack trace, not enough here for me to figure anything out. :) Thanks --John On Wed, Jul 2, 2014 at 7:33 AM, soozandjohny...@gmail.com wrote: Hi Carmen, Glad that one problem is fixed, and I can take a look at this one as well. --John Sent from my iPhone On Jul 2, 2014, at 6:50 AM, Carmen Manzulli carmenmanzu...@gmail.com wrote: ok; i've done what you have told me...but now i've got this problem.. ava.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) this is my Computation code: import org.apache.giraph.GiraphRunner; import org.apache.giraph.graph.BasicComputation; import org.apache.giraph.graph.Vertex; import org.apache.giraph.edge.Edge; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.util.ToolRunner; public class SimpleSelectionComputation extends BasicComputationText,NullWritable,Text,NullWritable { @Override public void compute(VertexText,NullWritable,Text vertex,IterableNullWritable messages){ Text source = new Text(http://dbpedia.org/resource/1040s;); if (getSuperstep()==0) { if(vertex.getId()==source) { System.out.println(il soggetto +vertex.getId()+ ha i seguenti predicati e oggetti:); for(EdgeText,Text e : vertex.getEdges()) { System.out.println(e.getValue()+\t+e.getTargetVertexId()); } } vertex.voteToHalt(); } } public static void main(String[] args) throws Exception { System.exit(ToolRunner.run(new GiraphRunner(), args)); } }
Re: Couldn't instantiate
Hi Carmen, Question--did you only define an arguments constructor? If so, I think you are getting this because you did not define a no-arguments constructor with public visibility. If this is not the case, I recommend posting your source code and I will be happy to help. --John On Mon, Jun 30, 2014 at 9:38 AM, Carmen Manzulli carmenmanzu...@gmail.com wrote: Hi, I'm trying to run a selectionComputation with my own code for VertexInputFormat but giraph' job starts to work and then fails with: java.lang.IllegalStateException: run: Caught an unrecoverable exception newInstance: Couldn't instantiate sisinflab.SimpleRDFVertexInputFormat at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.IllegalStateException: newInstance: Couldn't instantiate sisinflab.SimpleRDFVertexInputFormat at org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:105) at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createVertexInputFormat(ImmutableClassesGiraphConfiguration.java:235) at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createWrappedVertexInputFormat(ImmutableClassesGiraphConfiguration.java:246) at org.apache.giraph.graph.GraphTaskManager.checkInput(GraphTaskManager.java:171) at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:207) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:59) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:89) ... 7 more Caused by: java.lang.InstantiationException at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at java.lang.Class.newInstance(Class.java:374) at org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:103) ... 13 more what does it mean? where is the problem? Who can help me? Carmen
Re: Giraph insists on LocalJobRunner with custom Computation
Hi Yorn, I figured this out and detailed the solution in my post earlier this morning (6/23 2:46). The key is the following: -ca mapred.job.tracker=localhost:5431. Without this, you'll see the exception you detailed above. --John On Thu, Jun 5, 2014 at 5:36 AM, Yørn de Jong y...@uninett.no wrote: Hi group I have set up Giraph on a YARN cluster. I have no trouble running the shortest paths example as described in [1], but when I try to run my own algorithm, the program stops with: Exception in thread main Java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time! When I change the command to -w 1, it stops with Exception in thread main java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time! The command I try to run is hadoop \ jar giraph-rank-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.0.1-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner no.uninett.yorn.giraph.computation.DOSRank \ -eif no.uninett.yorn.giraph.format.io.NetflowCSVEdgeInputFormat \ -eip /user/hdfs/trd_gw1_12_01_normalized.csv \ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op /user/yarn/output \ -wc org.apache.giraph.worker.DefaultWorkerContext \ -w 5 \ -yj giraph-rank-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.0.1-jar-with-dependencies.jar Where, naturally, no.uninett.yorn.giraph.computation.DOSRank is my own algorithm, which is contained in giraph-rank-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.0.1-jar-with-dependencies.jar. giraph-rank is built using the same command I used to build the giraph-examples project, and the pom.xml for giraph-rank is made by copying from giraph-examples, and replacing «examples» with «rank». The command used to build both projects is: mvn -Phadoop_yarn -Dhadoop.version=2.3.0-cdh5.0.1 clean package -DskipTests Interestingly enough, when I change the input format to something else, I get error messages having to do with type mismatches. This seems to suggest that my EdgeInputFormat does start, but that the problem occurs while it runs or after it runs. I don’t know how to debug this. Am I missing something? How can I get my algorithm to run? The source code is available on [2]. [1] https://giraph.apache.org/quick_start.html#qs_section_5 [2] https://scm.uninett.no/yorn/giraph
Got Giraph 1.1.0 examples running on YARN
Hi Everyone, I just gotten the Giraph examples to run on YARN and I thought I would share the details since it looks like a few people have struggled with this. This is what I did: 1. Downloaded the latest snapshot (giraph-b218d2) 2. Built with mvn install -P=hadoop_2 -DskipTests=true 3. Executed with the following CLI entry against a Hadoop 2.2.0 pseudo-cluster with mapreduce.framework.name=yarn: hadoop jar /usr/local/java/giraph/giraph-1.1/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hadoop/tiny.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hadoop/shortestpaths -w 1 -ca giraph.zkList=localhost:2181 -ca giraph.SplitMasterWorker=true -ca mapred.job.tracker=localhost:54311 -ca mapreduce.job.tracker=localhost:54311 Note: the parameter -ca mapred.job.tracker=localhost:54311 is crucial to this working in cluster/pseudo-cluster mode. Otherwise, You'll get the following error hat Satyajit recently posted: java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time! If you actually want to run in local mode, have not figured that out as I want to be able to run on my cluster instead. --John.
Shortest Path Still Won't Work--Any Ideas?
Here's more details regarding my attempts are running Shortest Path. Any help would be greatly appreciated as the root cause for the Giraph job failing is not obvious to me. Thanks ---John Command Line: $ hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hadoop/tiny.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hadoop/shortestpaths -w 1 -yj giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar Console Output: 14/06/20 21:49:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/20 21:49:03 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one. 14/06/20 21:49:03 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one. 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Final output path is: hdfs://localhost.localdomain:8020/user/hadoop/shortestpaths 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Running Client 14/06/20 21:49:03 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Got node report from ASM for, nodeId=localhost.localdomain:36056, nodeAddress localhost.localdomain:8042, nodeRackName /default-rack, nodeNumContainers 0 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Defaulting per-task heap size to 1024MB. 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Obtained new Application ID: application_1402926902901_0001 14/06/20 21:49:03 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Set the environment for the application master 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Environment for AM :{CLASSPATH=${CLASSPATH}:./*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*} 14/06/20 21:49:03 INFO yarn.GiraphYarnClient: buildLocalResourceMap 14/06/20 21:49:03 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/06/20 21:49:04 INFO yarn.YarnUtils: Registered file in LocalResources :: hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402926902901_0001/giraph-conf.xml 14/06/20 21:49:04 INFO yarn.GiraphYarnClient: LIB JARS :giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar 14/06/20 21:49:04 INFO yarn.YarnUtils: Class path name . 14/06/20 21:49:04 INFO yarn.YarnUtils: base path checking . 14/06/20 21:49:04 INFO yarn.GiraphYarnClient: Made local resource for :/home/hadoop/Downloads/giraph/giraph-1200915/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar to hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402926902901_0001/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar 14/06/20 21:49:04 INFO yarn.YarnUtils: Registered file in LocalResources :: hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402926902901_0001/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar 14/06/20 21:49:04 INFO yarn.GiraphYarnClient: ApplicationSumbissionContext for GiraphApplicationMaster launch container is populated. 14/06/20 21:49:04 INFO yarn.GiraphYarnClient: Submitting application to ASM 14/06/20 21:49:04 INFO impl.YarnClientImpl: Submitted application application_1402926902901_0001 to ResourceManager at /0.0.0.0:8032 14/06/20 21:49:04 INFO yarn.GiraphYarnClient: Got new appId after submission :application_1402926902901_0001 14/06/20 21:49:04 INFO yarn.GiraphYarnClient: GiraphApplicationMaster container request was submitted to ResourceManager for job: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation 14/06/20 21:49:05 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 0.88 secs 14/06/20 21:49:05 INFO yarn.GiraphYarnClient: appattempt_1402926902901_0001_01, State: ACCEPTED, Containers used: 1 14/06/20 21:49:09 INFO yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 4.90 secs 14/06/20 21:49:09 INFO yarn.GiraphYarnClient: appattempt_1402926902901_0001_01, State: RUNNING, Containers used: 1 14/06/20 21:49:13 INFO yarn.GiraphYarnClient: Cleaning up HDFS distributed cache directory for Giraph job. 14/06/20 21:49:13 INFO
Re: How to output into multiple files through a GiraphJob
Hi Ferenc, I have an Giraph job that outputs from the Computation class as opposed to the MasterCompute because I need to maintain alot of state within VertexValues as opposed to Aggregators. This is one way of outputting results as multiple files. I am assuming that you want to scope output files per sub-graph groupings of vertices, of course. :) --John On Thu, Jun 19, 2014 at 4:02 AM, Ferenc Béres ferdzs...@gmail.com wrote: Hi Everyone, Currently I'm working on an ALS implementation in giraph 1.1.0 and I would like to output the values of the vertices into multiple output files, but I could not figure it out how to do it. I found that in Hadoop it can be done by using *org.apache.hadoop.mapreduce.lib.output.MultipleOutputsKEYOUT,VALUEOUT, *but it didn't work with the GiraphJob. Is it possible to output into multiple files by configuring the GiraphJob, or there is an other way? I would appreciate any idea in this matter. Thank you, Ferenc Béres
Cannot run shortest path on Hadoop 2.2
Hi Everyone, The shortest path example fails on my Hadoop 2.2.0 single node cluster, and I don't see an identifiable root exception. I am able to execute my Map/Reduce jobs, including ones that use Accumulo for a source and/or sink, but cannot get the Giraph example jobs nor my custom Giraph jobs to run. I followed the build and job launch instructions from the following URL: http://mail-archives.apache.org/mod_mbox/giraph-user/201312.mbox/%3C1647021.5fbjhLDxPK@chronos7%3E Here's the Hadoop console output I get when I attempt to run shortest path: 2014-06-16 07:10:09,631 INFO [main] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:main(421)) - Starting GitaphAM 2014-06-16 07:10:10,277 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-06-16 07:10:11,063 INFO [main] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:init(168)) - GiraphAM for ContainerId container_1402830191668_0017_01_01 ApplicationAttemptId appattempt_1402830191668_0017_01 2014-06-16 07:10:11,130 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(56)) - Connecting to ResourceManager at / 0.0.0.0:8030 2014-06-16 07:10:11,136 INFO [main] impl.NMClientAsyncImpl (NMClientAsyncImpl.java:serviceInit(107)) - Upper bound of the thread pool size is 500 2014-06-16 07:10:11,136 INFO [main] impl.ContainerManagementProtocolProxy (ContainerManagementProtocolProxy.java:init(71)) - yarn.client.max-nodemanagers-proxies : 500 2014-06-16 07:10:11,299 INFO [main] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:setupContainerAskForRM(279)) - Requested container ask: Capability[memory:1024, vCores:0]Priority[10] 2014-06-16 07:10:11,304 INFO [main] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:setupContainerAskForRM(279)) - Requested container ask: Capability[memory:1024, vCores:0]Priority[10] 2014-06-16 07:10:11,305 INFO [main] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:run(185)) - Wait to finish .. 2014-06-16 07:10:13,331 INFO [AMRM Callback Handler Thread] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:onContainersAllocated(605)) - Got response from RM for container ask, allocatedCnt=1 2014-06-16 07:10:13,331 INFO [AMRM Callback Handler Thread] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:onContainersAllocated(608)) - Total allocated # of container so far : 1 allocated out of 2 required. 2014-06-16 07:10:13,332 INFO [AMRM Callback Handler Thread] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:startContainerLaunchingThreads(359)) - Launching command on a new container., containerId=container_1402830191668_0017_01_02, containerNode=localhost.localdomain:38256, containerNodeURI=localhost.localdomain:8042, containerResourceMemory=1024 2014-06-16 07:10:13,333 INFO [pool-2-thread-1] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:buildContainerLaunchContext(492)) - Setting up container launch container for containerid=container_1402830191668_0017_01_02 2014-06-16 07:10:13,348 INFO [pool-2-thread-1] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:buildContainerLaunchContext(498)) - Conatain launch Commands :java -Xmx1024M -Xms1024M -cp .:${CLASSPATH} org.apache.giraph.yarn.GiraphYarnTask 1402830191668 17 2 1 1LOG_DIR/task-2-stdout.log 2LOG_DIR/task-2-stderr.log 2014-06-16 07:10:13,349 INFO [pool-2-thread-1] yarn.GiraphApplicationMaster (GiraphApplicationMaster.java:buildContainerLaunchContext(518)) - Setting username in ContainerLaunchContext to: hadoop 2014-06-16 07:10:13,744 INFO [pool-2-thread-1] yarn.YarnUtils (YarnUtils.java:addFsResourcesToMap(72)) - Adding giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar to LocalResources for export.to hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402830191668_0017/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar 2014-06-16 07:10:13,774 INFO [pool-2-thread-1] yarn.YarnUtils (YarnUtils.java:addFileToResourceMap(160)) - Registered file in LocalResources :: hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402830191668_0017/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar 2014-06-16 07:10:13,774 WARN [pool-2-thread-1] yarn.YarnUtils (YarnUtils.java:addFsResourcesToMap(81)) - Job jars (-yj option) didn't include giraph-core. 2014-06-16 07:10:13,776 INFO [pool-2-thread-1] yarn.YarnUtils (YarnUtils.java:addFileToResourceMap(160)) - Registered file in LocalResources :: hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402830191668_0017/giraph-conf.xml 2014-06-16 07:10:13,786 INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0] impl.NMClientAsyncImpl (NMClientAsyncImpl.java:run(531)) - Processing Event EventType: START_CONTAINER for Container
Re: Giraph keeps trying to connect to 9000 on Hadoop 2.2.0/YARN
Hey Avery, Thanks a bunch for responding so quickly to my post! Looks like the problem is with my client class. When I attempt to run one of the Giraph examples, which use GiraphRunner, GiraphRunner connects to the correct port and launches the Giraph job. SoI just need to take a closer look at GiraphRunner. Thanks again for your quick response--much appreciated. --John On Sun, Jun 1, 2014 at 11:12 AM, Avery Ching ach...@apache.org wrote: Giraph should just pick up your cluster's HDFS configuration. Can you check your hadoop *.xml files? On 6/1/14, 3:34 AM, John Yost wrote: Hi Everyone, Not sure why, but Giraph tries to connect to port 9000: java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) I set the following in the Giraph configuration: GiraphConstants.IS_PURE_YARN_JOB.set(conf,true); conf.set(giraph.useNetty,true); conf.set(giraph.zkList,localhost.localdomain); conf.set(fs.defaultFS,hdfs://localhost.localdomain:8020) conf.set(mapreduce.job.tracker,localhost.localdomain:54311); conf.set(mapreduce.framework.name,yarn); conf.set(yarn.resourcemanager.address,localhost.localdomain:8032); I built Giraph as follows: mvn -DskipTests=true -Dhadoop.version=2.2.0 -Phadoop_yarn clean install Any ideas as to why Giraph attempts to connect to 9000 instead of 8020? --John
Giraph keeps trying to connect to 9000 on Hadoop 2.2.0/YARN
Hi Everyone, Not sure why, but Giraph tries to connect to port 9000: java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) I set the following in the Giraph configuration: GiraphConstants.IS_PURE_YARN_JOB.set(conf,true); conf.set(giraph.useNetty,true); conf.set(giraph.zkList,localhost.localdomain); conf.set(fs.defaultFS,hdfs://localhost.localdomain:8020) conf.set(mapreduce.job.tracker,localhost.localdomain:54311); conf.set(mapreduce.framework.name,yarn); conf.set(yarn.resourcemanager.address,localhost.localdomain:8032); I built Giraph as follows: mvn -DskipTests=true -Dhadoop.version=2.2.0 -Phadoop_yarn clean install Any ideas as to why Giraph attempts to connect to 9000 instead of 8020? --John
Giraph job hangs and is eventually killed
Hi Everyone, I have a shortest path implementation that completes and outputs the correct results to a counter, but then hangs after the last superstep and is eventually killed by Hadoop. Here's the output from the console: main-SendThread(localhost.localdomain:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) [main-SendThread(localhost.localdomain:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session [main-SendThread(localhost.localdomain:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x1451fc674a30007, negotiated timeout = 4 14/04/04 22:19:44 INFO job.JobProgressTracker: Data from 1 workers - Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored; min free memory on worker 1 - 119.73MB, average 119.73MB 14/04/04 22:19:45 INFO mapred.JobClient: map 100% reduce 0% 14/04/04 22:19:49 INFO job.JobProgressTracker: Data from 1 workers - Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored; min free memory on worker 1 - 119.73MB, average 119.73MB 14/04/04 22:19:54 INFO job.JobProgressTracker: Data from 1 workers - Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored; min free memory on worker 1 - 119.44MB, average 119.44MB 1 This is the stack trace I see in Hadoop after the job is killed: Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@43349eef at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136) at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99) at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233) at org.apache.giraph.worker.BspServiceWorker.saveVertices(BspServiceWorker.java:1033) at org.apache.giraph.worker.BspServiceWorker.cleanup(BspServiceWorker.java:1179) at org.apache.giraph.graph.GraphTaskManager.cleanup(GraphTaskManager.java:843) at org.apache.giraph.graph.GraphMapper.cleanup(GraphMapper.java:81) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93) ... 7 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /user/prototype/giraph/twitter-path-result/_temporary/_attempt_201404012018_0003_m_01_0/part-m-1 for DFSClient_attempt_201404012018_0003_m_01_0_-1149212770_1 on client 127.0.0.1 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1452) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1266) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:668) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:647) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) I realize that the root cause appears to be within Hadoop and not Giraph, but I am wondering if there is Giraph configuration parameter I am missing? In researching the HDFS exception (not many posts on this, BTW), one responder opined that this exception is due to speculative execution being enabled. Also, I tested a standard Map/Reduce job writing to the same datablock and it worked fine, so I don't think HDFS is the problem (corrupt datablock, etc...) Any ideas? --John