Re: Giraph Use Case

2014-08-12 Thread John Yost
I recommend downloading a Twitter data set from SNAP and trying out
PageRank,, Jaccard, Lin, etc...to define and compare communities..That's
kind where I started. :)

--John


On Mon, Aug 11, 2014 at 8:46 AM, Vineet Mishra clearmido...@gmail.com
wrote:

 Hi All,

 Although I have installed and ran Giraph example on my Hadoop Cluster
 referring to example below

 https://giraph.apache.org/quick_start.html

 its working great but I wanted to know what could be the other possible
 use case scenario/implementation of Giraph.

 Experts advice would be highly appreciated!

 Thanks!



Re: Couldn't instantiate

2014-07-05 Thread John Yost
@465962c4
 2014-07-02 15:49:17,509 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:22181. Will 
 not attempt to authenticate using SASL (unknown error)
 2014-07-02 15:49:17,509 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:22181, 
 initiating session
 2014-07-02 15:49:17,515 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server 
 carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:22181, sessionid = 
 0x146f756106b0001, negotiated timeout = 60
 2014-07-02 15:49:17,516 INFO org.apache.giraph.bsp.BspService: process: 
 Asynchronous connection complete.
 2014-07-02 15:49:17,530 INFO org.apache.giraph.graph.GraphTaskManager: map: 
 No need to do anything when not a worker
 2014-07-02 15:49:17,530 INFO org.apache.giraph.graph.GraphTaskManager: 
 cleanup: Starting for MASTER_ZOOKEEPER_ONLY
 2014-07-02 15:49:17,561 INFO org.apache.giraph.bsp.BspService: getJobState: 
 Job state already exists (/_hadoopBsp/job_201407021315_0003/_masterJobState)
 2014-07-02 15:49:17,568 INFO org.apache.giraph.master.BspServiceMaster: 
 becomeMaster: First child is 
 '/_hadoopBsp/job_201407021315_0003/_masterElectionDir/carmen-HP-Pavilion-Sleekbook-15_000'
  and my bid is 
 '/_hadoopBsp/job_201407021315_0003/_masterElectionDir/carmen-HP-Pavilion-Sleekbook-15_000'
 2014-07-02 15:49:17,570 INFO org.apache.giraph.bsp.BspService: 
 getApplicationAttempt: Node 
 /_hadoopBsp/job_201407021315_0003/_applicationAttemptsDir already exists!
 2014-07-02 15:49:17,625 INFO org.apache.giraph.comm.netty.NettyServer: 
 NettyServer: Using execution group with 8 threads for requestFrameDecoder.
 2014-07-02 15:49:17,674 INFO org.apache.giraph.comm.netty.NettyServer: 
 start: Started server communication server: 
 carmen-HP-Pavilion-Sleekbook-15/127.0.1.1:3 with up to 16 threads on 
 bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
 2014-07-02 15:49:17,679 INFO org.apache.giraph.comm.netty.NettyClient: 
 NettyClient: Using execution handler with 8 threads after request-encoder.
 2014-07-02 15:49:17,682 INFO org.apache.giraph.master.BspServiceMaster: 
 becomeMaster: I am now the master!
 2014-07-02 15:49:17,684 INFO org.apache.giraph.bsp.BspService: 
 getApplicationAttempt: Node 
 /_hadoopBsp/job_201407021315_0003/_applicationAttemptsDir already exists!
 2014-07-02 15:49:17,717 ERROR org.apache.giraph.master.MasterThread: 
 masterThread: Master algorithm failed with NullPointerException
 java.lang.NullPointerException
  at 
 org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:330)
  at 
 org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:619)
  at 
 org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:686)
  at org.apache.giraph.master.MasterThread.run(MasterThread.java:108)
 2014-07-02 15:49:17,718 FATAL org.apache.giraph.graph.GraphMapper: 
 uncaughtException: OverrideExceptionHandler on thread 
 org.apache.giraph.master.MasterThread, msg = java.lang.NullPointerException, 
 exiting...
 java.lang.IllegalStateException: java.lang.NullPointerException
  at org.apache.giraph.master.MasterThread.run(MasterThread.java:193)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:330)
  at 
 org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:619)
  at 
 org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:686)
  at org.apache.giraph.master.MasterThread.run(MasterThread.java:108)
 2014-07-02 15:49:17,722 INFO org.apache.giraph.zk.ZooKeeperManager: run: 
 Shutdown hook started.
 2014-07-02 15:49:17,727 WARN org.apache.giraph.zk.ZooKeeperManager: 
 onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
 2014-07-02 15:49:18,049 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x146f756106b0001, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2014-07-02 15:49:18,050 INFO org.apache.giraph.zk.ZooKeeperManager: 
 onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 
 typically means killed).




 2014-07-02 13:52 GMT+02:00 John Yost soozandjohny...@gmail.com:

 Hi Carmen,

 Please post more of the exception stack trace, not enough here for me to
 figure anything out. :)

 Thanks

 --John


 On Wed, Jul 2, 2014 at 7:33 AM, soozandjohny...@gmail.com wrote:

 Hi Carmen,

 Glad that one problem is fixed, and I can take a look at this one as
 well.

 --John

 Sent from my iPhone

 On Jul 2, 2014, at 6:50 AM, Carmen Manzulli carmenmanzu...@gmail.com
 wrote:


 ok; i've done what you have told me...but now i've got this problem..

 ava.lang.Throwable: Child Error

Re: Couldn't instantiate

2014-07-02 Thread John Yost
Hi Carmen,

Please post more of the exception stack trace, not enough here for me to
figure anything out. :)

Thanks

--John


On Wed, Jul 2, 2014 at 7:33 AM, soozandjohny...@gmail.com wrote:

 Hi Carmen,

 Glad that one problem is fixed, and I can take a look at this one as well.

 --John

 Sent from my iPhone

 On Jul 2, 2014, at 6:50 AM, Carmen Manzulli carmenmanzu...@gmail.com
 wrote:


 ok; i've done what you have told me...but now i've got this problem..

 ava.lang.Throwable: Child Error
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
 Caused by: java.io.IOException: Task process exit with nonzero status of 1.
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

 this is my Computation code:
 import org.apache.giraph.GiraphRunner;
 import org.apache.giraph.graph.BasicComputation;
 import org.apache.giraph.graph.Vertex;
 import org.apache.giraph.edge.Edge;

 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.io.NullWritable;
 import org.apache.hadoop.util.ToolRunner;



 public class SimpleSelectionComputation extends 
 BasicComputationText,NullWritable,Text,NullWritable {

   
 @Override
 public void compute(VertexText,NullWritable,Text 
 vertex,IterableNullWritable messages){
   
   
   Text source = new Text(http://dbpedia.org/resource/1040s;);

   
   if (getSuperstep()==0)
   {
   if(vertex.getId()==source)
   {
   System.out.println(il soggetto +vertex.getId()+ ha i 
 seguenti predicati e oggetti:);
   for(EdgeText,Text e : vertex.getEdges())

   {
   
 System.out.println(e.getValue()+\t+e.getTargetVertexId());
   }
   }
   vertex.voteToHalt();
   }
   
 }

 public static void main(String[] args) throws Exception {
 System.exit(ToolRunner.run(new GiraphRunner(), args));

   }

   
 }




Re: Couldn't instantiate

2014-06-30 Thread John Yost
Hi Carmen,

Question--did you only define an arguments constructor?  If so, I think you
are getting this because you did not define a no-arguments constructor with
public visibility.  If this is not the case, I recommend posting your
source code and I will be happy to help.

--John


On Mon, Jun 30, 2014 at 9:38 AM, Carmen Manzulli carmenmanzu...@gmail.com
wrote:

 Hi,

 I'm trying to run a selectionComputation with my own code for 
 VertexInputFormat but giraph' job starts to work and then fails with:




 java.lang.IllegalStateException: run: Caught an unrecoverable exception 
 newInstance: Couldn't instantiate sisinflab.SimpleRDFVertexInputFormat
   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.IllegalStateException: newInstance: Couldn't instantiate 
 sisinflab.SimpleRDFVertexInputFormat
   at 
 org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:105)
   at 
 org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createVertexInputFormat(ImmutableClassesGiraphConfiguration.java:235)
   at 
 org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createWrappedVertexInputFormat(ImmutableClassesGiraphConfiguration.java:246)
   at 
 org.apache.giraph.graph.GraphTaskManager.checkInput(GraphTaskManager.java:171)
   at 
 org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:207)
   at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:59)
   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:89)
   ... 7 more
 Caused by: java.lang.InstantiationException
   at 
 sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at java.lang.Class.newInstance(Class.java:374)
   at 
 org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:103)
   ... 13 more


 what does it mean? where is the problem?

 Who can help me?

 Carmen




Re: Giraph insists on LocalJobRunner with custom Computation

2014-06-23 Thread John Yost
Hi Yorn,

I figured this out and detailed the solution in my post earlier this
morning (6/23 2:46). The key is the following: -ca
mapred.job.tracker=localhost:5431.  Without this, you'll see the exception
you detailed above.

--John


On Thu, Jun 5, 2014 at 5:36 AM, Yørn de Jong y...@uninett.no wrote:

 Hi group

 I have set up Giraph on a YARN cluster. I have no trouble running the
 shortest paths example as described in [1], but when I try to run my own
 algorithm, the program stops with:

 Exception in thread main Java.lang.IllegalArgumentException:
 checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only
 one worker since only 1 task at a time!

 When I change the command to -w 1, it stops with

 Exception in thread main java.lang.IllegalArgumentException:
 checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run
 in split master / worker mode since there is only 1 task at a time!

 The command I try to run is

 hadoop \
 jar
 giraph-rank-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.0.1-jar-with-dependencies.jar
 \
 org.apache.giraph.GiraphRunner
 no.uninett.yorn.giraph.computation.DOSRank \
 -eif no.uninett.yorn.giraph.format.io.NetflowCSVEdgeInputFormat \
 -eip /user/hdfs/trd_gw1_12_01_normalized.csv \
 -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
 -op /user/yarn/output \
 -wc org.apache.giraph.worker.DefaultWorkerContext \
 -w 5 \
 -yj
 giraph-rank-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.0.1-jar-with-dependencies.jar

 Where, naturally, no.uninett.yorn.giraph.computation.DOSRank is my own
 algorithm, which is contained in
 giraph-rank-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.0.1-jar-with-dependencies.jar.

 giraph-rank is built using the same command I used to build the
 giraph-examples project, and the pom.xml for giraph-rank is made by copying
 from giraph-examples, and replacing «examples» with «rank». The command
 used to build both projects is:

 mvn -Phadoop_yarn -Dhadoop.version=2.3.0-cdh5.0.1 clean package
 -DskipTests

 Interestingly enough, when I change the input format to something else, I
 get error messages having to do with type mismatches. This seems to suggest
 that my EdgeInputFormat does start, but that the problem occurs while it
 runs or after it runs. I don’t know how to debug this.

 Am I missing something? How can I get my algorithm to run?
 The source code is available on [2].

 [1] https://giraph.apache.org/quick_start.html#qs_section_5
 [2] https://scm.uninett.no/yorn/giraph



Got Giraph 1.1.0 examples running on YARN

2014-06-22 Thread John Yost
Hi Everyone,

I just gotten the Giraph examples to run on YARN and I thought I would
share the details since it looks like a few people have struggled with
this. This is what I did:

1. Downloaded the latest snapshot (giraph-b218d2)
2. Built with mvn install -P=hadoop_2 -DskipTests=true
3. Executed with the following CLI entry against a Hadoop 2.2.0
pseudo-cluster with mapreduce.framework.name=yarn:

hadoop jar
/usr/local/java/giraph/giraph-1.1/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hadoop/tiny.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/hadoop/shortestpaths -w 1 -ca giraph.zkList=localhost:2181 -ca
giraph.SplitMasterWorker=true -ca mapred.job.tracker=localhost:54311 -ca
mapreduce.job.tracker=localhost:54311

Note: the parameter -ca mapred.job.tracker=localhost:54311 is crucial to
this working in cluster/pseudo-cluster mode.  Otherwise, You'll get the
following error hat Satyajit recently posted:

java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When
using LocalJobRunner, you cannot run in split master / worker mode since
there is only 1 task at a time!

If you actually want to run in local mode, have not figured that out as I
want to be able to run on my cluster instead.

--John.


Shortest Path Still Won't Work--Any Ideas?

2014-06-20 Thread John Yost
Here's more details regarding my attempts are running Shortest Path.  Any
help would be greatly appreciated as the root cause for the Giraph job
failing is not obvious to me.

Thanks

---John

Command Line:

$ hadoop jar
giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hadoop/tiny.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/hadoop/shortestpaths -w 1 -yj
giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar

Console Output:

14/06/20 21:49:02 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/06/20 21:49:03 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.
14/06/20 21:49:03 INFO utils.ConfigurationUtils: No edge output format
specified. Ensure your OutputFormat does not require one.
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Final output path is:
hdfs://localhost.localdomain:8020/user/hadoop/shortestpaths
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Running Client
14/06/20 21:49:03 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Got node report from ASM for,
nodeId=localhost.localdomain:36056, nodeAddress localhost.localdomain:8042,
nodeRackName /default-rack, nodeNumContainers 0
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Defaulting per-task heap size
to 1024MB.
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Obtained new Application ID:
application_1402926902901_0001
14/06/20 21:49:03 INFO Configuration.deprecation: mapred.job.id is
deprecated. Instead, use mapreduce.job.id
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Set the environment for the
application master
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: Environment for AM
:{CLASSPATH=${CLASSPATH}:./*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*}
14/06/20 21:49:03 INFO yarn.GiraphYarnClient: buildLocalResourceMap 
14/06/20 21:49:03 INFO Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/06/20 21:49:04 INFO yarn.YarnUtils: Registered file in LocalResources ::
hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402926902901_0001/giraph-conf.xml
14/06/20 21:49:04 INFO yarn.GiraphYarnClient: LIB JARS
:giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
14/06/20 21:49:04 INFO yarn.YarnUtils: Class path name .
14/06/20 21:49:04 INFO yarn.YarnUtils: base path checking .
14/06/20 21:49:04 INFO yarn.GiraphYarnClient: Made local resource for
:/home/hadoop/Downloads/giraph/giraph-1200915/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
to
hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402926902901_0001/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
14/06/20 21:49:04 INFO yarn.YarnUtils: Registered file in LocalResources ::
hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402926902901_0001/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
14/06/20 21:49:04 INFO yarn.GiraphYarnClient: ApplicationSumbissionContext
for GiraphApplicationMaster launch container is populated.
14/06/20 21:49:04 INFO yarn.GiraphYarnClient: Submitting application to ASM
14/06/20 21:49:04 INFO impl.YarnClientImpl: Submitted application
application_1402926902901_0001 to ResourceManager at /0.0.0.0:8032
14/06/20 21:49:04 INFO yarn.GiraphYarnClient: Got new appId after
submission :application_1402926902901_0001
14/06/20 21:49:04 INFO yarn.GiraphYarnClient: GiraphApplicationMaster
container request was submitted to ResourceManager for job: Giraph:
org.apache.giraph.examples.SimpleShortestPathsComputation
14/06/20 21:49:05 INFO yarn.GiraphYarnClient: Giraph:
org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 0.88
secs
14/06/20 21:49:05 INFO yarn.GiraphYarnClient:
appattempt_1402926902901_0001_01, State: ACCEPTED, Containers used: 1
14/06/20 21:49:09 INFO yarn.GiraphYarnClient: Giraph:
org.apache.giraph.examples.SimpleShortestPathsComputation, Elapsed: 4.90
secs
14/06/20 21:49:09 INFO yarn.GiraphYarnClient:
appattempt_1402926902901_0001_01, State: RUNNING, Containers used: 1
14/06/20 21:49:13 INFO yarn.GiraphYarnClient: Cleaning up HDFS distributed
cache directory for Giraph job.
14/06/20 21:49:13 INFO 

Re: How to output into multiple files through a GiraphJob

2014-06-19 Thread John Yost
Hi Ferenc,

I have an Giraph job that outputs from the Computation class as opposed to
the MasterCompute because I need to maintain alot of state within
VertexValues as opposed to Aggregators.  This is one way of outputting
results as multiple files.  I am assuming that you want to scope output
files per sub-graph groupings of vertices, of course. :)

--John


On Thu, Jun 19, 2014 at 4:02 AM, Ferenc Béres ferdzs...@gmail.com wrote:

 Hi Everyone,

 Currently I'm working on an ALS implementation in giraph 1.1.0 and I would
 like to output the values of the vertices into multiple output files, but I
 could not figure it out how to do it.

 I found that in Hadoop it can be done by using 
 *org.apache.hadoop.mapreduce.lib.output.MultipleOutputsKEYOUT,VALUEOUT,
 *but it didn't work with the GiraphJob.

 Is it possible to output into multiple files by configuring the GiraphJob,
 or there is an other way?

 I would appreciate any idea in this matter.

 Thank you,
 Ferenc Béres



Cannot run shortest path on Hadoop 2.2

2014-06-16 Thread John Yost
Hi Everyone,

The shortest path example fails on my Hadoop 2.2.0 single node cluster, and
I don't see an identifiable root exception.  I am able to execute my
Map/Reduce jobs, including ones that use Accumulo for a source and/or sink,
but cannot get the Giraph example jobs nor my custom Giraph jobs to run.

I followed the build and job launch instructions from the following URL:
http://mail-archives.apache.org/mod_mbox/giraph-user/201312.mbox/%3C1647021.5fbjhLDxPK@chronos7%3E

Here's the Hadoop console output I get when I attempt to run shortest path:

2014-06-16 07:10:09,631 INFO  [main] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:main(421)) - Starting GitaphAM
2014-06-16 07:10:10,277 WARN  [main] util.NativeCodeLoader
(NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
2014-06-16 07:10:11,063 INFO  [main] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:init(168)) - GiraphAM  for ContainerId
container_1402830191668_0017_01_01 ApplicationAttemptId
appattempt_1402830191668_0017_01
2014-06-16 07:10:11,130 INFO  [main] client.RMProxy
(RMProxy.java:createRMProxy(56)) - Connecting to ResourceManager at /
0.0.0.0:8030
2014-06-16 07:10:11,136 INFO  [main] impl.NMClientAsyncImpl
(NMClientAsyncImpl.java:serviceInit(107)) - Upper bound of the thread pool
size is 500
2014-06-16 07:10:11,136 INFO  [main] impl.ContainerManagementProtocolProxy
(ContainerManagementProtocolProxy.java:init(71)) -
yarn.client.max-nodemanagers-proxies : 500
2014-06-16 07:10:11,299 INFO  [main] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:setupContainerAskForRM(279)) - Requested
container ask: Capability[memory:1024, vCores:0]Priority[10]
2014-06-16 07:10:11,304 INFO  [main] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:setupContainerAskForRM(279)) - Requested
container ask: Capability[memory:1024, vCores:0]Priority[10]
2014-06-16 07:10:11,305 INFO  [main] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:run(185)) - Wait to finish ..
2014-06-16 07:10:13,331 INFO  [AMRM Callback Handler Thread]
yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:onContainersAllocated(605)) - Got response
from RM for container ask, allocatedCnt=1
2014-06-16 07:10:13,331 INFO  [AMRM Callback Handler Thread]
yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:onContainersAllocated(608)) - Total allocated
# of container so far : 1 allocated out of 2 required.
2014-06-16 07:10:13,332 INFO  [AMRM Callback Handler Thread]
yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:startContainerLaunchingThreads(359)) -
Launching command on a new container.,
containerId=container_1402830191668_0017_01_02,
containerNode=localhost.localdomain:38256,
containerNodeURI=localhost.localdomain:8042, containerResourceMemory=1024
2014-06-16 07:10:13,333 INFO  [pool-2-thread-1]
yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:buildContainerLaunchContext(492)) - Setting
up container launch container for
containerid=container_1402830191668_0017_01_02
2014-06-16 07:10:13,348 INFO  [pool-2-thread-1]
yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:buildContainerLaunchContext(498)) - Conatain
launch Commands :java -Xmx1024M -Xms1024M -cp .:${CLASSPATH}
org.apache.giraph.yarn.GiraphYarnTask 1402830191668 17 2 1
1LOG_DIR/task-2-stdout.log 2LOG_DIR/task-2-stderr.log
2014-06-16 07:10:13,349 INFO  [pool-2-thread-1]
yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:buildContainerLaunchContext(518)) - Setting
username in ContainerLaunchContext to: hadoop
2014-06-16 07:10:13,744 INFO  [pool-2-thread-1] yarn.YarnUtils
(YarnUtils.java:addFsResourcesToMap(72)) - Adding
giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
to LocalResources for export.to
hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402830191668_0017/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
2014-06-16 07:10:13,774 INFO  [pool-2-thread-1] yarn.YarnUtils
(YarnUtils.java:addFileToResourceMap(160)) - Registered file in
LocalResources ::
hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402830191668_0017/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
2014-06-16 07:10:13,774 WARN  [pool-2-thread-1] yarn.YarnUtils
(YarnUtils.java:addFsResourcesToMap(81)) - Job jars (-yj option) didn't
include giraph-core.
2014-06-16 07:10:13,776 INFO  [pool-2-thread-1] yarn.YarnUtils
(YarnUtils.java:addFileToResourceMap(160)) - Registered file in
LocalResources ::
hdfs://localhost.localdomain:8020/user/hadoop/giraph_yarn_jar_cache/application_1402830191668_0017/giraph-conf.xml
2014-06-16 07:10:13,786 INFO
 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0]
impl.NMClientAsyncImpl (NMClientAsyncImpl.java:run(531)) - Processing Event
EventType: START_CONTAINER for Container

Re: Giraph keeps trying to connect to 9000 on Hadoop 2.2.0/YARN

2014-06-02 Thread John Yost
Hey Avery,

Thanks a bunch for responding so quickly to my post!  Looks like the
problem is with my client class.  When I attempt to run one of the Giraph
examples, which use GiraphRunner, GiraphRunner connects to the correct port
and launches the Giraph job.  SoI just need to take a closer look at
GiraphRunner.

Thanks again for your quick response--much appreciated.

--John


On Sun, Jun 1, 2014 at 11:12 AM, Avery Ching ach...@apache.org wrote:

  Giraph should just pick up your cluster's HDFS configuration.  Can you
 check your hadoop *.xml files?


 On 6/1/14, 3:34 AM, John Yost wrote:

 Hi Everyone,

  Not sure why, but Giraph tries to connect to port 9000:

  java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to
 localhost:9000 failed on connection exception: java.net.ConnectException:
 Connection refused; For more details see:
 http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

  I set the following in the Giraph configuration:

GiraphConstants.IS_PURE_YARN_JOB.set(conf,true);
   conf.set(giraph.useNetty,true);
   conf.set(giraph.zkList,localhost.localdomain);

 conf.set(fs.defaultFS,hdfs://localhost.localdomain:8020)
   conf.set(mapreduce.job.tracker,localhost.localdomain:54311);
   conf.set(mapreduce.framework.name,yarn);

  conf.set(yarn.resourcemanager.address,localhost.localdomain:8032);

  I built Giraph as follows:

  mvn -DskipTests=true -Dhadoop.version=2.2.0 -Phadoop_yarn clean install

  Any ideas as to why Giraph attempts to connect to 9000 instead of 8020?

  --John






Giraph keeps trying to connect to 9000 on Hadoop 2.2.0/YARN

2014-06-01 Thread John Yost
Hi Everyone,

Not sure why, but Giraph tries to connect to port 9000:

java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to
localhost:9000 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

I set the following in the Giraph configuration:

  GiraphConstants.IS_PURE_YARN_JOB.set(conf,true);
  conf.set(giraph.useNetty,true);
  conf.set(giraph.zkList,localhost.localdomain);
  conf.set(fs.defaultFS,hdfs://localhost.localdomain:8020)
  conf.set(mapreduce.job.tracker,localhost.localdomain:54311);
  conf.set(mapreduce.framework.name,yarn);
  conf.set(yarn.resourcemanager.address,localhost.localdomain:8032);

I built Giraph as follows:

mvn -DskipTests=true -Dhadoop.version=2.2.0 -Phadoop_yarn clean install

Any ideas as to why Giraph attempts to connect to 9000 instead of 8020?

--John


Giraph job hangs and is eventually killed

2014-04-05 Thread John Yost
Hi Everyone,

I have a shortest path implementation that completes and outputs the
correct results to a counter, but then hangs after the last superstep and
is eventually killed by Hadoop.

Here's the output from the console:

main-SendThread(localhost.localdomain:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate
using SASL (unknown error)
[main-SendThread(localhost.localdomain:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
localhost.localdomain/127.0.0.1:2181, initiating session
[main-SendThread(localhost.localdomain:2181)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
localhost.localdomain/127.0.0.1:2181, sessionid = 0x1451fc674a30007,
negotiated timeout = 4
14/04/04 22:19:44 INFO job.JobProgressTracker: Data from 1 workers -
Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored;
min free memory on worker 1 - 119.73MB, average 119.73MB
14/04/04 22:19:45 INFO mapred.JobClient:  map 100% reduce 0%
14/04/04 22:19:49 INFO job.JobProgressTracker: Data from 1 workers -
Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored;
min free memory on worker 1 - 119.73MB, average 119.73MB
14/04/04 22:19:54 INFO job.JobProgressTracker: Data from 1 workers -
Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored;
min free memory on worker 1 - 119.44MB, average 119.44MB
1

This is the stack trace I see in Hadoop after the job is killed:

Caused by: java.lang.IllegalStateException: waitFor:
ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@43349eef
at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
at 
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
at 
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
at 
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
at 
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
at 
org.apache.giraph.worker.BspServiceWorker.saveVertices(BspServiceWorker.java:1033)
at 
org.apache.giraph.worker.BspServiceWorker.cleanup(BspServiceWorker.java:1179)
at 
org.apache.giraph.graph.GraphTaskManager.cleanup(GraphTaskManager.java:843)
at org.apache.giraph.graph.GraphMapper.cleanup(GraphMapper.java:81)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
... 7 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalStateException:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
to create file 
/user/prototype/giraph/twitter-path-result/_temporary/_attempt_201404012018_0003_m_01_0/part-m-1
for DFSClient_attempt_201404012018_0003_m_01_0_-1149212770_1 on
client 127.0.0.1 because current leaseholder is trying to recreate
file.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1452)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1324)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1266)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:668)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:647)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

I realize that the root cause appears to be within Hadoop and not Giraph,
but I am wondering if there is Giraph configuration parameter I am missing?
 In researching the HDFS exception (not many posts on this, BTW), one
responder opined that this exception is due to speculative execution being
enabled.

Also, I tested a standard Map/Reduce job writing to the same datablock and
it worked fine, so I don't think HDFS is the problem (corrupt datablock,
etc...)

Any ideas?

--John