Re: [RESULT] [VOTE] Apache Giraph 1.1.0 RC2
Thanks for pushing this though Roman. Looks great! On 11/18/14, 4:30 AM, Roman Shaposhnik wrote: Hi! with 3 binding +1, one non-binding +1, no 0s or -1s the vote to publish Apache Giraph 1.1.0 RC2 as the 1.1.0 release of Apache Giraph passes. Thanks to everybody who spent time on validating the bits! The vote tally is +1s: Claudio Martella (binding) Maja Kabiljo (binding) Eli Reisman (binding) Roman Shaposhnik (non-binding) I'll do the publishing tonight and will send an announcement! Thanks, Roman (AKA 1.1.0 RM) On Thu, Nov 13, 2014 at 5:28 AM, Roman Shaposhnik ro...@shaposhnik.org wrote: This vote is for Apache Giraph, version 1.1.0 release It fixes the following issues: http://s.apache.org/a8X *** Please download, test and vote by Mon 11/17 noon PST Note that we are voting upon the source (tag): release-1.1.0-RC2 Source and binary files are available at: http://people.apache.org/~rvs/giraph-1.1.0-RC2/ Staged website is available at: http://people.apache.org/~rvs/giraph-1.1.0-RC2/site/ Maven staging repo is available at: https://repository.apache.org/content/repositories/orgapachegiraph-1003 Please notice, that as per earlier agreement two sets of artifacts are published differentiated by the version ID: * version ID 1.1.0 corresponds to the artifacts built for the hadoop_1 profile * version ID 1.1.0-hadoop2 corresponds to the artifacts built for hadoop_2 profile. The tag to be voted upon (release-1.1.0-RC1): https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=log;h=refs/tags/release-1.1.0-RC2 The KEYS file containing PGP keys we use to sign the release: http://svn.apache.org/repos/asf/bigtop/dist/KEYS Thanks, Roman.
Re: YARN vs. MR1: is YARN a good idea?
Theoretically, Giraph on YARN would be much better (actual resource request rather than mapper hack). That being said, Eli is the best person to talk about that. We haven't tried YARN. Avery On 10/6/14, 8:51 AM, Matthew Cornell wrote: Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I built Giraph 1.0.0 for our system. How much better is Giraph on YARN? Thank you.
Re: How local worker knows destination worker?
Take a look at the interfaces for MasterGraphPartitioner and WorkerGraphPartitioner and their implementations for hash parititoning (HashRangePartitionerFactory). You can implement any kind of partitioning you like. Avery On 8/8/14, 7:51 AM, Robert McCune wrote: For a non-hash partitioning, how does a worker know which destination worker to send a remote message to? In the Pregel paper, with hash partitioning, a worker can know the destination worker just by hashing the destination vertex ID But for any non-trivial partitioning, how does a worker know where to send a remote message? Welcome any references to the literature. Thank you
Re: Introducing Graft: A debugging and testing tool for Giraph algorithms
I'm seen this work demoed. It's awesome, especially for applications that are not very predictable. Avery On 6/4/14, 11:00 AM, Semih Salihoglu wrote: Hi Giraph Users, I wanted to introduce to you Graft, a project that some of us at Stanford have built over the last quarter. If you are a Giraph user, who ran into an annoying bug in which the code was throwing an exception or resulting in incorrect looking messages or vertex values (e.g. NaNs or NullPointerExceptions) and you had put in println statements into your compute() functions, and then inspect logs of Hadoop workers for debugging purposes, you should read on. You might find Graft very useful. In a nutshell, Graft is based on the idea of /capturing /the contexts under which a bug becomes noticeable (an exception is thrown or an incorrect message is sent, or a vertex is assigned an incorrect value) programmatically. The captured contexts can then be /visualized/ through a GUI. The contexts that a user thinks could be helpful for catching the bug can then be /reproduced/ in a compilable program and the user can then use his/her favorite IDE's debugger to do step-by-step debugging into the context. For example, when a vertex /v/ throws an exception, the user can reproduce the context under which /v/ throws the exception and then use (say) Eclipse to do step-by-step debugging to see exactly what lines were executed that resulted in the exception being thrown. On the testing side, Graft makes it easier to generate unit and end-to-end tests by letting users curate small graphs through its GUI's testing mode, and then generates code snippets which can be copied and pasted into a JUnit test. The project is still under development but interested users can start using it. We have a wiki with documentation and instructions on how to install and use Graft: https://github.com/semihsalihoglu/graft/wiki. Since the project is under development, we'd highly appreciate users to start using it and giving us direction on how to make it more useful. Our emails are on the documentation page. We also encourage interested developers to contribute to it if there are requested features that we don't get to very quickly. Just a small note: Graft works for the Giraph at trunk: https://github.com/apache/giraph/tree/trunk. We do not support earlier version. In particular your programs need to be written by extending Computation and optionally the Master class, instead of the older Vertex class. Best, semih
Re: Giraph keeps trying to connect to 9000 on Hadoop 2.2.0/YARN
Giraph should just pick up your cluster's HDFS configuration. Can you check your hadoop *.xml files? On 6/1/14, 3:34 AM, John Yost wrote: Hi Everyone, Not sure why, but Giraph tries to connect to port 9000: java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 http://127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) I set the following in the Giraph configuration: GiraphConstants.IS_PURE_YARN_JOB.set(conf,true); conf.set(giraph.useNetty,true); conf.set(giraph.zkList,localhost.localdomain); conf.set(fs.defaultFS,hdfs://localhost.localdomain:8020) conf.set(mapreduce.job.tracker,localhost.localdomain:54311); conf.set(mapreduce.framework.name http://mapreduce.framework.name,yarn); conf.set(yarn.resourcemanager.address,localhost.localdomain:8032); I built Giraph as follows: mvn -DskipTests=true -Dhadoop.version=2.2.0 -Phadoop_yarn clean install Any ideas as to why Giraph attempts to connect to 9000 instead of 8020? --John
Re: Errors while running large graph
You might also want to check the zookeeper memory options. Some of our production jobs use parameters such as -Xmx5g -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxGCPauseMillis=100 Since the master doesn't use much memory letting zk have more is reasonable. On 5/27/14, 9:25 AM, Praveen kumar s.k wrote: Hi All, I am getting several errors consistently while processing large graph. The code works when the size of the graph is in terms of GB's. we have implemented compression and removing the dead end nodes in de Bruijn graph My cluster settings are Cores WorkersRAM/Core GraphsizeAggregateRAM 252 250 10.5 GB 2.3 TB2.6 TB Below are the type of errors I am getting. 1. I believe that this error occurred because of zookeeper session expired. To address this I changed the parameter minSessionTimeout in configuration to large value. However some workers still throw this error. 2014-05-27 00:19:55,187 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.Il$ java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.master.MasterThread.run(MasterThread.java:185) Caused by: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843) at org.apache.giraph.master.MasterThread.run(MasterThread.java:98) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201405262302_0003/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670) ... 2 more 2. I dont know why this below error is thrown. My guess is that, master worker is failing for some reason 2014-05-27 00:19:55,184 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843) at org.apache.giraph.master.MasterThread.run(MasterThread.java:98) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201405262302_0003/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670) ... 2 more 3. Below is one more type of error java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843) at org.apache.giraph.master.MasterThread.run(MasterThread.java:98) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201405261249_0008/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670) ... 2 more 2014-05-26 18:19:54,269 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.Il$ java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.master.MasterThread.run(MasterThread.java:185) Caused by: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at
Re: Errors while running large graph
*giraph.zkJavaOpts* On 5/27/14, 10:27 AM, Praveen kumar s.k wrote: Do need to put this in the zookeeper configuration file or giraph job configuration? On Tue, May 27, 2014 at 12:14 PM, Avery Chingach...@apache.org wrote: You might also want to check the zookeeper memory options. Some of our production jobs use parameters such as -Xmx5g -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxGCPauseMillis=100 Since the master doesn't use much memory letting zk have more is reasonable. On 5/27/14, 9:25 AM, Praveen kumar s.k wrote: Hi All, I am getting several errors consistently while processing large graph. The code works when the size of the graph is in terms of GB's. we have implemented compression and removing the dead end nodes in de Bruijn graph My cluster settings are Cores WorkersRAM/Core GraphsizeAggregateRAM 252 250 10.5 GB 2.3 TB2.6 TB Below are the type of errors I am getting. 1. I believe that this error occurred because of zookeeper session expired. To address this I changed the parameter minSessionTimeout in configuration to large value. However some workers still throw this error. 2014-05-27 00:19:55,187 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.Il$ java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.master.MasterThread.run(MasterThread.java:185) Caused by: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843) at org.apache.giraph.master.MasterThread.run(MasterThread.java:98) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201405262302_0003/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670) ... 2 more 2. I dont know why this below error is thrown. My guess is that, master worker is failing for some reason 2014-05-27 00:19:55,184 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843) at org.apache.giraph.master.MasterThread.run(MasterThread.java:98) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201405262302_0003/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670) ... 2 more 3. Below is one more type of error java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843) at org.apache.giraph.master.MasterThread.run(MasterThread.java:98) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201405261249_0008/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670) ... 2 more 2014-05-26 18:19:54,269 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.Il$ java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at
Re: Error while executing large graph
I think this is the key message. 0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB Having less than 1 MB free won't work. Your workers are likely OOM, killing the job. Can you get more memory for your job? On 5/14/14, 3:13 AM, Arun Kumar wrote: Hi when i run giraph job against a data of 1 gb i am getting the below exception after some times can somebody tell me what is the issue? 14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers - Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB 14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x145f9cff031000f, likely server has closed socket, closing socket connection and attempting reconnect 14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on attempt 0, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201405140108_0003/_workerProgresses at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360) at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87) at java.lang.Thread.run(Thread.java:745) 14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:09 INFO mapred.JobClient: map 93% reduce 0% 14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server
Re: How to schedule a Giraph job.
You can schedule a GIraph job with any MapReduce job scheduler (it is just a map-only job). On 4/26/14, 4:30 AM, yeshwanth kumar wrote: hi i am looking for Giraph job Scheduler just like oozie. can we schedule a Giraph job using oozie -yeshwanth.
Please welcome our newest PMC member, Maja!
Maja has been working on Giraph for over a year and is one of our biggest contributors. Adding her to the Giraph PMC in recognition of her impressive work is long overdue. Some of her major contributions include composable computation, sharded aggregators, Hive I/O, support for massive messages, as well as lots of bugfixes and code reviews. She has also been responsible for several major performance improvements (i.e. message store specialization, message cache improvements, etc.). We are very lucky to have her working with us on this project. Avery
Blogpost: Large-scale graph partitioning with Apache Giraph
Hi Giraphers, Recently, a few internal Giraph users at Facebook published a really cool blog post on how we partition huge graphs (1.15 billion people and 150 billion friendships - 300B directed edges). https://code.facebook.com/posts/274771932683700/large-scale-graph-partitioning-with-apache-giraph/ Avery
New committer: Pavan Kumar
The Project Management Committee (PMC) for Apache Giraphhas asked Pavan Kumar to become a committer and we are pleased to announce that he hasaccepted. Here are some of Pavan's contributions: GIRAPH-858: tests fail for hadoop_facebook because of dependency issues (pavanka via aching) GIRAPH-854: fix for test fail due to GIRAPH-840 (pavanka via majakabiljo) GIRAPH-840: Upgrade to netty 4 (pavanka via majakabiljo) GIRAPH-843: remove rexter from hadoop_facebook profile (pavanka via aching) GIRAPH-838: setup time total time counter also include time spent waiting for machines (pavanka via majakabiljo) GIRAPH-839: NettyWorkerAggregatorRequestProcessor tries to reuse request objects (pavanka via majakabiljo) GIRAPH-830: directMemory used in netty message (pavanka via aching) GIRAPH-823: upgrade hiveio to version 0.21 from olderversion 0.20 (pavanka via majakabiljo) GIRAPH-821: proper handling of NegativeArraySizeException for all ByteArray backed messagestores (pavanka via majakabiljo) GIRAPH-820: add a configuration option to skip creating source vertices present only in edge input (pavanka via majakabiljo) Pavan has been actively writing and reviewing code. His Netty4 upgrade brought a HUGE performance improvement to Giraph trunk (everyone should try it out!). We are very excited to have Pavan take a larger role in the Giraph community! Thanks Pavan, Avery
Re: how to change graph
Yes, this is one of the great things about Giraph (not many other graph computation frameworks allow graph mutation). See the Computation class (i.e.) /** * Sends a request to create a vertex that will be available during the * next superstep. * * @param id Vertex id * @param value Vertex value * @param edges Initial edges */ void addVertexRequest(I id, V value, OutEdgesI, E edges) throws IOException; /** * Sends a request to create a vertex that will be available during the * next superstep. * * @param id Vertex id * @param value Vertex value */ void addVertexRequest(I id, V value) throws IOException; /** * Request to remove a vertex from the graph * (applied just prior to the next superstep). * * @param vertexId Id of the vertex to be removed. */ void removeVertexRequest(I vertexId) throws IOException; /** * Request to add an edge of a vertex in the graph * (processed just prior to the next superstep) * * @param sourceVertexId Source vertex id of edge * @param edge Edge to add */ void addEdgeRequest(I sourceVertexId, EdgeI, E edge) throws IOException; /** * Request to remove all edges from a given source vertex to a given target * vertex (processed just prior to the next superstep). * * @param sourceVertexId Source vertex id * @param targetVertexId Target vertex id */ void removeEdgesRequest(I sourceVertexId, I targetVertexId) throws IOException; On 4/16/14, 7:23 AM, Akshay Trivedi wrote: Hi, I wanted to do some computation on graph and delete some edges between supersteps. Can this be done using giraph?? I have heard of MutableVertex class but I dont know whether it can be used to delete edges. Also is MutableVertex abstract class and has to be implemented?? Regards, Akshay
Re: how to change graph
They should all be implemented. =) On 4/16/14, 9:32 PM, Akshay Trivedi wrote: Does removeVertexRequest(I vertexId) have to be implemented? Is there any pre-defined class for this? On Wed, Apr 16, 2014 at 8:33 PM, Avery Ching ach...@apache.org wrote: Yes, this is one of the great things about Giraph (not many other graph computation frameworks allow graph mutation). See the Computation class (i.e.) /** * Sends a request to create a vertex that will be available during the * next superstep. * * @param id Vertex id * @param value Vertex value * @param edges Initial edges */ void addVertexRequest(I id, V value, OutEdgesI, E edges) throws IOException; /** * Sends a request to create a vertex that will be available during the * next superstep. * * @param id Vertex id * @param value Vertex value */ void addVertexRequest(I id, V value) throws IOException; /** * Request to remove a vertex from the graph * (applied just prior to the next superstep). * * @param vertexId Id of the vertex to be removed. */ void removeVertexRequest(I vertexId) throws IOException; /** * Request to add an edge of a vertex in the graph * (processed just prior to the next superstep) * * @param sourceVertexId Source vertex id of edge * @param edge Edge to add */ void addEdgeRequest(I sourceVertexId, EdgeI, E edge) throws IOException; /** * Request to remove all edges from a given source vertex to a given target * vertex (processed just prior to the next superstep). * * @param sourceVertexId Source vertex id * @param targetVertexId Target vertex id */ void removeEdgesRequest(I sourceVertexId, I targetVertexId) throws IOException; On 4/16/14, 7:23 AM, Akshay Trivedi wrote: Hi, I wanted to do some computation on graph and delete some edges between supersteps. Can this be done using giraph?? I have heard of MutableVertex class but I dont know whether it can be used to delete edges. Also is MutableVertex abstract class and has to be implemented?? Regards, Akshay
Re: Child processes still running after successful job
Corona - https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920 On 4/11/14, 8:14 AM, chadi jaber wrote: Hi avery What do you mean by your version of hadoop ? Best regards, Chadi Date: Fri, 11 Apr 2014 07:35:44 -0700 From: ach...@apache.org To: user@giraph.apache.org Subject: Re: Child processes still running after successful job Unfortunately we don't face this issue since our version of Hadoop kills processes after a job is complete. If you can do a jstack, you can probably figure out where this is hanging and submit a patch to fix it. On 4/11/14, 4:24 AM, Yi Lu wrote: HI Chadi, I also have this problem, my solution is to write a python script to kill the process on each slave machine which consumes lots of memory.:) I hope there is a better solution.
Re: PageRank on custom input
Hi Vikesh, You just need to write an input format or use an existing one. You can specify any number and combination of VertexInputFormat and EdgeInputFormat formats as per your needs. Please see giraph-core/src/main/java/org/apache/giraph/io/formats for some examples. Avery On 4/7/14, 9:57 PM, Vikesh Khanna wrote: Hi, We want to run a PageRank job (similar to PageRankBenchmark) for custom input graph. Is there an example for this? Giraph's website has a page for this but it is incomplete - http://giraph.apache.org/pagerank.html Thanks, Vikesh Khanna, Masters, Computer Science (Class of 2015) Stanford University
Re: voteToHalt vs removeVertexRequest
Pretty much. But when you remove the vertex, you won't be able to dump its output (not that all applications need to). Avery On 4/7/14, 9:38 AM, Liannet Reyes wrote: Hi, Because of my algorithm I am able to detect when a vertex won't be used anymore, what would be more accurate : voteToHalt or removeVertex. I imagine that removing the vertexes I can free some memory and although it have some cost in execution time that is not a big deal as the graph is smaller each time. Am I right? Regards, Liannet
Re: Giraph job hangs indefinitely and is eventually killed by JobTracker
My guess is that you don't get your resources. It would be very helpful to print the master log. You can find it when the job is running to look at the Hadoop counters on the job UI page. Avery On 4/3/14, 12:49 PM, Vikesh Khanna wrote: Hi, I am running the PageRank benchmark under giraph-examples from giraph-1.0.0 release. I am using the following command to run the job (as mentioned here https://cwiki.apache.org/confluence/display/GIRAPH/Quick+Start+Guide) vikesh@madmax /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples $ $HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000 -w 30 However, the job gets stuck at map 9% and is eventually killed by the JobTracker on reaching the mapred.task.timeout (default 10 minutes). I tried increasing the timeout to a very large value, and the job went on for over 8 hours without completion. I also tried the ShortestPathsBenchmark, which also fails the same way. Any help is appreciated. ** *** *Machine details:* Linux version 2.6.32-279.14.1.el6.x86_64 (mockbu...@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 8 CPU socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Stepping: 2 CPU MHz: 1064.000 BogoMIPS: 5333.20 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K NUMA node0 CPU(s): 1-8 NUMA node1 CPU(s): 9-16 NUMA node2 CPU(s): 17-24 NUMA node3 CPU(s): 25-32 NUMA node4 CPU(s): 0,33-39 NUMA node5 CPU(s): 40-47 NUMA node6 CPU(s): 48-55 NUMA node7 CPU(s): 56-63 I am using a pseudo-distributed Hadoop cluster on a single machine with 64-cores. *-*** Thanks, Vikesh Khanna, Masters, Computer Science (Class of 2015) Stanford University
Re: GSoC 2014 - Strongly Connected Components
I think this would be great. Thanks Mirko. Avery On 3/16/14, 12:26 PM, Gianluca Righetto wrote: Hi, Thank you both for your comments and support. Mirko, I'm glad you'd like to be the mentor of this project, we just need to confirm this is OK with GSoC and Apache, just to avoid any issues down the road. Avery, what do you think about this? Thanks, Gianluca Righetto Am 15.03.2014 um 09:34 schrieb Mirko Kämpf: Hi Gianluca, thanks for sharing your ideas and sending your proposal. Your approach sounds promising and I am very interested in supporting your work. I am not an official member of the Apache Giraph project at the moment, so my question goes to Avery: Would it be possible for me to become a mentor for Gianluca's project? Best wishes Mirko On Fri, Mar 14, 2014 at 10:19 PM, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: This is a great idea. Unfortunately, I'm a little bandwidth limited, but I hope someone can help mentor you! On 3/14/14, 1:26 PM, Gianluca Righetto wrote: Hello everyone, I've been working with Giraph for some time now and I'd like to make some contributions back to the project through Google Summer of Code. I wrote a project proposal to implement an algorithm for finding Strongly Connected Components in a graph, based on recently published research papers. The main idea of the algorithm is to find clusters (or groups) in the graph and it's arguably more insightful than the currently available Connected Components algorithm. So, if there's any Apache member interested in mentoring this project, please, feel free to contact me. And any kind of feedback will be greatly appreciated. You can find the document in Google Drive here: http://goo.gl/1fqqui Thanks, Gianluca Righetto -- -- Mirko Kämpf *Trainer* @ Cloudera tel: +49 *176 20 63 51 99* skype: *kamir1604* mi...@cloudera.com mailto:mi...@cloudera.com
Re: Java Process Memory Leak
Hi Young, Our Hadoop instance (Corona) kills processes after they finish executing so we don't see this. You might want to do a jstack to see where it's hung up on and figure out the issue. Thanks Avery On 3/17/14, 7:56 AM, Young Han wrote: Hi all, With Giraph 1.0.0, I've noticed an issue where the Java process corresponding to the job loiters around indefinitely even after the job completes (successfully). The process consumes memory but not CPU time. This happens on both a single machine and clusters of machines (in which case every worker has the issue). The only way I know of fixing this is killing the Java process manually---restarting or stopping Hadoop does not help. Is this some known bug or a configuration issue on my end? Thanks, Young
Re: GSoC 2014 - Strongly Connected Components
This is a great idea. Unfortunately, I'm a little bandwidth limited, but I hope someone can help mentor you! On 3/14/14, 1:26 PM, Gianluca Righetto wrote: Hello everyone, I've been working with Giraph for some time now and I'd like to make some contributions back to the project through Google Summer of Code. I wrote a project proposal to implement an algorithm for finding Strongly Connected Components in a graph, based on recently published research papers. The main idea of the algorithm is to find clusters (or groups) in the graph and it's arguably more insightful than the currently available Connected Components algorithm. So, if there's any Apache member interested in mentoring this project, please, feel free to contact me. And any kind of feedback will be greatly appreciated. You can find the document in Google Drive here: http://goo.gl/1fqqui Thanks, Gianluca Righetto
Re: DataStreamer Exception - LeaseExpiredException
This looks more like the Zookeeper/YARN issues mentioned in the past. Unfortunately, I do not have a YARN instance to test this with. Does anyone else have any insights here? On 1/10/14 1:48 PM, Kristen Hardwick wrote: Hi all, I'm requesting help again! I'm trying to get this SimpleShortestPathsComputation example working, but I'm stuck again. Now the job begins to run and seems to work until the final step (it performs 3 supersteps), but the overall job is failing. In the master, among other things, I see: ... 14/01/10 15:04:17 INFO master.MasterThread: setup: Took 0.87 seconds. 14/01/10 15:04:17 INFO master.MasterThread: input superstep: Took 0.708 seconds. 14/01/10 15:04:17 INFO master.MasterThread: superstep 0: Took 0.158 seconds. 14/01/10 15:04:17 INFO master.MasterThread: superstep 1: Took 0.344 seconds. 14/01/10 15:04:17 INFO master.MasterThread: superstep 2: Took 0.064 seconds. 14/01/10 15:04:17 INFO master.MasterThread: shutdown: Took 0.162 seconds. 14/01/10 15:04:17 INFO master.MasterThread: total: Took 2.31 seconds. 14/01/10 15:04:17 INFO yarn.GiraphYarnTask: Master is ready to commit final job output data. 14/01/10 15:04:18 INFO yarn.GiraphYarnTask: Master has committed the final job output data. ... To me, that looks promising - like the job was successful. However, in the WORKER_ONLY containers, I see these things: ... 14/01/10 15:04:17 INFO graph.GraphTaskManager: cleanup: Starting for WORKER_ONLY 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent : partitionExchangeChildrenChanged (at least one worker is done sending partitions) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 INFO netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing NettyClient.bootstrap resources now. 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState) 14/01/10 15:04:17 INFO yarn.GiraphYarnTask: [STATUS: task-1] saveVertices: Starting to save 2 vertices using 1 threads 14/01/10 15:04:17 INFO worker.BspServiceWorker: saveVertices: Starting to save 2 vertices using 1 threads 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState) 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state path is empty! - /_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState 14/01/10 15:04:17 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:50) at org.json.JSONTokener.init(JSONTokener.java:66) at org.json.JSONObject.init(JSONObject.java:402) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:716) at org.apache.giraph.worker.BspServiceWorker.processEvent(BspServiceWorker.java:1563) at org.apache.giraph.bsp.BspService.process(BspService.java:1095) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_vertexInputSplitsAllReady, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_addressesAndPartitions, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent : partitionExchangeChildrenChanged (at least one worker is done sending partitions) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished, type=NodeDeleted, state=SyncConnected) ... 14/01/10 15:04:17 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/spry/Shortest/_temporary/1/_temporary/attempt_1389300168420_0024_m_01_1/part-m-1: File
Re: Giraph 1.0.0 - Netty port allocation
The port logic is a bit complex, but all encapsulated in NettyServer.java (see below). If nothing else is running on those ports and you really only have one giraph worker per port you should be good to go. Can you look at the logs for the worker that is trying to start a port other than base port + taskId? int taskId = conf.getTaskPartition(); int numTasks = conf.getInt(mapred.map.tasks, 1); // Number of workers + 1 for master int numServers = conf.getInt(GiraphConstants.MAX_WORKERS, numTasks) + 1; int portIncrementConstant = (int) Math.pow(10, Math.ceil(Math.log10(numServers))); int bindPort = GiraphConstants.IPC_INITIAL_PORT.get(conf) + taskId; int bindAttempts = 0; final int maxIpcPortBindAttempts = MAX_IPC_PORT_BIND_ATTEMPTS.get(conf); final boolean failFirstPortBindingAttempt = GiraphConstants.FAIL_FIRST_IPC_PORT_BIND_ATTEMPT.get(conf); // Simple handling of port collisions on the same machine while // preserving debugability from the port number alone. // Round up the max number of workers to the next power of 10 and use // it as a constant to increase the port number with. while (bindAttempts maxIpcPortBindAttempts) { this.myAddress = new InetSocketAddress(localHostname, bindPort); if (failFirstPortBindingAttempt bindAttempts == 0) { if (LOG.isInfoEnabled()) { LOG.info(start: Intentionally fail first + binding attempt as giraph.failFirstIpcPortBindAttempt + is true, port + bindPort); } ++bindAttempts; bindPort += portIncrementConstant; continue; } try { Channel ch = bootstrap.bind(myAddress); accepted.add(ch); break; } catch (ChannelException e) { LOG.warn(start: Likely failed to bind on attempt + bindAttempts + to port + bindPort, e); ++bindAttempts; bindPort += portIncrementConstant; } } if (bindAttempts == maxIpcPortBindAttempts || myAddress == null) { throw new IllegalStateException( start: Failed to start NettyServer with + bindAttempts + attempts); } On 11/22/13 9:15 AM, Larry Compton wrote: My teammates and I are running Giraph on a cluster where a firewall is configured on each compute node. We had 100 ports opened on the compute nodes, which we thought would be more than enough to accommodate a large number of workers. However, we're unable to go beyond about 90 workers with our Giraph jobs, due to Netty ports being allocated outside of the range (3-30100). We're not sure why this is happening. We shouldn't be running more than one worker per compute node, so we were assuming that only port 3 would be used, but we're routinely seeing Giraph try to use ports greater than 30100 when we request close to 100 workers. This leads us to believe that a simple one up numbering scheme is being used that doesn't take the host into consideration, although this is only speculation. Is there a way around this problem? Our system admins understandably balked at opening 1000 ports. Larry
Re: workload used to measure Giraph performance number
Hi Wei, For best performance, please be sure to tune the GC settings, use Java 7, tune the number of cores used for computation, communication, etc. and the combiner. We also have some numbers on our recent Facebook blog post. https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920 Avery On 10/8/13 7:43 PM, Wei Zhang wrote: Hi Sebastian, Thanks a lot for the help! Sorry for the late response! At this point, I would only need a random graph that complies with JsonLongDoubleFloatDoubleVertexInputFormat of Giraph to measure the pagerank example (of giraph) performance. I am wondering how to convert the data from Koblenz to the such a graph ? Is there any pointer of doing this ? (This is the same kind of question that I raised to Alok on SNAP) Thanks! Wei P.S.: I forgot to mention in all my previous emails that I just get started with distributed graph engine, so please forgive if my questions are too naive. Sebastian Schelter ---10/02/2013 12:41:27 PM---Another option is to use the Koblenz network collection [1], which offers even more (and larger) dat From: Sebastian Schelter s...@apache.org To: user@giraph.apache.org, Date: 10/02/2013 12:41 PM Subject: Re: workload used to measure Giraph performance number Another option is to use the Koblenz network collection [1], which offers even more (and larger) datasets than Snap. Best, Sebastian [1] http://konect.uni-koblenz.de/ On 02.10.2013 17:41, Alok Kumbhare wrote: There are a number real (medium sized) graphs at http://snap.stanford.edu/data/index.htmlwhich we use for similar benchmarks. It has a good mix of graph types, sparse/dense, ground truth graphs (e.g. social networks that follow power law distribution etc.). So far we have observed that the type of graph has a high impact on the performance of algorithms that Claudio mentioned. On Wed, Oct 2, 2013 at 8:22 AM, Claudio Martella claudio.marte...@gmail.com wrote: Hi Wei, it depends on what you mean by workload for a batch processing system. I believe we can split the problem in two: generating a realistic graph, and using "representative" algorithms. To generate graphs we have two options in giraph: 1) random graph: you specify the number of vertices and the number of edges for each vertex, and the edges will connect two random vertices. This creates a graph with (i) low clustering coefficient, (ii) low average path length, (ii) a uniform degree distribution 2) watts strogatz: you specify the number of vertices, the number of edges, and a rewire probability beta. giraph will generate a ring lattice (each vertex is connected to k preceeding vertices and k following vertices) and rewire some of the edges randomly. This will create a graph with (i) high clustering coefficient, (ii) low average path length, (iii) poisson-like degree distribution (depends on beta). This graph will resemble a small world graph such as a social network, except for the degree distribution which will not a power law. To use representative algorithms you can choose: 1) PageRank: it's a ranking algorithm where all the vertices are active and send messages along the edges at each superstep (hence you'll have O(V) active vertices and O(E) messages) 2) Shortest Paths: starting from a random vertex you'll visit al the vertices in the graph (some multiple times). This will have an aggregate O(V) active vertices and O(E) messages, but this is only a lower bound. In general you'l have different areas of the graph explored at each superstep, and hence potentially a varying workload across different supersteps. 3) Connected Components: this will have something opposite to (2) as it will have many active vertices at the
Re: zookeeper connection issue while running for second time
We did have this error a few times. This can happen due to GC pauses, so I would check the worker for long GC issues. Also, you can increase the ZooKeeper timeouts, see /** ZooKeeper session millisecond timeout */ IntConfOption ZOOKEEPER_SESSION_TIMEOUT = new IntConfOption(giraph.zkSessionMsecTimeout, MINUTES.toMillis(1), ZooKeeper session millisecond timeout); Currently, the default is one minute, but in production we set that number much, much higher (even greater than a day sometimes) to avoid the disconnection. Hope that helps, Avery On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote: Hi , I am able to run apache giraph successfully with around 500M pairs to find Connected components. It works great but not always, the issue seems to be with the time out zookeeper time out. Some of the client(around 5-10 ) out of 100, produces this error and the master fails due to this.Do you have any suggestions for this error. Any suggestions will be appreaciated. 2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to serverhad22.rsk.admobius.com/10.240.51.32:2181 http://had22.rsk.admobius.com/10.240.51.32:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket connection established tohad22.rsk.admobius.com/10.240.51.32:2181 http://had22.rsk.admobius.com/10.240.51.32:2181, initiating session 2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x441604c97412331 has expired, closing socket connection 2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got unknown null path event WatchedEvent state:Expired type:None path:null 2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-10-02 01:21:20,046 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 25 vertices at 1827.2925619484213 vertices/sec 1728790 edges at 12636.730317550928 edges/sec Memory (free/total/max) = 1745.60M / 2262.19M / 2730.69M 2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601 (v=261131, e=1808572) 2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException on /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168) at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159) ... 9 more -- Best Regards, Jyotirmoy Sundi Admobius San Francisco, CA 94158 On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi sundi...@gmail.com mailto:sundi...@gmail.com wrote: Hi , I got the connected component working for 1B nodes, but when I run the job again, it fails with the below error. Aprt form this in zookeeper the data is not cleared in the data directory. For successful jobs the data in zookeper from giraph is cleared. The following errors seems to be coming because the node tries to connect to the zookeeper with a session id which is cleared as seens in Client session timed out, have not heard from server in 68845ms for sessionid
Re: Exception Already has missing vertex on this worker
I think you may have added the same vertex 2x? That being said, I don't see why the code is this way. It should be fine. We should file a JIRA. On 9/26/13 11:02 AM, Yingyi Bu wrote: Thanks, Lukas! I think the reason of this exception is that I run the job over part of the graph where some target ids do not exist. Yingyi On Thu, Sep 26, 2013 at 1:13 AM, Lukas Nalezenec lukas.naleze...@firma.seznam.cz mailto:lukas.naleze...@firma.seznam.cz wrote: Hi, Do you use partition balancing ? Lukas On 09/26/13 05:16, Yingyi Bu wrote: Hi, I got this exception when I ran a Giraph-1.0.0 PageRank job over a 60 machine cluster with 28GB input data. But I got this exception: java.lang.IllegalStateException: run: Caught an unrecoverable exception resolveMutations: Already has missing vertex on this worker for 20464109 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.lang.IllegalStateException: resolveMutations: Already has missing vertex on this worker for 20464109 at org.apache.giraph.comm.netty.NettyWorkerServer.resolveMutations(NettyWorkerServer.java:184) at org.apache.giraph.comm.netty.NettyWorkerServer.prepareSuperstep(NettyWorkerServer.java:152) at org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:677) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:249) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) ... 7 more Does anyone know what is the possible cause of this exception? Thanks! Yingyi
Re: Exception Already has missing vertex on this worker
Hopefully you are using combiners and also re-using objects. This can keep memory usage much lower. Also implementing your own OutEdges can make it much more efficient. How much memory do you have? Avery On 9/26/13 12:51 PM, Yingyi Bu wrote: I think you may have added the same vertex 2x? I ran the job over roughly half of the graph and saw this. However the input is not a connected components such that there might be target vertex ids which do not exist. When I ran the job over the entire graph, I cannot see this but the job fails with exceeding GC limit (trying out-of-core now). Yingyi On Thu, Sep 26, 2013 at 12:05 PM, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: I think you may have added the same vertex 2x? That being said, I don't see why the code is this way. It should be fine. We should file a JIRA. On 9/26/13 11:02 AM, Yingyi Bu wrote: Thanks, Lukas! I think the reason of this exception is that I run the job over part of the graph where some target ids do not exist. Yingyi On Thu, Sep 26, 2013 at 1:13 AM, Lukas Nalezenec lukas.naleze...@firma.seznam.cz mailto:lukas.naleze...@firma.seznam.cz wrote: Hi, Do you use partition balancing ? Lukas On 09/26/13 05:16, Yingyi Bu wrote: Hi, I got this exception when I ran a Giraph-1.0.0 PageRank job over a 60 machine cluster with 28GB input data. But I got this exception: java.lang.IllegalStateException: run: Caught an unrecoverable exception resolveMutations: Already has missing vertex on this worker for 20464109 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.lang.IllegalStateException: resolveMutations: Already has missing vertex on this worker for 20464109 at org.apache.giraph.comm.netty.NettyWorkerServer.resolveMutations(NettyWorkerServer.java:184) at org.apache.giraph.comm.netty.NettyWorkerServer.prepareSuperstep(NettyWorkerServer.java:152) at org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:677) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:249) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) ... 7 more Does anyone know what is the possible cause of this exception? Thanks! Yingyi
Re: Counter limit
If you are running out of counters, you can turn off the superstep counters /** Use superstep counters? (boolean) */ BooleanConfOption USE_SUPERSTEP_COUNTERS = new BooleanConfOption(giraph.useSuperstepCounters, true, Use superstep counters? (boolean)); On 9/9/13 6:43 AM, Claudio Martella wrote: No, I used a different counters limit on that hadoop version. Setting mapreduce.job.counters.limit to a higher number and restarting JT and TT worked for me. Maybe 64000 might be too high? Try setting it to 512. Does not look like the case, but who knows. On Mon, Sep 9, 2013 at 2:57 PM, Christian Krause m...@ckrause.org mailto:m...@ckrause.org wrote: Sorry, it still doesn't work (I ran into a different problem before I reached the limit). I am using Hadoop 0.20.203.0 tel:0.20.203.0. Is the limit of 120 counters maybe hardcoded? Cheers Christian Am 09.09.2013 08 tel:09.09.2013%2008:29 schrieb Christian Krause m...@ckrause.org mailto:m...@ckrause.org: I changed the property name to mapred.job.counters.limit and restarted it again. Now it works. Thanks, Christian 2013/9/7 Claudio Martella claudio.marte...@gmail.com mailto:claudio.marte...@gmail.com did you restart TT and JT? On Sat, Sep 7, 2013 at 7:09 AM, Christian Krause m...@ckrause.org mailto:m...@ckrause.org wrote: Hi, I've increased the counter limit in mapred-site.xml, but I still get the error: Exceeded counter limits - Counters=121 Limit=120. Groups=6 Limit=50. This is my config: cat conf/mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration ... property namemapreduce.job.counters.limit/name value64000/value /property property namemapred.task.timeout/name value240/value /property ... /configuration Any ideas? Cheers, Christian -- Claudio Martella claudio.marte...@gmail.com mailto:claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com mailto:claudio.marte...@gmail.com
Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph
We have caches per every compute threads. Then we have w worker caches per compute thread. So the total amount of memory consumed by message caches per worker = Compute threads * workers * size of cache. The best thing is to tune down the size of the cache from MAX_MSG_REQUEST_SIZE to a size that works for your configuration. Hope that helps, Avery On 9/4/13 3:33 AM, Lukas Nalezenec wrote: Thanks, I was not sure if it really works as I described. Facebook can't be using it like this if, as described, they have billions of vertices and a trillion edges. Yes, its strange. I guess configuration does not help so much on large cluster. What might help are properties of input data. So do you, or Avery, have any idea how you might initialize this is a more reasonable way, and how??? Fast workaround is to set number of partitions to from W^2 to W or 2*W . It will help if you dont have very large number of workers. I would not change MAX_*_REQUEST_SIZE much since it may hurt performance. You can do some preprocessing before loading data to Giraph. How to change Giraph: The caches could be flushed if total sum of vertexes/edges in all caches exceeds some number. Ideally, it should prevent not only OutOfMemory errors but also raising high water mark. Not sure if it (preventing raising HWM) is easy to do. I am going to use almost-prebuild partitions. For my use case it would be ideal to detect if some cache is abandoned and i would not be used anymore. It would cut memory usage in caches from ~O(n^3) to ~O(n). It could be done by counting number of cache flushes or cache insertions and if some cache was not touched for long time it would be flushed. There could be separated configuration MAX_*_REQUEST_SIZE for per partition caches during loading data. I guess there should be simple but efficient way how to trace memory high-water mark. It could look like: Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb Iteration 1 XYZ Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb . . . Lukas On 09/04/13 01:12, Jeff Peters wrote: Thank you Lukas!!! That's EXACTLY the kind of model I was building in my head over the weekend about why this might be happening, and why increasing the number of AWS instances (and workers) does not solve the problem without increasing each worker's VM. Surely Facebook can't be using it like this if, as described, they have billions of vertices and a trillion edges. So do you, or Avery, have any idea how you might initialize this is a more reasonable way, and how??? On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec lukas.naleze...@firma.seznam.cz mailto:lukas.naleze...@firma.seznam.cz wrote: Hi I wasted few days on similar problem. I guess the problem was that during loading - if you have got W workers and W^2 partitions there are W^2 partition caches in each worker. Each cache can hold 10 000 vertexes by default. I had 26 000 000 vertexes, 60 workers - 3600 partitions. It means that there can be up to 36 000 000 vertexes in caches in each worker if input files are random. Workers were assigned 450 000 vertexes but failed when they had 900 000 vertexes in memory. Btw: Why default number of partitions is W^2 ? (I can be wrong) Lukas On 08/31/13 01:54, Avery Ching wrote: Ah, the new caches. =) These make things a lot faster (bulk data sending), but do take up some additional memory. if you look at GiraphConstants, you can find ways to change the cache sizes (this will reduce that memory usage). For example, MAX_EDGE_REQUEST_SIZE will affect the size of the edge cache. MAX_MSG_REQUEST_SIZE will affect the size of the message cache. The caches are per worker, so 100 workers would require 50 MB per worker by default. Feel free to trim it if you like. The byte arrays for the edges are the most efficient storage possible (although not as performance as the native edge stores). Hope that helps, Avery On 8/29/13 4:53 PM, Jeff Peters wrote: Avery, it would seem that optimizations to Giraph have, unfortunately, turned the majority of the heap into dark matter. The two snapshots are at unknown points in a superstep but I waited for several supersteps so that the activity had more or less stabilized. About the only thing comparable between the two snapshots are the vertexes, 192561 X RecsVertex in the new version and 191995 X Coloring in the old system. But with the new Giraph 672710176 out of 824886184 bytes are stored as primitive byte arrays. That's probably indicative of some very fine performance optimization work, but it makes it extremely difficult to know what's really out there, and why. I did notice that a number of caches have appeared
Re: Exception with Large Graphs
That error is from the master dying (likely due to the results of another worker dying). Can you do a rough calculation of the size of data that you expect to be loaded and check if the memory is enough? On 8/30/13 11:19 AM, Yasser Altowim wrote: Guys, Can someone please help me with this issue? Thanks. Best, Yasser *From:*Yasser Altowim *Sent:* Thursday, August 29, 2013 11:16 AM *To:* user@giraph.apache.org *Subject:* Exception with Large Graphs Hi, I am implementing an algorithm using Giraph, and I was able to run my algorithm on relatively small datasets (64,000,000 vertices and 128,000,000 edges). However, when I increase the size of the dataset to 128,000,000 vertices and 256,000,000 edges, the job takes so much time to load the vertices, and then it gives me the following exception. I have tried to increase the heap size and the task timeout value in the mapred-site.xml configuration file, and even vary the number of workers from 1 to 10, but still getting the same exceptions. I have a cluster of 10 nodes, and each node has a 4G of ram. Thanks in advance. 2013-08-29 10:22:53,150 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@1a129460 mailto:java.util.concurrent.FutureTask@1a129460 2013-08-29 10:22:53,151 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4 mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4 2013-08-29 10:23:07,938 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 7769685 vertices at 14250.953615591572 vertices/sec 15539370 edges at 28500.77593053654 edges/sec Memory (free/total/max) = 680.21M / 3207.44M / 3555.56M 2013-08-29 10:23:14,538 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 8019685 vertices at 14533.557468366102 vertices/sec 16039370 edges at 29065.97491865343 edges/sec Memory (free/total/max) = 906.80M / 3242.75M / 3555.56M 2013-08-29 10:23:21,888 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/9 (v=1212852, e=2425704) 2013-08-29 10:23:37,911 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19, overall roughly 7.518797% input splits reserved 2013-08-29 10:23:37,923 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19 from ZooKeeper and got input split 'org.apache.giraph.io.formats.multi.InputSplitWithInputFormatIndex@24004559' 2013-08-29 10:23:44,313 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 8482537 vertices at 14585.340134636266 vertices/sec 16965074 edges at 29169.59449002283 edges/sec Memory (free/total/max) = 538.93M / 3186.13M / 3555.56M 2013-08-29 10:23:49,963 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 8732537 vertices at 14870.726503632277 vertices/sec 17465074 edges at 29740.356341344923 edges/sec Memory (free/total/max) = 489.84M / 3222.56M / 3555.56M 2013-08-29 10:34:28,371 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@1a129460 mailto:java.util.concurrent.FutureTask@1a129460 2013-08-29 10:34:34,847 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4 mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4 2013-08-29 10:34:34,850 INFO org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window metrics MBytes/sec sent = 0, MBytes/sec received = 0.0161, MBytesSent = 0.0002, MBytesReceived = 12.3175, ave sent req MBytes = 0, ave received req MBytes = 0.0587, secs waited = 765.881 2013-08-29 10:34:35,698 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 649805ms for sessionid 0x140cb1140540006, closing socket connection and attempting reconnect 2013-08-29 10:34:42,471 WARN org.apache.giraph.bsp.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null 2013-08-29 10:34:42,472 WARN org.apache.giraph.worker.InputSplitsHandler: process: Problem with zookeeper, got event with path null, state Disconnected, event type None 2013-08-29 10:34:43,819 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave5.ericsson-magic.net/10.126.72.165:22181 2013-08-29 10:34:44,077 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to slave5.ericsson-magic.net/10.126.72.165:22181, initiating session 2013-08-29
Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph
Ah, the new caches. =) These make things a lot faster (bulk data sending), but do take up some additional memory. if you look at GiraphConstants, you can find ways to change the cache sizes (this will reduce that memory usage). For example, MAX_EDGE_REQUEST_SIZE will affect the size of the edge cache. MAX_MSG_REQUEST_SIZE will affect the size of the message cache. The caches are per worker, so 100 workers would require 50 MB per worker by default. Feel free to trim it if you like. The byte arrays for the edges are the most efficient storage possible (although not as performance as the native edge stores). Hope that helps, Avery On 8/29/13 4:53 PM, Jeff Peters wrote: Avery, it would seem that optimizations to Giraph have, unfortunately, turned the majority of the heap into dark matter. The two snapshots are at unknown points in a superstep but I waited for several supersteps so that the activity had more or less stabilized. About the only thing comparable between the two snapshots are the vertexes, 192561 X RecsVertex in the new version and 191995 X Coloring in the old system. But with the new Giraph 672710176 out of 824886184 bytes are stored as primitive byte arrays. That's probably indicative of some very fine performance optimization work, but it makes it extremely difficult to know what's really out there, and why. I did notice that a number of caches have appeared that did not exist before, namely SendEdgeCache, SendPartitionCache, SendMessageCache and SendMutationsCache. Could any of those account for a larger per-worker footprint in a modern Giraph? Should I simply assume that I need to force AWS to configure its EMR Hadoop so that each instance has fewer map tasks but with a somewhat larger VM max, say 3GB instead of 2GB? On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: Try dumping a histogram of memory usage from a running JVM and see where the memory is going. I can't think of anything in particular that changed... On 8/28/13 4:39 PM, Jeff Peters wrote: I am tasked with updating our ancient (circa 7/10/2012) Giraph to giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now runs out of memory using the same AWS elastic-mapreduce configuration we have always used. I have never tried to configure either Giraph or the AWS Hadoop. We build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS provides us. The 8 X m2.4xlarge cluster we use seems to provide 8*14=112 map tasks fitted out with 2GB heap each. Our code is completely unchanged except as required to adapt to the new Giraph APIs. Our vertex, edge, and message data are completely unchanged. On smaller jobs, that work, the aggregate heap usage high-water mark seems about the same as before, but the committed heap seems to run higher. I can't even make it work on a cluster of 12. In that case I get one map task that seems to end up with nearly twice as many messages as most of the others so it runs out of memory anyway. It only takes one to fail the job. Am I missing something here? Should I be configuring my new Giraph in some way I didn't used to need to with the old one?
Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph
Try dumping a histogram of memory usage from a running JVM and see where the memory is going. I can't think of anything in particular that changed... On 8/28/13 4:39 PM, Jeff Peters wrote: I am tasked with updating our ancient (circa 7/10/2012) Giraph to giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now runs out of memory using the same AWS elastic-mapreduce configuration we have always used. I have never tried to configure either Giraph or the AWS Hadoop. We build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS provides us. The 8 X m2.4xlarge cluster we use seems to provide 8*14=112 map tasks fitted out with 2GB heap each. Our code is completely unchanged except as required to adapt to the new Giraph APIs. Our vertex, edge, and message data are completely unchanged. On smaller jobs, that work, the aggregate heap usage high-water mark seems about the same as before, but the committed heap seems to run higher. I can't even make it work on a cluster of 12. In that case I get one map task that seems to end up with nearly twice as many messages as most of the others so it runs out of memory anyway. It only takes one to fail the job. Am I missing something here? Should I be configuring my new Giraph in some way I didn't used to need to with the old one?
Re: Workers input splits and MasterCompute communication
That makes sense, since the Context doesn't have a real InputSplit (it's a Giraph one - see BspInputSplit). What information are you trying to get out of the input splits? Giraph workers can process an arbitrary number of input splits (0 or more), so I don't think this will be useful. You can use Configuration if you need to set some information at runtime. Avery On 8/19/13 9:14 AM, Marco Aurelio Barbosa Fagnani Lotz wrote: Hello all :) I am having problems calling getContext().getInputSplit(); inside the compute() method in the workers. It always returns as if it didn't get any split at all, since inputSplit.getLocations() returns without the hosts that should have that split as local and inputSplit.getLength() returns 0. Should there be any initialization to the Workers context so that I can get this information? Is there anyway to access the jobContext from the workers or the Master? Best Regards, Marco Lotz *From:* Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk *Sent:* 17 August 2013 20:20 *To:* user@giraph.apache.org *Subject:* Workers input splits and MasterCompute communication Hello all :) In what class the workers actually get the input file splits from the file system? Is it possible to a MasterCompute class object to have access/communication with the workers in that job? I though about using aggregators, but then I assumed that aggregators actually work with vertices compute() (and related methods) and not with the worker itself. When I mean workers I don't mean the vertices in each worker, but the object that runs the compute for all the vertices in that worker. Best Regards, Marco Lotz
Re: New vertex allocation and messages
Yes, you can control this behavior with the VertexResolver. It handles all mutations to the graph and resolves them in a user defined way. Avery On 8/19/13 9:21 AM, Marco Aurelio Barbosa Fagnani Lotz wrote: Hello all :) I am programming an application that has to create and destroy a few vertices. I was wondering if there is any protection in Giraph to prevent a vertex to send a message to another vertex that does not exist (i.e. provide a vertex id that is not associated with a vertex yet). Is there a way to test if the destination vertex exists before sending the message to it? Also, when a vertex is created, is there any source of load balancing or it is always kept in the worker that created it? Best Regards, Marco Lotz
Re: MultiVertexInputFormat
This is doable in Giraph, you can use as many vertex or edge input formats as you like (via GIRAPH-639). You just need to choose MultiVertexInputFormat and/or MultiEdgeInputFromat See VertexInputFormatDescription for vertex input formats /** * VertexInputFormats description - JSON array containing a JSON array for * each vertex input. Vertex input JSON arrays contain one or two elements - * first one is the name of vertex input class, and second one is JSON object * with all specific parameters for this vertex input. For example: * [[VIF1,{p:v1}],[VIF2,{p:v2,q:v}]] */ public static final StrConfOption VERTEX_INPUT_FORMAT_DESCRIPTIONS = new StrConfOption(giraph.multiVertexInput.descriptions, null, VertexInputFormats description - JSON array containing a JSON + array for each vertex input. Vertex input JSON arrays contain + one or two elements - first one is the name of vertex input + class, and second one is JSON object with all specific parameters + for this vertex input. For example: [[\VIF1\,{\p\:\v1\}], + [\VIF2\,{\p\:\v2\,\q\:\v\}]]\); See EdgeInputFormatDescription for edge input formats /** * EdgeInputFormats description - JSON array containing a JSON array for * each edge input. Edge input JSON arrays contain one or two elements - * first one is the name of edge input class, and second one is JSON object * with all specific parameters for this edge input. For example: * [[EIF1,{p:v1}],[EIF2,{p:v2,q:v}]] */ public static final StrConfOption EDGE_INPUT_FORMAT_DESCRIPTIONS = new StrConfOption(giraph.multiEdgeInput.descriptions, null, EdgeInputFormats description - JSON array containing a JSON array + for each edge input. Edge input JSON arrays contain one or two + elements - first one is the name of edge input class, and second + one is JSON object with all specific parameters for this edge + input. For example: [[\EIF1\,{\p\:\v1\}], + [\EIF2\,{\p\:\v2\,\q\:\v\}]]); Hope that helps, Avery On 8/16/13 8:45 AM, Yasser Altowim wrote: Guys, any help with this will be appreciated. Thanks. *From:*Yasser Altowim [mailto:yasser.alto...@ericsson.com] *Sent:* Thursday, August 15, 2013 2:07 PM *To:* user@giraph.apache.org *Subject:* MultiVertexInputFormat Hi, I am implementing an algorithm using Giraph. My algorithm needs to read input data from two files, each has its own format. My questions are: 1.How can I use the MultiVertexInputFormat class? Is there any example that shows how this class can be used? 2.How can I specify this class when running my job using the Giraph Runner or using a driver class? Thanks in advance. *Best,* *Yasser*
Using Giraph at Facebook
Hi Giraphers, We recently released an article on we can use Giraph at the scale of a trillion edges at Facebook. If you're interested, please take a look! https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920 Avery
Re: Giraph vs good-old PVM/MPI ?
The Giraph/Pregel model is based on bulk synchronous parallel computing, where the programmer is abstracted from the details of how the parallelization occurs (infrastructure does this for you). Additionally the APIs are built for graph-processing. Since the computing model is well defined (BSP), the infrastructure can checkpoint the state of the application at the appropriate time and also handle failures without user interaction. MPI is a much lower level and generic API, where messages are send to processes. Users must pack/unpack their own messages and deliver messages to the appropriate data structures. Users must partition their own data. As of MPI 2, the state of a failed process leaves the application in an undefined state (usually dead). Hope that helps, Avery On 8/6/13 10:19 AM, Yang wrote: it seems that the paradigm offered by Giraph/Pregel is very similar to the programming paradim of PVM , and to a lesser degree, MPI. using PVM, we often engages in such iterative cycles where all the nodes sync on a barrier and then enters the next cycle. so what is the extra features offered by Giraph/Pregel? I can see persistence/restarting of tasks, and maybe abstraction of the user-code-specific part into the API so that users are not concerned with the actual message passing (message passing is done by the framework). Thanks Yang
Re: Missing classes SendMessageToAllCache / SendWorkerOneToAllMessagesRequest
This should be fixed now. On 7/20/13 12:20 PM, Avery Ching wrote: My bad. I am out but will fix in a few hours. On Jul 20, 2013 11:02 AM, Christian Krause m...@ckrause.org mailto:m...@ckrause.org wrote: Hi, I get these compile errors. Could it be that some classes are missing? Cheers, Christian [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) on project giraph-core: Compilation failure: Compilation failure: [ERROR] /home/christian/giraph-git/giraph-core/target/munged/main/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java:[24,30] cannot find symbol [ERROR] symbol: class SendMessageToAllCache [ERROR] location: package org.apache.giraph.comm [ERROR] /home/christian/giraph-git/giraph-core/target/munged/main/org/apache/giraph/comm/requests/RequestType.java:[41,5] cannot find symbol [ERROR] symbol: class SendWorkerOneToAllMessagesRequest [ERROR] location: class org.apache.giraph.comm.requests.RequestType [ERROR] /home/christian/giraph-git/giraph-core/target/munged/main/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java:[132,13] cannot find symbol [ERROR] symbol: class SendMessageToAllCache [ERROR] location: class org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessorI,V,E
Re: HBase EdgeInputFormat
I don't think it will be hard to implement. Just start with the HbaseVertexInputFormat and have it extend EdgeInputFormat. You can look at TableEdgeInputFormat for an example. It sounds like a good contribution to Giraph. On 7/18/13 1:57 PM, Puneet Jain wrote: I also need this feature. Will be really helpful. On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ emre.ala...@agmlab.com mailto:emre.ala...@agmlab.com wrote: Hi, Question: Will there be HBaseEdgeInputFormat class or is there a restriction of HBase thus we can't implement it? HBaseVertexInputFormat is fine for vertex-centric reading, i.e. each row in HBase corresponds to one Vertex. But it does not allow me to create duplicate vertices with the same ID. Now I have the case many rows in HBase can correspond to one Vertex, each representing sets of edges. Example: a1 - x y z a2 - t p a3 - k will be vertex a with edges to x y z t p k It gives me the intuition that if there existed HBaseEdgeInputFormat, I could solve this case. But it doesn't exist yet. -- --Puneet
Re: Avro input format available on Giraph?
Not that I know of. Since it is similar to JSON, you might want to take a look at JsonBase64VertexInputFormat as an example for Avro. Should be fairly similar in structure. Of course, it would be great if you can contribute it back to Giraph when you're done. =) Avery On 7/18/13 4:36 PM, Chuan Lei wrote: Hello, I just wonder does Giraph have an Avro input format reader, which can read avro input files? If not, could someone let me know where I can get started? For example, which input format class that I should extend from. Thanks in advance for your help. Regards, Chuan L.
Re: MapWritable messages in Giraph
Looks like the serialization/descrialization has a problem. If you want to see an example of a Trove primitive map, see LongDoubleArrayEdges. On 7/4/13 7:06 AM, Pasupathy Mahalingam wrote: Hi, Thanks Avery Ching. I get the following exception java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@381eb0c6 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@381eb0c6 at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111) at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73) at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192) at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:753) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:273) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) ... 7 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: next: IOException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232) at java.util.concurrent.FutureTask.get(FutureTask.java:91) at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271) at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143) ... 13 more Caused by: java.lang.IllegalStateException: next: IOException at org.apache.giraph.utils.ByteArrayVertexIdData$VertexIdDataIterator.next(ByteArrayVertexIdData.java:211) at org.apache.giraph.comm.messages.ByteArrayMessagesPerVertexStore.addPartitionMessages(ByteArrayMessagesPerVertexStore.java:116) at org.apache.giraph.comm.requests.SendWorkerMessagesRequest.doRequest(SendWorkerMessagesRequest.java:72) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:419) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:193) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: ensureRemaining: Only 393 bytes remaining, trying to read 8960 at org.apache.giraph.utils.UnsafeByteArrayInputStream.ensureRemaining(UnsafeByteArrayInputStream.java:114) at org.apache.giraph.utils.UnsafeByteArrayInputStream.readFully(UnsafeByteArrayInputStream.java:128) at org.apache.giraph.utils.UnsafeByteArrayInputStream.readUTF(UnsafeByteArrayInputStream.java:275) at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:199) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167) at org.apache.giraph.utils.ByteArrayVertexIdMessages.readData(ByteArrayVertexIdMessages.java:76) at org.apache.giraph.utils.ByteArrayVertexIdMessages.readData(ByteArrayVertexIdMessages.java:34) at org.apache.giraph.utils.ByteArrayVertexIdData$VertexIdDataIterator.next(ByteArrayVertexIdData.java:209) ... 12 more It will be great on how you use writable maps based on Trove/ Fast Util. Sample usage if you can share will be great. Rgds Pasupathy On Wed, Jul 3, 2013 at 10:19 PM, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: We don't use MapWritable. Internally we have a bunch of writable maps based on Trove or FastUtil for speed. What's your full exception stack trace? On 7/2/13 1:24 AM, Pasupathy Mahalingam wrote: Hi, I'm trying to send
Re: Bi-directional and multigraphs
You can easily add bi-directional edges. When you load the edge, simply also load the reciprocal edge. I.e. if you add a-b, also add b-a. On 7/2/13 1:11 AM, Pascal Jäger wrote: Hi everyone, I am currently getting my hands on giraph which is why I am trying to implement a maximum flow algorithm originally designed for MapReduce. The algorithm requires bi-directional edges. * Are bi-directional edges supported in giraph? * Where would I find them? Thanks Pascal
Re: Failed to compile Giraph for Hadoop YARN
Eli, any thoughts? On 7/3/13 9:27 AM, Chui-Hui Chiu wrote: Hello, I tried to compile the Giraph-1.1.0-SNAPSHOT for hadoop_2.0.3 or hadoop_yarn but all failed. The error message while the compile command is mvn -Phadoop_yarn compile is = [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Giraph Parent .. SUCCESS [1.320s] [INFO] Apache Giraph Core FAILURE [12.508s] [INFO] Apache Giraph Examples SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 14.473s [INFO] Finished at: Wed Jul 03 11:05:44 CDT 2013 [INFO] Final Memory: 14M/216M [INFO] [ERROR] Failed to execute goal on project giraph-core: Could not resolve dependencies for project org.apache.giraph:giraph-core:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-mapreduce-client-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-mapreduce-client-core:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-yarn-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-yarn-server-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, org.apache.hadoop:hadoop-yarn-server-tests:jar:tests:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION: Could not find artifact org.apache.hadoop:hadoop-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION in central (http://repo1.maven.org/maven2) - [Help 1] = The error message while the compile command is mvn -Phadoop_2.0.3 compile is = [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Giraph Parent .. SUCCESS [12.695s] [INFO] Apache Giraph Core SUCCESS [2:10.916s] [INFO] Apache Giraph Examples FAILURE [2.286s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 2:26.530s [INFO] Finished at: Wed Jul 03 11:03:25 CDT 2013 [INFO] Final Memory: 34M/348M [INFO] [ERROR] Failed to execute goal on project giraph-examples: Could not resolve dependencies for project org.apache.giraph:giraph-examples:jar:1.1.0-SNAPSHOT: Could not find artifact org.apache.giraph:giraph-core:jar:tests:1.1.0-SNAPSHOT in central (http://repo1.maven.org/maven2) - [Help 1] = Do I miss anything? I also noticed that my Maven 3 download many files from the maven2 folder on a remote server with the following prompt. Downloading: http://repo1.maven.org/maven2/org/apache/hadoop/... Is this a problem? Thanks, Chui-hui
Re: Running Giraph job inside Java code
Take a look at PageRankBenchmark, it is a stand alone java program that runs Giraph jobs. On 7/2/13 4:08 AM, Ahmet Emre Aladağ wrote: By the way, I have set the corresponding classes in the giraph configuration. GiraphConfiguration giraphConf = new GiraphConfiguration(config); giraphConf.setZooKeeperConfiguration( zooKeeperWatcher.getQuorum()); giraphConf.setComputationClass(LinkRankComputation.class); giraphConf.setMasterComputeClass(LinkRankVertexMasterCompute.class); giraphConf.setOutEdgesClass(ByteArrayEdges.class); giraphConf.setVertexInputFormatClass(NutchTableEdgeInputFormat.class); giraphConf.setVertexOutputFormatClass(NutchTableEdgeOutputFormat.class); giraphConf.setInt(giraph.pageRank.superstepCount, 40); giraphConf.setWorkerConfiguration(1, 1, 100.0f); giraphConf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME); giraphConf.set(TableOutputFormat.OUTPUT_TABLE, TABLE_NAME);
Re: Array exception when using out-of-core graph
Claudio, any thoughts? On 7/3/13 3:52 AM, Han JU wrote: Hi, I've been testing some algorithm using the out-of-core feature, and I have an strange ArrayIndexOutOfBoundsException. In my computation class, the vertex value is a custom writable class which contains a long[]. And during the computation, when the code access this array (say at index 0), the exception is thrown. Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at some.package.ProjectionComputation.compute(ProjectionComputation.java:87) at org.apache.giraph.graph.ComputeCallable.computePartition(ComputeCallable.java:226) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:161) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70) This happens only if out-of-core graph is enabled and the maxPartitionsInMemory is lower than the actual partitions. The vertex value class is solid in terms of serialization (proven by unit tests). The strange thing is that when the exception is thrown, the array index is perfectly legal. And I can even print the long value retrieved from the array ... So it seems to me that maybe it's not a problem within my code. Any suggestions? My programs base on the trunk. -- *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne GI06 - Fouille de Données et Décisionnel// +33 061960
Re: Is Zookeeper a must for Giraph?
Zookeeper is required. That being said, you can have an external Zookeeper or Giraph can start one for you. It's your choice. Eli is the one to contact regarding Giraph on Hadoop 2.0.5. Any thoughts Eli? Avery On 6/24/13 5:22 PM, Chuan Lei wrote: It is not clear to me that whether Zookeeper is required or optional to Giraph. I wonder if it is possible to run Giraph without Zookeeper. If not the case, would the default Zookeeper work with Giraph? Is there anything has to be changed on Zookeeper. Another question is that I have the following error message when I ran Giraph on Hadoop-2.0.5 when I ran the PageRankBenchmark program. I saw similar posts on the mailing list, but it seems no clear answer to it yet. I would be grateful if someone can answer my question and resolve the issue. Error: java.lang.IllegalStateException: run: Caught an unrecoverable exception java.io.FileNotFoundException: File _bsp/_defaultZkManagerDir/job_1372108933881_0002/_zkServer does not exist. at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File _bsp/_defaultZkManagerDir/job_1372108933881_0002/_zkServer does not exist. at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:790) at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357) at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) ... 7 more Caused by: java.io.FileNotFoundException: File _bsp/_defaultZkManagerDir/job_1372108933881_0002/_zkServer does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:405) at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:749) ... 11 more Regards, Chuan
Re: Restarting Algorithm Pattern
Rather than use voteToHalt, you could add an Aggregator that kept track of the alive vertices and then you can use an Aggregator to store/set your configuration value that the Master computation can modify. Do the logic in the Master computation and all should be well. Avery On 6/3/13 10:04 PM, David Gainer wrote: I have an algorithm where I'd like to iterative over the vertices with a configuration variable set to some value. Then, when all the vertices vote to halt, I'd like to reduce the configuration variable and repeat the inner iteration until some threshold of the configuration variable is reached. I was wondering what the natural way of programming that would be. It seems like a master Computing situation -- but I didn't see any method for un-halting vertices. I also wasn't sure when A vertex would ever be able to call its own wakeup function. Thanks, David
Re: External Documentation about Giraph
Improving our documentation is always very nice. Thanks for doing this you two! On 5/31/13 7:32 PM, Yazan Boshmaf wrote: Maria, I can help you with this if you are interested and have the time. If you are busy, please let me know and I will update the site docs with a variant of your tutorial. Thanks! On Thu, May 30, 2013 at 4:13 PM, Roman Shaposhnik r...@apache.org wrote: On Wed, May 29, 2013 at 2:25 PM, Maria Stylianou mars...@gmail.com wrote: Hello guys, This semester I'm doing my master thesis using Giraph in a daily basis. In my blog (marsty5.wordpress.com) I wrote some posts about Giraph, some of the new users may find them useful! And maybe some of the experienced ones can give me feedback and correct any mistakes :D So far, I described: 1. How to set up Giraph 2. What to do next - after setting up Giraph 3. How to run ShortestPaths 4. How to run PageRank Good stuff! As a shameless plug, one more way to install Giraph is via Apache Bigtop. All it takes is hooking one of these files: http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/label=fedora18/lastSuccessfulBuild/artifact/repo/bigtop.repo http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/label=opensuse12/lastSuccessfulBuild/artifact/repo/bigtop.repo to your yum/apt system and typing: $ sudo yum install hadoop-conf-pseudo giraph In fact we're about to release Bigtop 0.6.0 with Hadoop 2.0.4.1 and Giraph 1.0 -- so anybody's interested in helping us to test this stuff -- that would be really appreciated. Thanks, Roman. P.S. There's quite a few other platforms available as well: http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/
Re: Extra data on vertex
Best way is to add it to the vertex value. The vertex value is meant to store any data associated with a particular vertex. Hope that helps, Avery On 5/7/13 7:47 AM, Ahmet Emre Aladağ wrote: Hi, 1) What's the best way for storing extra data (such as URL) on a vertex? I thought this would be through a class variable but I could not find the way to access that variable from the neighbor. For example I'd like to remove the duplicate edges going towards the nodes with the same url (Duplicate Removal phase of LinkRank). How can I learn my neighbor's url variable: targetUrl? 2) Is removing edges like this a valid approach? public class LinkRankVertex extends VertexIntWritable, FloatWritable, NullWritable, FloatWritable { public String url; public void removeDuplicateLinks() { int targetId; String targetUrl; SetString urls = new HashSetString(); ArrayListEdgesIntWritable, NullWritable edges = new ArrayListEdgesIntWritable, NullWritable(); for (EdgeIntWritable, NullWritable edge : getEdges()) { targetId = edge.getTargetVertexId().get(); targetUrl = ...?? if (!urls.contains(targetUrl)) { urls.add(targetUrl); edges.add(edge); } } setEdges(edges); } } Thanks, Emre.
Re: TestJsonBase64Format failure on 1.0.0
) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:126) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) at org.apache.giraph.io.formats.TextVertexInputFormat$TextVertexReader.initialize(TextVertexInputFormat.java:96) at org.apache.giraph.io.formats.JsonBase64VertexInputFormat$JsonBase64VertexReader.initialize(JsonBase64VertexInputFormat.java:71) at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:120) at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) ... 7 more 2013-05-06 09:22:44,485 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201305052325_0013_m_02_0: Task attempt_201305052325_0013_m_02_0 failed to report status for 602 seconds. Killing! 2013-05-06 09:22:44,485 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201305052325_0013_m_02 has failed 1 times. 2013-05-06 09:22:44,485 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201305052325_0013 2013-05-06 09:22:44,485 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_201305052325_0013' Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com --- On *Mon, 5/6/13, Kiru Pakkirisamy /kirupakkiris...@yahoo.com/*wrote: From: Kiru Pakkirisamy kirupakkiris...@yahoo.com Subject: Re: Compiling 1.0.0 distribution To: user@giraph.apache.org, Avery Ching ach...@apache.org Date: Monday, May 6, 2013, 12:02 AM Yes, I am trying to run on my Ubuntu laptop. Let me look at the log files. Thanks for the help. Much appreciated. Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com --- On *Sun, 5/5/13, Avery Ching /ach...@apache.org/* wrote: From: Avery Ching ach...@apache.org Subject: Re: Compiling 1.0.0 distribution To: user@giraph.apache.org Cc: Kiru Pakkirisamy kirupakkiris...@yahoo.com Date: Sunday, May 5, 2013, 11:51 PM My guess is that you don't have enough workers to run the job and the master kills the job (i.e. are you running on a single machine setup?). You can try to run first with one worker (this will take 2 map slots - one for the master and one for the worker). You can also look at the logs from map task 0 to see more clearly what the error was. Avery On 5/5/13 11:16 PM, Kiru Pakkirisamy wrote: Yup, I did a mvn3 install and then a mvn3 compile to get around that already. Right now, I am trying to run the PageRank, even after a few runs I have not had one successful run . The maps progress decreases in percentage (second time around) !! I have never seen this before (?) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com --- On *Sun, 5/5/13, Roman Shaposhnik /r...@apache.org/* wrote: From: Roman Shaposhnik r...@apache.org Subject: Re: Compiling 1.0.0 distribution To: user@giraph.apache.org Date: Sunday, May 5, 2013, 10:50 PM To pile on top of that -- you can also run mvn -pl module-name from the top level to short-circuit the build to that module (and yet still honor the dependencies). Thanks, Roman. On Sun, May 5, 2013 at 10:44 PM, Avery Ching ach...@apache.org wrote: The easiest way is to compile from the base directory, which will build everything. You can build individual directories, but you have to install the core jars first (i.e. go to giraph-core and do 'mvn clean install'). Then you can build the directory of your choice. Hope that helps, Avery On 5/5/13 11:11 AM, Kiru Pakkirisamy wrote: Hi, I am unable to compile giraph-examples because it is not able to reach the core jar files on the repo. Why doesn't it pick it up from the root build dir ? Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com http
Re: Compiling 1.0.0 distribution
My guess is that you don't have enough workers to run the job and the master kills the job (i.e. are you running on a single machine setup?). You can try to run first with one worker (this will take 2 map slots - one for the master and one for the worker). You can also look at the logs from map task 0 to see more clearly what the error was. Avery On 5/5/13 11:16 PM, Kiru Pakkirisamy wrote: Yup, I did a mvn3 install and then a mvn3 compile to get around that already. Right now, I am trying to run the PageRank, even after a few runs I have not had one successful run . The maps progress decreases in percentage (second time around) !! I have never seen this before (?) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com --- On *Sun, 5/5/13, Roman Shaposhnik /r...@apache.org/* wrote: From: Roman Shaposhnik r...@apache.org Subject: Re: Compiling 1.0.0 distribution To: user@giraph.apache.org Date: Sunday, May 5, 2013, 10:50 PM To pile on top of that -- you can also run mvn -pl module-name from the top level to short-circuit the build to that module (and yet still honor the dependencies). Thanks, Roman. On Sun, May 5, 2013 at 10:44 PM, Avery Ching ach...@apache.org /mc/compose?to=ach...@apache.org wrote: The easiest way is to compile from the base directory, which will build everything. You can build individual directories, but you have to install the core jars first (i.e. go to giraph-core and do 'mvn clean install'). Then you can build the directory of your choice. Hope that helps, Avery On 5/5/13 11:11 AM, Kiru Pakkirisamy wrote: Hi, I am unable to compile giraph-examples because it is not able to reach the core jar files on the repo. Why doesn't it pick it up from the root build dir ? Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com http://webcloudtech.wordpress.com
[VOTE] Release Giraph 1.0 (rc0)
Fellow Giraphers, We have a our first release candidate since graduating from incubation. This is a source release, primarily due to the different versions of Hadoop we support with munge (similar to the 0.1 release). Since 0.1, we've made A TON of progress on overall performance, optimizing memory use, split vertex/edge inputs, easy interoperability with Apache Hive, and a bunch of other areas. In many ways, this is an almost totally different codebase. Thanks everyone for your hard work! Apache Giraph has been running in production at Facebook (against Facebook's Corona implementation of Hadoop - https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona) since around last December. It has proven to be very scalable, performant, and enables a bunch of new applications. Based on the drastic improvements and the use of Giraph in production, it seems appropriate to bump up our version to 1.0. While anyone can vote, the ASF requires majority approval from the PMC -- i.e., at least three PMC members must vote affirmatively for release, and there must be more positive than negative votes. Releases may not be vetoed. Before voting +1 PMC members are required to download the signed source code package, compile it as provided, and test the resulting executable on their own platform, along with also verifying that the package meets the requirements of the ASF policy on releases. Please test this against many other Hadoop versions and let us know how this goes! Release notes: http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html Release artifacts: http://people.apache.org/~aching/giraph-1.0-RC0/ Corresponding git tag: https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0 Signing keys: http://people.apache.org/keys/group/giraph.asc The vote runs for 72 hours, until Monday 4pm PST. Thanks everyone for your patience with this release! Avery
Re: about fault tolerance in Giraph
Hi Yuanyuan, We haven't tested this feature in a while. But it should work. What did the job report about why it failed? Avery On 3/18/13 10:22 AM, Yuanyuan Tian wrote: Can anyone help me answer the question? Yuanyuan From: Yuanyuan Tian/Almaden/IBM@IBMUS To: user@giraph.apache.org Date: 03/15/2013 02:05 PM Subject: about fault tolerance in Giraph Hi I was testing the fault tolerance of Giraph on a long running job. I noticed that when one of the worker throw an exception, the whole job failed without retrying the task, even though I turned on the checkpointing and there were available map slots in my cluster. Why wasn't the fault tolerance mechanism working? I was running a version of Giraph downloaded sometime in June 2012 and I used Netty for the communication layer. Thanks, Yuanyuan
Re: Congrats to our newest PMC member, Eli Reisman
Congrats Eli! On 3/15/13 9:03 PM, Eli Reisman wrote: Thanks! I look forward to many more enjoyable toils in the future! Send the decoder ring. I'm already wearing the robe ;) On Fri, Mar 15, 2013 at 2:07 PM, Alessandro Presta alessan...@fb.com mailto:alessan...@fb.com wrote: Well done, Eli! Sent from my iPhone On Mar 15, 2013, at 2:04 PM, Jakob Homan jghoman@gmail.com mailto:jghoman@gmail.com wrote: I'm happy to announce that the Apache Giraph PMC has elected Eli Reisman to the PMC in recognition of his sustained and substantial contributions over the past year. Most recently, he's been toiling away at getting Giraph onto YARN, which is a huge win. Congrats, Eli. Your robe and secret decoder ring are in the mail. -Jakob on behalf of the Giraph PMC
Re: Zookeeper exception while running SimpleShortestPathsVertexTest
I think those are info level logs rather than actual issues. If your job completes successfully, I wouldn't worry about it. On 3/8/13 12:31 PM, Ameet Kini wrote: Hi folks, I am trying to run the SimpleShortestPathsVertexTest example introduced by the unit testing tool as part of (https://issues.apache.org/jira/browse/GIRAPH-51) and see the below zookeeper exception while running the testToyData method. I can run giraph applications from the command-line and have confirmed that the worker node can bring up zookeeper ok. Is there a configuration step I am missing while running the unit test tool? Thanks, Ameet [14:56:25] INFO: [ZooKeeperServerMain] Starting server [14:56:25] INFO: [GiraphJob] run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4) [14:56:25] INFO: [ZooKeeperServer] Server environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT [14:56:25] INFO: [ZooKeeperServer] Server environment:host.name=dodo [14:56:25] INFO: [ZooKeeperServer] Server environment:java.version=1.6.0_24 [14:56:25] INFO: [ZooKeeperServer] Server environment:java.vendor=Sun Microsystems Inc. [14:56:25] INFO: [ZooKeeperServer] Server environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre [14:56:25] INFO: [ZooKeeperServer] Server environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/a\ md64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib\ /jni:/lib:/usr/lib [14:56:25] INFO: [ZooKeeperServer] Server environment:java.io.tmpdir=/tmp [14:56:25] INFO: [ZooKeeperServer] Server environment:java.compiler=NA [14:56:25] INFO: [ZooKeeperServer] Server environment:os.name=Linux [14:56:25] INFO: [ZooKeeperServer] Server environment:os.arch=amd64 [14:56:25] INFO: [ZooKeeperServer] Server environment:os.version=3.2.0-29-generic [14:56:25] INFO: [ZooKeeperServer] Server environment:user.name=akini [14:56:25] INFO: [ZooKeeperServer] Server environment:user.home=/home/akini [14:56:25] INFO: [ZooKeeperServer] Server environment:user.dir=/home/jakini/workspace/giraph_test [14:56:25] INFO: [ZooKeeperServer] tickTime set to 2000 [14:56:25] INFO: [ZooKeeperServer] minSessionTimeout set to 1 [14:56:25] INFO: [ZooKeeperServer] maxSessionTimeout set to 10 [14:56:25] WARN: [JobClient] Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [14:56:25] INFO: [NIOServerCnxn] binding to port 0.0.0.0/0.0.0.0:22182 [14:56:25] INFO: [FileTxnSnapLog] Snapshotting: 0 [14:56:25] INFO: [JobClient] Running job: job_201303070954_0007 [14:56:26] INFO: [JobClient] map 0% reduce 0% [14:56:30] INFO: [NIOServerCnxn] Accepted socket connection from /127.0.0.1:43076 [14:56:30] INFO: [NIOServerCnxn] Client attempting to establish new session at /127.0.0.1:43076 [14:56:30] INFO: [FileTxnLog] Creating new log file: log.1 [14:56:31] INFO: [NIOServerCnxn] Established session 0x13d4b936bc3 with negotiated timeout 6 for client /127.0.0.1:43076 [14:56:31] INFO: [NIOServerCnxn] Accepted socket connection from /127.0.0.1:43077 [14:56:31] INFO: [NIOServerCnxn] Client attempting to establish new session at /127.0.0.1:43077 [14:56:31] INFO: [PrepRequestProcessor] Got user-level KeeperException when processing sessionid:0x13d4b936bc3 type:create cxid:0x1 zxid:0xfffe txntype:un\ known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007/_masterElectionDir Error:KeeperErrorCode = NoNode for /_hadoopBsp/job_201303070954_0007/_masterElectionDir [14:56:31] INFO: [NIOServerCnxn] Established session 0x13d4b936bc30001 with negotiated timeout 6 for client /127.0.0.1:43077 [14:56:31] INFO: [PrepRequestProcessor] Got user-level KeeperException when processing sessionid:0x13d4b936bc30001 type:create cxid:0x1 zxid:0xfffe txntype:un\ known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_201303070954_0007/_masterJobState [14:56:31] INFO: [PrepRequestProcessor] Got user-level KeeperException when processing sessionid:0x13d4b936bc3 type:create cxid:0xc zxid:0xfffe txntype:un\ known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir/0 Error:KeeperErrorCode = NoNode for /_hadoopBsp/job_201303070954_0007/_applicationA\ ttemptsDir/0 [14:56:31] INFO: [PrepRequestProcessor] Got user-level KeeperException when processing sessionid:0x13d4b936bc30001 type:create cxid:0x3 zxid:0xfffe txntype:un\ known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_201303070954_0007/_applicatio\ nAttemptsDir [14:56:31] INFO: [PrepRequestProcessor] Got user-level KeeperException when
Re: Using HiveGiraphRunner with dependencies
Yeah, this is where things get a bit tricky. You'll have to experiment with what works for you, but we are using Hive to launch the job with the jar.sh script. This gets the environment straight from the Hive side. jar_help () { echo Used for applications that require Hadoop and Hive classpath and environment. echo ./hive --service jar yourjar yourclass HIVE_OPTS your_args } Avery On 1/17/13 4:49 PM, pradeep kumar wrote: Hi, Actually we are trying to use giraph in our project for graph analysis with hive, so far it was good build was successful shortestpath example ran fine but working with hive is been a real issue. we started with command line hadoop jar giraph-hcatalog-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.io.hcatalog.HiveGiraphRunner -db default -vertexClass org.apache.giraph.vertex.Vertex -vertexInputFormatClass org.apache.giraph.io.hcatalog.HCatalogVertexInputFormat -vertexOutputFormatClass org.apache.giraph.io.hcatalog.HCatalogVertexOutputFormat -w 1 -vi testinput -o testoutput -hiveconf javax.jdo.option.ConnectionURL=jdbc:mysql://localhost/metastore -hiveconf javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver -hiveconf javax.jdo.option.ConnectionUserName=root -hiveconf javax.jdo.option.ConnectionPassword=root -hiveconf datanucleus.autoCreateSchema=false -hiveconf datanucleus.fixedDatastore=true is it a wrong way of doing it.. because we are running into exception while doing so.. and if its wrong, then any suggestion on how can we proceed will be a great help. Regards, Pradeep
Re: Code failing for the large data
This looks like 0.1 (still using Hadoop RPC). Please try trunk instead. Avery On 1/10/13 1:09 AM, pankaj Gulhane wrote: Hi, My code is working on smaller (very very small) dataset but if I use the same code on the large dataset it fails. Following code is some basic implementation of naive PageRank (just for testing). When I run with 4-5 vertices it works properly but when run for thousands of vertices it fails with the following error error java.lang.IllegalStateException: run: Caught an unrecoverable exception setup: Offlining servers due to exception... at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.RuntimeException: setup: Offlining servers due to exception... at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) ... 7 more Caused by: java.lang.IllegalStateException: setup: loadVertices failed at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) ... 8 more Caused by: java.lang.NullPointerException at org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:817) at org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575) ... 9 more /error code public class PageRank implements Tool{ /** Configuration from Configurable */ private Configuration conf; public static String SUPERSTEP_COUNT = PageRankBenchmark.superstepCount; public static class PageRankHashMapVertex extends HashMapVertex LongWritable, DoubleWritable, DoubleWritable, DoubleWritable { @Override public void compute(IteratorDoubleWritable msgIterator) { if (getSuperstep() = 1) { double sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } DoubleWritable vertexValue = new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum); setVertexValue(vertexValue); } if (getSuperstep() getConf().getInt(SUPERSTEP_COUNT,4)) { long edges = getNumOutEdges(); sendMsgToAllEdges( new DoubleWritable(getVertexValue().get() / edges)); } voteToHalt(); } } @Override public Configuration getConf() { return conf; } @Override public void setConf(Configuration conf) { this.conf = conf; } @Override public int run(String[] args) throws Exception { GiraphJob job = new GiraphJob(getConf(), getClass().getName()); // job.setJarByClass(getClass()); job.setVertexClass(PageRankHashMapVertex.class); job.setVertexInputFormatClass(LongDoubleDoubleAdjacencyListVertexInputFormat.class); job.setVertexOutputFormatClass(IdWithValueTextOutputFormat.class); job.setWorkerConfiguration(200, 200, 100.0f); job.setJobName(Testing PG); job.getConfiguration().setInt(SUPERSTEP_COUNT, 2); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return (job.run(true) == true ? 0 : 1); } public static void main(String[] args) throws Exception { System.exit(ToolRunner.run(new PageRank(), args)); } } /code Any pointers/help on the mistake I may be doing would be great? Thanks, Pankaj PS: I am running on a cluster with more than 400 mapper slots.
Re: Breadth-first search
We are running several Giraph applications in production using our version of Hadoop (Corona) at Facebook. The part you have to be careful about is ensuring you have enough resources for your job to run. But otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many more edges). Avery On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote: Hi: I implemented a graph algorithm to recommend content to our users. Although it is working (implementation uses Mahout) it very inefficient because I have to run many iterations in order to perform a breadth-first search on my graph. I would like to use Giraph for that task. I would like to know if it is production ready. I'm running jobs on Amazon EMR. Thanks in advance. Gustavo
Re: What a worker really is and other interesting runtime information
Oh, forgot one thing. You need to set the number of partitions to use single each thread works on a single partition at a time. Try -Dhash.userPartitionCount=number of threads On 11/28/12 5:29 AM, Alexandros Daglis wrote: Dear Avery, I followed your advice, but the application seems to be totally thread-count-insensitive: I literally observe zero scaling of performance, while I increase the thread count. Maybe you can point out if I am doing something wrong. - Using only 4 cores on a single node at the moment - Input graph: 14 million vertices, file size is 470 MB - Running SSSP as follows: hadoop jar target/giraph-0.1-jar-with-dependencies.jar org.apache.giraph.examples.SimpleShortestPathsVertex -Dgiraph.SplitMasterWorker=false -Dgiraph.numComputeThreads=X input output 12 1 where X=1,2,3,12,30 - I notice a total insensitivity to the number of thread I specify. Aggregate core utilization is always approximately the same (usually around 25-30% = only one of the cores running) and overall execution time is always the same (~8 mins) Why is Giraph's performance not scaling? Is the input size / number of workers inappropriate? It's not an IO issue either, because even during really low core utilization, time is wasted on idle, not on IO. Cheers, Alexandros On 28 November 2012 11:13, Alexandros Daglis alexandros.dag...@epfl.ch mailto:alexandros.dag...@epfl.ch wrote: Thank you Avery, that helped a lot! Regards, Alexandros On 27 November 2012 20:57, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: Hi Alexandros, The extra task is for the master process (a coordination task). In your case, since you are using a single machine, you can use a single task. -Dgiraph.SplitMasterWorker=false and you can try multithreading instead of multiple workers. -Dgiraph.numComputeThreads=12 The reason why cpu usage increases is due to netty threads to handle network requests. By using multithreading instead, you should bypass this. Avery On 11/27/12 9:40 AM, Alexandros Daglis wrote: Hello everybody, I went through most of the documentation I could find for Giraph and also most of the messages in this email list, but still I have not figured out precisely what a worker really is. I would really appreciate it if you could help me understand how the framework works. At first I thought that a worker has a one-to-one correspondence to a map task. Apparently this is not exactly the case, since I have noticed that if I ask for x workers, the job finishes after having used x+1 map tasks. What is this extra task for? I have been trying out the example SSSP application on a single node with 12 cores. Giving an input graph of ~400MB and using 1 worker, around 10 GBs of memory are used during execution. What intrigues me is that if I use 2 workers for the same input (and without limiting memory per map task), double the memory will be used. Furthermore, there will be no improvement in performance. I rather notice a slowdown. Are these observations normal? Might it be the case that 1 and 2 workers are very few and I should go to the 30-100 range that is the proposed number of mappers for a conventional MapReduce job? Finally, a last observation. Even though I use only 1 worker, I see that there are significant periods during execution where up to 90% of the 12 cores computing power is consumed, that is, almost 10 cores are used in parallel. Does each worker spawn multiple threads and dynamically balances the load to utilize the available hardware? Thanks a lot in advance! Best, Alexandros
Re: What a worker really is and other interesting runtime information
Hi Alexandros, The extra task is for the master process (a coordination task). In your case, since you are using a single machine, you can use a single task. -Dgiraph.SplitMasterWorker=false and you can try multithreading instead of multiple workers. -Dgiraph.numComputeThreads=12 The reason why cpu usage increases is due to netty threads to handle network requests. By using multithreading instead, you should bypass this. Avery On 11/27/12 9:40 AM, Alexandros Daglis wrote: Hello everybody, I went through most of the documentation I could find for Giraph and also most of the messages in this email list, but still I have not figured out precisely what a worker really is. I would really appreciate it if you could help me understand how the framework works. At first I thought that a worker has a one-to-one correspondence to a map task. Apparently this is not exactly the case, since I have noticed that if I ask for x workers, the job finishes after having used x+1 map tasks. What is this extra task for? I have been trying out the example SSSP application on a single node with 12 cores. Giving an input graph of ~400MB and using 1 worker, around 10 GBs of memory are used during execution. What intrigues me is that if I use 2 workers for the same input (and without limiting memory per map task), double the memory will be used. Furthermore, there will be no improvement in performance. I rather notice a slowdown. Are these observations normal? Might it be the case that 1 and 2 workers are very few and I should go to the 30-100 range that is the proposed number of mappers for a conventional MapReduce job? Finally, a last observation. Even though I use only 1 worker, I see that there are significant periods during execution where up to 90% of the 12 cores computing power is consumed, that is, almost 10 cores are used in parallel. Does each worker spawn multiple threads and dynamically balances the load to utilize the available hardware? Thanks a lot in advance! Best, Alexandros
Re: java.net.ConnectException: Connection refused
The connect exception is fine, it usually takes more than one connect attempt to zk. The reason your job failed is due to not having enough simultaneous map tasks on your Hadoop instance. See http://svn.apache.org/repos/asf/giraph/trunk/README for details on running in pseudo-distributed mode. Avery On 10/17/12 11:09 AM, rodrigo zerbini wrote: Hello, everybody. I'm trying to run the shortest paths example with the command below: hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -if org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexInputFormat -ip shortestPathsInputGraph -of org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexOutputFormat -op shortestPathsOutputGraph -w 3 However, it didn't work. In jobtracker I found that some jobs failed. I had 4 killed tasks. Below you can see the log of the first task. I got a ConnectException. Does anyone have some ideia why this connection was refused? Thanks in advance. 2012-10-16 17:40:40,788 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2012-10-16 17:40:42,331 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists! 2012-10-16 17:40:44,019 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : null 2012-10-16 17:40:44,729 INFO org.apache.giraph.graph.GraphMapper: setup: Set log level to info 2012-10-16 17:40:44,729 INFO org.apache.giraph.graph.GraphMapper: Distributed cache is empty. Assuming fatjar. 2012-10-16 17:40:44,729 INFO org.apache.giraph.graph.GraphMapper: setup: classpath @ /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/jars/job.jar 2012-10-16 17:40:45,514 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201210161739_0001 2012-10-16 17:40:45,531 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201210161739_0001/_task/practivate.adobe.com http://practivate.adobe.com 0 2012-10-16 17:40:47,160 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [practivate.adobe.com http://practivate.adobe.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0 2012-10-16 17:40:47,233 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201210161739_0001/zkServerList_practivate.adobe.com http://zkServerList_practivate.adobe.com 0 ' 2012-10-16 17:40:48,029 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_practivate.adobe.com http://zkServerList_practivate.adobe.com 0 ' (polling period is 3000) 2012-10-16 17:40:48,030 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [practivate.adobe.com http://practivate.adobe.com, 0] 2 hosts in filename 'zkServerList_practivate.adobe.com http://zkServerList_practivate.adobe.com 0 ' 2012-10-16 17:40:48,142 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper 2012-10-16 17:40:48,300 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper/zoo.cfg in /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper with base port 22181 2012-10-16 17:40:48,300 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true 2012-10-16 17:40:48,300 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false 2012-10-16 17:40:48,643 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper/zoo.cfg] in directory /tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper 2012-10-16 17:40:48,803 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to practivate.adobe.com:22181 http://practivate.adobe.com:22181 with poll msecs = 3000 2012-10-16 17:40:48,946 WARN org.apache.giraph.zk.ZooKeeperManager:
Re: Giraph with DB system
Answers inline. On 10/5/12 1:58 AM, Gergely Svigruha wrote: Hi, I have a few questions regarding Giraph. 1) Is is possible to use Giraph for local traversals in the graph? For example if I want to do some computing on the neighbours of the node with id xy is it possible to get the reference of the xy vertex (or just send a message to it) then send some messages to its neighbours etc, but not do any computation on any other vertices? In my opinion, this is was Graph DBs are for, not a large-scale batch processing system like Giraph. 2) Is it possible to combine Giraph with HBase or any other DBMS? Yes, Giraph can use HBase or another DBMS as a backend storage system (see giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/) 3) Is it possible to run Giraph on a server continuously after the graph has been built then process several jobs based on request? Or Giraph only can be interpreted in the context of a (one) Hadoop job. Again, think of Giraph as a batch processing system. Thanks, and please set me strait if I completely misunderstand something! Hope this helps! Greg
Re: Getting SimpleTriangleClosingVertex to run
I don't think the types are compatible. public class SimpleTriangleClosingVertex extends EdgeListVertex IntWritable, SimpleTriangleClosingVertex.IntArrayListWritable, NullWritable, IntWritable You'll need to use an input format and output format that fits these types. Otherwise the issue is likely to be serialization/deserialization here. On 9/23/12 10:44 PM, Vernon Thommeret wrote: I'm trying to get the SimpleTriangleClosingVertex to run, but getting this error: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: null at org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:923) at org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:327) at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:604) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:377) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:578) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server This is the diff that causes the issue: @@ -33,7 +33,7 @@ import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.giraph.graph.GiraphJob; -import org.apache.giraph.graph.IntIntNullIntVertex; +import org.apache.giraph.examples.SimpleTriangleClosingVertex; import org.apache.giraph.io.IntIntNullIntTextInputFormat; import org.apache.giraph.io.AdjacencyListTextVertexOutputFormat; @@ -44,16 +44,12 @@ import org.apache.log4j.Logger; /** * Simple function to return the in degree for each vertex. */ -public class SharedConnectionsVertex extends IntIntNullIntVertex implements Tool { +public class SharedConnections implements Tool { private Configuration conf; private static final Logger LOG = Logger.getLogger(SharedConnections.class); - public void compute(IterableIntWritable messages) { -voteToHalt(); - } - @Override public final int run(final String[] args) throws Exception { Options options = new Options(); @@ -71,7 +67,7 @@ public class SharedConnections extends IntIntNullIntVertex implements Tool { GiraphJob job = new GiraphJob(getConf(), getClass().getName()); -job.setVertexClass(SharedConnections.class); +job.setVertexClass(SimpleTriangleClosingVertex.class); job.setVertexInputFormatClass(IntIntNullIntTextInputFormat.class); job.setVertexOutputFormatClass(AdjacencyListTextVertexOutputFormat.class); job.setWorkerConfiguration(10, 10, 100.0f); -- I.e. I have a dummy job that just outputs the vertices which works, but trying to switch the vertex class doesn't seem to work. I'm running the latest version of Giraph (rev 1388628). Should this work or should I try something different? Thanks! Vernon
Please welcome our newest committer, Maja!
The Giraph PMC has voted to extend Maja Kabiljo an offer to be a Giraph committer and she has graciously accepted!. Maja has been doing some amazing work in out-of-core messaging and improving aggregators. Here are a list of some of her contributions. GIRAPH-327: Timesout values in BspServiceMaster.barrierOnWorkerList (majakabiljo via ereisman) GIRAPH-323: Check if requests are done before calling wait (majakabiljo via ereisman) GIRAPH-298: Reduce timeout for TestAutoCheckpoint. (majakabiljo via aching) GIRAPH-317: Add subpackages to comm (Maja Kabiljo via ereisman) GIRAPH-313: Open Netty client and server on master. (majakabiljo via aching) GIRAPH-303: Regression: cleanup phase happens earlier than it should. (majakabiljo via apresta) GIRAPH-296: TotalNumVertices and TotalNumEdges are not saved in checkpoint. (majakabiljo via apresta) GIRAPH-297: Checkpointing on master is done one superstep later (majakabiljo via aching). GIRAPH-259: TestBspBasic.testBspPageRank is broken (majakabiljo via apresta) GIRAPH-287: Add option to limit the number of open requests. (Maja Kabiljo via jghoman) GIRAPH-45: Improve the way to keep outgoing messages (majakabiljo via aching). GIRAPH-266: Average aggregators don't calculate real average (majakabiljo via aching). GIRAPH-257: TestBspBasic.testBspMasterCompute is broken (majakabiljo via aching). GIRAPH-81: Create annotations on provided algorithms for cli (majakabiljo via aching). In the spirit of your first commit, Maja, please take a look at https://issues.apache.org/jira/browse/GIRAPH-335 . Welcome Maja and happy Giraphing! Avery Ching
Re: reason behind a java.io.EOFException
) at java.lang.Thread.run(Thread.java:680) On Tue, Sep 11, 2012 at 7:53 AM, Avery Ching ach...@apache.org wrote: These days we are focusing more on the netty IPC. Can you try -Dgiraph.useNetty=true? Avery On 9/10/12 2:08 PM, Franco Maria Nardini wrote: Dear all, I am working with Giraph 0.2/Hadoop 1.0.3. In particular, I am trying to execute the following code: hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimplePageRankVertex \ -w 2 \ -if org.apache.giraph.examples.SimplePageRankVertex\$SimplePageRankVertexInputFormat -ip bigGraph.txt \ -of org.apache.giraph.io.IdWithValueTextOutputFormat -op output \ -mc org.apache.giraph.examples.SimplePageRankVertex\$HDFSBasedPageRankVertexMasterCompute If I set the number of workers equal to two, one of the mappers produce: ava.lang.RuntimeException: java.io.IOException: Call to zipottero.local/172.20.10.3:30001 failed on local exception: java.io.EOFException at org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:923) at org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:327) at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:604) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:377) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:578) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: Call to zipottero.local/172.20.10.3:30001 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107) at org.apache.hadoop.ipc.Client.call(Client.java:1075) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy3.putVertexList(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:920) ... 11 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749) while it perfectly works if the number of workers is set to 1. I am experiencing the problem both on small and big graphs. Any idea of the reasons behind this behavior? Thanks a lot in advance. Best, FM -- Franco Maria Nardini High Performance Computing Laboratory Istituto di Scienza e Tecnologie dell’Informazione (ISTI) Consiglio Nazionale delle Ricerche (CNR) Via G. Moruzzi, 1 56124, Pisa, Italy Phone: +39 050 315 3496 Fax: +39 050 315 2040 Mail: francomaria.nard...@isti.cnr.it Skype: francomaria.nardini Web: http://hpc.isti.cnr.it/~nardini/
Re: reason behind a java.io.EOFException
These days we are focusing more on the netty IPC. Can you try -Dgiraph.useNetty=true? Avery On 9/10/12 2:08 PM, Franco Maria Nardini wrote: Dear all, I am working with Giraph 0.2/Hadoop 1.0.3. In particular, I am trying to execute the following code: hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimplePageRankVertex \ -w 2 \ -if org.apache.giraph.examples.SimplePageRankVertex\$SimplePageRankVertexInputFormat -ip bigGraph.txt \ -of org.apache.giraph.io.IdWithValueTextOutputFormat -op output \ -mc org.apache.giraph.examples.SimplePageRankVertex\$HDFSBasedPageRankVertexMasterCompute If I set the number of workers equal to two, one of the mappers produce: ava.lang.RuntimeException: java.io.IOException: Call to zipottero.local/172.20.10.3:30001 failed on local exception: java.io.EOFException at org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:923) at org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:327) at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:604) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:377) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:578) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: Call to zipottero.local/172.20.10.3:30001 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107) at org.apache.hadoop.ipc.Client.call(Client.java:1075) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy3.putVertexList(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:920) ... 11 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749) while it perfectly works if the number of workers is set to 1. I am experiencing the problem both on small and big graphs. Any idea of the reasons behind this behavior? Thanks a lot in advance. Best, FM -- Franco Maria Nardini High Performance Computing Laboratory Istituto di Scienza e Tecnologie dell’Informazione (ISTI) Consiglio Nazionale delle Ricerche (CNR) Via G. Moruzzi, 1 56124, Pisa, Italy Phone: +39 050 315 3496 Fax: +39 050 315 2040 Mail: francomaria.nard...@isti.cnr.it Skype: francomaria.nardini Web: http://hpc.isti.cnr.it/~nardini/
Re: question about Pagerank example
PageRankBenchmark doesn't have an use an output format. If you'd like to see the output, just add a VertexOutputFormat (that matches the types). You could start with JsonBase64VertexOutputFormat. i.e. in PageRankBenchmark.java add job.setVertexOutputFormatClass( JsonBase64VertexOutputFormat.class); On 7/27/12 5:10 PM, Amir R Abdolrashidi wrote: Hi everyone, I am not sure whether this is right question or not but does anyone know if we can see the output of PageRankBenchmark example that is provided on the tuotial? Thanks -Amir
Re: Adding rb to approved email addresses?
I tried adding the from emails to the d...@giraph.apache.org mailing list. Shouldn't that work? On 7/16/12 12:17 PM, Jakob Homan wrote: I don't believe so. The from list seems reasonable on each one: -- Forwarded message -- From: Avery Ching avery.ch...@gmail.com To: Avery Ching avery.ch...@gmail.com Cc: giraph giraph-...@incubator.apache.org, Alessandro Presta alessan...@fb.com On Mon, Jul 16, 2012 at 12:15 PM, Owen O'Malley omal...@apache.org wrote: On Mon, Jul 16, 2012 at 12:02 PM, Jakob Homan jgho...@gmail.com wrote: Anyone know what needs to be done to get the automated messages reviewboard is sending out whitelisted on the dev list? We're getting moderation requests for every one... Usually, if you use reply-all, it will bless that sender. Is each user showing up as a different sender? -- Owen
Re: Suggestions on problem sizes for giraph performance benchmarking
You should try using the appropriate memory settings (i.e. -Dmapred.child.java.opts=-Xms30g -Xmx30g -Xss128k) for a 30 GB heap. This depends on how much memory you can get. Avery On 7/9/12 5:57 AM, Amani Alonazi wrote: Actually, I had the same problem of running out of memory with Giraph when trying to implement strongly connected components algorithm on Giraph. My input graph is 1 million nodes and 7 million edges. I'm using cluster of 21 computers. On Mon, Jul 9, 2012 at 3:44 PM, Benjamin Heitmann benjamin.heitm...@deri.org mailto:benjamin.heitm...@deri.org wrote: Hello Stephen, sorry for the very late reply. On 28 Jun 2012, at 02:50, Fleischman, Stephen (ISS SCI - Plano TX) wrote: Hello Avery and all: I have a cluster of 10 two-processor/48 GB RAM servers, upon which we are conducting Hadoop performance characterization tests. I plan to use the Giraph pagerank and simple shortest path example tests as part of this exercise and would appreciate guidance on problem sizes for both tests. I’m looking at paring down an obfuscated Twitter dataset and it would save a lot of time if someone has some knowledge on roughly how the time and memory scales with number of nodes in a graph. I can provide some suggestions for the kind of algorithm and data which does currently surpass the scalability of giraph. While the limits to my knowledge of Giraph and Hadoop are probably also to blame for this, please see the recent discussions on this list, and on JIRA for other indications that the scalability of Giraph needs improvement: * post by Yuanyuan Tian in the thread wierd communication errors on user@giraph.apache.org mailto:user@giraph.apache.org * GIRAPH-234 about GC overhead https://issues.apache.org/jira/browse/GIRAPH-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel If you want to stretch the limits of Giraph, then you need to try an algorithm which is conceptually different from PageRank, and you need a big data set. If you use an algorithm which has complex application logic (maybe even domain specific logic), which needs to be embedded in the algorithm, then the nodes need to have a lot of state. In addition, such algorithms probably send around a lot of messages, and each of the messages might have a payload which is more complex then one floating point number. In addition, it helps to have a graph format, which requires strings on the edges and vertices. The strings are required for the domain specific business logic which the graph algorithm needs to follow. Finally, imagine a data set which has a big loading time, and where one run of the algorithm only provides results for one user. The standard Hadoop paradigm is to throw away the graph after loading it. So if you have 100s or 1000s of users, then you need a way to execute the algorithm multiple times in parallel. Again this will add a lot of state, as each of the vertices will need to hold one state object for each user who has visited the vertex. In my specific case, I had the following data and algorithm: Data: * an RDF graph with 10 million vertices and 40 million edges I used my own import code to map the RDF graph to a undirected graph with a limit of one edge between any two nodes (so it was not a multi-graph) * each vertex and each edge uses a string as an identity to represent a URI in the RDF graph (required for the business logic in the algorithm) Algorithm: * spreading activation. You can think of it as depth first search guided by domain specific logic. A short introduction here: https://en.wikipedia.org/wiki/Spreading_activation The wikipedia article only mentions using spreading activation on weighted graphs, however I used it on graphs which have additional types on the edges. The whole area of using the semantics of the edges to guide the algorithm is an active research topic, so thats why I can't point you to a good article on that. * parallel execution: I need to run the algorithm once for every user in the system, however loading the data set takes around 15 minutes alone. So each node has an array of states, one for each user for which the algorithm has visited a node. I experimented with user numbers between 30 and 1000, anything more did not work for concurrent execution of the algorithm. Infrastructure: * a single server with 24 Intel Xeon 2.4 GHz cpus and 96 GB of RAM * Hadoop 1.0, pseudo-distributed setup * between 10 and 20 Giraph workers A few weeks ago I stopped work on my Giraph based implementation, as Giraph ran out of memory almost immediately after loading and initialising the data. I made sure that the Giraph workers do not run out of
Apache Giraph BOARD report for 7/25 meeting
Status report for the Apache Giraph project - July 2012 Giraph is a Bulk Synchronous Parallel framework for writing programs that analyze large graphs on a Hadoop cluster. Giraph is similar to Google's Pregel system. Project Status -- Releases: 0.2.0 - expected 7/31 * Reduce memory consumption * Improve support for the Green-Marl project. The transition to being a full Apache project is nearly complete (still a few references to incubator on the website). Community - Activity has picked up on Apache Giraph and more contributors seem to be gaining interest and we had 24 commits for the month of June. We should try to convert some contributors to committers soon. Mailing lists: 116 subscribers on dev 155 subscribers on user
Re: Problem with zookeeper setup
If you're running without a real Hadoop instance, you'll need to blow away the zk directories after running the first time. Hope that helps, Avery On 6/19/12 5:39 PM, Jonathan Bishop wrote: Hi, I am exploring Giraph 0.1 and was able to download, build, and run all the tests - all 58 passed. I can also run the SimpleShortestPathsVertex test using the supplied giraph jar. However, when I copy the java src file into eclipse and build my own jar I get the following error which leads me to believe that something is going wrong with the ZK setup. 12/06/19 17:31:31 INFO mapred.JobClient: Running job: job_201206191708_0003 12/06/19 17:31:32 INFO mapred.JobClient: map 0% reduce 0% 12/06/19 17:32:14 INFO mapred.JobClient: Task Id : attempt_201206191708_0003_m_00_0, Status : FAILED java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 10 tries! at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 10 tries! at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:658) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) ... 7 more attempt_201206191708_0003_m_00_0: log4j:WARN No appenders could be found for logger (org.apache.giraph.zk.ZooKeeperManager). attempt_201206191708_0003_m_00_0: log4j:WARN Please initialize the log4j system properly. BTW, I needed to add the following line to get this to run from my own jar file... job.setJarByClass(SimpleShortestPathsVertex.class) Not sure if that is related but it seems that it will not run without this (it can not find SimpleShortestPathsVertex. Thanks, Jon Bishop
Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...
We did have a related issue (https://issues.apache.org/jira/browse/GIRAPH-155). On 5/29/12 6:54 AM, Claudio Martella wrote: I'm not sure they will be needed to send them on the first superstep. They'll be created and used in the second superstep if necessary. If they need it in the first superstep, then i guess they'll put them as a line in the inputfile. I agree with you that this is kind of messed up :) On Tue, May 29, 2012 at 3:23 PM, Sebastian Schelters...@apache.org wrote: Oh sorry, I didn't know that discussion. The problem I see is that in every implementation, a user might run into this issue, and I don't think its ideal to force users to always run a round of sending empty messages at the beginning. Maybe the system should (somehow) automagically do that for the users? Really seems to be an awkward situation though... --sebastian On 29.05.2012 15:03, Claudio Martella wrote: About the mapreduce job to prepare the inputset, I did advocate for this solution instead of supporting automatic creation of non-existent vertices implicitly (which I believe adds a logical path in vertex resolution which has some drawbacks e.g you have to check in the hashmap for the existence of the destination vertex for each message, which is fine now that it's a hashmap, but it's going to be less fine when/if we turn to TreeMap for out-of-core). Unfortunately the other committers preferred going for the path that helps userland's life, so I guess this solution is not to be considered here either. On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelters...@apache.org wrote: On 29.05.2012 13:13, Paolo Castagna wrote: Hi Sebastian Sebastian Schelter wrote: Why do you only recompute the pageRank in each second superstep? Can we not use the aggregated value of the dangling nodes from the last superstep? I removed the computing of PageRank values every each second superstep. However, I needed to use a couple of aggregators for the dangling nodes contribution instead of just one: dangling-current and dangling-previous. Each superstep, I need to reset the dangling-current aggregator, at the same time, I need to know the value of the aggregator at a previous superstep. You can save the value from the previous step in a static variable in the WorkerContext before resetting the aggregator. I hope it makes sense, let me know if you have a better idea. Overall I think we're on a good way to a robust, real-world PageRank implementation, I managed to implement the convergence check with an aggregator, will post an updated patch soon. I think I've just done it, have a look [1] and let me know if you would have done it differently. Paolo [1] https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
Re: Giraph on Hadoop 2.0.0-alpha
Did you compile with the appropriate flags? From the README: - Apache Hadoop 0.23.1 You may tell maven to use this version with mvn -Phadoop_0.23 goals. On 5/25/12 9:24 AM, Roman Shaposhnik wrote: Hi! I'm trying to run Giraph trunk on top of Hadoop 2.0.0 and I'm getting the following error while submitting an example job: $ hadoop jar /usr/lib/giraph/giraph-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -V 10 -w 3 Now, if I look at the state of HDFS right after the job fails I see that the job has created file structure all the way up to _bsp/_defaultZkManagerDir/job_1337959594450_0002/ I even see _bsp/_defaultZkManagerDir/job_1337959594450_0002/zkServerList_ahmed-laptop 0 so it is unlikely to be file permission problems or anything like that. Could you, please, suggest a way to debug it from here? Oh, and here's the exception I'm getting: 2012-05-25 08:31:34,335 INFO [IPC Server handler 16 on 33249] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1337959594450_0002_m_01_3: Error: java.lang.RuntimeException: java.io.FileNotFoundException: File _bsp/_defaultZkManagerDir/job_1337959594450_0002/_zkServer does not exist. at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:748) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:424) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:645) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147) Caused by: java.io.FileNotFoundException: File _bsp/_defaultZkManagerDir/job_1337959594450_0002/_zkServer does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365) at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:707) ... 9 more Thanks, Roman.