[ANNOUNCE] Apache Giraph 1.0.0 released
The Apache Giraph team is proud to announce our first release out of incubation. The release version is 1.0.0 to reflect a lot of hard work that went into making the code stable enough for production use, memory efficient, and performant. Apache Giraph is an scalable and distributed iterative graph processing system that is inspired by BSP (bulk synchronous parallel) and Google's Pregel. Giraph distinguishes itself from those projects by being open-source, running on Hadoop infrastructure, and going beyond the Pregel model with features such as master computation, sharded aggregators, out-of-core support, no single point of failure design, and more. Here are some highlights of the release: * Scales out to hundreds of machines easily and hundreds of billions of edges (memory permitting) * Efficient use of memory via fast byte-based serialization by default and can use primitive specific types when better performance is required * Multithreaded input and computation can take advantage of multicore machines efficiently * Simplified vertex API * Vertex-based and/or edge-based input supported * Master compute API for handling application-wide logic * Sharded aggregators for handling large (memory) aggregators * Easy access to/from Hive tables to integrate with your data warehouse * Out-of-core graph and messaging support * YARN support For release details and download access, please visit: http://giraph.apache.org/releases.html Thanks so much to everyone for all their contributions. It is you who made this release possible! We've also been investing in updating our website as part of this release (http://giraph.apache.org), more documentation/updates will be coming in the near future. We expect that releases will happen more frequently in the future now that we are more familiar with the process. Regards, The Apache Giraph team
Giraph and Fair Scheduler
Hi, I am running Fair scheduler with many applications in hadoop stack in my cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and want to run giraph along with those other applications. I have configured pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs submitted to this pool to wait to get the min share). I am trying to run giraph in this mode. I see that jobs from other pools are getting pre-empted to give the giraph job's pool its configured min share but my job fails with "Unable to create native thread" error. This same job passes if the slots are available immediately without having to wait for the tasks from other queues to be pre-empted. I also tried to tweak the "giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in this scenario. Basically, I wanted to know how to configure giraph to wait for a threshold for the slots to be available for it through pre-emption. Thanks Arun Ramani
TestJsonBase64Format failure on 1.0.0
I got over my compilation issues (thanks - @Avery, @Roman).Now, I am trying to run the test and one pearticular test is failing.I want to get to the bottom of this, because I am unable to run the PageRank example. Maybe, it is because I have only on tasktracker (?) (Apache pseudo-cluster on my ubuntu laptop). Regards,- kiru 2013-05-06 09:12:32,197 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201305052325_0013_m_03_0' to tip task_201305052325_0013_m_03, for tracker 'tracker_kiru-N53SV:localhost/127.0.0.1:42265'2013-05-06 09:12:35,198 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201305052325_0013_m_02_0: java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249)Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3 at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111) at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73) at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192) at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276) at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323) at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) ... 7 moreCaused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: call: IOExceptionat java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232) at java.util.concurrent.FutureTask.get(FutureTask.java:91) at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271) at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143) ... 15 moreCaused by: java.lang.IllegalStateException: call: IOException at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:172) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)Caused by: java.io.FileNotFoundException: /tmp/_giraphTests/testContinue/_logs (Is a directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120) at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107) at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:126) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) at org.apache.giraph.io.formats.TextVertexInputFormat$TextVertexReader.initialize(TextVertexInputFormat.java:96) at org.apache.giraph.io.formats.JsonBase64VertexInputFormat$JsonBase64VertexReader.initialize(JsonBase64VertexInputFormat.java:71) at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:120) at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) ... 7 more 2013-05-06 09:22:44,485 INFO org.apache.hadoop.mapred.TaskInProgress:
Re: Compiling 1.0.0 distribution
Yes, I am trying to run on my Ubuntu laptop. Let me look at the log files. Thanks for the help. Much appreciated. Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com --- On Sun, 5/5/13, Avery Ching wrote: From: Avery Ching Subject: Re: Compiling 1.0.0 distribution To: user@giraph.apache.org Cc: "Kiru Pakkirisamy" Date: Sunday, May 5, 2013, 11:51 PM My guess is that you don't have enough workers to run the job and the master kills the job (i.e. are you running on a single machine setup?). You can try to run first with one worker (this will take 2 map slots - one for the master and one for the worker). You can also look at the logs from map task 0 to see more clearly what the error was. Avery On 5/5/13 11:16 PM, Kiru Pakkirisamy wrote: Yup, I did a mvn3 install and then a mvn3 compile to get around that already. Right now, I am trying to run the PageRank, even after a few runs I have not had one successful run . The maps progress decreases in percentage (second time around) !! I have never seen this before (?) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com --- On Sun, 5/5/13, Roman Shaposhnik wrote: From: Roman Shaposhnik Subject: Re: Compiling 1.0.0 distribution To: user@giraph.apache.org Date: Sunday, May 5, 2013, 10:50 PM To pile on top of that -- you can also run mvn -pl module-name from the top level to short-circuit the build to that module (and yet still honor the dependencies). Thanks, Roman. On Sun, May 5, 2013 at 10:44 PM, Avery Ching wrote: The easiest way is to compile from the base directory, which will build everything. You can build individual directories, but you have to install the core jars first (i.e. go to giraph-core and do 'mvn clean install'). Then you can build the directory of your choice. Hope that helps, Avery On 5/5/13 11:11 AM, Kiru Pakkirisamy wrote: Hi, I am unable to compile giraph-examples because it is not able to reach the core jar files on the repo. Why doesn't it pick it up from the root build dir ? Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com