[ANNOUNCE] Apache Giraph 1.0.0 released

2013-05-06 Thread Avery Ching
The Apache Giraph team is proud to announce our first release out of 
incubation.  The release version is 1.0.0 to reflect a lot of hard work 
that went into making the code stable enough for production use, memory 
efficient, and performant.


Apache Giraph is an scalable and distributed iterative graph processing 
system that is inspired by BSP (bulk synchronous parallel) and Google's 
Pregel.  Giraph distinguishes itself from those projects by being 
open-source, running on Hadoop infrastructure, and going beyond the 
Pregel model with features such as master computation, sharded 
aggregators, out-of-core support, no single point of failure design, and 
more.


Here are some highlights of the release:

* Scales out to hundreds of machines easily and hundreds of billions of 
edges (memory permitting)
* Efficient use of memory via fast byte-based serialization by default 
and can use primitive specific types when better performance is required
* Multithreaded input and computation can take advantage of multicore 
machines efficiently

* Simplified vertex API
* Vertex-based and/or edge-based input supported
* Master compute API for handling application-wide logic
* Sharded aggregators for handling large (memory) aggregators
* Easy access to/from Hive tables to integrate with your data warehouse
* Out-of-core graph and messaging support
* YARN support

For release details and download access, please visit:
http://giraph.apache.org/releases.html

Thanks so much to everyone for all their contributions.  It is you who 
made this release possible!  We've also been investing in updating our 
website as part of this release (http://giraph.apache.org), more 
documentation/updates will be coming in the near future.  We expect that 
releases will happen more frequently in the future now that we are more 
familiar with the process.


Regards,

The Apache Giraph team


Giraph and Fair Scheduler

2013-05-06 Thread Ramani, Arun
Hi,

I am running Fair scheduler with many applications in hadoop stack in my 
cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and 
want to run giraph along with those other applications. I have configured 
pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs 
submitted to this pool to wait to get the min share).

I am trying to run giraph in this mode. I see that jobs from other pools are 
getting pre-empted to give the giraph job's pool its configured min share but 
my job fails with "Unable to create native thread" error. This same job passes 
if the slots are available immediately without having to wait for the tasks 
from other queues to be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in 
this scenario.

Basically, I wanted to know how to configure giraph to wait for a threshold for 
the slots to be available for it through pre-emption.

Thanks
Arun Ramani


TestJsonBase64Format failure on 1.0.0

2013-05-06 Thread Kiru Pakkirisamy
I got over my compilation issues (thanks - @Avery, @Roman).Now, I am trying to 
run the test and one pearticular test is failing.I want to get to the bottom of 
this, because I am unable to run the PageRank example. Maybe, it is because I 
have only on tasktracker (?) (Apache pseudo-cluster on my ubuntu laptop).
Regards,- kiru
2013-05-06 09:12:32,197 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
(MAP) 'attempt_201305052325_0013_m_03_0' to tip 
task_201305052325_0013_m_03, for tracker 
'tracker_kiru-N53SV:localhost/127.0.0.1:42265'2013-05-06 09:12:35,198 INFO 
org.apache.hadoop.mapred.TaskInProgress: Error from 
attempt_201305052325_0013_m_02_0: java.lang.IllegalStateException: run: 
Caught an unrecoverable exception waitFor: ExecutionException occurred while 
waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3   
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)  at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)   at 
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
java.security.AccessController.doPrivileged(Native Method)   at 
javax.security.auth.Subject.doAs(Subject.java:396)   at
 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)Caused by: 
java.lang.IllegalStateException: waitFor: ExecutionException occurred while 
waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3   
   at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)   
 at 
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
at 
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
 at 
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
   at 
org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
 at 
org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
at 
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)   at
 org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) ... 7 
moreCaused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalStateException: call: IOExceptionat 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)   at 
java.util.concurrent.FutureTask.get(FutureTask.java:91)  at 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
 at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)   
 ... 15 moreCaused by: java.lang.IllegalStateException: call: IOException   
 at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:172) 
 at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)  
 at 
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   at
 java.util.concurrent.FutureTask.run(FutureTask.java:138)   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
 at java.lang.Thread.run(Thread.java:662)Caused by: 
java.io.FileNotFoundException: /tmp/_giraphTests/testContinue/_logs (Is a 
directory) at java.io.FileInputStream.open(Native Method)  at 
java.io.FileInputStream.(FileInputStream.java:120) at 
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
   at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
   at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) 
   at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:126)
   at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) 
   at
 org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)  at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
  at 
org.apache.giraph.io.formats.TextVertexInputFormat$TextVertexReader.initialize(TextVertexInputFormat.java:96)
at 
org.apache.giraph.io.formats.JsonBase64VertexInputFormat$JsonBase64VertexReader.initialize(JsonBase64VertexInputFormat.java:71)
  at 
org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:120)
at 
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) 
 ... 7 more
2013-05-06 09:22:44,485 INFO org.apache.hadoop.mapred.TaskInProgress:

Re: Compiling 1.0.0 distribution

2013-05-06 Thread Kiru Pakkirisamy
Yes, I am trying to run on my Ubuntu laptop. Let me look at the log files. 
Thanks for the help. Much appreciated.

Regards,

- kiru



Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On Sun, 5/5/13, Avery Ching  wrote:

From: Avery Ching 
Subject: Re: Compiling 1.0.0 distribution
To: user@giraph.apache.org
Cc: "Kiru Pakkirisamy" 
Date: Sunday, May 5, 2013, 11:51 PM


  


  
  
My guess is that you don't have enough
  workers to run the job and the master kills the job (i.e. are you
  running on a single machine setup?).  You can try to run first
  with one worker (this will take 2 map slots - one for the master
  and one for the worker).  You can also look at the logs from map
  task 0 to see more clearly what the error was.

  

  Avery

  

  On 5/5/13 11:16 PM, Kiru Pakkirisamy wrote:



  

  
Yup, I did a mvn3
  install and then a mvn3 compile to get around that
  already.
  Right now, I am trying to run the PageRank, even
after a few runs I have not had one successful run . The
maps progress decreases in percentage (second time
around) !! I have never seen this before (?)



Regards,

- kiru



Kiru Pakkirisamy | webcloudtech.wordpress.com



--- On Sun, 5/5/13, Roman Shaposhnik 
wrote:



  From: Roman Shaposhnik 

  Subject: Re: Compiling 1.0.0 distribution

  To: user@giraph.apache.org

  Date: Sunday, May 5, 2013, 10:50 PM

  

  To pile on top of that -- you
can also run mvn -pl module-name from the top
level to short-circuit the build to that module
  (and yet still honor the dependencies).



Thanks,

  Roman.

  

On Sun, May
  5, 2013 at 10:44 PM, Avery Ching 
  wrote:

  

  The easiest way is to compile from
the base directory, which will build
everything.



You can build individual directories,
but you have to install the core jars
first (i.e. go to giraph-core and do
'mvn clean install').  Then you can
build the directory of your choice.



Hope that helps,



Avery



On 5/5/13 11:11 AM, Kiru Pakkirisamy
wrote:

  
  

  

  Hi,
I am unable to compile
  giraph-examples because it is
  not able to reach the core jar
  files on the repo. Why doesn't
  it pick it up from the root
  build dir ?

  

  Regards,

  - kiru

  

  Kiru Pakkirisamy | 
webcloudtech.wordpress.com