Re: [RESULT] [VOTE] Apache Giraph 1.1.0 RC2

2014-11-18 Thread Avery Ching

Thanks for pushing this though Roman. Looks great!

On 11/18/14, 4:30 AM, Roman Shaposhnik wrote:

Hi!

with 3 binding +1, one non-binding +1,
no 0s or -1s the vote to publish
Apache Giraph  1.1.0 RC2 as the 1.1.0 release of
Apache Giraph passes. Thanks to everybody who
spent time on validating the bits!

The vote tally is
   +1s:
   Claudio Martella (binding)
   Maja Kabiljo (binding)
   Eli Reisman (binding)
   Roman Shaposhnik  (non-binding)

I'll do the publishing tonight and will send an announcement!

Thanks,
Roman (AKA 1.1.0 RM)

On Thu, Nov 13, 2014 at 5:28 AM, Roman Shaposhnik ro...@shaposhnik.org wrote:

This vote is for Apache Giraph, version 1.1.0 release

It fixes the following issues:
   http://s.apache.org/a8X

*** Please download, test and vote by Mon 11/17 noon PST

Note that we are voting upon the source (tag):
release-1.1.0-RC2

Source and binary files are available at:
http://people.apache.org/~rvs/giraph-1.1.0-RC2/

Staged website is available at:
http://people.apache.org/~rvs/giraph-1.1.0-RC2/site/

Maven staging repo is available at:
https://repository.apache.org/content/repositories/orgapachegiraph-1003

Please notice, that as per earlier agreement two sets
of artifacts are published differentiated by the version ID:
   * version ID 1.1.0 corresponds to the artifacts built for
  the hadoop_1 profile
   * version ID 1.1.0-hadoop2 corresponds to the artifacts
  built for hadoop_2 profile.

The tag to be voted upon (release-1.1.0-RC1):
   
https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=log;h=refs/tags/release-1.1.0-RC2

The KEYS file containing PGP keys we use to sign the release:
http://svn.apache.org/repos/asf/bigtop/dist/KEYS

Thanks,
Roman.




Re: YARN vs. MR1: is YARN a good idea?

2014-10-06 Thread Avery Ching
Theoretically, Giraph on YARN would be much better (actual resource 
request rather than mapper hack). That being said, Eli is the best 
person to talk about that.  We haven't tried YARN.


Avery

On 10/6/14, 8:51 AM, Matthew Cornell wrote:

Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
Thank you.





Re: How local worker knows destination worker?

2014-08-08 Thread Avery Ching
Take a look at the interfaces for MasterGraphPartitioner and 
WorkerGraphPartitioner and their implementations for hash parititoning 
(HashRangePartitionerFactory).  You can implement any kind of 
partitioning you like.


Avery

On 8/8/14, 7:51 AM, Robert McCune wrote:
For a non-hash partitioning, how does a worker know which destination 
worker to send a remote message to?


In the Pregel paper, with hash partitioning, a worker can know the 
destination worker just by hashing the destination vertex ID


But for any non-trivial partitioning, how does a worker know where to 
send a remote message?


Welcome any references to the literature.

Thank you




Re: Introducing Graft: A debugging and testing tool for Giraph algorithms

2014-06-04 Thread Avery Ching
I'm seen this work demoed.  It's awesome, especially for applications 
that are not very predictable.


Avery

On 6/4/14, 11:00 AM, Semih Salihoglu wrote:


Hi Giraph Users,

I wanted to introduce to you Graft, a project that some of us at 
Stanford have built over the last quarter. If you are a Giraph user, 
who ran into an annoying bug in which the code was throwing an 
exception or resulting in incorrect looking messages or vertex values 
(e.g. NaNs or NullPointerExceptions) and you had put in println 
statements into your compute() functions, and then inspect logs of 
Hadoop workers for debugging purposes, you should read on. You might 
find Graft very useful.


In a nutshell, Graft is based on the idea of /capturing /the contexts 
under which a bug becomes noticeable (an exception is thrown or an 
incorrect message is sent, or a vertex is assigned an incorrect value) 
programmatically. The captured contexts can then be /visualized/ 
through a GUI. The contexts that a user thinks could be helpful for 
catching the bug can then be /reproduced/ in a compilable program and 
the user can then use his/her favorite IDE's debugger to do 
step-by-step debugging into the context. For example, when a vertex 
/v/ throws an exception, the user can reproduce the context under 
which /v/ throws the exception and then use (say) Eclipse to do 
step-by-step debugging to see exactly what lines were executed that 
resulted in the exception being thrown.


On the testing side, Graft makes it easier to generate unit and 
end-to-end tests by letting users curate small graphs through its 
GUI's testing mode, and then generates code snippets which can be 
copied and pasted into a JUnit test.


The project is still under development but interested users can start 
using it. We have a wiki with documentation and instructions on how to 
install and use Graft: https://github.com/semihsalihoglu/graft/wiki. 
Since the project is under development, we'd highly appreciate users 
to start using it and giving us direction on how to make it more 
useful. Our emails are on the documentation page. We also encourage 
interested developers to contribute to it if there are requested 
features that we don't get to very quickly.


Just a small note: Graft works for the Giraph at trunk: 
https://github.com/apache/giraph/tree/trunk. We do not support earlier 
version. In particular your programs need to be written by extending 
Computation and optionally the Master class, instead of the older 
Vertex class.


Best,

semih





Re: Giraph keeps trying to connect to 9000 on Hadoop 2.2.0/YARN

2014-06-01 Thread Avery Ching
Giraph should just pick up your cluster's HDFS configuration.  Can you 
check your hadoop *.xml files?


On 6/1/14, 3:34 AM, John Yost wrote:

Hi Everyone,

Not sure why, but Giraph tries to connect to port 9000:

java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 
http://127.0.0.1 to localhost:9000 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see: 
http://wiki.apache.org/hadoop/ConnectionRefused

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

I set the following in the Giraph configuration:

 GiraphConstants.IS_PURE_YARN_JOB.set(conf,true);
 conf.set(giraph.useNetty,true);
 conf.set(giraph.zkList,localhost.localdomain);
conf.set(fs.defaultFS,hdfs://localhost.localdomain:8020)
 conf.set(mapreduce.job.tracker,localhost.localdomain:54311);
 conf.set(mapreduce.framework.name 
http://mapreduce.framework.name,yarn);

 conf.set(yarn.resourcemanager.address,localhost.localdomain:8032);

I built Giraph as follows:

mvn -DskipTests=true -Dhadoop.version=2.2.0 -Phadoop_yarn clean install

Any ideas as to why Giraph attempts to connect to 9000 instead of 8020?

--John






Re: Errors while running large graph

2014-05-27 Thread Avery Ching

You might also want to check the zookeeper memory options.

Some of our production jobs use parameters such as

-Xmx5g -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=70 -XX:MaxGCPauseMillis=100


Since the master doesn't use much memory letting zk have more is reasonable.

On 5/27/14, 9:25 AM, Praveen kumar s.k wrote:

Hi All,
I am getting several errors consistently while processing large graph.
The code works when the size of the graph is in terms of GB's.
we have implemented compression and removing the dead end nodes in de
Bruijn graph
My cluster settings are

Cores WorkersRAM/Core  GraphsizeAggregateRAM
252 250  10.5 GB  2.3 TB2.6 TB

Below are the type of errors I am getting.

1.  I believe that this error occurred because of zookeeper session
expired. To address this I changed the parameter minSessionTimeout in
configuration to large value. However some workers still throw this
error.

2014-05-27 00:19:55,187 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg = java.lang.Il$
java.lang.IllegalStateException: java.lang.IllegalStateException:
Failed to create job state path due to KeeperException
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:185)
Caused by: java.lang.IllegalStateException: Failed to create job state
path due to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at 
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201405262302_0003/_masterJobState
 at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
 ... 2 more

2. I dont know why this below error is thrown. My guess is that,
master worker is failing for some reason

2014-05-27 00:19:55,184 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with IllegalStateException
java.lang.IllegalStateException: Failed to create job state path due
to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at 
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201405262302_0003/_masterJobState
 at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
 ... 2 more

3. Below is one more type of error
java.lang.IllegalStateException: Failed to create job state path due
to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at 
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201405261249_0008/_masterJobState
 at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
 ... 2 more
2014-05-26 18:19:54,269 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg = java.lang.Il$
java.lang.IllegalStateException: java.lang.IllegalStateException:
Failed to create job state path due to KeeperException
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:185)
Caused by: java.lang.IllegalStateException: Failed to create job state
path due to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at 

Re: Errors while running large graph

2014-05-27 Thread Avery Ching

*giraph.zkJavaOpts*

On 5/27/14, 10:27 AM, Praveen kumar s.k wrote:

Do need to put this in the zookeeper configuration file or giraph job
configuration?

On Tue, May 27, 2014 at 12:14 PM, Avery Chingach...@apache.org  wrote:

You might also want to check the zookeeper memory options.

Some of our production jobs use parameters such as

-Xmx5g -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:MaxGCPauseMillis=100

Since the master doesn't use much memory letting zk have more is reasonable.


On 5/27/14, 9:25 AM, Praveen kumar s.k wrote:

Hi All,
I am getting several errors consistently while processing large graph.
The code works when the size of the graph is in terms of GB's.
we have implemented compression and removing the dead end nodes in de
Bruijn graph
My cluster settings are

Cores WorkersRAM/Core  GraphsizeAggregateRAM
252 250  10.5 GB  2.3 TB2.6 TB

Below are the type of errors I am getting.

1.  I believe that this error occurred because of zookeeper session
expired. To address this I changed the parameter minSessionTimeout in
configuration to large value. However some workers still throw this
error.

2014-05-27 00:19:55,187 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg = java.lang.Il$
java.lang.IllegalStateException: java.lang.IllegalStateException:
Failed to create job state path due to KeeperException
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:185)
Caused by: java.lang.IllegalStateException: Failed to create job state
path due to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201405262302_0003/_masterJobState
 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at
org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
 ... 2 more

2. I dont know why this below error is thrown. My guess is that,
master worker is failing for some reason

2014-05-27 00:19:55,184 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with IllegalStateException
java.lang.IllegalStateException: Failed to create job state path due
to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201405262302_0003/_masterJobState
 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at
org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
 ... 2 more

3. Below is one more type of error
java.lang.IllegalStateException: Failed to create job state path due
to KeeperException
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:679)
 at
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
 at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201405261249_0008/_masterJobState
 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at
org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
 at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
 ... 2 more
2014-05-26 18:19:54,269 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg = java.lang.Il$
java.lang.IllegalStateException: java.lang.IllegalStateException:
Failed to create job state path due to KeeperException
 at 

Re: Error while executing large graph

2014-05-15 Thread Avery Ching

I think this is the key message.

0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, 
average 11.56MB


Having less than 1 MB free won't work.  Your workers are likely OOM, 
killing the job.  Can you get more memory for your job?


On 5/14/14, 3:13 AM, Arun Kumar wrote:
Hi when i run giraph job against a data of 1 gb i am getting the below 
exception after some times can somebody tell me what is the issue?
14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers - 
Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196 
partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB
14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional 
data from server sessionid 0x145f9cff031000f, likely server has closed 
socket, closing socket connection and attempting reconnect
14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on 
attempt 0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/_hadoopBsp/job_201405140108_0003/_workerProgresses
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at 
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)

at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

14/05/14 01:54:09 INFO mapred.JobClient:  map 93% reduce 0%
14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
http://mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server 

Re: How to schedule a Giraph job.

2014-04-26 Thread Avery Ching
You can schedule a GIraph job with any MapReduce job scheduler (it is 
just a map-only job).


On 4/26/14, 4:30 AM, yeshwanth kumar wrote:

hi i am looking for Giraph job Scheduler just like oozie.

can we schedule a Giraph job using oozie

-yeshwanth.




Please welcome our newest PMC member, Maja!

2014-04-22 Thread Avery Ching
Maja has been working on Giraph for over a year and is one of our 
biggest contributors.  Adding her to the Giraph PMC in recognition of 
her impressive work is long overdue.


Some of her major contributions include composable computation, sharded 
aggregators, Hive I/O, support for massive messages, as well as lots of 
bugfixes and code reviews.  She has also been responsible for several 
major performance improvements (i.e. message store specialization, 
message cache improvements, etc.).  We are very lucky to have her 
working with us on this project.


Avery


Blogpost: Large-scale graph partitioning with Apache Giraph

2014-04-22 Thread Avery Ching

Hi Giraphers,

Recently, a few internal Giraph users at Facebook published a really 
cool blog post on how we partition huge graphs (1.15 billion people and 
150 billion friendships - 300B directed edges).


https://code.facebook.com/posts/274771932683700/large-scale-graph-partitioning-with-apache-giraph/

Avery


New committer: Pavan Kumar

2014-04-22 Thread Avery Ching
The Project Management Committee (PMC) for Apache Giraphhas asked Pavan 
Kumar to become a committer and we are pleased to announce that he 
hasaccepted. Here are some of Pavan's contributions:


GIRAPH-858: tests fail for hadoop_facebook because of dependency issues
(pavanka via aching)

GIRAPH-854: fix for test fail due to GIRAPH-840 (pavanka via majakabiljo)

GIRAPH-840: Upgrade to netty 4 (pavanka via majakabiljo)

GIRAPH-843: remove rexter from hadoop_facebook profile (pavanka via aching)

GIRAPH-838: setup time  total time counter also include time spent 
waiting for machines (pavanka via majakabiljo)


GIRAPH-839: NettyWorkerAggregatorRequestProcessor tries to reuse request 
objects (pavanka via majakabiljo)


GIRAPH-830: directMemory used in netty message (pavanka via aching)

GIRAPH-823: upgrade hiveio to version 0.21 from olderversion 0.20 
(pavanka via majakabiljo)


GIRAPH-821: proper handling of NegativeArraySizeException for all 
ByteArray backed messagestores (pavanka via majakabiljo)


GIRAPH-820: add a configuration option to skip creating source vertices 
present only in edge input (pavanka via majakabiljo)


Pavan has been actively writing and reviewing code. His Netty4 upgrade 
brought a HUGE performance improvement to Giraph trunk (everyone should 
try it out!).  We are very excited to have Pavan take a larger role in 
the Giraph community!


Thanks Pavan,
Avery


Re: how to change graph

2014-04-16 Thread Avery Ching
Yes, this is one of the great things about Giraph (not many other graph 
computation frameworks allow graph mutation).  See the Computation class 
(i.e.)


  /**
   * Sends a request to create a vertex that will be available during the
   * next superstep.
   *
   * @param id Vertex id
   * @param value Vertex value
   * @param edges Initial edges
   */
  void addVertexRequest(I id, V value, OutEdgesI, E edges) throws 
IOException;


  /**
   * Sends a request to create a vertex that will be available during the
   * next superstep.
   *
   * @param id Vertex id
   * @param value Vertex value
   */
  void addVertexRequest(I id, V value) throws IOException;

  /**
   * Request to remove a vertex from the graph
   * (applied just prior to the next superstep).
   *
   * @param vertexId Id of the vertex to be removed.
   */
  void removeVertexRequest(I vertexId) throws IOException;

  /**
   * Request to add an edge of a vertex in the graph
   * (processed just prior to the next superstep)
   *
   * @param sourceVertexId Source vertex id of edge
   * @param edge Edge to add
   */
  void addEdgeRequest(I sourceVertexId, EdgeI, E edge) throws 
IOException;


  /**
   * Request to remove all edges from a given source vertex to a given 
target

   * vertex (processed just prior to the next superstep).
   *
   * @param sourceVertexId Source vertex id
   * @param targetVertexId Target vertex id
   */
  void removeEdgesRequest(I sourceVertexId, I targetVertexId)
throws IOException;


On 4/16/14, 7:23 AM, Akshay Trivedi wrote:

Hi,
I wanted to do some computation on graph and delete some edges between
supersteps. Can this be done using giraph?? I have heard of
MutableVertex class but I dont know whether it can be used to delete
edges. Also is MutableVertex abstract class and has to be
implemented??

Regards,
Akshay




Re: how to change graph

2014-04-16 Thread Avery Ching

They should all be implemented. =)

On 4/16/14, 9:32 PM, Akshay Trivedi wrote:

Does removeVertexRequest(I vertexId) have to be implemented? Is there
any pre-defined class for this?

On Wed, Apr 16, 2014 at 8:33 PM, Avery Ching ach...@apache.org wrote:

Yes, this is one of the great things about Giraph (not many other graph
computation frameworks allow graph mutation).  See the Computation class
(i.e.)

   /**
* Sends a request to create a vertex that will be available during the
* next superstep.
*
* @param id Vertex id
* @param value Vertex value
* @param edges Initial edges
*/
   void addVertexRequest(I id, V value, OutEdgesI, E edges) throws
IOException;

   /**
* Sends a request to create a vertex that will be available during the
* next superstep.
*
* @param id Vertex id
* @param value Vertex value
*/
   void addVertexRequest(I id, V value) throws IOException;

   /**
* Request to remove a vertex from the graph
* (applied just prior to the next superstep).
*
* @param vertexId Id of the vertex to be removed.
*/
   void removeVertexRequest(I vertexId) throws IOException;

   /**
* Request to add an edge of a vertex in the graph
* (processed just prior to the next superstep)
*
* @param sourceVertexId Source vertex id of edge
* @param edge Edge to add
*/
   void addEdgeRequest(I sourceVertexId, EdgeI, E edge) throws IOException;

   /**
* Request to remove all edges from a given source vertex to a given
target
* vertex (processed just prior to the next superstep).
*
* @param sourceVertexId Source vertex id
* @param targetVertexId Target vertex id
*/
   void removeEdgesRequest(I sourceVertexId, I targetVertexId)
 throws IOException;



On 4/16/14, 7:23 AM, Akshay Trivedi wrote:

Hi,
I wanted to do some computation on graph and delete some edges between
supersteps. Can this be done using giraph?? I have heard of
MutableVertex class but I dont know whether it can be used to delete
edges. Also is MutableVertex abstract class and has to be
implemented??

Regards,
Akshay






Re: Child processes still running after successful job

2014-04-11 Thread Avery Ching
Corona - 
https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920


On 4/11/14, 8:14 AM, chadi jaber wrote:


Hi avery

What do you mean by your version of hadoop ?

Best regards,
Chadi

 Date: Fri, 11 Apr 2014 07:35:44 -0700
 From: ach...@apache.org
 To: user@giraph.apache.org
 Subject: Re: Child processes still running after successful job

 Unfortunately we don't face this issue since our version of Hadoop 
kills

 processes after a job is complete. If you can do a jstack, you can
 probably figure out where this is hanging and submit a patch to fix it.

 On 4/11/14, 4:24 AM, Yi Lu wrote:
  HI Chadi,
 
  I also have this problem, my solution is to write a python script to
  kill the process on each slave machine which consumes lots of 
memory.:)

 
  I hope there is a better solution.
  ​





Re: PageRank on custom input

2014-04-09 Thread Avery Ching

Hi Vikesh,

You just need to write an input format or use an existing one. You can 
specify any number and combination of VertexInputFormat and 
EdgeInputFormat formats as per your needs.


Please see giraph-core/src/main/java/org/apache/giraph/io/formats for 
some examples.


Avery

On 4/7/14, 9:57 PM, Vikesh Khanna wrote:

Hi,

We want to run a PageRank job (similar to PageRankBenchmark) for 
custom input graph. Is there an example for this? Giraph's website has 
a page for this but it is incomplete - 
http://giraph.apache.org/pagerank.html


Thanks,
Vikesh Khanna,
Masters, Computer Science (Class of 2015)
Stanford University






Re: voteToHalt vs removeVertexRequest

2014-04-07 Thread Avery Ching
Pretty much.  But when you remove the vertex, you won't be able to dump 
its output (not that all applications need to).


Avery

On 4/7/14, 9:38 AM, Liannet Reyes wrote:

Hi,

Because of my algorithm I am able to detect when a vertex won't be 
used anymore, what would be more accurate : voteToHalt or removeVertex.


I imagine that removing the vertexes I can free some memory and 
although it have some cost in execution time that is not a big deal as 
the graph is smaller each time. Am I right?


Regards,

Liannet






Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

2014-04-03 Thread Avery Ching
My guess is that you don't get your resources.  It would be very helpful 
to print the master log.  You can find it when the job is running to 
look at the Hadoop counters on the job UI page.


Avery

On 4/3/14, 12:49 PM, Vikesh Khanna wrote:

Hi,

I am running the PageRank benchmark under giraph-examples from 
giraph-1.0.0 release. I am using the following command to run the job 
(as mentioned here 
https://cwiki.apache.org/confluence/display/GIRAPH/Quick+Start+Guide)


vikesh@madmax 
/lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples 
$ $HADOOP_HOME/bin/hadoop jar 
$GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000 
-w 30



However, the job gets stuck at map 9% and is eventually killed by the 
JobTracker on reaching the mapred.task.timeout (default 10 minutes). I 
tried increasing the timeout to a very large value, and the job went 
on for over 8 hours without completion. I also tried the 
ShortestPathsBenchmark, which also fails the same way.



Any help is appreciated.


**  ***


*Machine details:*

Linux version 2.6.32-279.14.1.el6.x86_64 
(mockbu...@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red 
Hat 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012


Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 1
Core(s) per socket: 8
CPU socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Stepping: 2
CPU MHz: 1064.000
BogoMIPS: 5333.20
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 1-8
NUMA node1 CPU(s): 9-16
NUMA node2 CPU(s): 17-24
NUMA node3 CPU(s): 25-32
NUMA node4 CPU(s): 0,33-39
NUMA node5 CPU(s): 40-47
NUMA node6 CPU(s): 48-55
NUMA node7 CPU(s): 56-63


I am using a pseudo-distributed Hadoop cluster on a single machine 
with 64-cores.



*-***


Thanks,
Vikesh Khanna,
Masters, Computer Science (Class of 2015)
Stanford University






Re: GSoC 2014 - Strongly Connected Components

2014-03-17 Thread Avery Ching

I think this would be great.  Thanks Mirko.

Avery

On 3/16/14, 12:26 PM, Gianluca Righetto wrote:

Hi,

Thank you both for your comments and support.
Mirko, I'm glad you'd like to be the mentor of this project, we just 
need to confirm this is OK with GSoC and Apache, just to avoid any 
issues down the road.

Avery, what do you think about this?

Thanks,
Gianluca Righetto

Am 15.03.2014 um 09:34 schrieb Mirko Kämpf:


Hi Gianluca,

thanks for sharing your ideas and sending your proposal.
Your approach sounds promising and I am very interested
in supporting your work.

I am not an official member of the Apache Giraph project
at the moment, so my question goes to Avery:
Would it be possible for me to become a mentor for Gianluca's project?

Best wishes
Mirko




On Fri, Mar 14, 2014 at 10:19 PM, Avery Ching ach...@apache.org 
mailto:ach...@apache.org wrote:


This is a great idea.  Unfortunately, I'm a little bandwidth
limited, but I hope someone can help mentor you!


On 3/14/14, 1:26 PM, Gianluca Righetto wrote:

Hello everyone,

I've been working with Giraph for some time now and I'd like
to make some contributions back to the project through Google
Summer of Code.
I wrote a project proposal to implement an algorithm for
finding Strongly Connected Components in a graph, based on
recently published research papers. The main idea of the
algorithm is to find clusters (or groups) in the graph and
it's arguably more insightful than the currently available
Connected Components algorithm.
So, if there's any Apache member interested in mentoring this
project, please, feel free to contact me.
And any kind of feedback will be greatly appreciated.

You can find the document in Google Drive here:
http://goo.gl/1fqqui

Thanks,
Gianluca Righetto





--
--
Mirko Kämpf

*Trainer* @ Cloudera

tel: +49 *176 20 63 51 99*
skype: *kamir1604*
mi...@cloudera.com mailto:mi...@cloudera.com







Re: Java Process Memory Leak

2014-03-17 Thread Avery Ching

Hi Young,

Our Hadoop instance (Corona) kills processes after they finish executing 
so we don't see this.  You might want to do a jstack to see where it's 
hung up on and figure out the issue.


Thanks

Avery

On 3/17/14, 7:56 AM, Young Han wrote:

Hi all,

With Giraph 1.0.0, I've noticed an issue where the Java process 
corresponding to the job loiters around indefinitely even after the 
job completes (successfully). The process consumes memory but not CPU 
time. This happens on both a single machine and clusters of machines 
(in which case every worker has the issue). The only way I know of 
fixing this is killing the Java process manually---restarting or 
stopping Hadoop does not help.


Is this some known bug or a configuration issue on my end?

Thanks,
Young




Re: GSoC 2014 - Strongly Connected Components

2014-03-14 Thread Avery Ching
This is a great idea.  Unfortunately, I'm a little bandwidth limited, 
but I hope someone can help mentor you!


On 3/14/14, 1:26 PM, Gianluca Righetto wrote:

Hello everyone,

I've been working with Giraph for some time now and I'd like to make some 
contributions back to the project through Google Summer of Code.
I wrote a project proposal to implement an algorithm for finding Strongly 
Connected Components in a graph, based on recently published research papers. 
The main idea of the algorithm is to find clusters (or groups) in the graph and 
it's arguably more insightful than the currently available Connected Components 
algorithm.
So, if there's any Apache member interested in mentoring this project, please, 
feel free to contact me.
And any kind of feedback will be greatly appreciated.

You can find the document in Google Drive here: http://goo.gl/1fqqui

Thanks,
Gianluca Righetto




Re: DataStreamer Exception - LeaseExpiredException

2014-01-10 Thread Avery Ching
This looks more like the Zookeeper/YARN issues mentioned in the past.  
Unfortunately, I do not have a YARN instance to test this with.  Does 
anyone else have any insights here?


On 1/10/14 1:48 PM, Kristen Hardwick wrote:
Hi all, I'm requesting help again! I'm trying to get this 
SimpleShortestPathsComputation example working, but I'm stuck again. 
Now the job begins to run and seems to work until the final step (it 
performs 3 supersteps), but the overall job is failing.


In the master, among other things, I see:

...
14/01/10 15:04:17 INFO master.MasterThread: setup: Took 0.87 seconds.
14/01/10 15:04:17 INFO master.MasterThread: input superstep: Took 
0.708 seconds.
14/01/10 15:04:17 INFO master.MasterThread: superstep 0: Took 0.158 
seconds.
14/01/10 15:04:17 INFO master.MasterThread: superstep 1: Took 0.344 
seconds.
14/01/10 15:04:17 INFO master.MasterThread: superstep 2: Took 0.064 
seconds.

14/01/10 15:04:17 INFO master.MasterThread: shutdown: Took 0.162 seconds.
14/01/10 15:04:17 INFO master.MasterThread: total: Took 2.31 seconds.
14/01/10 15:04:17 INFO yarn.GiraphYarnTask: Master is ready to commit 
final job output data.
14/01/10 15:04:18 INFO yarn.GiraphYarnTask: Master has committed the 
final job output data.

...

To me, that looks promising - like the job was successful. However, in 
the WORKER_ONLY containers, I see these things:


...
14/01/10 15:04:17 INFO graph.GraphTaskManager: cleanup: Starting for 
WORKER_ONLY
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and 
unprocessed event 
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions, 
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent : 
partitionExchangeChildrenChanged (at least one worker is done sending 
partitions)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and 
unprocessed event 
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished, 
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 INFO netty.NettyClient: stop: reached wait 
threshold, 1 connections closed, releasing NettyClient.bootstrap 
resources now.
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job 
state changed, checking to see if it needs to restart
14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already 
exists 
(/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState)
14/01/10 15:04:17 INFO yarn.GiraphYarnTask: [STATUS: task-1] 
saveVertices: Starting to save 2 vertices using 1 threads
14/01/10 15:04:17 INFO worker.BspServiceWorker: saveVertices: Starting 
to save 2 vertices using 1 threads
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job 
state changed, checking to see if it needs to restart
14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already 
exists 
(/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState)
14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state path is 
empty! - 
/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState

14/01/10 15:04:17 ERROR zookeeper.ClientCnxn: Error while calling watcher
java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:50)
at org.json.JSONTokener.init(JSONTokener.java:66)
at org.json.JSONObject.init(JSONObject.java:402)
at 
org.apache.giraph.bsp.BspService.getJobState(BspService.java:716)
at 
org.apache.giraph.worker.BspServiceWorker.processEvent(BspServiceWorker.java:1563)

at org.apache.giraph.bsp.BspService.process(BspService.java:1095)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and 
unprocessed event 
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_vertexInputSplitsAllReady, 
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and 
unprocessed event 
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_addressesAndPartitions, 
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent : 
partitionExchangeChildrenChanged (at least one worker is done sending 
partitions)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and 
unprocessed event 
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished, 
type=NodeDeleted, state=SyncConnected)

...
14/01/10 15:04:17 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): 
No lease on 
/user/spry/Shortest/_temporary/1/_temporary/attempt_1389300168420_0024_m_01_1/part-m-1: 
File 

Re: Giraph 1.0.0 - Netty port allocation

2013-11-22 Thread Avery Ching
The port logic is a bit complex, but all encapsulated in 
NettyServer.java (see below).


If nothing else is running on those ports and you really only have one 
giraph worker per port you should be good to go.  Can you look at the 
logs for the worker that is trying to start a port other than base port 
+ taskId?



int taskId = conf.getTaskPartition();
int numTasks = conf.getInt(mapred.map.tasks, 1);
// Number of workers + 1 for master
int numServers = conf.getInt(GiraphConstants.MAX_WORKERS, numTasks) 
+ 1;

int portIncrementConstant =
(int) Math.pow(10, Math.ceil(Math.log10(numServers)));
int bindPort = GiraphConstants.IPC_INITIAL_PORT.get(conf) + taskId;
int bindAttempts = 0;
final int maxIpcPortBindAttempts = 
MAX_IPC_PORT_BIND_ATTEMPTS.get(conf);

final boolean failFirstPortBindingAttempt =
GiraphConstants.FAIL_FIRST_IPC_PORT_BIND_ATTEMPT.get(conf);

// Simple handling of port collisions on the same machine while
// preserving debugability from the port number alone.
// Round up the max number of workers to the next power of 10 and use
// it as a constant to increase the port number with.
while (bindAttempts  maxIpcPortBindAttempts) {
  this.myAddress = new InetSocketAddress(localHostname, bindPort);
  if (failFirstPortBindingAttempt  bindAttempts == 0) {
if (LOG.isInfoEnabled()) {
  LOG.info(start: Intentionally fail first  +
  binding attempt as giraph.failFirstIpcPortBindAttempt  +
  is true, port  + bindPort);
}
++bindAttempts;
bindPort += portIncrementConstant;
continue;
  }

  try {
Channel ch = bootstrap.bind(myAddress);
accepted.add(ch);

break;
  } catch (ChannelException e) {
LOG.warn(start: Likely failed to bind on attempt  +
bindAttempts +  to port  + bindPort, e);
++bindAttempts;
bindPort += portIncrementConstant;
  }
}
if (bindAttempts == maxIpcPortBindAttempts || myAddress == null) {
  throw new IllegalStateException(
  start: Failed to start NettyServer with  +
  bindAttempts +  attempts);
}



On 11/22/13 9:15 AM, Larry Compton wrote:
My teammates and I are running Giraph on a cluster where a firewall is 
configured on each compute node. We had 100 ports opened on the 
compute nodes, which we thought would be more than enough to 
accommodate a large number of workers. However, we're unable to go 
beyond about 90 workers with our Giraph jobs, due to Netty ports being 
allocated outside of the range (3-30100). We're not sure why this 
is happening. We shouldn't be running more than one worker per compute 
node, so we were assuming that only port 3 would be used, but 
we're routinely seeing Giraph try to use ports greater than 30100 when 
we request close to 100 workers. This leads us to believe that a 
simple one up numbering scheme is being used that doesn't take the 
host into consideration, although this is only speculation.


Is there a way around this problem? Our system admins understandably 
balked at opening 1000 ports.


Larry




Re: workload used to measure Giraph performance number

2013-10-09 Thread Avery Ching

  
  
Hi Wei,
  
  For best performance, please be sure to tune the GC settings, use
  Java 7, tune the number of cores used for computation,
  communication, etc. and the combiner.
  
  We also have some numbers on our recent Facebook blog post. 
  
https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920
  
  Avery
  
  On 10/8/13 7:43 PM, Wei Zhang wrote:


  Hi Sebastian,

Thanks a lot for the help!
  Sorry for the late response!

At this point, I would only
  need a random graph that complies with
  JsonLongDoubleFloatDoubleVertexInputFormat of Giraph to
  measure the pagerank example (of giraph) performance.
I am wondering how to convert
  the data from Koblenz to the such a graph ? Is there any
  pointer of doing this ? (This is the same kind of question
  that I raised to Alok on SNAP)

Thanks!

Wei

P.S.: I forgot to mention in
  all my previous emails that I just get started with
  distributed graph engine, so please forgive if my questions
  are too naive.

Sebastian Schelter ---10/02/2013
  12:41:27 PM---Another option is to use the Koblenz network
  collection [1], which offers even more (and larger) dat

From: Sebastian Schelter
  s...@apache.org
To: user@giraph.apache.org, 
Date: 10/02/2013 12:41 PM
Subject: Re: workload used to measure Giraph
  performance number
  
  
  
  
  Another option is to use the Koblenz network
  collection [1], which
  offers even more (and larger) datasets than Snap.
  
  Best,
  Sebastian
  
  [1] http://konect.uni-koblenz.de/
  
  
  On 02.10.2013 17:41, Alok Kumbhare wrote:
   There are a number real (medium sized) graphs at
   http://snap.stanford.edu/data/index.htmlwhich we use for similar
   benchmarks. It has a good mix of graph types,
  sparse/dense, ground truth
   graphs (e.g. social networks that follow power law
  distribution etc.). So
   far we have observed that the type of graph has a high
  impact on the
   performance of algorithms that Claudio mentioned.
   
   
   On Wed, Oct 2, 2013 at 8:22 AM, Claudio Martella
  claudio.marte...@gmail.com
   wrote:
   
   Hi Wei,
  
   it depends on what you mean by workload for a batch
  processing system. I
   believe we can split the problem in two: generating a
  realistic graph, and
   using "representative" algorithms.
  
   To generate graphs we have two options in giraph:
  
   1) random graph: you specify the number of vertices
  and the number of
   edges for each vertex, and the edges will connect two
  random vertices. This
   creates a graph with (i) low clustering coefficient,
  (ii) low average path
   length, (ii) a uniform degree distribution
  
   2) watts strogatz: you specify the number of
  vertices, the number of
   edges, and a rewire probability beta. giraph will
  generate a ring lattice
   (each vertex is connected to k preceeding vertices
  and k following
   vertices) and rewire some of the edges randomly. This
  will create a graph
   with (i) high clustering coefficient, (ii) low
  average path length, (iii)
   poisson-like degree distribution (depends on beta).
  This graph will
   resemble a small world graph such as a social
  network, except for the
   degree distribution which will not a power law.
  
   To use representative algorithms you can choose:
  
   1) PageRank: it's a ranking algorithm where all the
  vertices are active
   and send messages along the edges at each superstep
  (hence you'll have O(V)
   active vertices and O(E) messages)
  
   2) Shortest Paths: starting from a random vertex
  you'll visit al the
   vertices in the graph (some multiple times). This
  will have an aggregate
   O(V) active vertices and O(E) messages, but this is
  only a lower bound. In
   general you'l have different areas of the graph
  explored at each superstep,
   and hence potentially a varying workload across
  different supersteps.
  
   3) Connected Components: this will have something
  opposite to (2) as it
   will have many active vertices at the 

Re: zookeeper connection issue while running for second time

2013-10-01 Thread Avery Ching
We did have this error a few times. This can happen due to GC pauses, so 
I would check the worker for long GC issues.  Also, you can increase the 
ZooKeeper timeouts, see


  /** ZooKeeper session millisecond timeout */
  IntConfOption ZOOKEEPER_SESSION_TIMEOUT =
  new IntConfOption(giraph.zkSessionMsecTimeout, MINUTES.toMillis(1),
  ZooKeeper session millisecond timeout);

Currently, the default is one minute, but in production we set that 
number much, much higher (even greater than a day sometimes) to avoid 
the disconnection.


Hope that helps,
Avery

On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote:

Hi ,
I am able to run apache giraph successfully with around 500M pairs to 
find Connected components. It works great but not always, the issue 
seems to be with the time out zookeeper time out. Some of the 
client(around 5-10 ) out of 100, produces this error and the master 
fails due to this.Do you have any suggestions for this error. Any 
suggestions will be appreaciated.

2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process: 
Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent 
state:Disconnected type:None path:null
2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to serverhad22.rsk.admobius.com/10.240.51.32:2181  
http://had22.rsk.admobius.com/10.240.51.32:2181. Will not attempt to 
authenticate using SASL (Unable to locate a login configuration)
2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established tohad22.rsk.admobius.com/10.240.51.32:2181  
http://had22.rsk.admobius.com/10.240.51.32:2181, initiating session
2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0x441604c97412331 has expired, closing 
socket connection
2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got 
unknown null path event WatchedEvent state:Expired type:None path:null
2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down
2013-10-02 01:21:20,046 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 25 vertices at 1827.2925619484213 vertices/sec 1728790 edges at 
12636.730317550928 edges/sec Memory (free/total/max) = 1745.60M / 2262.19M / 
2730.69M
2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable: 
loadFromInputSplit: Finished loading 
/_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601 (v=261131, e=1808572)
2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable: 
Execution of callable failed
java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException on 
/_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
at 
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
at 
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
at 
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for 
/_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at 
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
... 9 more

--
Best Regards,
Jyotirmoy Sundi
Admobius

San Francisco, CA 94158



On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi sundi...@gmail.com 
mailto:sundi...@gmail.com wrote:


Hi ,

I got the connected component working for 1B nodes, but when I run the 
job again, it fails with the below error. Aprt form this in zookeeper the data 
is not cleared in the data directory. For successful jobs the data in zookeper 
from giraph is cleared.

The following errors seems to be coming because the node tries to connect 
to the zookeeper with a session id which is cleared as seens in

Client session timed out, have not heard from server in 68845ms for sessionid 

Re: Exception Already has missing vertex on this worker

2013-09-26 Thread Avery Ching
I think you may have added the same vertex 2x?  That being said, I don't 
see why the code is this way.  It should be fine.  We should file a JIRA.


On 9/26/13 11:02 AM, Yingyi Bu wrote:

Thanks, Lukas!
I think the reason of this exception is that I run the job over part 
of the graph where some target ids do not exist.


Yingyi


On Thu, Sep 26, 2013 at 1:13 AM, Lukas Nalezenec 
lukas.naleze...@firma.seznam.cz 
mailto:lukas.naleze...@firma.seznam.cz wrote:


Hi,
Do you use partition balancing ?
Lukas



On 09/26/13 05:16, Yingyi Bu wrote:

Hi,
I got this exception when I ran a Giraph-1.0.0 PageRank job over a 60 
machine cluster with 28GB input data.  But I got this exception:

java.lang.IllegalStateException: run: Caught an unrecoverable exception 
resolveMutations: Already has missing vertex on this worker for 20464109
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: resolveMutations: Already has 
missing vertex on this worker for 20464109
at 
org.apache.giraph.comm.netty.NettyWorkerServer.resolveMutations(NettyWorkerServer.java:184)
at 
org.apache.giraph.comm.netty.NettyWorkerServer.prepareSuperstep(NettyWorkerServer.java:152)
at 
org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:677)
at 
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:249)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
... 7 more


Does anyone know what is the possible cause of this exception?
Thanks!

Yingyi







Re: Exception Already has missing vertex on this worker

2013-09-26 Thread Avery Ching
Hopefully you are using combiners and also re-using objects.  This can 
keep memory usage much lower. Also implementing your own OutEdges can 
make it much more efficient.


How much memory do you have?

Avery

On 9/26/13 12:51 PM, Yingyi Bu wrote:

 I think you may have added the same vertex 2x?
I ran the job over roughly half of the graph and saw this.  However 
the input is not a connected components such that there might be 
target vertex ids which do not exist.
When I ran the job over the entire graph,  I cannot see this but the 
job fails with exceeding GC limit (trying out-of-core now).


Yingyi



On Thu, Sep 26, 2013 at 12:05 PM, Avery Ching ach...@apache.org 
mailto:ach...@apache.org wrote:


I think you may have added the same vertex 2x?  That being said, I
don't see why the code is this way.  It should be fine.  We should
file a JIRA.


On 9/26/13 11:02 AM, Yingyi Bu wrote:

Thanks, Lukas!
I think the reason of this exception is that I run the job over
part of the graph where some target ids do not exist.

Yingyi


On Thu, Sep 26, 2013 at 1:13 AM, Lukas Nalezenec
lukas.naleze...@firma.seznam.cz
mailto:lukas.naleze...@firma.seznam.cz wrote:

Hi,
Do you use partition balancing ?
Lukas



On 09/26/13 05:16, Yingyi Bu wrote:

Hi,
I got this exception when I ran a Giraph-1.0.0 PageRank job over a 60 
machine cluster with 28GB input data.  But I got this exception:

java.lang.IllegalStateException: run: Caught an unrecoverable exception 
resolveMutations: Already has missing vertex on this worker for 20464109
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: resolveMutations: Already 
has missing vertex on this worker for 20464109
at 
org.apache.giraph.comm.netty.NettyWorkerServer.resolveMutations(NettyWorkerServer.java:184)
at 
org.apache.giraph.comm.netty.NettyWorkerServer.prepareSuperstep(NettyWorkerServer.java:152)
at 
org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:677)
at 
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:249)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
... 7 more


Does anyone know what is the possible cause of this exception?
Thanks!

Yingyi










Re: Counter limit

2013-09-09 Thread Avery Ching

If you are running out of counters, you can turn off the superstep counters

  /** Use superstep counters? (boolean) */
  BooleanConfOption USE_SUPERSTEP_COUNTERS =
  new BooleanConfOption(giraph.useSuperstepCounters, true,
  Use superstep counters? (boolean));

On 9/9/13 6:43 AM, Claudio Martella wrote:
No, I used a different counters limit on that hadoop version. Setting 
mapreduce.job.counters.limit to a higher number and restarting JT and 
TT worked for me. Maybe 64000 might be too high? Try setting it to 
512. Does not look like the case, but who knows.



On Mon, Sep 9, 2013 at 2:57 PM, Christian Krause m...@ckrause.org 
mailto:m...@ckrause.org wrote:


Sorry, it still doesn't work (I ran into a different problem
before I reached the limit).

I am using Hadoop 0.20.203.0 tel:0.20.203.0. Is the limit of 120
counters maybe hardcoded?

Cheers
Christian

Am 09.09.2013 08 tel:09.09.2013%2008:29 schrieb Christian
Krause m...@ckrause.org mailto:m...@ckrause.org:

I changed the property name to mapred.job.counters.limit and
restarted it again. Now it works.

Thanks,
Christian


2013/9/7 Claudio Martella claudio.marte...@gmail.com
mailto:claudio.marte...@gmail.com

did you restart TT and JT?


On Sat, Sep 7, 2013 at 7:09 AM, Christian Krause
m...@ckrause.org mailto:m...@ckrause.org wrote:

Hi,
I've increased the counter limit in mapred-site.xml,
but I still get the error: Exceeded counter limits -
Counters=121 Limit=120. Groups=6 Limit=50.

This is my config:

 cat conf/mapred-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl
href=configuration.xsl?

!-- Put site-specific property overrides in this
file. --

configuration
...
property
namemapreduce.job.counters.limit/name
value64000/value
/property
property
namemapred.task.timeout/name
value240/value
/property
...
/configuration

Any ideas?

Cheers,
Christian




-- 
   Claudio Martella

claudio.marte...@gmail.com
mailto:claudio.marte...@gmail.com





--
   Claudio Martella
claudio.marte...@gmail.com mailto:claudio.marte...@gmail.com




Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-09-04 Thread Avery Ching
We have caches per every compute threads.  Then we have w worker caches 
per compute thread.  So the total amount of memory consumed by message 
caches per worker =
Compute threads * workers * size of cache.  The best thing is to tune 
down the size of the cache from MAX_MSG_REQUEST_SIZE to a size that 
works for your configuration.


Hope that helps,

Avery

On 9/4/13 3:33 AM, Lukas Nalezenec wrote:


Thanks,
I was not sure if it really works as I described.

 Facebook can't be using it like this if, as described, they have 
billions of vertices and a trillion edges.


Yes, its strange. I guess configuration does not help so much on large 
cluster. What might help are properties of input data.


 So do you, or Avery, have any idea how you might initialize this is 
a more reasonable way, and how???


Fast workaround is to set number of partitions to from W^2 to W or 2*W 
.  It will help if you dont have very large number of workers.

I would not change MAX_*_REQUEST_SIZE much since it may hurt performance.
You can do some preprocessing before loading data to Giraph.



How to change Giraph:
The caches could be flushed if total sum of vertexes/edges in all 
caches exceeds some number. Ideally, it should prevent not only 
OutOfMemory errors but also raising high water mark. Not sure if it 
(preventing raising HWM) is easy to do.
I am going to use almost-prebuild partitions. For my use case it would 
be ideal to detect if some cache is abandoned and i would not be used 
anymore. It would cut memory usage in caches from ~O(n^3) to ~O(n).  
It could be done by counting number of cache flushes or cache 
insertions and if some cache was not touched for long time it would be 
flushed.


There could be separated configuration MAX_*_REQUEST_SIZE for per 
partition caches during loading data.


I guess there should be simple but efficient way how to trace memory 
high-water mark. It could look like:


Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb
Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
Iteration 1 XYZ 
Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
.
.
.

Lukas





On 09/04/13 01:12, Jeff Peters wrote:
Thank you Lukas!!! That's EXACTLY the kind of model I was building in 
my head over the weekend about why this might be happening, and why 
increasing the number of AWS instances (and workers) does not solve 
the problem without increasing each worker's VM. Surely Facebook 
can't be using it like this if, as described, they have billions of 
vertices and a trillion edges. So do you, or Avery, have any idea how 
you might initialize this is a more reasonable way, and how???



On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec 
lukas.naleze...@firma.seznam.cz 
mailto:lukas.naleze...@firma.seznam.cz wrote:


Hi

I wasted few days on similar problem.

I guess the problem was that during loading - if you have got W
workers and W^2 partitions there are W^2 partition caches in each
worker.
Each cache can hold 10 000 vertexes by default.
I had 26 000 000 vertexes, 60 workers - 3600 partitions. It
means that there can be up to 36 000 000 vertexes in caches in
each worker if input files are random.
Workers were assigned 450 000 vertexes but failed when they had
900 000 vertexes in memory.

Btw: Why default number of partitions is W^2 ?

(I can be wrong)
Lukas



On 08/31/13 01:54, Avery Ching wrote:

Ah, the new caches. =)  These make things a lot faster (bulk
data sending), but do take up some additional memory.  if you
look at GiraphConstants, you can find ways to change the cache
sizes (this will reduce that memory usage).
For example, MAX_EDGE_REQUEST_SIZE will affect the size of the
edge cache. MAX_MSG_REQUEST_SIZE will affect the size of the
message cache.  The caches are per worker, so 100 workers would
require 50 MB  per worker by default.  Feel free to trim it if
you like.

The byte arrays for the edges are the most efficient storage
possible (although not as performance as the native edge stores).

Hope that helps,

Avery

On 8/29/13 4:53 PM, Jeff Peters wrote:

Avery, it would seem that optimizations to Giraph have,
unfortunately, turned the majority of the heap into dark
matter. The two snapshots are at unknown points in a superstep
but I waited for several supersteps so that the activity had
more or less stabilized. About the only thing comparable
between the two snapshots are the vertexes, 192561 X
RecsVertex in the new version and 191995 X Coloring in the
old system. But with the new Giraph 672710176 out of 824886184
bytes are stored as primitive byte arrays. That's probably
indicative of some very fine performance optimization work, but
it makes it extremely difficult to know what's really out
there, and why. I did notice that a number of caches have
appeared

Re: Exception with Large Graphs

2013-08-30 Thread Avery Ching
That error is from the master dying (likely due to the results of 
another worker dying).  Can you do a rough calculation of the size of 
data that you expect to be loaded and check if the memory is enough?


On 8/30/13 11:19 AM, Yasser Altowim wrote:


Guys,

   Can someone please help me with this issue? Thanks.

Best,

Yasser

*From:*Yasser Altowim
*Sent:* Thursday, August 29, 2013 11:16 AM
*To:* user@giraph.apache.org
*Subject:* Exception with Large Graphs

Hi,

 I am implementing an algorithm using Giraph, and I was able 
to run my algorithm on relatively small datasets (64,000,000 vertices 
and 128,000,000 edges). However, when I increase the size of the 
dataset to 128,000,000 vertices and 256,000,000 edges, the job takes 
so much time to load the vertices, and then it gives me the following 
exception.


I have tried to increase the heap size and the task timeout 
value in the mapred-site.xml configuration file, and even vary the 
number of workers from 1 to 10, but still getting the same exceptions. 
I have a cluster of 10 nodes, and each node has  a 4G of ram.  Thanks 
in advance.


2013-08-29 10:22:53,150 INFO 
org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not 
ready yet java.util.concurrent.FutureTask@1a129460 
mailto:java.util.concurrent.FutureTask@1a129460


2013-08-29 10:22:53,151 INFO 
org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4 
mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4


2013-08-29 10:23:07,938 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: 
readVertexInputSplit: Loaded 7769685 vertices at 14250.953615591572 
vertices/sec 15539370 edges at 28500.77593053654 edges/sec Memory 
(free/total/max) = 680.21M / 3207.44M / 3555.56M


2013-08-29 10:23:14,538 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: 
readVertexInputSplit: Loaded 8019685 vertices at 14533.557468366102 
vertices/sec 16039370 edges at 29065.97491865343 edges/sec Memory 
(free/total/max) = 906.80M / 3242.75M / 3555.56M


2013-08-29 10:23:21,888 INFO 
org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: 
Finished loading 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/9 (v=1212852, 
e=2425704)


2013-08-29 10:23:37,911 INFO 
org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: 
Reserved input split path 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19, overall 
roughly 7.518797% input splits reserved


2013-08-29 10:23:37,923 INFO 
org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19 from 
ZooKeeper and got input split 
'org.apache.giraph.io.formats.multi.InputSplitWithInputFormatIndex@24004559'


2013-08-29 10:23:44,313 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: 
readVertexInputSplit: Loaded 8482537 vertices at 14585.340134636266 
vertices/sec 16965074 edges at 29169.59449002283 edges/sec Memory 
(free/total/max) = 538.93M / 3186.13M / 3555.56M


2013-08-29 10:23:49,963 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: 
readVertexInputSplit: Loaded 8732537 vertices at 14870.726503632277 
vertices/sec 17465074 edges at 29740.356341344923 edges/sec Memory 
(free/total/max) = 489.84M / 3222.56M / 3555.56M


2013-08-29 10:34:28,371 INFO 
org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not 
ready yet java.util.concurrent.FutureTask@1a129460 
mailto:java.util.concurrent.FutureTask@1a129460


2013-08-29 10:34:34,847 INFO 
org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4 
mailto:org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4


2013-08-29 10:34:34,850 INFO 
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server 
window metrics MBytes/sec sent = 0, MBytes/sec received = 0.0161, 
MBytesSent = 0.0002, MBytesReceived = 12.3175, ave sent req MBytes = 
0, ave received req MBytes = 0.0587, secs waited = 765.881


2013-08-29 10:34:35,698 INFO org.apache.zookeeper.ClientCnxn: Client 
session timed out, have not heard from server in 649805ms for 
sessionid 0x140cb1140540006, closing socket connection and attempting 
reconnect


2013-08-29 10:34:42,471 WARN org.apache.giraph.bsp.BspService: 
process: Disconnected from ZooKeeper (will automatically try to 
recover) WatchedEvent state:Disconnected type:None path:null


2013-08-29 10:34:42,472 WARN 
org.apache.giraph.worker.InputSplitsHandler: process: Problem with 
zookeeper, got event with path null, state Disconnected, event type None


2013-08-29 10:34:43,819 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket connection to server slave5.ericsson-magic.net/10.126.72.165:22181


2013-08-29 10:34:44,077 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection established to 
slave5.ericsson-magic.net/10.126.72.165:22181, initiating session


2013-08-29 

Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-08-30 Thread Avery Ching
Ah, the new caches. =)  These make things a lot faster (bulk data 
sending), but do take up some additional memory.  if you look at 
GiraphConstants, you can find ways to change the cache sizes (this will 
reduce that memory usage).
For example, MAX_EDGE_REQUEST_SIZE will affect the size of the edge 
cache.  MAX_MSG_REQUEST_SIZE will affect the size of the message cache.  
The caches are per worker, so 100 workers would require 50 MB  per 
worker by default.  Feel free to trim it if you like.


The byte arrays for the edges are the most efficient storage possible 
(although not as performance as the native edge stores).


Hope that helps,

Avery

On 8/29/13 4:53 PM, Jeff Peters wrote:
Avery, it would seem that optimizations to Giraph have, unfortunately, 
turned the majority of the heap into dark matter. The two snapshots 
are at unknown points in a superstep but I waited for several 
supersteps so that the activity had more or less stabilized. About the 
only thing comparable between the two snapshots are the vertexes, 
192561 X RecsVertex in the new version and 191995 X Coloring in 
the old system. But with the new Giraph 672710176 out of 824886184 
bytes are stored as primitive byte arrays. That's probably indicative 
of some very fine performance optimization work, but it makes it 
extremely difficult to know what's really out there, and why. I did 
notice that a number of caches have appeared that did not exist 
before, namely SendEdgeCache, SendPartitionCache, SendMessageCache 
and SendMutationsCache.


Could any of those account for a larger per-worker footprint in a 
modern Giraph? Should I simply assume that I need to force AWS to 
configure its EMR Hadoop so that each instance has fewer map tasks but 
with a somewhat larger VM max, say 3GB instead of 2GB?



On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching ach...@apache.org 
mailto:ach...@apache.org wrote:


Try dumping a histogram of memory usage from a running JVM and see
where the memory is going.  I can't think of anything in
particular that changed...


On 8/28/13 4:39 PM, Jeff Peters wrote:


I am tasked with updating our ancient (circa 7/10/2012) Giraph
to giraph-release-1.0.0-RC3. Most jobs run fine but our
largest job now runs out of memory using the same AWS
elastic-mapreduce configuration we have always used. I have
never tried to configure either Giraph or the AWS Hadoop. We
build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS
provides us. The 8 X m2.4xlarge cluster we use seems to
provide 8*14=112 map tasks fitted out with 2GB heap each. Our
code is completely unchanged except as required to adapt to
the new Giraph APIs. Our vertex, edge, and message data are
completely unchanged. On smaller jobs, that work, the
aggregate heap usage high-water mark seems about the same as
before, but the committed heap seems to run higher. I can't
even make it work on a cluster of 12. In that case I get one
map task that seems to end up with nearly twice as many
messages as most of the others so it runs out of memory
anyway. It only takes one to fail the job. Am I missing
something here? Should I be configuring my new Giraph in some
way I didn't used to need to with the old one?







Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-08-28 Thread Avery Ching
Try dumping a histogram of memory usage from a running JVM and see where 
the memory is going.  I can't think of anything in particular that 
changed...


On 8/28/13 4:39 PM, Jeff Peters wrote:


I am tasked with updating our ancient (circa 7/10/2012) Giraph to 
giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now 
runs out of memory using the same AWS elastic-mapreduce configuration 
we have always used. I have never tried to configure either Giraph or 
the AWS Hadoop. We build for Hadoop 1.0.2 because that's closest to 
the 1.0.3 AWS provides us. The 8 X m2.4xlarge cluster we use seems to 
provide 8*14=112 map tasks fitted out with 2GB heap each. Our code is 
completely unchanged except as required to adapt to the new Giraph 
APIs. Our vertex, edge, and message data are completely unchanged. On 
smaller jobs, that work, the aggregate heap usage high-water mark 
seems about the same as before, but the committed heap seems to run 
higher. I can't even make it work on a cluster of 12. In that case I 
get one map task that seems to end up with nearly twice as many 
messages as most of the others so it runs out of memory anyway. It 
only takes one to fail the job. Am I missing something here? Should I 
be configuring my new Giraph in some way I didn't used to need to with 
the old one?






Re: Workers input splits and MasterCompute communication

2013-08-19 Thread Avery Ching
That makes sense, since the Context doesn't have a real InputSplit (it's 
a Giraph one - see BspInputSplit).


What information are you trying to get out of the input splits? Giraph 
workers can process an arbitrary number of input splits (0 or more), so 
I don't think this will be useful.


You can use Configuration if you need to set some information at runtime.

Avery

On 8/19/13 9:14 AM, Marco Aurelio Barbosa Fagnani Lotz wrote:

Hello all :)

I am having problems calling getContext().getInputSplit(); inside the 
compute() method in the workers.


It always returns as if it didn't get any split at all, since 
inputSplit.getLocations() returns without the hosts that should have 
that split as local and inputSplit.getLength() returns 0.


Should there be any initialization to the Workers context so that I 
can get this information?

Is there anyway to access the jobContext from the workers or the Master?

Best Regards,
Marco Lotz


*From:* Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk
*Sent:* 17 August 2013 20:20
*To:* user@giraph.apache.org
*Subject:* Workers input splits and MasterCompute communication
Hello all :)

In what class the workers actually get the input file splits from the 
file system?


Is it possible to a MasterCompute class object to have 
access/communication with the workers in that job? I though about 
using aggregators, but then I assumed that aggregators actually work 
with vertices compute() (and related methods) and not with the worker 
itself.


When I mean workers I don't mean the vertices in each worker, but the 
object that runs the compute for all the vertices in that worker.


Best Regards,
Marco Lotz




Re: New vertex allocation and messages

2013-08-19 Thread Avery Ching
Yes, you can control this behavior with the VertexResolver.  It handles 
all mutations to the graph and resolves them in a user defined way.


Avery

On 8/19/13 9:21 AM, Marco Aurelio Barbosa Fagnani Lotz wrote:

Hello all :)

I am programming an application that has to create and destroy a few 
vertices. I was wondering if there is any protection in Giraph to 
prevent a vertex to send a message to another vertex that does not 
exist (i.e. provide a vertex id that is not associated with a vertex yet).


Is there a way to test if the destination vertex exists before sending 
the message to it?


Also, when a vertex is created, is there any source of load balancing 
or it is always kept in the worker that created it?


Best Regards,
Marco Lotz






Re: MultiVertexInputFormat

2013-08-16 Thread Avery Ching
This is doable in Giraph, you can use as many vertex or edge input 
formats as you like (via GIRAPH-639).  You just need to choose 
MultiVertexInputFormat and/or MultiEdgeInputFromat


See VertexInputFormatDescription for vertex input formats

  /**
   * VertexInputFormats description - JSON array containing a JSON 
array for
   * each vertex input. Vertex input JSON arrays contain one or two 
elements -
   * first one is the name of vertex input class, and second one is 
JSON object

   * with all specific parameters for this vertex input. For example:
   * [[VIF1,{p:v1}],[VIF2,{p:v2,q:v}]]
   */
  public static final StrConfOption VERTEX_INPUT_FORMAT_DESCRIPTIONS =
  new StrConfOption(giraph.multiVertexInput.descriptions, null,
  VertexInputFormats description - JSON array containing a 
JSON  +
  array for each vertex input. Vertex input JSON arrays 
contain  +

  one or two elements - first one is the name of vertex input  +
  class, and second one is JSON object with all specific 
parameters  +
  for this vertex input. For example: 
[[\VIF1\,{\p\:\v1\}], +

  [\VIF2\,{\p\:\v2\,\q\:\v\}]]\);

See EdgeInputFormatDescription for edge input formats

  /**
   * EdgeInputFormats description - JSON array containing a JSON array for
   * each edge input. Edge input JSON arrays contain one or two elements -
   * first one is the name of edge input class, and second one is JSON 
object

   * with all specific parameters for this edge input. For example:
   * [[EIF1,{p:v1}],[EIF2,{p:v2,q:v}]]
   */
  public static final StrConfOption EDGE_INPUT_FORMAT_DESCRIPTIONS =
  new StrConfOption(giraph.multiEdgeInput.descriptions, null,
  EdgeInputFormats description - JSON array containing a JSON 
array  +
  for each edge input. Edge input JSON arrays contain one or 
two  +
  elements - first one is the name of edge input class, and 
second  +
  one is JSON object with all specific parameters for this 
edge  +

  input. For example: [[\EIF1\,{\p\:\v1\}], +
  [\EIF2\,{\p\:\v2\,\q\:\v\}]]);

Hope that helps,

Avery

On 8/16/13 8:45 AM, Yasser Altowim wrote:


Guys, any help with this will be appreciated. Thanks.

*From:*Yasser Altowim [mailto:yasser.alto...@ericsson.com]
*Sent:* Thursday, August 15, 2013 2:07 PM
*To:* user@giraph.apache.org
*Subject:* MultiVertexInputFormat

Hi,

 I am implementing an algorithm using Giraph. My  
algorithm needs to read input data from two files, each has its own 
format. My questions are:


1.How can I use the MultiVertexInputFormat class? Is there any example 
that shows how this class can be used?


2.How can I specify this class when running my job using the Giraph 
Runner or using a driver class?


Thanks in advance.

*Best,*

*Yasser*





Using Giraph at Facebook

2013-08-14 Thread Avery Ching

Hi Giraphers,

We recently released an article on we can use Giraph at the scale of a 
trillion edges at Facebook.  If you're interested, please take a look!


https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920

Avery


Re: Giraph vs good-old PVM/MPI ?

2013-08-06 Thread Avery Ching
The Giraph/Pregel model is based on bulk synchronous parallel computing, 
where the programmer is abstracted from the details of how the 
parallelization occurs (infrastructure does this for you).  Additionally 
the APIs are built for graph-processing.  Since the computing model is 
well defined (BSP), the infrastructure can checkpoint the state of the 
application at the appropriate time and also handle failures without 
user interaction.


MPI is a much lower level and generic API, where messages are send to 
processes.  Users must pack/unpack their own messages and deliver 
messages to the appropriate data structures.  Users must partition their 
own data.  As of MPI 2, the state of a failed process leaves the 
application in an undefined state (usually dead).


Hope that helps,

Avery

On 8/6/13 10:19 AM, Yang wrote:
it seems that the paradigm offered by Giraph/Pregel is very similar to 
the programming paradim of PVM , and to a lesser degree, MPI. using 
PVM, we often engages in such iterative cycles where all the nodes 
sync on a barrier and then enters the next cycle.


so what is the extra features offered by Giraph/Pregel? I can see 
persistence/restarting of tasks, and maybe abstraction of the 
user-code-specific part into the API so that users are not concerned 
with the actual message passing (message passing is done by the 
framework).


Thanks
Yang




Re: Missing classes SendMessageToAllCache / SendWorkerOneToAllMessagesRequest

2013-07-20 Thread Avery Ching

This should be fixed now.

On 7/20/13 12:20 PM, Avery Ching wrote:


My bad. I am out but will fix in a few hours.

On Jul 20, 2013 11:02 AM, Christian Krause m...@ckrause.org 
mailto:m...@ckrause.org wrote:


Hi,
I get these compile errors. Could it be that some classes are missing?

Cheers,
Christian

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.0:compile
(default-compile) on project giraph-core: Compilation failure:
Compilation failure:
[ERROR]

/home/christian/giraph-git/giraph-core/target/munged/main/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java:[24,30]
cannot find symbol
[ERROR] symbol:   class SendMessageToAllCache
[ERROR] location: package org.apache.giraph.comm
[ERROR]

/home/christian/giraph-git/giraph-core/target/munged/main/org/apache/giraph/comm/requests/RequestType.java:[41,5]
cannot find symbol
[ERROR] symbol:   class SendWorkerOneToAllMessagesRequest
[ERROR] location: class org.apache.giraph.comm.requests.RequestType
[ERROR]

/home/christian/giraph-git/giraph-core/target/munged/main/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java:[132,13]
cannot find symbol
[ERROR] symbol:   class SendMessageToAllCache
[ERROR] location: class
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessorI,V,E





Re: HBase EdgeInputFormat

2013-07-18 Thread Avery Ching
I don't think it will be hard to implement.  Just start with the 
HbaseVertexInputFormat and have it extend EdgeInputFormat.  You can look 
at TableEdgeInputFormat for an example.  It sounds like a good 
contribution to Giraph.


On 7/18/13 1:57 PM, Puneet Jain wrote:

I also need this feature. Will be really helpful.


On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ 
emre.ala...@agmlab.com mailto:emre.ala...@agmlab.com wrote:


Hi,

Question: Will there be HBaseEdgeInputFormat class or is there a
restriction of HBase thus we can't implement it?

HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
each row in HBase corresponds to one Vertex. But it does not allow
me to create duplicate vertices with the same ID.
Now I have the case many rows in HBase can correspond to one
Vertex, each representing sets of edges.

Example:
a1 - x y z
a2 - t p
a3 - k

will be

vertex a with edges to x y z t p k

It gives me the intuition that if there existed
HBaseEdgeInputFormat, I could solve this case. But it doesn't
exist yet.






--
--Puneet




Re: Avro input format available on Giraph?

2013-07-18 Thread Avery Ching
Not that I know of.  Since it is similar to JSON, you might want to take 
a look at JsonBase64VertexInputFormat as an example for Avro.  Should be 
fairly similar in structure.  Of course, it would be great if you can 
contribute it back to Giraph when you're done. =)


Avery

On 7/18/13 4:36 PM, Chuan Lei wrote:

Hello,

I just wonder does Giraph have an Avro input format reader, which can 
read avro input files? If not, could someone let me know where I can 
get started? For example, which input format class that I should 
extend from. Thanks in advance for your help.


Regards,
Chuan L.




Re: MapWritable messages in Giraph

2013-07-04 Thread Avery Ching
Looks like the serialization/descrialization has a problem.  If you want 
to see an example of a Trove primitive map, see

LongDoubleArrayEdges.

On 7/4/13 7:06 AM, Pasupathy Mahalingam wrote:

Hi,

Thanks Avery Ching.

I get the following exception

java.lang.IllegalStateException: run: Caught an unrecoverable 
exception waitFor: ExecutionException occurred while waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@381eb0c6

at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException: waitFor: 
ExecutionException occurred while waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@381eb0c6
at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
at 
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
at 
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
at 
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
at 
org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:753)
at 
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:273)

at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
... 7 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalStateException: next: IOException

at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)

... 13 more
Caused by: java.lang.IllegalStateException: next: IOException
at 
org.apache.giraph.utils.ByteArrayVertexIdData$VertexIdDataIterator.next(ByteArrayVertexIdData.java:211)
at 
org.apache.giraph.comm.messages.ByteArrayMessagesPerVertexStore.addPartitionMessages(ByteArrayMessagesPerVertexStore.java:116)
at 
org.apache.giraph.comm.requests.SendWorkerMessagesRequest.doRequest(SendWorkerMessagesRequest.java:72)
at 
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
at 
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:419)
at 
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:193)
at 
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)
at 
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: ensureRemaining: Only 393 bytes 
remaining, trying to read 8960
at 
org.apache.giraph.utils.UnsafeByteArrayInputStream.ensureRemaining(UnsafeByteArrayInputStream.java:114)
at 
org.apache.giraph.utils.UnsafeByteArrayInputStream.readFully(UnsafeByteArrayInputStream.java:128)
at 
org.apache.giraph.utils.UnsafeByteArrayInputStream.readUTF(UnsafeByteArrayInputStream.java:275)
at 
org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:199)

at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167)
at 
org.apache.giraph.utils.ByteArrayVertexIdMessages.readData(ByteArrayVertexIdMessages.java:76)
at 
org.apache.giraph.utils.ByteArrayVertexIdMessages.readData(ByteArrayVertexIdMessages.java:34)
at 
org.apache.giraph.utils.ByteArrayVertexIdData$VertexIdDataIterator.next(ByteArrayVertexIdData.java:209)

... 12 more

It will be great on how you use writable maps based on Trove/ Fast 
Util. Sample usage if you can share will be great.


Rgds
Pasupathy


On Wed, Jul 3, 2013 at 10:19 PM, Avery Ching ach...@apache.org 
mailto:ach...@apache.org wrote:


We don't use MapWritable.  Internally we have a bunch of writable
maps based on Trove or FastUtil for speed.  What's your full
exception stack trace?


On 7/2/13 1:24 AM, Pasupathy Mahalingam wrote:

Hi,

I'm trying to send

Re: Bi-directional and multigraphs

2013-07-03 Thread Avery Ching
You can easily add bi-directional edges.  When you load the edge, simply 
also load the reciprocal edge.  I.e. if you add a-b, also add b-a.


On 7/2/13 1:11 AM, Pascal Jäger wrote:

Hi everyone,

I am currently getting my hands on giraph which is why I am trying to 
implement a maximum flow algorithm originally designed for MapReduce.

The algorithm requires bi-directional edges.

  * Are bi-directional edges supported in giraph?
  * Where would I find them?

Thanks

Pascal





Re: Failed to compile Giraph for Hadoop YARN

2013-07-03 Thread Avery Ching

Eli, any thoughts?

On 7/3/13 9:27 AM, Chui-Hui Chiu wrote:

Hello,

I tried to compile the Giraph-1.1.0-SNAPSHOT for hadoop_2.0.3 or 
hadoop_yarn but all failed.


The error message while the compile command is mvn -Phadoop_yarn 
compile is

=
[INFO] 


[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent .. SUCCESS 
[1.320s]
[INFO] Apache Giraph Core  FAILURE 
[12.508s]

[INFO] Apache Giraph Examples  SKIPPED
[INFO] 


[INFO] BUILD FAILURE
[INFO] 


[INFO] Total time: 14.473s
[INFO] Finished at: Wed Jul 03 11:05:44 CDT 2013
[INFO] Final Memory: 14M/216M
[INFO] 

[ERROR] Failed to execute goal on project giraph-core: Could not 
resolve dependencies for project 
org.apache.giraph:giraph-core:jar:1.1.0-SNAPSHOT: The following 
artifacts could not be resolved: 
org.apache.hadoop:hadoop-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-mapreduce-client-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-mapreduce-client-core:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-yarn-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-yarn-server-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION, 
org.apache.hadoop:hadoop-yarn-server-tests:jar:tests:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION: 
Could not find artifact 
org.apache.hadoop:hadoop-common:jar:SET_HADOOP_VERSION_USING_MVN_DASH_D_OPTION 
in central (http://repo1.maven.org/maven2) - [Help 1]

=


The error message while the compile command is mvn -Phadoop_2.0.3 
compile is

=
[INFO] 


[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent .. SUCCESS 
[12.695s]
[INFO] Apache Giraph Core  SUCCESS 
[2:10.916s]
[INFO] Apache Giraph Examples  FAILURE 
[2.286s]
[INFO] 


[INFO] BUILD FAILURE
[INFO] 


[INFO] Total time: 2:26.530s
[INFO] Finished at: Wed Jul 03 11:03:25 CDT 2013
[INFO] Final Memory: 34M/348M
[INFO] 

[ERROR] Failed to execute goal on project giraph-examples: Could not 
resolve dependencies for project 
org.apache.giraph:giraph-examples:jar:1.1.0-SNAPSHOT: Could not find 
artifact org.apache.giraph:giraph-core:jar:tests:1.1.0-SNAPSHOT in 
central (http://repo1.maven.org/maven2) - [Help 1]

=

Do I miss anything?


I also noticed that my Maven 3 download many files from the maven2 
folder on a remote server with the following prompt.


Downloading: http://repo1.maven.org/maven2/org/apache/hadoop/...

Is this a problem?


Thanks,
Chui-hui




Re: Running Giraph job inside Java code

2013-07-03 Thread Avery Ching
Take a look at PageRankBenchmark, it is a stand alone java program that 
runs Giraph jobs.


On 7/2/13 4:08 AM, Ahmet Emre Aladağ wrote:
By the way, I have set the corresponding classes in the giraph 
configuration.


GiraphConfiguration giraphConf = new GiraphConfiguration(config);

giraphConf.setZooKeeperConfiguration(
zooKeeperWatcher.getQuorum());
giraphConf.setComputationClass(LinkRankComputation.class);
giraphConf.setMasterComputeClass(LinkRankVertexMasterCompute.class);
giraphConf.setOutEdgesClass(ByteArrayEdges.class);
giraphConf.setVertexInputFormatClass(NutchTableEdgeInputFormat.class);
giraphConf.setVertexOutputFormatClass(NutchTableEdgeOutputFormat.class);
giraphConf.setInt(giraph.pageRank.superstepCount, 40);
giraphConf.setWorkerConfiguration(1, 1, 100.0f);
giraphConf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME);
giraphConf.set(TableOutputFormat.OUTPUT_TABLE, TABLE_NAME);




Re: Array exception when using out-of-core graph

2013-07-03 Thread Avery Ching

Claudio, any thoughts?

On 7/3/13 3:52 AM, Han JU wrote:

Hi,

I've been testing some algorithm using the out-of-core feature, and I 
have an strange ArrayIndexOutOfBoundsException.


In my computation class, the vertex value is a custom writable class 
which contains a long[]. And during the computation, when the code 
access this array (say at index 0), the exception is thrown.


Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at
some.package.ProjectionComputation.compute(ProjectionComputation.java:87)
at

org.apache.giraph.graph.ComputeCallable.computePartition(ComputeCallable.java:226)
at
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:161)
at
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)


This happens only if out-of-core graph is enabled and the 
maxPartitionsInMemory is lower than the actual partitions. The vertex 
value class is solid in terms of serialization (proven by unit tests).
The strange thing is that when the exception is thrown, the array 
index is perfectly legal. And I can even print the long value 
retrieved from the array ... So it seems to me that maybe it's not a 
problem within my code.


Any suggestions?

My programs base on the trunk.

--
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   - Université de Technologie de Compiègne
GI06 - Fouille de Données et Décisionnel//

+33 061960




Re: Is Zookeeper a must for Giraph?

2013-06-24 Thread Avery Ching
Zookeeper is required.  That being said, you can have an external 
Zookeeper or Giraph can start one for you.  It's your choice.


Eli is the one to contact regarding Giraph on Hadoop 2.0.5.  Any 
thoughts Eli?


Avery

On 6/24/13 5:22 PM, Chuan Lei wrote:
It is not clear to me that whether Zookeeper is required or optional 
to Giraph. I wonder if it is possible to run Giraph without Zookeeper. 
If not the case, would the default Zookeeper work with Giraph? Is 
there anything has to be changed on Zookeeper.


Another question is that I have the following error message when I ran 
Giraph on Hadoop-2.0.5 when I ran the PageRankBenchmark program. I saw 
similar posts on the mailing list, but it seems no clear answer to it 
yet. I would be grateful if someone can answer my question and resolve 
the issue.


Error: java.lang.IllegalStateException: run: Caught an unrecoverable 
exception java.io.FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_1372108933881_0002/_zkServer does not 
exist. at 
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) 
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) Caused 
by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_1372108933881_0002/_zkServer does not 
exist. at 
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:790) 
at 
org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357) 
at 
org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188) 
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60) at 
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) ... 7 
more Caused by: java.io.FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_1372108933881_0002/_zkServer does not 
exist. at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:405) 
at 
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:749) 
... 11 more


Regards,
Chuan




Re: Restarting Algorithm Pattern

2013-06-04 Thread Avery Ching
Rather than use voteToHalt, you could add an Aggregator that kept track 
of the alive vertices and then you can use an Aggregator to store/set 
your configuration value that the Master computation can modify.  Do the 
logic in the Master computation and all should be well.


Avery

On 6/3/13 10:04 PM, David Gainer wrote:


I have an algorithm where I'd like to iterative over the vertices with 
a configuration variable set to some value.  Then, when all the 
vertices vote to halt, I'd like to reduce the configuration variable 
and repeat the inner iteration until some threshold of the 
configuration variable is reached.  I was wondering what the natural 
way of programming that would be.  It seems like a master Computing 
situation -- but I didn't see any method for un-halting vertices.  I 
also wasn't sure when A vertex would ever be able to call its own 
wakeup function.


Thanks,

David





Re: External Documentation about Giraph

2013-05-31 Thread Avery Ching
Improving our documentation is always very nice.  Thanks for doing this 
you two!


On 5/31/13 7:32 PM, Yazan Boshmaf wrote:

Maria, I can help you with this if you are interested and have the
time. If you are busy, please let me know and I will update the site
docs with a variant of your tutorial. Thanks!

On Thu, May 30, 2013 at 4:13 PM, Roman Shaposhnik r...@apache.org wrote:

On Wed, May 29, 2013 at 2:25 PM, Maria Stylianou mars...@gmail.com wrote:

Hello guys,

This semester I'm doing my master thesis using Giraph in a daily basis.
In my blog (marsty5.wordpress.com) I wrote some posts about Giraph, some of
the new users may find them useful!
And maybe some of the experienced ones can give me feedback and correct any
mistakes :D
So far, I described:
1. How to set up Giraph
2. What to do next - after setting up Giraph
3. How to run ShortestPaths
4. How to run PageRank

Good stuff! As a shameless plug, one more way
to install Giraph is via Apache Bigtop. All it takes is
hooking one of these files:
 
http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/label=fedora18/lastSuccessfulBuild/artifact/repo/bigtop.repo
 
http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/label=opensuse12/lastSuccessfulBuild/artifact/repo/bigtop.repo
to your yum/apt system and typing:
$ sudo yum install hadoop-conf-pseudo giraph

In fact we're about to release Bigtop 0.6.0 with Hadoop 2.0.4.1
and Giraph 1.0 -- so anybody's interested in helping us
to test this stuff -- that would be really appreciated.

Thanks,
Roman.

P.S. There's quite a few other platforms available as well:
 
http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/




Re: Extra data on vertex

2013-05-07 Thread Avery Ching
Best way is to add it to the vertex value.  The vertex value is meant to 
store any data associated with a particular vertex.


Hope that helps,

Avery

On 5/7/13 7:47 AM, Ahmet Emre Aladağ wrote:

Hi,

1) What's the best way for storing extra data (such as URL) on a 
vertex? I thought this would be through a class variable but I could 
not find the way to access that variable from the neighbor.
For example I'd like to remove the duplicate edges going towards the 
nodes with the same url (Duplicate Removal phase of LinkRank). How 
can I learn my neighbor's url variable: targetUrl?


2) Is removing edges like this a valid approach?


public class LinkRankVertex extends VertexIntWritable, FloatWritable,
NullWritable, FloatWritable {

public String url;
public void removeDuplicateLinks() {
int targetId;
String targetUrl;

SetString urls = new HashSetString();
ArrayListEdgesIntWritable, NullWritable edges = new 
ArrayListEdgesIntWritable, NullWritable();


for (EdgeIntWritable, NullWritable edge : getEdges()) {
targetId = edge.getTargetVertexId().get();
targetUrl = ...??
if (!urls.contains(targetUrl)) {
urls.add(targetUrl);
edges.add(edge);
}
}
setEdges(edges);
}
}

Thanks,
Emre.





Re: TestJsonBase64Format failure on 1.0.0

2013-05-07 Thread Avery Ching
)
at

org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:126)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
at

org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
at

org.apache.giraph.io.formats.TextVertexInputFormat$TextVertexReader.initialize(TextVertexInputFormat.java:96)
at

org.apache.giraph.io.formats.JsonBase64VertexInputFormat$JsonBase64VertexReader.initialize(JsonBase64VertexInputFormat.java:71)
at

org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:120)
at

org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
at

org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
... 7 more

2013-05-06 09:22:44,485 INFO
org.apache.hadoop.mapred.TaskInProgress: Error from
attempt_201305052325_0013_m_02_0: Task
attempt_201305052325_0013_m_02_0 failed to report status for
602 seconds. Killing!
2013-05-06 09:22:44,485 INFO
org.apache.hadoop.mapred.TaskInProgress: TaskInProgress
task_201305052325_0013_m_02 has failed 1 times.
2013-05-06 09:22:44,485 INFO
org.apache.hadoop.mapred.JobInProgress: Aborting job
job_201305052325_0013
2013-05-06 09:22:44,485 INFO
org.apache.hadoop.mapred.JobInProgress: Killing job
'job_201305052325_0013'

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On *Mon, 5/6/13, Kiru Pakkirisamy
/kirupakkiris...@yahoo.com/*wrote:


From: Kiru Pakkirisamy kirupakkiris...@yahoo.com
Subject: Re: Compiling 1.0.0 distribution
To: user@giraph.apache.org, Avery Ching ach...@apache.org
Date: Monday, May 6, 2013, 12:02 AM

Yes, I am trying to run on my Ubuntu laptop. Let me look at
the log files. Thanks for the help. Much appreciated.

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On *Sun, 5/5/13, Avery Ching /ach...@apache.org/* wrote:


From: Avery Ching ach...@apache.org
Subject: Re: Compiling 1.0.0 distribution
To: user@giraph.apache.org
Cc: Kiru Pakkirisamy kirupakkiris...@yahoo.com
Date: Sunday, May 5, 2013, 11:51 PM

My guess is that you don't have enough workers to run the
job and the master kills the job (i.e. are you running on
a single machine setup?).  You can try to run first with
one worker (this will take 2 map slots - one for the
master and one for the worker).  You can also look at the
logs from map task 0 to see more clearly what the error was.

Avery

On 5/5/13 11:16 PM, Kiru Pakkirisamy wrote:

Yup, I did a mvn3 install and then a mvn3 compile to get
around that already.
Right now, I am trying to run the PageRank, even after a
few runs I have not had one successful run . The maps
progress decreases in percentage (second time around) !!
I have never seen this before (?)

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On *Sun, 5/5/13, Roman Shaposhnik /r...@apache.org/*
wrote:


From: Roman Shaposhnik r...@apache.org
Subject: Re: Compiling 1.0.0 distribution
To: user@giraph.apache.org
Date: Sunday, May 5, 2013, 10:50 PM

To pile on top of that -- you can also run mvn -pl
module-name from the top
level to short-circuit the build to that module (and
yet still honor the dependencies).

Thanks,
Roman.

On Sun, May 5, 2013 at 10:44 PM, Avery Ching
ach...@apache.org wrote:

The easiest way is to compile from the base
directory, which will build everything.

You can build individual directories, but you
have to install the core jars first (i.e. go to
giraph-core and do 'mvn clean install'). Then you
can build the directory of your choice.

Hope that helps,

Avery

On 5/5/13 11:11 AM, Kiru Pakkirisamy wrote:

Hi,
I am unable to compile giraph-examples because
it is not able to reach the core jar files on
the repo. Why doesn't it pick it up from the
root build dir ?

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com
http

Re: Compiling 1.0.0 distribution

2013-05-06 Thread Avery Ching
My guess is that you don't have enough workers to run the job and the 
master kills the job (i.e. are you running on a single machine setup?).  
You can try to run first with one worker (this will take 2 map slots - 
one for the master and one for the worker).  You can also look at the 
logs from map task 0 to see more clearly what the error was.


Avery

On 5/5/13 11:16 PM, Kiru Pakkirisamy wrote:
Yup, I did a mvn3 install and then a mvn3 compile to get around that 
already.
Right now, I am trying to run the PageRank, even after a few runs I 
have not had one successful run . The maps progress decreases in 
percentage (second time around) !! I have never seen this before (?)


Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On *Sun, 5/5/13, Roman Shaposhnik /r...@apache.org/* wrote:


From: Roman Shaposhnik r...@apache.org
Subject: Re: Compiling 1.0.0 distribution
To: user@giraph.apache.org
Date: Sunday, May 5, 2013, 10:50 PM

To pile on top of that -- you can also run mvn -pl module-name
from the top
level to short-circuit the build to that module (and yet still
honor the dependencies).

Thanks,
Roman.

On Sun, May 5, 2013 at 10:44 PM, Avery Ching ach...@apache.org
/mc/compose?to=ach...@apache.org wrote:

The easiest way is to compile from the base directory, which
will build everything.

You can build individual directories, but you have to install
the core jars first (i.e. go to giraph-core and do 'mvn clean
install').  Then you can build the directory of your choice.

Hope that helps,

Avery

On 5/5/13 11:11 AM, Kiru Pakkirisamy wrote:

Hi,
I am unable to compile giraph-examples because it is not able
to reach the core jar files on the repo. Why doesn't it pick
it up from the root build dir ?

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com
http://webcloudtech.wordpress.com








[VOTE] Release Giraph 1.0 (rc0)

2013-04-12 Thread Avery Ching

Fellow Giraphers,

We have a our first release candidate since graduating from incubation.  
This is a source release, primarily due to the different versions of 
Hadoop we support with munge (similar to the 0.1 release).  Since 0.1, 
we've made A TON of progress on overall performance, optimizing memory 
use, split vertex/edge inputs, easy interoperability with Apache Hive, 
and a bunch of other areas.  In many ways, this is an almost totally 
different codebase.  Thanks everyone for your hard work!


Apache Giraph has been running in production at Facebook (against 
Facebook's Corona implementation of Hadoop - 
https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona) 
since around last December.  It has proven to be very scalable, 
performant, and enables a bunch of new applications.  Based on the 
drastic improvements and the use of Giraph in production, it seems 
appropriate to bump up our version to 1.0.


While anyone can vote, the ASF requires majority approval from the PMC 
-- i.e., at least three PMC members must vote affirmatively for release, 
and there must be more positive than negative votes. Releases may not be 
vetoed. Before voting +1 PMC members are required to download the signed 
source code package, compile it as provided, and test the resulting 
executable on their own platform, along with also verifying that the 
package meets the requirements of the ASF policy on releases.


Please test this against many other Hadoop versions and let us know how 
this goes!


Release notes:
http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html

Release artifacts:
http://people.apache.org/~aching/giraph-1.0-RC0/

Corresponding git tag:
https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0

Signing keys:
http://people.apache.org/keys/group/giraph.asc

The vote runs for 72 hours, until Monday 4pm PST.

Thanks everyone for your patience with this release!

Avery


Re: about fault tolerance in Giraph

2013-03-18 Thread Avery Ching

Hi Yuanyuan,

We haven't tested this feature in a while.  But it should work. What did 
the job report about why it failed?


Avery

On 3/18/13 10:22 AM, Yuanyuan Tian wrote:

Can anyone help me answer the question?

Yuanyuan



From: Yuanyuan Tian/Almaden/IBM@IBMUS
To: user@giraph.apache.org
Date: 03/15/2013 02:05 PM
Subject: about fault tolerance in Giraph




Hi

I was testing the fault tolerance of Giraph on a long running job. I 
noticed that when one of the worker throw an exception, the whole job 
failed without retrying the task, even though I turned on the 
checkpointing and there were available map slots in my cluster. Why 
wasn't the fault tolerance mechanism working?


I was running a version of Giraph downloaded sometime in June 2012 and 
I used Netty for the communication layer.


Thanks,

Yuanyuan




Re: Congrats to our newest PMC member, Eli Reisman

2013-03-16 Thread Avery Ching

Congrats Eli!

On 3/15/13 9:03 PM, Eli Reisman wrote:
Thanks! I look forward to many more enjoyable toils in the future! 
Send the decoder ring. I'm already wearing the robe ;)





On Fri, Mar 15, 2013 at 2:07 PM, Alessandro Presta alessan...@fb.com 
mailto:alessan...@fb.com wrote:


Well done, Eli!

Sent from my iPhone

On Mar 15, 2013, at 2:04 PM, Jakob Homan jghoman@gmail.com
mailto:jghoman@gmail.com wrote:

 I'm happy to announce that the Apache Giraph PMC has elected Eli
Reisman to
 the PMC in recognition of his sustained and substantial
contributions over
 the past year.  Most recently, he's been toiling away at getting
Giraph
 onto YARN, which is a huge win.

 Congrats, Eli.  Your robe and secret decoder ring are in the mail.

 -Jakob
 on behalf of the Giraph PMC






Re: Zookeeper exception while running SimpleShortestPathsVertexTest

2013-03-09 Thread Avery Ching
I think those are info level logs rather than actual issues.  If your 
job completes successfully, I wouldn't worry about it.


On 3/8/13 12:31 PM, Ameet Kini wrote:

Hi folks,

I am trying to run the SimpleShortestPathsVertexTest example
introduced by the unit testing tool as part of
(https://issues.apache.org/jira/browse/GIRAPH-51) and see the below
zookeeper exception while running the testToyData method. I can run
giraph applications from the command-line and have confirmed that the
worker node can bring up zookeeper ok. Is there a configuration step I
am missing while running the unit test tool?

Thanks,
Ameet



[14:56:25]  INFO: [ZooKeeperServerMain] Starting server
[14:56:25]  INFO: [GiraphJob] run: Since checkpointing is disabled
(default), do not allow any task retries (setting
mapred.map.max.attempts = 0, old value = 4)
[14:56:25]  INFO: [ZooKeeperServer] Server
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
GMT
[14:56:25]  INFO: [ZooKeeperServer] Server environment:host.name=dodo
[14:56:25]  INFO: [ZooKeeperServer] Server environment:java.version=1.6.0_24
[14:56:25]  INFO: [ZooKeeperServer] Server environment:java.vendor=Sun
Microsystems Inc.
[14:56:25]  INFO: [ZooKeeperServer] Server
environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
[14:56:25]  INFO: [ZooKeeperServer] Server
environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/a\
md64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib\
/jni:/lib:/usr/lib
[14:56:25]  INFO: [ZooKeeperServer] Server environment:java.io.tmpdir=/tmp
[14:56:25]  INFO: [ZooKeeperServer] Server environment:java.compiler=NA
[14:56:25]  INFO: [ZooKeeperServer] Server environment:os.name=Linux
[14:56:25]  INFO: [ZooKeeperServer] Server environment:os.arch=amd64
[14:56:25]  INFO: [ZooKeeperServer] Server
environment:os.version=3.2.0-29-generic
[14:56:25]  INFO: [ZooKeeperServer] Server environment:user.name=akini
[14:56:25]  INFO: [ZooKeeperServer] Server environment:user.home=/home/akini
[14:56:25]  INFO: [ZooKeeperServer] Server
environment:user.dir=/home/jakini/workspace/giraph_test
[14:56:25]  INFO: [ZooKeeperServer] tickTime set to 2000
[14:56:25]  INFO: [ZooKeeperServer] minSessionTimeout set to 1
[14:56:25]  INFO: [ZooKeeperServer] maxSessionTimeout set to 10
[14:56:25]  WARN: [JobClient] Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
[14:56:25]  INFO: [NIOServerCnxn] binding to port 0.0.0.0/0.0.0.0:22182
[14:56:25]  INFO: [FileTxnSnapLog] Snapshotting: 0
[14:56:25]  INFO: [JobClient] Running job: job_201303070954_0007
[14:56:26]  INFO: [JobClient]  map 0% reduce 0%
[14:56:30]  INFO: [NIOServerCnxn] Accepted socket connection from
/127.0.0.1:43076
[14:56:30]  INFO: [NIOServerCnxn] Client attempting to establish new
session at /127.0.0.1:43076
[14:56:30]  INFO: [FileTxnLog] Creating new log file: log.1
[14:56:31]  INFO: [NIOServerCnxn] Established session
0x13d4b936bc3 with negotiated timeout 6 for client
/127.0.0.1:43076
[14:56:31]  INFO: [NIOServerCnxn] Accepted socket connection from
/127.0.0.1:43077
[14:56:31]  INFO: [NIOServerCnxn] Client attempting to establish new
session at /127.0.0.1:43077
[14:56:31]  INFO: [PrepRequestProcessor] Got user-level
KeeperException when processing sessionid:0x13d4b936bc3
type:create cxid:0x1 zxid:0xfffe txntype:un\
known reqpath:n/a Error
Path:/_hadoopBsp/job_201303070954_0007/_masterElectionDir
Error:KeeperErrorCode = NoNode for
/_hadoopBsp/job_201303070954_0007/_masterElectionDir
[14:56:31]  INFO: [NIOServerCnxn] Established session
0x13d4b936bc30001 with negotiated timeout 6 for client
/127.0.0.1:43077
[14:56:31]  INFO: [PrepRequestProcessor] Got user-level
KeeperException when processing sessionid:0x13d4b936bc30001
type:create cxid:0x1 zxid:0xfffe txntype:un\
known reqpath:n/a Error
Path:/_hadoopBsp/job_201303070954_0007/_masterJobState
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/job_201303070954_0007/_masterJobState
[14:56:31]  INFO: [PrepRequestProcessor] Got user-level
KeeperException when processing sessionid:0x13d4b936bc3
type:create cxid:0xc zxid:0xfffe txntype:un\
known reqpath:n/a Error
Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir/0
Error:KeeperErrorCode = NoNode for
/_hadoopBsp/job_201303070954_0007/_applicationA\
ttemptsDir/0
[14:56:31]  INFO: [PrepRequestProcessor] Got user-level
KeeperException when processing sessionid:0x13d4b936bc30001
type:create cxid:0x3 zxid:0xfffe txntype:un\
known reqpath:n/a Error
Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/job_201303070954_0007/_applicatio\
nAttemptsDir
[14:56:31]  INFO: [PrepRequestProcessor] Got user-level
KeeperException when 

Re: Using HiveGiraphRunner with dependencies

2013-01-17 Thread Avery Ching
Yeah, this is where things get a bit tricky.  You'll have to experiment 
with what works for you, but we are using Hive to launch the job with 
the jar.sh script.  This gets the environment straight from the Hive side.


jar_help () {
  echo Used for applications that require Hadoop and Hive classpath 
and environment.

  echo ./hive --service jar yourjar yourclass HIVE_OPTS your_args
}

Avery

On 1/17/13 4:49 PM, pradeep kumar wrote:


Hi,

Actually we are trying to use giraph in our project for graph analysis 
with hive, so far it was good build was successful shortestpath 
example ran fine but working with hive is been a real issue.  we 
started with command line


hadoop jar giraph-hcatalog-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.io.hcatalog.HiveGiraphRunner -db default 
-vertexClass org.apache.giraph.vertex.Vertex -vertexInputFormatClass 
org.apache.giraph.io.hcatalog.HCatalogVertexInputFormat 
-vertexOutputFormatClass 
org.apache.giraph.io.hcatalog.HCatalogVertexOutputFormat -w 1 -vi 
testinput -o testoutput -hiveconf 
javax.jdo.option.ConnectionURL=jdbc:mysql://localhost/metastore 
-hiveconf javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver 
-hiveconf javax.jdo.option.ConnectionUserName=root -hiveconf 
javax.jdo.option.ConnectionPassword=root -hiveconf 
datanucleus.autoCreateSchema=false -hiveconf 
datanucleus.fixedDatastore=true


is it a wrong way of doing it.. because we are running into exception 
while doing so..


and if its wrong,

then any suggestion on how can we proceed will be a great help.

Regards,

Pradeep





Re: Code failing for the large data

2013-01-10 Thread Avery Ching

This looks like 0.1 (still using Hadoop RPC).  Please try trunk instead.

Avery

On 1/10/13 1:09 AM, pankaj Gulhane wrote:

Hi,

My code is working on smaller (very very small) dataset but if I use 
the same code on the large dataset it fails.


Following code is some basic implementation of naive PageRank (just 
for testing). When I run with 4-5 vertices it works properly but when 
run for thousands of vertices it fails with the following error


error
java.lang.IllegalStateException: run: Caught an unrecoverable 
exception setup: Offlining servers due to exception...

at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)

at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.RuntimeException: setup: Offlining servers due to 
exception...

at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
... 7 more
Caused by: java.lang.IllegalStateException: setup: loadVertices failed
at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)

at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
... 8 more
Caused by: java.lang.NullPointerException
at 
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:817)
at 
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)

... 9 more
/error


code
public class PageRank implements Tool{
/** Configuration from Configurable */
private Configuration conf;
public static String SUPERSTEP_COUNT = 
PageRankBenchmark.superstepCount;


public static class PageRankHashMapVertex extends HashMapVertex
LongWritable, DoubleWritable, DoubleWritable, 
DoubleWritable {



@Override
public void compute(IteratorDoubleWritable msgIterator) {

if (getSuperstep() = 1) {
double sum = 0;
while (msgIterator.hasNext()) {
sum += msgIterator.next().get();
}
DoubleWritable vertexValue =
new DoubleWritable((0.15f / getNumVertices()) 
+ 0.85f *

sum);
setVertexValue(vertexValue);
}

if (getSuperstep()  getConf().getInt(SUPERSTEP_COUNT,4)) {
long edges = getNumOutEdges();
sendMsgToAllEdges(
new DoubleWritable(getVertexValue().get() / 
edges));

}

voteToHalt();
}
}

@Override
public Configuration getConf() {
return conf;
}

@Override
public void setConf(Configuration conf) {
this.conf = conf;
}

@Override
public int run(String[] args) throws Exception {
GiraphJob job = new GiraphJob(getConf(), getClass().getName());

// job.setJarByClass(getClass());
job.setVertexClass(PageRankHashMapVertex.class);

job.setVertexInputFormatClass(LongDoubleDoubleAdjacencyListVertexInputFormat.class);
job.setVertexOutputFormatClass(IdWithValueTextOutputFormat.class);

job.setWorkerConfiguration(200, 200, 100.0f);
job.setJobName(Testing PG);

job.getConfiguration().setInt(SUPERSTEP_COUNT, 2);


FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));


return (job.run(true) == true ? 0 : 1);
}

public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new PageRank(), args));
}
}

/code

Any pointers/help on the mistake I may be doing  would be great?

Thanks,
Pankaj

PS: I am running on a cluster with more than 400 mapper slots.




Re: Breadth-first search

2012-12-11 Thread Avery Ching
We are running several Giraph applications in production using our 
version of Hadoop (Corona) at Facebook.  The part you have to be careful 
about is ensuring you have enough resources for your job to run.  But 
otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many 
more edges).


Avery

On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:

Hi:

I implemented a graph algorithm to recommend content to our users. 
Although it is working (implementation uses Mahout) it very 
inefficient because I have to run many iterations in order to perform 
a breadth-first search on my graph.
I would like to use Giraph for that task. I would like to know if it 
is production ready. I'm running jobs on Amazon EMR.


Thanks in advance.
Gustavo




Re: What a worker really is and other interesting runtime information

2012-11-28 Thread Avery Ching
Oh, forgot one thing.  You need to set the number of partitions to use 
single each thread works on a single partition at a time.


Try -Dhash.userPartitionCount=number of threads

On 11/28/12 5:29 AM, Alexandros Daglis wrote:

Dear Avery,

I followed your advice, but the application seems to be totally 
thread-count-insensitive: I literally observe zero scaling of 
performance, while I increase the thread count. Maybe you can point 
out if I am doing something wrong.


- Using only 4 cores on a single node at the moment
- Input graph: 14 million vertices, file size is 470 MB
- Running SSSP as follows: hadoop jar 
target/giraph-0.1-jar-with-dependencies.jar 
org.apache.giraph.examples.SimpleShortestPathsVertex 
-Dgiraph.SplitMasterWorker=false -Dgiraph.numComputeThreads=X input 
output 12 1

where X=1,2,3,12,30
- I notice a total insensitivity to the number of thread I specify. 
Aggregate core utilization is always approximately the same (usually 
around 25-30% = only one of the cores running) and overall execution 
time is always the same (~8 mins)


Why is Giraph's performance not scaling? Is the input size / number of 
workers inappropriate? It's not an IO issue either, because even 
during really low core utilization, time is wasted on idle, not on IO.


Cheers,
Alexandros



On 28 November 2012 11:13, Alexandros Daglis 
alexandros.dag...@epfl.ch mailto:alexandros.dag...@epfl.ch wrote:


Thank you Avery, that helped a lot!

Regards,
Alexandros


On 27 November 2012 20:57, Avery Ching ach...@apache.org
mailto:ach...@apache.org wrote:

Hi Alexandros,

The extra task is for the master process (a coordination
task). In your case, since you are using a single machine, you
can use a single task.

-Dgiraph.SplitMasterWorker=false

and you can try multithreading instead of multiple workers.

-Dgiraph.numComputeThreads=12

The reason why cpu usage increases is due to netty threads to
handle network requests.  By using multithreading instead, you
should bypass this.

Avery


On 11/27/12 9:40 AM, Alexandros Daglis wrote:

Hello everybody,

I went through most of the documentation I could find for
Giraph and also most of the messages in this email list,
but still I have not figured out precisely what a worker
really is. I would really appreciate it if you could help
me understand how the framework works.

At first I thought that a worker has a one-to-one
correspondence to a map task. Apparently this is not
exactly the case, since I have noticed that if I ask for x
workers, the job finishes after having used x+1 map tasks.
What is this extra task for?

I have been trying out the example SSSP application on a
single node with 12 cores. Giving an input graph of ~400MB
and using 1 worker, around 10 GBs of memory are used
during execution. What intrigues me is that if I use 2
workers for the same input (and without limiting memory
per map task), double the memory will be used.
Furthermore, there will be no improvement in performance.
I rather notice a slowdown. Are these observations normal?

Might it be the case that 1 and 2 workers are very few and
I should go to the 30-100 range that is the proposed
number of mappers for a conventional MapReduce job?

Finally, a last observation. Even though I use only 1
worker, I see that there are significant periods during
execution where up to 90% of the 12 cores computing power
is consumed, that is, almost 10 cores are used in
parallel. Does each worker spawn multiple threads and
dynamically balances the load to utilize the available
hardware?

Thanks a lot in advance!

Best,
Alexandros









Re: What a worker really is and other interesting runtime information

2012-11-27 Thread Avery Ching

Hi Alexandros,

The extra task is for the master process (a coordination task). In your 
case, since you are using a single machine, you can use a single task.


-Dgiraph.SplitMasterWorker=false

and you can try multithreading instead of multiple workers.

-Dgiraph.numComputeThreads=12

The reason why cpu usage increases is due to netty threads to handle 
network requests.  By using multithreading instead, you should bypass this.


Avery

On 11/27/12 9:40 AM, Alexandros Daglis wrote:

Hello everybody,

I went through most of the documentation I could find for Giraph and 
also most of the messages in this email list, but still I have not 
figured out precisely what a worker really is. I would really 
appreciate it if you could help me understand how the framework works.


At first I thought that a worker has a one-to-one correspondence to a 
map task. Apparently this is not exactly the case, since I have 
noticed that if I ask for x workers, the job finishes after having 
used x+1 map tasks. What is this extra task for?


I have been trying out the example SSSP application on a single node 
with 12 cores. Giving an input graph of ~400MB and using 1 worker, 
around 10 GBs of memory are used during execution. What intrigues me 
is that if I use 2 workers for the same input (and without limiting 
memory per map task), double the memory will be used. Furthermore, 
there will be no improvement in performance. I rather notice a 
slowdown. Are these observations normal?


Might it be the case that 1 and 2 workers are very few and I should go 
to the 30-100 range that is the proposed number of mappers for a 
conventional MapReduce job?


Finally, a last observation. Even though I use only 1 worker, I see 
that there are significant periods during execution where up to 90% of 
the 12 cores computing power is consumed, that is, almost 10 cores are 
used in parallel. Does each worker spawn multiple threads and 
dynamically balances the load to utilize the available hardware?


Thanks a lot in advance!

Best,
Alexandros






Re: java.net.ConnectException: Connection refused

2012-10-17 Thread Avery Ching
The connect exception is fine, it usually takes more than one connect 
attempt to zk.  The reason your job failed is due to not having enough 
simultaneous map tasks on your Hadoop instance.


See http://svn.apache.org/repos/asf/giraph/trunk/README for details on 
running in pseudo-distributed mode.


Avery

On 10/17/12 11:09 AM, rodrigo zerbini wrote:

Hello, everybody.

I'm trying to run the shortest paths example with the command below:

hadoop jar 
giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar 
org.apache.giraph.GiraphRunner 
org.apache.giraph.examples.SimpleShortestPathsVertex -if 
org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexInputFormat -ip 
shortestPathsInputGraph -of 
org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexOutputFormat -op 
shortestPathsOutputGraph -w 3


However, it didn't work. In jobtracker I found that some jobs failed. 
I had 4 killed tasks. Below you can see the log of the first task. I 
got a ConnectException. Does anyone have some ideia why this 
connection was refused? Thanks in advance.



2012-10-16 17:40:40,788 WARN org.apache.hadoop.util.NativeCodeLoader: 
Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
2012-10-16 17:40:42,331 WARN 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi 
already exists!
2012-10-16 17:40:44,019 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin : null
2012-10-16 17:40:44,729 INFO org.apache.giraph.graph.GraphMapper: 
setup: Set log level to info
2012-10-16 17:40:44,729 INFO org.apache.giraph.graph.GraphMapper: 
Distributed cache is empty. Assuming fatjar.
2012-10-16 17:40:44,729 INFO org.apache.giraph.graph.GraphMapper: 
setup: classpath @ 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/jars/job.jar
2012-10-16 17:40:45,514 INFO org.apache.giraph.zk.ZooKeeperManager: 
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/job_201210161739_0001
2012-10-16 17:40:45,531 INFO org.apache.giraph.zk.ZooKeeperManager: 
createCandidateStamp: Creating my filestamp 
_bsp/_defaultZkManagerDir/job_201210161739_0001/_task/practivate.adobe.com 
http://practivate.adobe.com 0
2012-10-16 17:40:47,160 INFO org.apache.giraph.zk.ZooKeeperManager: 
getZooKeeperServerList: Got [practivate.adobe.com 
http://practivate.adobe.com] 1 hosts from 1 candidates when 1 
required (polling period is 3000) on attempt 0
2012-10-16 17:40:47,233 INFO org.apache.giraph.zk.ZooKeeperManager: 
createZooKeeperServerList: Creating the final ZooKeeper file 
'_bsp/_defaultZkManagerDir/job_201210161739_0001/zkServerList_practivate.adobe.com 
http://zkServerList_practivate.adobe.com 0 '
2012-10-16 17:40:48,029 INFO org.apache.giraph.zk.ZooKeeperManager: 
getZooKeeperServerList: For task 0, got file 
'zkServerList_practivate.adobe.com 
http://zkServerList_practivate.adobe.com 0 ' (polling period is 3000)
2012-10-16 17:40:48,030 INFO org.apache.giraph.zk.ZooKeeperManager: 
getZooKeeperServerList: Found [practivate.adobe.com 
http://practivate.adobe.com, 0] 2 hosts in filename 
'zkServerList_practivate.adobe.com 
http://zkServerList_practivate.adobe.com 0 '
2012-10-16 17:40:48,142 INFO org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Trying to delete old directory 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper
2012-10-16 17:40:48,300 INFO org.apache.giraph.zk.ZooKeeperManager: 
generateZooKeeperConfigFile: Creating file 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper/zoo.cfg 
in 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper 
with base port 22181
2012-10-16 17:40:48,300 INFO org.apache.giraph.zk.ZooKeeperManager: 
generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
2012-10-16 17:40:48,300 INFO org.apache.giraph.zk.ZooKeeperManager: 
generateZooKeeperConfigFile: Delete of zoo.cfg = false
2012-10-16 17:40:48,643 INFO org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Attempting to start ZooKeeper server with 
command 
[/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java, 
-Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, 
-XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/jars/job.jar, 
org.apache.zookeeper.server.quorum.QuorumPeerMain, 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper/zoo.cfg] 
in directory 
/tmp/hadoop-ro/mapred/local/taskTracker/ro/jobcache/job_201210161739_0001/work/_bspZooKeeper
2012-10-16 17:40:48,803 INFO org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect 
to practivate.adobe.com:22181 http://practivate.adobe.com:22181 with 
poll msecs = 3000
2012-10-16 17:40:48,946 WARN org.apache.giraph.zk.ZooKeeperManager: 

Re: Giraph with DB system

2012-10-05 Thread Avery Ching

Answers inline.

On 10/5/12 1:58 AM, Gergely Svigruha wrote:

Hi,

I have a few questions regarding Giraph.

1) Is is possible to use Giraph for local traversals in the graph? For 
example if I want to do some computing on the neighbours of the node 
with id xy is it possible to get the reference of the xy vertex (or 
just send a message to it) then send some messages to its neighbours 
etc, but not do any computation on any other vertices?




In my opinion, this is was Graph DBs are for, not a large-scale batch 
processing system like Giraph.

2) Is it possible to combine Giraph with HBase or any other DBMS?

Yes, Giraph can use HBase or another DBMS as a backend storage system 
(see giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/)
3) Is it possible to run Giraph on a server continuously after the 
graph has been built then process several jobs based on request? Or 
Giraph only can be interpreted in the context of a (one) Hadoop job.



Again, think of Giraph as a batch processing system.


Thanks, and please set me strait if I completely misunderstand something!


Hope this helps!


Greg




Re: Getting SimpleTriangleClosingVertex to run

2012-09-24 Thread Avery Ching

I don't think the types are compatible.

public class SimpleTriangleClosingVertex extends EdgeListVertex
  IntWritable, SimpleTriangleClosingVertex.IntArrayListWritable,
  NullWritable, IntWritable

You'll need to use an input format and output format that fits these 
types.  Otherwise the issue is likely to be 
serialization/deserialization here.


On 9/23/12 10:44 PM, Vernon Thommeret wrote:

I'm trying to get the SimpleTriangleClosingVertex to run, but getting
this error:

java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: IPC
server unable to read call parameters: null
at 
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:923)
at 
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:604)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:377)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:578)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.ipc.RemoteException: IPC server

This is the diff that causes the issue:

@@ -33,7 +33,7 @@ import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.io.IntWritable;

  import org.apache.giraph.graph.GiraphJob;
-import org.apache.giraph.graph.IntIntNullIntVertex;
+import org.apache.giraph.examples.SimpleTriangleClosingVertex;
  import org.apache.giraph.io.IntIntNullIntTextInputFormat;
  import org.apache.giraph.io.AdjacencyListTextVertexOutputFormat;

@@ -44,16 +44,12 @@ import org.apache.log4j.Logger;
  /**
   * Simple function to return the in degree for each vertex.
   */
-public class SharedConnectionsVertex extends IntIntNullIntVertex
implements Tool {
+public class SharedConnections implements Tool {

private Configuration conf;
private static final Logger LOG =
Logger.getLogger(SharedConnections.class);

-  public void compute(IterableIntWritable messages) {
-voteToHalt();
-  }
-
@Override
public final int run(final String[] args) throws Exception {
  Options options = new Options();
@@ -71,7 +67,7 @@ public class SharedConnections extends
IntIntNullIntVertex implements Tool {

  GiraphJob job = new GiraphJob(getConf(), getClass().getName());

-job.setVertexClass(SharedConnections.class);
+job.setVertexClass(SimpleTriangleClosingVertex.class);
  job.setVertexInputFormatClass(IntIntNullIntTextInputFormat.class);
  job.setVertexOutputFormatClass(AdjacencyListTextVertexOutputFormat.class);
  job.setWorkerConfiguration(10, 10, 100.0f);

--

I.e. I have a dummy job that just outputs the vertices which works,
but trying to switch the vertex class doesn't seem to work. I'm
running the latest version of Giraph (rev 1388628). Should this work
or should I try something different?

Thanks!
Vernon




Please welcome our newest committer, Maja!

2012-09-21 Thread Avery Ching
The Giraph PMC has voted to extend Maja Kabiljo an offer to be a Giraph 
committer and she has graciously accepted!.  Maja has been doing some 
amazing work in out-of-core messaging and improving aggregators.  Here 
are a list of some of her contributions.


  GIRAPH-327: Timesout values in BspServiceMaster.barrierOnWorkerList
  (majakabiljo via ereisman)

  GIRAPH-323: Check if requests are done before calling wait 
(majakabiljo via ereisman)


  GIRAPH-298: Reduce timeout for TestAutoCheckpoint. (majakabiljo via
  aching)

  GIRAPH-317: Add subpackages to comm (Maja Kabiljo via ereisman)

  GIRAPH-313: Open Netty client and server on master. (majakabiljo via
  aching)

  GIRAPH-303: Regression: cleanup phase happens earlier than it
  should. (majakabiljo via apresta)

  GIRAPH-296: TotalNumVertices and TotalNumEdges are not saved in
  checkpoint.  (majakabiljo via apresta)

  GIRAPH-297: Checkpointing on master is done one superstep later
  (majakabiljo via aching).

  GIRAPH-259: TestBspBasic.testBspPageRank is broken (majakabiljo via
  apresta)

  GIRAPH-287: Add option to limit the number of open requests.
  (Maja Kabiljo via jghoman)

  GIRAPH-45: Improve the way to keep outgoing messages (majakabiljo
  via aching).

  GIRAPH-266: Average aggregators don't calculate real average
  (majakabiljo via aching).

  GIRAPH-257: TestBspBasic.testBspMasterCompute is broken (majakabiljo
  via aching).

  GIRAPH-81: Create annotations on provided algorithms for cli
  (majakabiljo via aching).

In the spirit of your first commit, Maja, please take a look at 
https://issues.apache.org/jira/browse/GIRAPH-335 .


Welcome Maja and happy Giraphing!

Avery Ching


Re: reason behind a java.io.EOFException

2012-09-11 Thread Avery Ching
)
at java.lang.Thread.run(Thread.java:680)

On Tue, Sep 11, 2012 at 7:53 AM, Avery Ching ach...@apache.org wrote:

These days we are focusing more on the netty IPC.  Can you try
-Dgiraph.useNetty=true?

Avery


On 9/10/12 2:08 PM, Franco Maria Nardini wrote:

Dear all,

I am working with Giraph 0.2/Hadoop 1.0.3. In particular, I am trying
to execute the following code:

hadoop jar
giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimplePageRankVertex \
-w 2 \
-if
org.apache.giraph.examples.SimplePageRankVertex\$SimplePageRankVertexInputFormat
-ip bigGraph.txt \
-of org.apache.giraph.io.IdWithValueTextOutputFormat -op output \
-mc
org.apache.giraph.examples.SimplePageRankVertex\$HDFSBasedPageRankVertexMasterCompute

If I set the number of workers equal to two, one of the mappers produce:

ava.lang.RuntimeException: java.io.IOException: Call to
zipottero.local/172.20.10.3:30001 failed on local exception:
java.io.EOFException
 at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:923)
 at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
 at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:604)
 at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:377)
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:578)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Call to
zipottero.local/172.20.10.3:30001 failed on local exception:
java.io.EOFException
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
 at org.apache.hadoop.ipc.Client.call(Client.java:1075)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
 at $Proxy3.putVertexList(Unknown Source)
 at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:920)
 ... 11 more
Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)

while it perfectly works if the number of workers is set to 1. I am
experiencing the problem both on small and big graphs.

Any idea of the reasons behind this behavior?

Thanks a lot in advance.

Best,

FM
--
Franco Maria Nardini

High Performance Computing Laboratory
Istituto di Scienza e Tecnologie dell’Informazione (ISTI)
Consiglio Nazionale delle Ricerche (CNR)
Via G. Moruzzi, 1
56124, Pisa, Italy

Phone: +39 050 315 3496
Fax: +39 050 315 2040
Mail: francomaria.nard...@isti.cnr.it
Skype: francomaria.nardini
Web: http://hpc.isti.cnr.it/~nardini/






Re: reason behind a java.io.EOFException

2012-09-10 Thread Avery Ching
These days we are focusing more on the netty IPC.  Can you try 
-Dgiraph.useNetty=true?


Avery

On 9/10/12 2:08 PM, Franco Maria Nardini wrote:

Dear all,

I am working with Giraph 0.2/Hadoop 1.0.3. In particular, I am trying
to execute the following code:

hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimplePageRankVertex \
-w 2 \
-if 
org.apache.giraph.examples.SimplePageRankVertex\$SimplePageRankVertexInputFormat
-ip bigGraph.txt \
-of org.apache.giraph.io.IdWithValueTextOutputFormat -op output \
-mc 
org.apache.giraph.examples.SimplePageRankVertex\$HDFSBasedPageRankVertexMasterCompute

If I set the number of workers equal to two, one of the mappers produce:

ava.lang.RuntimeException: java.io.IOException: Call to
zipottero.local/172.20.10.3:30001 failed on local exception:
java.io.EOFException
at 
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:923)
at 
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:604)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:377)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:578)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Call to
zipottero.local/172.20.10.3:30001 failed on local exception:
java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy3.putVertexList(Unknown Source)
at 
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionRequest(BasicRPCCommunications.java:920)
... 11 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)

while it perfectly works if the number of workers is set to 1. I am
experiencing the problem both on small and big graphs.

Any idea of the reasons behind this behavior?

Thanks a lot in advance.

Best,

FM
--
Franco Maria Nardini

High Performance Computing Laboratory
Istituto di Scienza e Tecnologie dell’Informazione (ISTI)
Consiglio Nazionale delle Ricerche (CNR)
Via G. Moruzzi, 1
56124, Pisa, Italy

Phone: +39 050 315 3496
Fax: +39 050 315 2040
Mail: francomaria.nard...@isti.cnr.it
Skype: francomaria.nardini
Web: http://hpc.isti.cnr.it/~nardini/




Re: question about Pagerank example

2012-07-28 Thread Avery Ching
PageRankBenchmark doesn't have an use an output format.  If you'd like 
to see the output, just add a VertexOutputFormat (that matches the 
types).  You could start with JsonBase64VertexOutputFormat.


i.e. in PageRankBenchmark.java add

job.setVertexOutputFormatClass( JsonBase64VertexOutputFormat.class);


On 7/27/12 5:10 PM, Amir R Abdolrashidi wrote:

Hi everyone,

I am not sure whether this is right question or not but does anyone 
know if we can see the output of PageRankBenchmark example that is 
provided on the tuotial?


Thanks

-Amir




Re: Adding rb to approved email addresses?

2012-07-16 Thread Avery Ching
I tried adding the from emails to the d...@giraph.apache.org mailing 
list.  Shouldn't that work?


On 7/16/12 12:17 PM, Jakob Homan wrote:

I don't believe so.  The from list seems reasonable on each one:
-- Forwarded message --
From: Avery Ching avery.ch...@gmail.com
To: Avery Ching avery.ch...@gmail.com
Cc: giraph giraph-...@incubator.apache.org, Alessandro Presta
alessan...@fb.com

On Mon, Jul 16, 2012 at 12:15 PM, Owen O'Malley omal...@apache.org wrote:


On Mon, Jul 16, 2012 at 12:02 PM, Jakob Homan jgho...@gmail.com wrote:

Anyone know what needs to be done to get the automated messages
reviewboard is sending out whitelisted on the dev list?  We're getting
moderation requests for every one...


Usually, if you use reply-all, it will bless that sender. Is each user
showing up as a different sender?

-- Owen





Re: Suggestions on problem sizes for giraph performance benchmarking

2012-07-10 Thread Avery Ching
You should try using the appropriate memory settings (i.e.  
-Dmapred.child.java.opts=-Xms30g -Xmx30g -Xss128k) for a 30 GB heap.  
This depends on how much memory you can get.


Avery

On 7/9/12 5:57 AM, Amani Alonazi wrote:
Actually, I had the same problem of running out of memory with Giraph 
when trying to implement strongly connected components algorithm on 
Giraph. My input graph is 1 million nodes and 7 million edges.


I'm using cluster of 21 computers.


On Mon, Jul 9, 2012 at 3:44 PM, Benjamin Heitmann 
benjamin.heitm...@deri.org mailto:benjamin.heitm...@deri.org wrote:



Hello Stephen,

sorry for the very late reply.

On 28 Jun 2012, at 02:50, Fleischman, Stephen (ISS SCI - Plano TX)
wrote:


Hello Avery and all:

I have a cluster of 10  two-processor/48 GB RAM servers, upon
which we are conducting Hadoop performance characterization
tests.  I plan to use the Giraph pagerank and simple shortest
path example tests as part of this exercise and would appreciate
guidance on problem sizes for both tests.  I’m looking at paring
down an obfuscated Twitter dataset and it would save a lot of
time if someone has some knowledge on roughly how the time and
memory scales with number of nodes in a graph.




I can provide some suggestions for the kind of algorithm and data
which does currently surpass the scalability of giraph.

While the limits to my knowledge of Giraph and Hadoop are probably
also to blame for this, please see the recent discussions on this
list,
and on JIRA for other indications that the scalability of Giraph
needs improvement:
* post  by Yuanyuan Tian in the thread wierd communication
errors on user@giraph.apache.org mailto:user@giraph.apache.org
* GIRAPH-234 about GC overhead

https://issues.apache.org/jira/browse/GIRAPH-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

If you want to stretch the limits of Giraph, then you need to try
an algorithm which is conceptually different from PageRank, and
you need a big data set.
If you use an algorithm which has complex application logic (maybe
even domain specific logic), which needs to be embedded in the
algorithm,
then the nodes need to have a lot of state. In addition, such
algorithms probably send around a lot of messages, and each of the
messages might have a payload
which is more complex then one floating point number. In addition,
it helps to have a graph format, which requires strings on the
edges and vertices.
The strings are required for the domain specific business logic
which the graph algorithm needs to follow.

Finally, imagine a data set which has a big loading time, and
where one run of the algorithm only provides results for one user.
The standard Hadoop paradigm is to throw away the graph after
loading it.
So if you have 100s or 1000s of users, then you need a way to
execute the algorithm multiple times in parallel.
Again this will add a lot of state, as each of the vertices will
need to hold one state object for each user who has visited the
vertex.

In my specific case, I had the following data and algorithm:
Data:
* an RDF graph with 10 million vertices and 40 million edges
I used my own import code to map the RDF graph to a undirected
graph with a limit of one edge between any two nodes (so it was
not a multi-graph)
* each vertex and each edge uses a string as an identity to
represent a URI in the RDF graph (required for the business logic
in the algorithm)

Algorithm:
* spreading activation.
You can think of it as depth first search guided by domain
specific logic.
A short introduction here:
https://en.wikipedia.org/wiki/Spreading_activation
The wikipedia article only mentions using spreading activation on
weighted graphs, however I used it on graphs which have additional
types on the edges.
The whole area of using the semantics of the edges to guide the
algorithm is an active research topic, so thats why I can't point
you to a good article on that.
* parallel execution:
I need to run the algorithm once for every user in the system,
however loading the data set takes around 15 minutes alone.
So each node has an array of states, one for each user for which
the algorithm has visited a node.
I experimented with user numbers between 30 and 1000, anything
more did not work for concurrent execution of the algorithm.

Infrastructure:
* a single server with 24 Intel Xeon 2.4 GHz cpus and 96 GB of RAM
* Hadoop 1.0, pseudo-distributed setup
* between 10 and 20 Giraph workers


A few weeks ago I stopped work on my Giraph based implementation,
as Giraph ran out of memory almost immediately after loading and
initialising the data.
I made sure that the Giraph workers do not run out of 

Apache Giraph BOARD report for 7/25 meeting

2012-07-10 Thread Avery Ching

Status report for the Apache Giraph project - July 2012

Giraph is a Bulk Synchronous Parallel framework for writing programs that
analyze large graphs on a Hadoop cluster. Giraph is similar to Google's
Pregel system.

Project Status
--

Releases:
  0.2.0 - expected 7/31
 * Reduce memory consumption
 * Improve support for the Green-Marl project.

The transition to being a full Apache project is nearly complete (still a
few references to incubator on the website).

Community
-

Activity has picked up on Apache Giraph and more contributors seem to be
gaining interest and we had 24 commits for the month of June.  We should
try to convert some contributors to committers soon.

Mailing lists:
  116 subscribers on dev
  155 subscribers on user


Re: Problem with zookeeper setup

2012-06-19 Thread Avery Ching
If you're running without a real Hadoop instance, you'll need to blow 
away the zk directories after running the first time.  Hope that helps,


Avery

On 6/19/12 5:39 PM, Jonathan Bishop wrote:

Hi,

I am exploring Giraph 0.1 and was able to download, build, and run all 
the tests - all 58 passed.


I can also run the SimpleShortestPathsVertex test using the supplied 
giraph jar. However, when I copy the java src file into eclipse and 
build my own jar I get the following error which leads me to believe 
that something is going wrong with the ZK setup.



12/06/19 17:31:31 INFO mapred.JobClient: Running job:
job_201206191708_0003
12/06/19 17:31:32 INFO mapred.JobClient:  map 0% reduce 0%
12/06/19 17:32:14 INFO mapred.JobClient: Task Id :
attempt_201206191708_0003_m_00_0, Status : FAILED
java.lang.IllegalStateException: run: Caught an unrecoverable
exception onlineZooKeeperServers: Failed to connect in 10 tries!
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException:
onlineZooKeeperServers: Failed to connect in 10 tries!
at

org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:658)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
... 7 more

attempt_201206191708_0003_m_00_0: log4j:WARN No appenders
could be found for logger (org.apache.giraph.zk.ZooKeeperManager).
attempt_201206191708_0003_m_00_0: log4j:WARN Please initialize
the log4j system properly.

BTW, I needed to add the following line to get this to run from my own 
jar file...


job.setJarByClass(SimpleShortestPathsVertex.class)


Not sure if that is related but it seems that it will not run without 
this (it can not find SimpleShortestPathsVertex.


Thanks,

Jon Bishop






Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Avery Ching
We did have a related issue 
(https://issues.apache.org/jira/browse/GIRAPH-155).


On 5/29/12 6:54 AM, Claudio Martella wrote:

I'm not sure they will be needed to send them on the first superstep.
They'll be created and used in the second superstep if necessary. If
they need it in the first superstep, then i guess they'll put them as
a line in the inputfile.
I agree with you that this is kind of messed up :)


On Tue, May 29, 2012 at 3:23 PM, Sebastian Schelters...@apache.org  wrote:

Oh sorry, I didn't know that discussion. The problem I see is that in
every implementation, a user might run into this issue, and I don't
think its ideal to force users to always run a round of sending empty
messages at the beginning.

Maybe the system should (somehow) automagically do that for the users?
Really seems to be an awkward situation though...

--sebastian



On 29.05.2012 15:03, Claudio Martella wrote:

About the mapreduce job to prepare the inputset, I did advocate for
this solution instead of supporting automatic creation of non-existent
vertices implicitly (which I believe adds a logical path in vertex
resolution which has some drawbacks e.g you have to check in the
hashmap for the existence of the destination vertex for each message,
which is fine now that it's a hashmap, but it's going to be less
fine when/if we turn to TreeMap for out-of-core).

Unfortunately the other committers preferred going for the path that
helps userland's life, so I guess this solution is not to be
considered here either.

On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelters...@apache.org  wrote:

On 29.05.2012 13:13, Paolo Castagna wrote:

Hi Sebastian

Sebastian Schelter wrote:

Why do you only recompute the pageRank in each second superstep? Can we
not use the aggregated value of the dangling nodes from the last superstep?

I removed the computing of PageRank values every each second superstep.
However, I needed to use a couple of aggregators for the dangling nodes
contribution instead of just one: dangling-current and dangling-previous.

Each superstep, I need to reset the dangling-current aggregator, at the
same time, I need to know the value of the aggregator at a previous
superstep.

You can save the value from the previous step in a static variable in
the WorkerContext before resetting the aggregator.


I hope it makes sense, let me know if you have a better idea.


Overall I think we're on a good way to a robust, real-world PageRank
implementation, I managed to implement the convergence check with an
aggregator, will post an updated patch soon.

I think I've just done it, have a look [1] and let me know if you would have
done it differently.

Paolo

  [1]
https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90












Re: Giraph on Hadoop 2.0.0-alpha

2012-05-29 Thread Avery Ching

Did you compile with the appropriate flags?

From the README:

- Apache Hadoop 0.23.1

  You may tell maven to use this version with mvn -Phadoop_0.23 goals.

On 5/25/12 9:24 AM, Roman Shaposhnik wrote:

Hi!

I'm trying to run Giraph trunk on top of Hadoop 2.0.0 and I'm getting
the following error while submitting an example job:
  $ hadoop jar /usr/lib/giraph/giraph-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -V 10 -w 3

Now, if I look at the state of HDFS right after the job fails I see that
the job has created file structure all the way up to
   _bsp/_defaultZkManagerDir/job_1337959594450_0002/
I even see
   _bsp/_defaultZkManagerDir/job_1337959594450_0002/zkServerList_ahmed-laptop 0
so it is unlikely to be file permission problems or anything like that.

Could you, please, suggest a way to debug it from here?

Oh, and here's the exception I'm getting:

2012-05-25 08:31:34,335 INFO [IPC Server handler 16 on 33249]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1337959594450_0002_m_01_3: Error:
java.lang.RuntimeException: java.io.FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_1337959594450_0002/_zkServer does not
exist.
   at 
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:748)
   at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:424)
   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:645)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_1337959594450_0002/_zkServer does not
exist.
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365)
   at 
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:707)
   ... 9 more

Thanks,
Roman.