Superstep duration increases

2014-05-15 Thread Pascal Jäger
Hi all,

I have implemented a label propagation algorithm to find clusters in a graph.
I just realized that the time the algorithm takes for one superstep is 
increasing and I don’t know why.

The graph is static and the number of messages is the same throughout all 
supersteps.
During every superstep each node sends its label to its neighbors which then 
calculate their label based on the received messages and then again send their 
label.
At the end of each superstep each node writes a nodeID - label pair to an HBase 
table.

Do you have any general hints where I can look at?

I absolutely have no clue where to start

Thanks for your help!

Regards

Pascal



Re: Error while executing large graph

2014-05-15 Thread Avery Ching

I think this is the key message.

0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, 
average 11.56MB


Having less than 1 MB free won't work.  Your workers are likely OOM, 
killing the job.  Can you get more memory for your job?


On 5/14/14, 3:13 AM, Arun Kumar wrote:
Hi when i run giraph job against a data of 1 gb i am getting the below 
exception after some times can somebody tell me what is the issue?
14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers - 
Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196 
partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB
14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional 
data from server sessionid 0x145f9cff031000f, likely server has closed 
socket, closing socket connection and attempting reconnect
14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on 
attempt 0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/_hadoopBsp/job_201405140108_0003/_workerProgresses
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at 
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)

at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

14/05/14 01:54:09 INFO mapred.JobClient:  map 93% reduce 0%
14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection 
to server mercado-12.hpl.hp.com/15.25.119.147:22181 
. Will not attempt 
to authenticate using SASL (unknown error)
14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
f

Re: resolveMutations error

2014-05-15 Thread Pascal Jäger
I get the error as well but couldn’t figure out why.

Did you check if all your vertices are present after the exception would have 
been thrown?

Regards


Pascal

Am 14.05.2014 um 17:05 schrieb Jyotirmoy Sundi 
mailto:sundi...@gmail.com>>:

Hi,
   Not throwing the exception in resolveMutations worked for me. Is it possible 
to get some understanding of what effects it would have if any because the 
results are working as expected.

for (I vertexId : destinations) {
  if (partition.getVertex(vertexId) == null) {
if (!resolveVertexIndices.put(partitionId, vertexId)) {
  //throw new IllegalStateException(
  //"resolveMutations: Already has missing vertex on this " +
  //"worker for " + vertexId);
}
  }
}


On Wed, May 14, 2014 at 7:23 AM, Jyotirmoy Sundi 
mailto:sundi...@gmail.com>> wrote:

Hi Folks,

I am seeing this error recently in jobs, can you please throw some 
light on what resolveMutations in NettyWorkerServer.java tries to achieve ?

Trace:

Caused by: java.lang.IllegalStateException: resolveMutations: Already has 
missing vertex on this worker for 8345381246748292335
at 
org.apache.giraph.comm.netty.NettyWorkerServer.resolveMutations(NettyWorkerServer.java:184)
at 
org.apache.giraph.comm.netty.NettyWorkerServer.prepareSuperstep(NettyWorkerServer.java:152)
at 
org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:677)
at 
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:249)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)

--
Best Regards,
Jyotirmoy Sundi




--
Best Regards,
Jyotirmoy Sundi