Re: java.net.ConnectException: Connection refused

2014-11-08 Thread Puneet Agarwal
Thanks Xenia.I also managed to solve the issue. Following is how I solved it.
I ran netstat on all the computers of my cluster, and I ran the gripah job in 
parallel.I then learnt that this process runs on 127.0.0.1 while other machine 
tries to connect on 172.21.xx.xxx. That's why It gives the error 
java.net.ConnectException
Then opened the /etc/hosts file, I found that the hostname of the machine was 
also mapped to 127.0.0.1.I removed that entry and it worked. It took me quite a 
while to resolve this issue.

Anyway finally it worked.
PuneetIIT Delhi, India
 

 On Tuesday, November 4, 2014 4:09 AM, Xenia Demetriou xenia...@gmail.com 
wrote:
   

 Hi Puneet,

I am not an expert but I had the same error and I solved it by changing the 
hostnames of the cluster-Pcs in lowercase e.g Make 
iHadoop3 - ihadoop3  

--
Xenia

2014-11-02 14:08 GMT+02:00 Puneet Agarwal puagar...@yahoo.com:

I have setup a cluster of 4 computers for running my Pregel jobs.

When running a job I often get the following error (given below).I followed 
another thread in giraph forums and learnt that this problem is because of the 
firewall stopping network traffic.I have stopped the firewall service on all 
the machines. These are machines have RHEL 5.5 and I stopped the service using 
the command - service iptables stop
But I still get the same error.
Can someone tell me what could be causing this service to be blocked on port 
30001 on this computer?
RegardsPuneet (IIT Delhi, India)
Re: Problem running the PageRank example in a cluster

|   |
|   |   |   |   |   |
| Re: Problem running the PageRank example in a clusterthis is the output of 
the command in all servers:Chain INPUT (policy ACCEPT)target prot opt source 
destinationACCEPT tcp -- anywhere anywhere stateNEW tcp dpts:3:30010ACCEPT 
tcp -- anywhere anywhere ... |
|  |
| View on mail-archives.apache.org | Preview by Yahoo |
|  |
|   |




Error===Using Netty without authentication.
2014-11-02 14:26:24,458 WARN org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Future failed to connect with 
iHadoop3/172.21.208.178:30001 with 0 failures because of 
java.net.ConnectException: Connection refused
2014-11-02 14:26:24,458 INFO org.apache.giraph.comm.netty.NettyClient: Using 
Netty without authentication.
2014-11-02 14:26:24,459 INFO org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Successfully added 0 connections, (0 total connected) 1 
failed, 1 failures total.
2014-11-02 14:26:24,499 WARN 
org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: 
Channel failed with remote address null
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)2014-11-02 
14:26:24,459 INFO org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Successfully added 0 connections, (0 total connected) 1 
failed, 1 failures total.
2014-11-02 14:26:24,499 WARN 
org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: 
Channel failed with remote address null
java.net.ConnectException: Connection refusedjava.net.ConnectException: 
Connection refused





   

Issue in Aggregator

2014-11-08 Thread Puneet Agarwal
Hi All,In my algo, I use an Aggregator which takes a Text value. I have written 
my custom aggregator class for this, as given below.

public class MyAgg extends BasicAggregatorText {...}

This works fine when running on my laptop with one worker.However, when running 
it on the cluster, sometimes it does not return the correctly aggregated 
value.It seems it is returning the locally aggregated value of one of the 
workers.While it should have used my logic to decide which of the aggregated 
values sent by various worker should be chosen as finally aggregated 
values.(But in fact I have not written such a code anywhere, it is therefore 
doing the best it could)

Following is how is my analysis about this issue.a.I guess every worker 
aggregates the values locally.b.    then there is a global aggregation step, 
which simply compares the values sent by various aggregators.c.    For global 
aggregation it uses Text.compareTo() method. This method Text.compareTo() is a 
default Hadoop implementation and does not include the logic of my program.d.   
 It seem it is because of the above the value returned by my aggregator in the 
cluster is actually not globally aggregated, but the locally aggregated value 
of one of the worker gets taken.
If the above analysis is correct, following is how I think I can solve this.I 
should write my own class that implements Writable interface. In this class I 
would also write a compareTo method as a result things will start working fine.
If it was using class MyAgg itself, to decide which of the values returned by 
various workers should be taken as globally aggregated value then this problem 
would not have occurred.

I seek your guidance whether my analysis is correct.
- PuneetIIT Delhi, India



Re: Help with Giraph on Yarn

2014-11-08 Thread Alessandro Negro
Hi Tripti,
finally I was able to run the test with success. It was an issue of permission 
since I was running as ale not as yarn.
Let me say that now I’m able to run graph examples on Yarn 2.5.1. This is the 
final result:

14/11/08 16:24:00 INFO yarn.GiraphYarnClient: Completed Giraph: 
org.apache.giraph.examples.SimpleShortestPathsComputation: SUCCEEDED, total 
running time: 0 minutes, 21 seconds.


Many thanks for your support,
Alessandro

Il giorno 06/nov/2014, alle ore 15:16, Tripti Singh tri...@yahoo-inc.com ha 
scritto:

 I don't know if u have access to this node. But if u do, u could check if the 
 file is indeed there and u have access to it.
 
 Sent from my iPhone
 
 On 06-Nov-2014, at 6:12 pm, Alessandro Negro alenegr...@yahoo.it wrote:
 
 You are right it works, but now I receive the following error:
 
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0009/filecache/10/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/opt/yarn/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 2014-11-06 13:15:37.120 java[10158:1803] Unable to load realm info from 
 SCDynamicStore
 Exception in thread pool-4-thread-1 java.lang.IllegalStateException: Could 
 not configure the containerlaunch context for GiraphYarnTasks.
 at 
 org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
 at 
 org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
 at 
 org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
 at 
 org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 hdfs://hadoop-master:9000/user/yarn/giraph_yarn_jar_cache/application_1415264041937_0009/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
 at org.apache.giraph.yarn.YarnUtils.addFileToResourceMap(YarnUtils.java:153)
 at org.apache.giraph.yarn.YarnUtils.addFsResourcesToMap(YarnUtils.java:77)
 at 
 org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:387)
 ... 6 more
 
 
 That justify the other error I receive in the task:
 Could not find or load main class org.apache.giraph.yarn.GiraphYarnTask
 
 
 Thanks,
 
 Il giorno 06/nov/2014, alle ore 13:07, Tripti Singh tri...@yahoo-inc.com 
 ha scritto:
 
 Why r u adding two jars? Example jar ideally contains core library so 
 everything should be available with just one example jar included
 
 Sent from my iPhone
 
 On 06-Nov-2014, at 4:33 pm, Alessandro Negro alenegr...@yahoo.it wrote:
 
 Hi,
 now it seems better, I need to add: 
 
 giraph-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar,giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar
 
 
 Now it seems that after a lot of cycle it fail with this error:
 
 Could not find or load main class org.apache.giraph.yarn.GiraphYarnTask
 
 But in this case the error appear in task-3-stderr.log  not in 
 gam-stderr.log where there is the following error:
 
 LF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0006/filecache/12/giraph-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0006/filecache/10/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/opt/yarn/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type 

storeCheckpoint() in worker can be slow when a slow worker is presence

2014-11-08 Thread Vincentius Martin
Hi all,

I have a question related to my last experience using Giraph.

In Giraph worker's code, I see a line like this:

*getServerData().getCurrentMessageStore().writePartition(verticesOutputStream,
partition.getId());*

To the best of my knowledge, while executing this line, a worker writes
some of its partitionMap entry into outputStream. However, with the
existence of a slow workers, this line execution in another worker also
gets slower.

I see that the execution gets faster again after the slow worker has
finished its storeCheckpoint process.

From what I know, each worker uses its own message store and write to
different output stream. Hence, why the slow storeCheckpoint process in a
worker can affect another worker checkpoint process?

Thanks