Re: java.net.ConnectException: Connection refused
Thanks Xenia.I also managed to solve the issue. Following is how I solved it. I ran netstat on all the computers of my cluster, and I ran the gripah job in parallel.I then learnt that this process runs on 127.0.0.1 while other machine tries to connect on 172.21.xx.xxx. That's why It gives the error java.net.ConnectException Then opened the /etc/hosts file, I found that the hostname of the machine was also mapped to 127.0.0.1.I removed that entry and it worked. It took me quite a while to resolve this issue. Anyway finally it worked. PuneetIIT Delhi, India On Tuesday, November 4, 2014 4:09 AM, Xenia Demetriou xenia...@gmail.com wrote: Hi Puneet, I am not an expert but I had the same error and I solved it by changing the hostnames of the cluster-Pcs in lowercase e.g Make iHadoop3 - ihadoop3 -- Xenia 2014-11-02 14:08 GMT+02:00 Puneet Agarwal puagar...@yahoo.com: I have setup a cluster of 4 computers for running my Pregel jobs. When running a job I often get the following error (given below).I followed another thread in giraph forums and learnt that this problem is because of the firewall stopping network traffic.I have stopped the firewall service on all the machines. These are machines have RHEL 5.5 and I stopped the service using the command - service iptables stop But I still get the same error. Can someone tell me what could be causing this service to be blocked on port 30001 on this computer? RegardsPuneet (IIT Delhi, India) Re: Problem running the PageRank example in a cluster | | | | | | | | | Re: Problem running the PageRank example in a clusterthis is the output of the command in all servers:Chain INPUT (policy ACCEPT)target prot opt source destinationACCEPT tcp -- anywhere anywhere stateNEW tcp dpts:3:30010ACCEPT tcp -- anywhere anywhere ... | | | | View on mail-archives.apache.org | Preview by Yahoo | | | | | Error===Using Netty without authentication. 2014-11-02 14:26:24,458 WARN org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Future failed to connect with iHadoop3/172.21.208.178:30001 with 0 failures because of java.net.ConnectException: Connection refused 2014-11-02 14:26:24,458 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication. 2014-11-02 14:26:24,459 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 0 connections, (0 total connected) 1 failed, 1 failures total. 2014-11-02 14:26:24,499 WARN org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address null java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)2014-11-02 14:26:24,459 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 0 connections, (0 total connected) 1 failed, 1 failures total. 2014-11-02 14:26:24,499 WARN org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address null java.net.ConnectException: Connection refusedjava.net.ConnectException: Connection refused
Issue in Aggregator
Hi All,In my algo, I use an Aggregator which takes a Text value. I have written my custom aggregator class for this, as given below. public class MyAgg extends BasicAggregatorText {...} This works fine when running on my laptop with one worker.However, when running it on the cluster, sometimes it does not return the correctly aggregated value.It seems it is returning the locally aggregated value of one of the workers.While it should have used my logic to decide which of the aggregated values sent by various worker should be chosen as finally aggregated values.(But in fact I have not written such a code anywhere, it is therefore doing the best it could) Following is how is my analysis about this issue.a.I guess every worker aggregates the values locally.b. then there is a global aggregation step, which simply compares the values sent by various aggregators.c. For global aggregation it uses Text.compareTo() method. This method Text.compareTo() is a default Hadoop implementation and does not include the logic of my program.d. It seem it is because of the above the value returned by my aggregator in the cluster is actually not globally aggregated, but the locally aggregated value of one of the worker gets taken. If the above analysis is correct, following is how I think I can solve this.I should write my own class that implements Writable interface. In this class I would also write a compareTo method as a result things will start working fine. If it was using class MyAgg itself, to decide which of the values returned by various workers should be taken as globally aggregated value then this problem would not have occurred. I seek your guidance whether my analysis is correct. - PuneetIIT Delhi, India
Re: Help with Giraph on Yarn
Hi Tripti, finally I was able to run the test with success. It was an issue of permission since I was running as ale not as yarn. Let me say that now I’m able to run graph examples on Yarn 2.5.1. This is the final result: 14/11/08 16:24:00 INFO yarn.GiraphYarnClient: Completed Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation: SUCCEEDED, total running time: 0 minutes, 21 seconds. Many thanks for your support, Alessandro Il giorno 06/nov/2014, alle ore 15:16, Tripti Singh tri...@yahoo-inc.com ha scritto: I don't know if u have access to this node. But if u do, u could check if the file is indeed there and u have access to it. Sent from my iPhone On 06-Nov-2014, at 6:12 pm, Alessandro Negro alenegr...@yahoo.it wrote: You are right it works, but now I receive the following error: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0009/filecache/10/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/yarn/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2014-11-06 13:15:37.120 java[10158:1803] Unable to load realm info from SCDynamicStore Exception in thread pool-4-thread-1 java.lang.IllegalStateException: Could not configure the containerlaunch context for GiraphYarnTasks. at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391) at org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78) at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522) at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.FileNotFoundException: File does not exist: hdfs://hadoop-master:9000/user/yarn/giraph_yarn_jar_cache/application_1415264041937_0009/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064) at org.apache.giraph.yarn.YarnUtils.addFileToResourceMap(YarnUtils.java:153) at org.apache.giraph.yarn.YarnUtils.addFsResourcesToMap(YarnUtils.java:77) at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:387) ... 6 more That justify the other error I receive in the task: Could not find or load main class org.apache.giraph.yarn.GiraphYarnTask Thanks, Il giorno 06/nov/2014, alle ore 13:07, Tripti Singh tri...@yahoo-inc.com ha scritto: Why r u adding two jars? Example jar ideally contains core library so everything should be available with just one example jar included Sent from my iPhone On 06-Nov-2014, at 4:33 pm, Alessandro Negro alenegr...@yahoo.it wrote: Hi, now it seems better, I need to add: giraph-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar,giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar Now it seems that after a lot of cycle it fail with this error: Could not find or load main class org.apache.giraph.yarn.GiraphYarnTask But in this case the error appear in task-3-stderr.log not in gam-stderr.log where there is the following error: LF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0006/filecache/12/giraph-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0006/filecache/10/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/yarn/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type
storeCheckpoint() in worker can be slow when a slow worker is presence
Hi all, I have a question related to my last experience using Giraph. In Giraph worker's code, I see a line like this: *getServerData().getCurrentMessageStore().writePartition(verticesOutputStream, partition.getId());* To the best of my knowledge, while executing this line, a worker writes some of its partitionMap entry into outputStream. However, with the existence of a slow workers, this line execution in another worker also gets slower. I see that the execution gets faster again after the slow worker has finished its storeCheckpoint process. From what I know, each worker uses its own message store and write to different output stream. Hence, why the slow storeCheckpoint process in a worker can affect another worker checkpoint process? Thanks