date:20141108

storeCheckpoint() in worker can be slow when a slow worker is presence

2014-11-08 Thread Vincentius Martin

Hi all,

I have a question related to my last experience using Giraph.

In Giraph worker's code, I see a line like this:

*"getServerData().getCurrentMessageStore().writePartition(verticesOutputStream,
partition.getId());*"

To the best of my knowledge, while executing this line, a worker writes
some of its partitionMap entry into outputStream. However, with the
existence of a slow workers, this line execution in another worker also
gets slower.

I see that the execution gets faster again after the slow worker has
finished its storeCheckpoint process.

>From what I know, each worker uses its own message store and write to
different output stream. Hence, why the slow storeCheckpoint process in a
worker can affect another worker checkpoint process?

Thanks

Re: Issue in Aggregator

2014-11-08 Thread Matthew Saltz

Hi Puneet,

It's unclear to me what you're wanting in terms of aggregator behavior. Are
you saying you want an aggregator such that the final output is the
aggregated value just for a particular worker? With an aggregator you
should at least make sure the operations you're performing are commutative;
that is, the order in which items are aggregated should not matter unless
it is explicitly dealt with somehow. Otherwise you'll get unpredictable
results.

Best,
Matthew Saltz
El 08/11/2014 15:05, "Puneet Agarwal"  escribió:

> Hi All,
> In my algo, I use an Aggregator which takes a Text value. I have written
> my custom aggregator class for this, as given below.
>
> public class MyAgg extends BasicAggregator {
> ...
> }
>
> This works fine when running on my laptop with one worker.
> However, when running it on the cluster, sometimes it does not return the
> correctly aggregated value.
> It seems it is returning the locally aggregated value of one of the
> workers.
> While it should have used my logic to decide which of the aggregated
> values sent by various worker should be chosen as finally aggregated values.
> (But in fact I have not written such a code anywhere, it is therefore
> doing the best it could)
>
> Following is how is my analysis about this issue.
> a.I guess every worker aggregates the values locally.
> b.then there is a global aggregation step, which simply compares the
> values sent by various aggregators.
> c.For global aggregation it uses Text.compareTo() method. This method
> Text.compareTo() is a default Hadoop implementation and does not include
> the logic of my program.
> d.It seem it is because of the above the value returned by my
> aggregator in the cluster is actually not globally aggregated, but the
> locally aggregated value of one of the worker gets taken.
>
> If the above analysis is correct, following is how I think I can solve
> this.
> I should write my own class that implements Writable interface. In this
> class I would also write a compareTo method as a result things will start
> working fine.
>
> If it was using class MyAgg itself, to decide which of the values returned
> by various workers should be taken as globally aggregated value then this
> problem would not have occurred.
>
> *I seek your guidance whether my analysis is correct.*
>
> - Puneet
> IIT Delhi, India
>
>

Re: Help with Giraph on Yarn

2014-11-08 Thread Alessandro Negro

Hi Tripti,
finally I was able to run the test with success. It was an issue of permission 
since I was running as ale not as yarn.
Let me say that now I’m able to run graph examples on Yarn 2.5.1. This is the 
final result:

14/11/08 16:24:00 INFO yarn.GiraphYarnClient: Completed Giraph: 
org.apache.giraph.examples.SimpleShortestPathsComputation: SUCCEEDED, total 
running time: 0 minutes, 21 seconds.


Many thanks for your support,
Alessandro

Il giorno 06/nov/2014, alle ore 15:16, Tripti Singh  ha 
scritto:

> I don't know if u have access to this node. But if u do, u could check if the 
> file is indeed there and u have access to it.
> 
> Sent from my iPhone
> 
> On 06-Nov-2014, at 6:12 pm, "Alessandro Negro"  wrote:
> 
>> You are right it works, but now I receive the following error:
>> 
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in 
>> [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0009/filecache/10/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in 
>> [jar:file:/opt/yarn/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 2014-11-06 13:15:37.120 java[10158:1803] Unable to load realm info from 
>> SCDynamicStore
>> Exception in thread "pool-4-thread-1" java.lang.IllegalStateException: Could 
>> not configure the containerlaunch context for GiraphYarnTasks.
>> at 
>> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>> at 
>> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>> at 
>> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>> at 
>> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.io.FileNotFoundException: File does not exist: 
>> hdfs://hadoop-master:9000/user/yarn/giraph_yarn_jar_cache/application_1415264041937_0009/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar
>> at 
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
>> at 
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
>> at 
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
>> at org.apache.giraph.yarn.YarnUtils.addFileToResourceMap(YarnUtils.java:153)
>> at org.apache.giraph.yarn.YarnUtils.addFsResourcesToMap(YarnUtils.java:77)
>> at 
>> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:387)
>> ... 6 more
>> 
>> 
>> That justify the other error I receive in the task:
 Could not find or load main class org.apache.giraph.yarn.GiraphYarnTask
>> 
>> 
>> Thanks,
>> 
>> Il giorno 06/nov/2014, alle ore 13:07, Tripti Singh  
>> ha scritto:
>> 
>>> Why r u adding two jars? Example jar ideally contains core library so 
>>> everything should be available with just one example jar included
>>> 
>>> Sent from my iPhone
>>> 
>>> On 06-Nov-2014, at 4:33 pm, "Alessandro Negro"  wrote:
>>> 
 Hi,
 now it seems better, I need to add: 
 
 giraph-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar,giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar
 
 
 Now it seems that after a lot of cycle it fail with this error:
 
 Could not find or load main class org.apache.giraph.yarn.GiraphYarnTask
 
 But in this case the error appear in task-3-stderr.log  not in 
 gam-stderr.log where there is the following error:
 
 LF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0006/filecache/12/giraph-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/private/tmp/hadoop-yarn/nm-local-dir/usercache/ale/appcache/application_1415264041937_0006/filecache/10/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/opt/yarn/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

Issue in Aggregator

2014-11-08 Thread Puneet Agarwal

Hi All,In my algo, I use an Aggregator which takes a Text value. I have written 
my custom aggregator class for this, as given below.

public class MyAgg extends BasicAggregator {...}

This works fine when running on my laptop with one worker.However, when running 
it on the cluster, sometimes it does not return the correctly aggregated 
value.It seems it is returning the locally aggregated value of one of the 
workers.While it should have used my logic to decide which of the aggregated 
values sent by various worker should be chosen as finally aggregated 
values.(But in fact I have not written such a code anywhere, it is therefore 
doing the best it could)

Following is how is my analysis about this issue.a.I guess every worker 
aggregates the values locally.b.    then there is a global aggregation step, 
which simply compares the values sent by various aggregators.c.    For global 
aggregation it uses Text.compareTo() method. This method Text.compareTo() is a 
default Hadoop implementation and does not include the logic of my program.d.   
 It seem it is because of the above the value returned by my aggregator in the 
cluster is actually not globally aggregated, but the locally aggregated value 
of one of the worker gets taken.
If the above analysis is correct, following is how I think I can solve this.I 
should write my own class that implements Writable interface. In this class I 
would also write a compareTo method as a result things will start working fine.
If it was using class MyAgg itself, to decide which of the values returned by 
various workers should be taken as globally aggregated value then this problem 
would not have occurred.

I seek your guidance whether my analysis is correct.
- PuneetIIT Delhi, India

Re: java.net.ConnectException: Connection refused

2014-11-08 Thread Puneet Agarwal

Thanks Xenia.I also managed to solve the issue. Following is how I solved it.
I ran netstat on all the computers of my cluster, and I ran the gripah job in 
parallel.I then learnt that this process runs on 127.0.0.1 while other machine 
tries to connect on 172.21.xx.xxx. That's why It gives the error 
java.net.ConnectException
Then opened the /etc/hosts file, I found that the hostname of the machine was 
also mapped to 127.0.0.1.I removed that entry and it worked. It took me quite a 
while to resolve this issue.

Anyway finally it worked.
PuneetIIT Delhi, India
 

 On Tuesday, November 4, 2014 4:09 AM, Xenia Demetriou  
wrote:
   

 Hi Puneet,

I am not an expert but I had the same error and I solved it by changing the 
hostnames of the cluster-Pcs in lowercase e.g Make 
iHadoop3 -> ihadoop3  

--
Xenia

2014-11-02 14:08 GMT+02:00 Puneet Agarwal :

I have setup a cluster of 4 computers for running my Pregel jobs.

When running a job I often get the following error (given below).I followed 
another thread in giraph forums and learnt that this problem is because of the 
firewall stopping network traffic.I have stopped the firewall service on all 
the machines. These are machines have RHEL 5.5 and I stopped the service using 
the command - "service iptables stop"
But I still get the same error.
Can someone tell me what could be causing this service to be blocked on port 
30001 on this computer?
RegardsPuneet (IIT Delhi, India)
Re: Problem running the PageRank example in a cluster

|   |
|   |   |   |   |   |
| Re: Problem running the PageRank example in a clusterthis is the output of 
the command in all servers:Chain INPUT (policy ACCEPT)target prot opt source 
destinationACCEPT tcp -- anywhere anywhere stateNEW tcp dpts:3:30010ACCEPT 
tcp -- anywhere anywhere ... |
|  |
| View on mail-archives.apache.org | Preview by Yahoo |
|  |
|   |




Error===Using Netty without authentication.
2014-11-02 14:26:24,458 WARN org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Future failed to connect with 
iHadoop3/172.21.208.178:30001 with 0 failures because of 
java.net.ConnectException: Connection refused
2014-11-02 14:26:24,458 INFO org.apache.giraph.comm.netty.NettyClient: Using 
Netty without authentication.
2014-11-02 14:26:24,459 INFO org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Successfully added 0 connections, (0 total connected) 1 
failed, 1 failures total.
2014-11-02 14:26:24,499 WARN 
org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: 
Channel failed with remote address null
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)2014-11-02 
14:26:24,459 INFO org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Successfully added 0 connections, (0 total connected) 1 
failed, 1 failures total.
2014-11-02 14:26:24,499 WARN 
org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: 
Channel failed with remote address null
java.net.ConnectException: Connection refusedjava.net.ConnectException: 
Connection refused

storeCheckpoint() in worker can be slow when a slow worker is presence

Re: Issue in Aggregator

Re: Help with Giraph on Yarn

Issue in Aggregator

Re: java.net.ConnectException: Connection refused

5 matches

Site Navigation

Mail list logo

Footer information