Re: Giraph and Fair Scheduler

2013-05-07 Thread Ramani, Arun
Hi Avery,

The following is the error of one of the failed tasks:


May 7, 2013 2:34:26 PM 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink
WARNING: Failed to accept a connection.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:657)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.start(AbstractNioWorker.java:179)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.register(AbstractNioWorker.java:141)
at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.registerAcceptedChannel(NioServerSocketPipelineSink.java:277)
at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:239)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
13/05/07 14:34:26 INFO worker.BspServiceWorker: startSuperstep: 
Master(hostname=lvshdc5dn0020.qa.paypal.com, MRtaskID=44, port=30044)
13/05/07 14:34:26 INFO worker.BspServiceWorker: startSuperstep: Ready for 
computation on superstep -1 since worker selection and vertex range assignments 
are done in 
/_hadoopBsp/job_201305061811_0012/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
May 7, 2013 2:34:26 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an exception 
event ([id: 0x45c3e9ba] EXCEPTION: java.lang.OutOfMemoryError: unable to create 
new native thread)
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:657)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at 
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.doUnorderedExecute(MemoryAwareThreadPoolExecutor.java:452)
at 
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.doExecute(MemoryAwareThreadPoolExecutor.java:445)
at 
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.execute(MemoryAwareThreadPoolExecutor.java:437)
at 
org.jboss.netty.handler.execution.ExecutionHandler.handleUpstream(ExecutionHandler.java:172)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:378)
at 
org.apache.giraph.comm.netty.ByteCounter.handleUpstream(ByteCounter.java:116)
at 
org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533)
at org.jboss.netty.channel.Channels$7.run(Channels.java:507)
at 
org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:373)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:254)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
May 7, 2013 2:34:26 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an exception 
event ([id: 0x45c3e9

Thanks

Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Date: Tuesday, May 7, 2013 2:29 PM
To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Subject: Re: Giraph and Fair Scheduler

Can you check the logs of the failed task and report what the error is?

Avery

On 5/7/13 2:26 PM, Ramani, Arun wrote:
Hi Avery,

I am setting "minsharepreemptiontimeout" to 5 sec and my Giraph job could not 
even wait for 5 secs to get its slots. Let me explain the scenario below:

Assume, Cluster capacity is 150
Queue A (min share –10 maps) - I submit a sleep job with 100 map tasks. Cluster 
is empty, and hence the first job submitted to Queue A will take the entire 100 
map tasks.
Queue B (Giraph pool with min share – 140 maps) - Now my job 1 is running with 
100 tasks occupied. I submit a giraph shortestpathfirst example job with 100 
workers to Queue B. Queue B has "minsharepreemptiontimeout" to 5 sec". So, it 
will first schedule 50 tasks since first job only took 100 tasks and cluste

Re: Giraph and Fair Scheduler

2013-05-07 Thread Avery Ching

Can you check the logs of the failed task and report what the error is?

Avery

On 5/7/13 2:26 PM, Ramani, Arun wrote:

Hi Avery,

I am setting "minsharepreemptiontimeout" to 5 sec and my Giraph job 
could not even wait for 5 secs to get its slots. Let me explain the 
scenario below:


Assume, Cluster capacity is 150
Queue A (min share –10 maps) - I submit a sleep job with 100 map 
tasks. Cluster is empty, and hence the first job submitted to Queue A 
will take the entire 100 map tasks.
Queue B (Giraph pool with min share – 140 maps) - Now my job 1 is 
running with 100 tasks occupied. I submit a giraph shortestpathfirst 
example job with 100 workers to Queue B. Queue B has 
"minsharepreemptiontimeout" to 5 sec". So, it will first schedule 50 
tasks since first job only took 100 tasks and cluster's capacity is 
150. Meanwhile, in 5 sec, 50 more tasks would be preempted from Queue 
A and would be given to Giraph Job. I see this happening, however, the 
job fails with "Unable to create native thread error"


Please let me know if "giraph.maxMasterSuperstepWaitMsecs" will help 
in this scenario.


Thanks so much
Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Date: Tuesday, May 7, 2013 2:19 PM
To: "user@giraph.apache.org " 
mailto:user@giraph.apache.org>>
Cc: "Ramani, Arun(aramani)" >

Subject: Re: Giraph and Fair Scheduler

Oh, I see.  You can change the timeout of how long the giraph job 
waits for tasks before giving up.  Try setting 
giraph.maxMasterSuperstepWaitMsecs to a higher number.  The default is 
10 minutes.


Avery

On 5/7/13 2:10 PM, Ramani, Arun wrote:

Hi Avery,

I am not preempting tasks out of the giraph pool. I have configured 
pre-emption so that any job submitted to giraph pool will get its min 
share. Any suggestion on how to make this work?


Thanks so much in advance.

Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org " 
mailto:user@giraph.apache.org>>

Date: Tuesday, May 7, 2013 7:25 AM
To: "user@giraph.apache.org " 
mailto:user@giraph.apache.org>>

Subject: Re: Giraph and Fair Scheduler

Can you disable the preemption for the giraph pool?  It's not great 
to preempt those tasks.


Avery

On 5/6/13 6:37 PM, Ramani, Arun wrote:

Hi,

I am running Fair scheduler with many applications in hadoop stack 
in my cluster (like pig, hive, hbase etc). I have dedicated a pool 
for Giraph and want to run giraph along with those other 
applications. I have configured pre-emption and and set the 
"minsharepreemptiontimeout=5" (sec – for the jobs submitted to this 
pool to wait to get the min share).


I am trying to run giraph in this mode. I see that jobs from other 
pools are getting pre-empted to give the giraph job's pool its 
configured min share but my job fails with "Unable to create native 
thread" error. This same job passes if the slots are available 
immediately without having to wait for the tasks from other queues 
to be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. 
Please help in this scenario.


Basically, I wanted to know how to configure giraph to wait for a 
threshold for the slots to be available for it through pre-emption.


Thanks
Arun Ramani








Re: Giraph and Fair Scheduler

2013-05-07 Thread Ramani, Arun
Hi Avery,

I am setting "minsharepreemptiontimeout" to 5 sec and my Giraph job could not 
even wait for 5 secs to get its slots. Let me explain the scenario below:

Assume, Cluster capacity is 150
Queue A (min share –10 maps) - I submit a sleep job with 100 map tasks. Cluster 
is empty, and hence the first job submitted to Queue A will take the entire 100 
map tasks.
Queue B (Giraph pool with min share – 140 maps) - Now my job 1 is running with 
100 tasks occupied. I submit a giraph shortestpathfirst example job with 100 
workers to Queue B. Queue B has "minsharepreemptiontimeout" to 5 sec". So, it 
will first schedule 50 tasks since first job only took 100 tasks and cluster's 
capacity is 150. Meanwhile, in 5 sec, 50 more tasks would be preempted from 
Queue A and would be given to Giraph Job. I see this happening, however, the 
job fails with "Unable to create native thread error"

Please let me know if "giraph.maxMasterSuperstepWaitMsecs" will help in this 
scenario.

Thanks so much
Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Date: Tuesday, May 7, 2013 2:19 PM
To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Cc: "Ramani, Arun(aramani)" mailto:aram...@paypal.com>>
Subject: Re: Giraph and Fair Scheduler

Oh, I see.  You can change the timeout of how long the giraph job waits for 
tasks before giving up.  Try setting giraph.maxMasterSuperstepWaitMsecs to a 
higher number.  The default is 10 minutes.

Avery

On 5/7/13 2:10 PM, Ramani, Arun wrote:
Hi Avery,

I am not preempting tasks out of the giraph pool. I have configured pre-emption 
so that any job submitted to giraph pool will get its min share. Any suggestion 
on how to make this work?

Thanks so much in advance.

Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Date: Tuesday, May 7, 2013 7:25 AM
To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Subject: Re: Giraph and Fair Scheduler

Can you disable the preemption for the giraph pool?  It's not great to preempt 
those tasks.

Avery

On 5/6/13 6:37 PM, Ramani, Arun wrote:
Hi,

I am running Fair scheduler with many applications in hadoop stack in my 
cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and 
want to run giraph along with those other applications. I have configured 
pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs 
submitted to this pool to wait to get the min share).

I am trying to run giraph in this mode. I see that jobs from other pools are 
getting pre-empted to give the giraph job's pool its configured min share but 
my job fails with "Unable to create native thread" error. This same job passes 
if the slots are available immediately without having to wait for the tasks 
from other queues to be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in 
this scenario.

Basically, I wanted to know how to configure giraph to wait for a threshold for 
the slots to be available for it through pre-emption.

Thanks
Arun Ramani




Re: Giraph and Fair Scheduler

2013-05-07 Thread Avery Ching
Oh, I see.  You can change the timeout of how long the giraph job waits 
for tasks before giving up.  Try setting 
giraph.maxMasterSuperstepWaitMsecs to a higher number. The default is 10 
minutes.


Avery

On 5/7/13 2:10 PM, Ramani, Arun wrote:

Hi Avery,

I am not preempting tasks out of the giraph pool. I have configured 
pre-emption so that any job submitted to giraph pool will get its min 
share. Any suggestion on how to make this work?


Thanks so much in advance.

Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org " 
mailto:user@giraph.apache.org>>

Date: Tuesday, May 7, 2013 7:25 AM
To: "user@giraph.apache.org " 
mailto:user@giraph.apache.org>>

Subject: Re: Giraph and Fair Scheduler

Can you disable the preemption for the giraph pool?  It's not great to 
preempt those tasks.


Avery

On 5/6/13 6:37 PM, Ramani, Arun wrote:

Hi,

I am running Fair scheduler with many applications in hadoop stack in 
my cluster (like pig, hive, hbase etc). I have dedicated a pool for 
Giraph and want to run giraph along with those other applications. I 
have configured pre-emption and and set the 
"minsharepreemptiontimeout=5" (sec – for the jobs submitted to this 
pool to wait to get the min share).


I am trying to run giraph in this mode. I see that jobs from other 
pools are getting pre-empted to give the giraph job's pool its 
configured min share but my job fails with "Unable to create native 
thread" error. This same job passes if the slots are available 
immediately without having to wait for the tasks from other queues to 
be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please 
help in this scenario.


Basically, I wanted to know how to configure giraph to wait for a 
threshold for the slots to be available for it through pre-emption.


Thanks
Arun Ramani






Re: Giraph and Fair Scheduler

2013-05-07 Thread Ramani, Arun
Hi Claudio,

Which timeout are you referring?

Thanks
Arun Ramani

From: Claudio Martella 
mailto:claudio.marte...@gmail.com>>
Reply-To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Date: Tuesday, May 7, 2013 5:23 AM
To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Subject: Re: Giraph and Fair Scheduler

it's probably an obvious question, but have you tried increasing the timeout?


On Tue, May 7, 2013 at 3:37 AM, Ramani, Arun 
mailto:aram...@paypal.com>> wrote:
Hi,

I am running Fair scheduler with many applications in hadoop stack in my 
cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and 
want to run giraph along with those other applications. I have configured 
pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs 
submitted to this pool to wait to get the min share).

I am trying to run giraph in this mode. I see that jobs from other pools are 
getting pre-empted to give the giraph job's pool its configured min share but 
my job fails with "Unable to create native thread" error. This same job passes 
if the slots are available immediately without having to wait for the tasks 
from other queues to be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in 
this scenario.

Basically, I wanted to know how to configure giraph to wait for a threshold for 
the slots to be available for it through pre-emption.

Thanks
Arun Ramani



--
   Claudio Martella
   claudio.marte...@gmail.com


Re: Giraph and Fair Scheduler

2013-05-07 Thread Ramani, Arun
Hi Avery,

I am not preempting tasks out of the giraph pool. I have configured pre-emption 
so that any job submitted to giraph pool will get its min share. Any suggestion 
on how to make this work?

Thanks so much in advance.

Arun Ramani

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Date: Tuesday, May 7, 2013 7:25 AM
To: "user@giraph.apache.org" 
mailto:user@giraph.apache.org>>
Subject: Re: Giraph and Fair Scheduler

Can you disable the preemption for the giraph pool?  It's not great to preempt 
those tasks.

Avery

On 5/6/13 6:37 PM, Ramani, Arun wrote:
Hi,

I am running Fair scheduler with many applications in hadoop stack in my 
cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and 
want to run giraph along with those other applications. I have configured 
pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs 
submitted to this pool to wait to get the min share).

I am trying to run giraph in this mode. I see that jobs from other pools are 
getting pre-empted to give the giraph job's pool its configured min share but 
my job fails with "Unable to create native thread" error. This same job passes 
if the slots are available immediately without having to wait for the tasks 
from other queues to be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in 
this scenario.

Basically, I wanted to know how to configure giraph to wait for a threshold for 
the slots to be available for it through pre-emption.

Thanks
Arun Ramani



Re: TestJsonBase64Format failure on 1.0.0

2013-05-07 Thread Avery Ching
Here's an easy way to calculate the number of map slots required.  
workers + 1.  So for instance if you only have 2 map slots, you can only 
run at most one worker. If you're running on your laptop, feel free to 
change the number of map slots to have you need.


Hope that helps,

Avery

On 5/7/13 11:40 AM, Kiru Pakkirisamy wrote:

Avery et al,
Please let me know if I should use Giraph only on a multi-node cluster 
(what is the min # of nodes ?)


Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On *Mon, 5/6/13, Kiru Pakkirisamy //* 
wrote:



From: Kiru Pakkirisamy 
Subject: TestJsonBase64Format failure on 1.0.0
To: user@giraph.apache.org
Date: Monday, May 6, 2013, 9:40 AM

I got over my compilation issues (thanks - @Avery, @Roman).
Now, I am trying to run the test and one pearticular test is failing.
I want to get to the bottom of this, because I am unable to run
the PageRank example. Maybe, it is because I have only on
tasktracker (?) (Apache pseudo-cluster on my ubuntu laptop).

Regards,
- kiru

2013-05-06 09:12:32,197 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (MAP) 'attempt_201305052325_0013_m_03_0' to tip
task_201305052325_0013_m_03, for tracker
'tracker_kiru-N53SV:localhost/127.0.0.1:42265'
2013-05-06 09:12:35,198 INFO
org.apache.hadoop.mapred.TaskInProgress: Error from
attempt_201305052325_0013_m_02_0:
java.lang.IllegalStateException: run: Caught an unrecoverable
exception waitFor: ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException: waitFor:
ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3
at

org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
at

org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
at

org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
at

org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
at

org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
at

org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
at
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
... 7 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalStateException: call: IOException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at

org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
at

org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
... 15 more
Caused by: java.lang.IllegalStateException: call: IOException
at

org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:172)
at

org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
at

org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/tmp/_giraphTests/testContinue/_logs (Is a directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:120)
at

org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
at

org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
at

org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:126)
at

Re: TestJsonBase64Format failure on 1.0.0

2013-05-07 Thread Kiru Pakkirisamy
Avery et al,Please let me know if I should use Giraph only on a multi-node 
cluster (what is the min # of nodes ?)

Regards,

- kiru



Kiru Pakkirisamy | webcloudtech.wordpress.com

--- On Mon, 5/6/13, Kiru Pakkirisamy  wrote:

From: Kiru Pakkirisamy 
Subject: TestJsonBase64Format failure on 1.0.0
To: user@giraph.apache.org
Date: Monday, May 6, 2013, 9:40 AM

I got over my compilation issues (thanks - @Avery, @Roman).Now, I am trying to 
run the test and one pearticular test is failing.I want to get to the bottom of 
this, because I am unable to run the PageRank example. Maybe, it is because I 
have only on tasktracker (?) (Apache pseudo-cluster on my ubuntu laptop).
Regards,- kiru
2013-05-06 09:12:32,197 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
(MAP) 'attempt_201305052325_0013_m_03_0' to tip 
task_201305052325_0013_m_03, for tracker
 'tracker_kiru-N53SV:localhost/127.0.0.1:42265'2013-05-06 09:12:35,198 INFO 
org.apache.hadoop.mapred.TaskInProgress: Error from 
attempt_201305052325_0013_m_02_0: java.lang.IllegalStateException: run: 
Caught an unrecoverable exception waitFor: ExecutionException occurred while 
waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3  
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)  at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)   at 
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
java.security.AccessController.doPrivileged(Native Method)   at 
javax.security.auth.Subject.doAs(Subject.java:396)   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)Caused by: 
java.lang.IllegalStateException: waitFor: ExecutionException occurred while 
waiting for
 org.apache.giraph.utils.ProgressableUtils$FutureWaitable@482d59a3  at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)   
 at 
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
at 
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
 at 
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
   at
 
org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
   at 
org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
at 
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)   at 
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)  at 
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) ... 7 moreCaused 
by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: 
call: IOExceptionat 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)   at 
java.util.concurrent.FutureTask.get(FutureTask.java:91)  at 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
 at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)   
 ... 15
 moreCaused by: java.lang.IllegalStateException: call: IOException  at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:172) 
 at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)  
 at 
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   at
 java.util.concurrent.FutureTask.run(FutureTask.java:138)   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
 at java.lang.Thread.run(Thread.java:662)Caused by: 
java.io.FileNotFoundException: /tmp/_giraphTests/testContinue/_logs (Is a 
directory) at java.io.FileInputStream.open(Native Method)  at 
java.io.FileInputStream.(FileInputStream.java:120) at 
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
   at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
   at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) 
   at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:126)
   at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) 
   at
 org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)  at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
  at 
org.apache.giraph.io.formats.TextVertexInputFormat$TextVertexReader.initialize(TextVertexInputFormat.java:96)
at
 
org.apache.giraph.io.formats.JsonBase64VertexInputFormat$JsonBase64VertexReader.initialize(JsonBase64VertexInputFormat.java:71)
at 
org.apache.giraph.

Re: Extra data on vertex

2013-05-07 Thread Claudio Martella
Keep in mind that you cannot access a neighbors value directly from a
vertex. What you are proposing now is possible because you are using the
vertex id to store your information (URL), which makes sense in the context
of a web page.
As soon as you will store data in the vertex value, as Avery suggest, you
will have to rely on messages to inform the neighbors of the value.


On Tue, May 7, 2013 at 4:47 PM, Ahmet Emre Aladağ wrote:

> Hi,
>
> 1) What's the best way for storing extra data (such as URL) on a vertex? I
> thought this would be through a class variable but I could not find the way
> to access that variable from the neighbor.
> For example I'd like to remove the duplicate edges going towards the nodes
> with the "same url" (Duplicate Removal phase of LinkRank). How can I learn
> my neighbor's url variable: targetUrl?
>
> 2) Is removing edges like this a valid approach?
>
>
> public class LinkRankVertex extends Vertex NullWritable, FloatWritable> {
>
> public String url;
> public void removeDuplicateLinks() {
> int targetId;
> String targetUrl;
>
> Set urls = new HashSet();
> ArrayListEdges edges = new
> ArrayListEdges();
>
> for (Edge edge : getEdges()) {
> targetId = edge.getTargetVertexId().get()**;
> targetUrl = ...??
> if (!urls.contains(targetUrl)) {
> urls.add(targetUrl);
> edges.add(edge);
> }
> }
> setEdges(edges);
> }
> }
>
> Thanks,
> Emre.
>
>


-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: Extra data on vertex

2013-05-07 Thread Avery Ching
Best way is to add it to the vertex value.  The vertex value is meant to 
store any data associated with a particular vertex.


Hope that helps,

Avery

On 5/7/13 7:47 AM, Ahmet Emre Aladağ wrote:

Hi,

1) What's the best way for storing extra data (such as URL) on a 
vertex? I thought this would be through a class variable but I could 
not find the way to access that variable from the neighbor.
For example I'd like to remove the duplicate edges going towards the 
nodes with the "same url" (Duplicate Removal phase of LinkRank). How 
can I learn my neighbor's url variable: targetUrl?


2) Is removing edges like this a valid approach?


public class LinkRankVertex extends Vertex {

public String url;
public void removeDuplicateLinks() {
int targetId;
String targetUrl;

Set urls = new HashSet();
ArrayListEdges edges = new 
ArrayListEdges();


for (Edge edge : getEdges()) {
targetId = edge.getTargetVertexId().get();
targetUrl = ...??
if (!urls.contains(targetUrl)) {
urls.add(targetUrl);
edges.add(edge);
}
}
setEdges(edges);
}
}

Thanks,
Emre.





Extra data on vertex

2013-05-07 Thread Ahmet Emre Aladağ

Hi,

1) What's the best way for storing extra data (such as URL) on a vertex? 
I thought this would be through a class variable but I could not find 
the way to access that variable from the neighbor.
For example I'd like to remove the duplicate edges going towards the 
nodes with the "same url" (Duplicate Removal phase of LinkRank). How can 
I learn my neighbor's url variable: targetUrl?


2) Is removing edges like this a valid approach?


public class LinkRankVertex extends Vertex {

public String url;
public void removeDuplicateLinks() {
int targetId;
String targetUrl;

Set urls = new HashSet();
ArrayListEdges edges = new 
ArrayListEdges();


for (Edge edge : getEdges()) {
targetId = edge.getTargetVertexId().get();
targetUrl = ...??
if (!urls.contains(targetUrl)) {
urls.add(targetUrl);
edges.add(edge);
}
}
setEdges(edges);
}
}

Thanks,
Emre.



Re: Giraph and Fair Scheduler

2013-05-07 Thread Avery Ching
Can you disable the preemption for the giraph pool?  It's not great to 
preempt those tasks.


Avery

On 5/6/13 6:37 PM, Ramani, Arun wrote:

Hi,

I am running Fair scheduler with many applications in hadoop stack in 
my cluster (like pig, hive, hbase etc). I have dedicated a pool for 
Giraph and want to run giraph along with those other applications. I 
have configured pre-emption and and set the 
"minsharepreemptiontimeout=5" (sec – for the jobs submitted to this 
pool to wait to get the min share).


I am trying to run giraph in this mode. I see that jobs from other 
pools are getting pre-empted to give the giraph job's pool its 
configured min share but my job fails with "Unable to create native 
thread" error. This same job passes if the slots are available 
immediately without having to wait for the tasks from other queues to 
be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please 
help in this scenario.


Basically, I wanted to know how to configure giraph to wait for a 
threshold for the slots to be available for it through pre-emption.


Thanks
Arun Ramani




Re: Giraph and Fair Scheduler

2013-05-07 Thread Claudio Martella
it's probably an obvious question, but have you tried increasing the
timeout?


On Tue, May 7, 2013 at 3:37 AM, Ramani, Arun  wrote:

>  Hi,
>
>  I am running Fair scheduler with many applications in hadoop stack in my
> cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and
> want to run giraph along with those other applications. I have configured
> pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the
> jobs submitted to this pool to wait to get the min share).
>
>  I am trying to run giraph in this mode. I see that jobs from other pools
> are getting pre-empted to give the giraph job's pool its configured min
> share but my job fails with "Unable to create native thread" error. This
> same job passes if the slots are available immediately without having to
> wait for the tasks from other queues to be pre-empted. I also tried to
> tweak the "giraph.minPercentResponded=50.0f". My Giraph job still fails.
> Please help in this scenario.
>
>  Basically, I wanted to know how to configure giraph to wait for a
> threshold for the slots to be available for it through pre-emption.
>
>  Thanks
> Arun Ramani
>



-- 
   Claudio Martella
   claudio.marte...@gmail.com