Re: Problem processing large graph

2014-09-11 Thread Tripti Singh
Hi Matthew,
I am running it on a cluster and so the hardware resources are shared. I don’t 
think I can control that beyond asking for a bigger sized container, which I 
already did (4GB container).
Am I missing some configuration Do we have an alternative solution?
Also, it was working fine with older version of Hadoop and giraph. System is 
now upgraded to new hadoop2 and so is the giraph(hadoop_yarn profile)

Thanks,
Tripti

From: Matthew Saltz mailto:sal...@gmail.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
mailto:user@giraph.apache.org>>
Date: Thursday, September 11, 2014 at 5:19 PM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
mailto:user@giraph.apache.org>>
Subject: Re: Problem processing large graph


Hi Tripti,

How many machines are you running on? The ideal configuration would be one 
worker per machine and one separate machine for the master. If you're using 
more mappers than machines then you're using more resources than necessary, and 
fixing that could help.

Best,
Matthew

El 11/09/2014 13:39, "Tripti Singh" 
mailto:tri...@yahoo-inc.com>> escribió:
Hi Avery,
Thanks for your reply.
I did adjust the heap and container size to a higher value(3072MB and 4096MB 
respectively) and I am not running the out of core option as well.
I am intermittently able to run the job with 200 mappers. At other times, I can 
run part of the data while the other part gets stalled.
FYI, I am using netty without authentication.
One thing that I have noticed, however, is that it mostly runs successfully 
when the queue is running almost idle or is comparatively free. On most 
instances when the queue is running more tasks or is over-allocated, my job 
stalls even when the required number of containers are allocated. Looking at 
the logs, I mostly find it stalling at Superstep 0 or 1 after finishing 
Superstep –1. Or sometimes, even at –1.
Could there be some shared resources in the queue which are not enough for the 
job while the job runs on a loaded queue and can I configure some other value 
to make it run?

Tripti Singh
Tech Yahoo, Software Sys Dev Eng
P: +91 080.30516197  M: +91 
9611907150
Yahoo Software Development India Pvt. Ltd Torrey Pines Bangalore 560 071

[cid:5A227D25-C441-497E-87F7-79628F751CDF]

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
mailto:user@giraph.apache.org>>
Date: Wednesday, September 3, 2014 at 10:53 PM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
mailto:user@giraph.apache.org>>
Subject: Re: Problem processing large graph

Hi Tripti,

Is there a chance you can use higher memory machines so you don't run out of 
core?  We do it this way at Facebook.  We've haven't tested the out-of-core 
option.

Avery

On 8/31/14, 2:34 PM, Tripti Singh wrote:
Hi,
I am able to successfully build hadoop_yarn profile for running Giraph 1.1.
I am also able to test run Connected Components on a small dataset.
However, I am seeing 2 issues while running on a bigger dataset with 400 
mappers:

  1.  I am unable to use out of Core Graph option. It errors out saying that it 
cannot read INIT partition. (Sorry I don’t have the log currently but I will 
share after I run that again).
I am expecting that if the out of Core option is fixed, I should be able to run 
the workflow with less mappers.
  2.  In order to run the workflow anyhow, I removed the out of Core option and 
adjusted the heap size. This also runs with smaller dataset but fails with huge 
dataset.
Worker logs are mostly empty. Non-empty logs end like this:
mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
[STATUS: task-374] setup: Beginning worker setup. setup: Log level remains at 
info
[STATUS: task-374] setup: Initializing Zookeeper services.
mapred.job.id<http://mapred.job.id> is deprecated.
Instead, use mapreduce.job.id<http://mapreduce.job.id> job.local.dir is 
deprecated.
Instead, use mapreduce.job.local.dir
[STATUS: task-374] setup: Setting up Zookeeper manager.
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
createCandidateStamp: Creating my filestamp 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com<http://gsta33201.tan.ygrid.yahoo.com>
 374
getZooKeeperServerList: For task 374, got file 'null' (polling period is 3000)

Master log has log statements for launching the container, opening proxy and 
processing event like this:
Opening proxy : 
gsta31118.tan.ygrid.yahoo.com:8041<http://gsta31118.tan.ygrid.yahoo.com:8041>
Processing Event EventType: QUERY_CONTAINER for Container 
container_1407992474095_708614_01_000314
……

I am not using SASL authentication.
Any idea what might be wrong?

Thanks,
Tripti.





Re: Problem processing large graph

2014-09-11 Thread Matthew Saltz
Hi Tripti,

How many machines are you running on? The ideal configuration would be one
worker per machine and one separate machine for the master. If you're using
more mappers than machines then you're using more resources than necessary,
and fixing that could help.

Best,
Matthew
El 11/09/2014 13:39, "Tripti Singh"  escribió:

>  Hi Avery,
> Thanks for your reply.
> I did adjust the heap and container size to a higher value(3072MB and
> 4096MB respectively) and I am not running the out of core option as well.
> I am intermittently able to run the job with 200 mappers. At other times,
> I can run part of the data while the other part gets stalled.
> FYI, I am using netty without authentication.
> One thing that I have noticed, however, is that it mostly runs
> successfully when the queue is running almost idle or is comparatively
> free. On most instances when the queue is running more tasks or is
> over-allocated, my job stalls even when the required number of containers
> are allocated. Looking at the logs, I mostly find it stalling at Superstep
> 0 or 1 after finishing Superstep –1. Or sometimes, even at –1.
> Could there be some shared resources in the queue which are not enough for
> the job while the job runs on a loaded queue and can I configure some other
> value to make it run?
>
>  Tripti Singh
> Tech Yahoo, Software Sys Dev Eng
> P: +91 080.30516197  M: +91 9611907150
> Yahoo Software Development India Pvt. Ltd Torrey Pines Bangalore 560 071
>
>
>   From: Avery Ching 
> Reply-To: "user@giraph.apache.org" 
> Date: Wednesday, September 3, 2014 at 10:53 PM
> To: "user@giraph.apache.org" 
> Subject: Re: Problem processing large graph
>
>   Hi Tripti,
>
> Is there a chance you can use higher memory machines so you don't run out
> of core?  We do it this way at Facebook.  We've haven't tested the
> out-of-core option.
>
> Avery
>
> On 8/31/14, 2:34 PM, Tripti Singh wrote:
>
>  Hi,
>  I am able to successfully build hadoop_yarn profile for running Giraph
> 1.1.
>  I am also able to test run Connected Components on a small dataset.
>  However, I am seeing 2 issues while running on a bigger dataset with 400
> mappers:
>
>1. I am unable to use out of Core Graph option. It errors out saying
>that it cannot read INIT partition. (Sorry I don’t have the log currently
>but I will share after I run that again).
>I am expecting that if the out of Core option is fixed, I should be
>able to run the workflow with less mappers.
>2. In order to run the workflow anyhow, I removed the out of Core
>option and adjusted the heap size. This also runs with smaller dataset but
>fails with huge dataset.
>Worker logs are mostly empty. Non-empty logs end like this:
>
>
>
>
>
>
>
>
>
>
> *mapred.task.partition is deprecated. Instead, use
>mapreduce.task.partition [STATUS: task-374] setup: Beginning worker setup.
>setup: Log level remains at info [STATUS: task-374] setup: Initializing
>Zookeeper services. mapred.job.id <http://mapred.job.id> is deprecated.
>Instead, use mapreduce.job.id <http://mapreduce.job.id> job.local.dir is
>deprecated. Instead, use mapreduce.job.local.dir [STATUS: task-374] setup:
>Setting up Zookeeper manager. createCandidateStamp: Made the directory
>_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
>createCandidateStamp: Made the directory
>
> _bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
>createCandidateStamp: Creating my filestamp
>
> _bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com
><http://gsta33201.tan.ygrid.yahoo.com> 374 getZooKeeperServerList: For task
>374, got file 'null' (polling period is 3000) *
>
>Master log has log statements for launching the container, opening
>proxy and processing event like this:
>
>
> *Opening proxy : gsta31118.tan.ygrid.yahoo.com:8041
><http://gsta31118.tan.ygrid.yahoo.com:8041>  Processing Event EventType:
>QUERY_CONTAINER for Container container_1407992474095_708614_01_000314 ……*
>
> I am not using SASL authentication.
> Any idea what might be wrong?
>
>  Thanks,
> Tripti.
>
>
>
>


Re: Problem processing large graph

2014-09-11 Thread Tripti Singh
Hi Avery,
Thanks for your reply.
I did adjust the heap and container size to a higher value(3072MB and 4096MB 
respectively) and I am not running the out of core option as well.
I am intermittently able to run the job with 200 mappers. At other times, I can 
run part of the data while the other part gets stalled.
FYI, I am using netty without authentication.
One thing that I have noticed, however, is that it mostly runs successfully 
when the queue is running almost idle or is comparatively free. On most 
instances when the queue is running more tasks or is over-allocated, my job 
stalls even when the required number of containers are allocated. Looking at 
the logs, I mostly find it stalling at Superstep 0 or 1 after finishing 
Superstep –1. Or sometimes, even at –1.
Could there be some shared resources in the queue which are not enough for the 
job while the job runs on a loaded queue and can I configure some other value 
to make it run?

Tripti Singh
Tech Yahoo, Software Sys Dev Eng
P: +91 080.30516197  M: +91 9611907150
Yahoo Software Development India Pvt. Ltd Torrey Pines Bangalore 560 071

[cid:5A227D25-C441-497E-87F7-79628F751CDF]

From: Avery Ching mailto:ach...@apache.org>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
mailto:user@giraph.apache.org>>
Date: Wednesday, September 3, 2014 at 10:53 PM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
mailto:user@giraph.apache.org>>
Subject: Re: Problem processing large graph

Hi Tripti,

Is there a chance you can use higher memory machines so you don't run out of 
core?  We do it this way at Facebook.  We've haven't tested the out-of-core 
option.

Avery

On 8/31/14, 2:34 PM, Tripti Singh wrote:
Hi,
I am able to successfully build hadoop_yarn profile for running Giraph 1.1.
I am also able to test run Connected Components on a small dataset.
However, I am seeing 2 issues while running on a bigger dataset with 400 
mappers:

  1.  I am unable to use out of Core Graph option. It errors out saying that it 
cannot read INIT partition. (Sorry I don’t have the log currently but I will 
share after I run that again).
I am expecting that if the out of Core option is fixed, I should be able to run 
the workflow with less mappers.
  2.  In order to run the workflow anyhow, I removed the out of Core option and 
adjusted the heap size. This also runs with smaller dataset but fails with huge 
dataset.
Worker logs are mostly empty. Non-empty logs end like this:
mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
[STATUS: task-374] setup: Beginning worker setup. setup: Log level remains at 
info
[STATUS: task-374] setup: Initializing Zookeeper services.
mapred.job.id is deprecated.
Instead, use mapreduce.job.id job.local.dir is deprecated.
Instead, use mapreduce.job.local.dir
[STATUS: task-374] setup: Setting up Zookeeper manager.
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
createCandidateStamp: Creating my filestamp 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com
 374
getZooKeeperServerList: For task 374, got file 'null' (polling period is 3000)

Master log has log statements for launching the container, opening proxy and 
processing event like this:
Opening proxy : gsta31118.tan.ygrid.yahoo.com:8041
Processing Event EventType: QUERY_CONTAINER for Container 
container_1407992474095_708614_01_000314
……

I am not using SASL authentication.
Any idea what might be wrong?

Thanks,
Tripti.





Re: Problem processing large graph

2014-09-03 Thread Avery Ching

Hi Tripti,

Is there a chance you can use higher memory machines so you don't run 
out of core?  We do it this way at Facebook.  We've haven't tested the 
out-of-core option.


Avery

On 8/31/14, 2:34 PM, Tripti Singh wrote:

Hi,
I am able to successfully build hadoop_yarn profile for running Giraph 
1.1.

I am also able to test run Connected Components on a small dataset.
However, I am seeing 2 issues while running on a bigger dataset with 
400 mappers:


 1. I am unable to use out of Core Graph option. It errors out saying
that it cannot read INIT partition. (Sorry I don’t have the log
currently but I will share after I run that again).
I am expecting that if the out of Core option is fixed, I should
be able to run the workflow with less mappers.
 2. In order to run the workflow anyhow, I removed the out of Core
option and adjusted the heap size. This also runs with smaller
dataset but fails with huge dataset.
Worker logs are mostly empty. Non-empty logs end like this:
*mapred.task.partition is deprecated. Instead, use
mapreduce.task.partition
[STATUS: task-374] setup: Beginning worker setup. setup: Log level
remains at info
[STATUS: task-374] setup: Initializing Zookeeper services.
mapred.job.id is deprecated.
Instead, use mapreduce.job.id job.local.dir is deprecated.
Instead, use mapreduce.job.local.dir
[STATUS: task-374] setup: Setting up Zookeeper manager.
createCandidateStamp: Made the directory
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
createCandidateStamp: Made the directory

_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
createCandidateStamp: Creating my filestamp

_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com
374
getZooKeeperServerList: For task 374, got file 'null' (polling
period is 3000) *

Master log has log statements for launching the container, opening
proxy and processing event like this:
*Opening proxy : gsta31118.tan.ygrid.yahoo.com:8041
Processing Event EventType: QUERY_CONTAINER for Container
container_1407992474095_708614_01_000314
……*

I am not using SASL authentication.
Any idea what might be wrong?

Thanks,
Tripti.






Problem processing large graph

2014-08-31 Thread Tripti Singh
Hi,
I am able to successfully build hadoop_yarn profile for running Giraph 1.1.
I am also able to test run Connected Components on a small dataset.
However, I am seeing 2 issues while running on a bigger dataset with 400 
mappers:

  1.  I am unable to use out of Core Graph option. It errors out saying that it 
cannot read INIT partition. (Sorry I don’t have the log currently but I will 
share after I run that again).
I am expecting that if the out of Core option is fixed, I should be able to run 
the workflow with less mappers.
  2.  In order to run the workflow anyhow, I removed the out of Core option and 
adjusted the heap size. This also runs with smaller dataset but fails with huge 
dataset.
Worker logs are mostly empty. Non-empty logs end like this:
mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
[STATUS: task-374] setup: Beginning worker setup. setup: Log level remains at 
info
[STATUS: task-374] setup: Initializing Zookeeper services.
mapred.job.id is deprecated.
Instead, use mapreduce.job.id job.local.dir is deprecated.
Instead, use mapreduce.job.local.dir
[STATUS: task-374] setup: Setting up Zookeeper manager.
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
createCandidateStamp: Made the directory 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
createCandidateStamp: Creating my filestamp 
_bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com
 374
getZooKeeperServerList: For task 374, got file 'null' (polling period is 3000)

Master log has log statements for launching the container, opening proxy and 
processing event like this:
Opening proxy : gsta31118.tan.ygrid.yahoo.com:8041
Processing Event EventType: QUERY_CONTAINER for Container 
container_1407992474095_708614_01_000314
……

I am not using SASL authentication.
Any idea what might be wrong?

Thanks,
Tripti.