Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Maja Kabiljo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
---


Alessandro, can you please try the same test with a lot of vertices (you only 
had 10 vertices here)? If vertices are visited in random order in the 
prepareSuperstep, I think you should have much more disk operations.

> Also, addPartition() would then have to read all vertex ids even when the 
> partition is in memory, which would make it way slower in the standard use 
> case.
What do you mean by this?


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java


You could end up with more than maxInMemoryPartitions.


- Maja Kabiljo


On Aug. 21, 2012, 11:02 p.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 21, 2012, 11:02 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to 
> putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
> except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions 
> in memory, and spill the remaining ones to disk. Each partition is stored in 
> a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the 
> file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a 
> partition held in memory only requires a read-lock (since Partition is 
> thread-safe), while creating a new partition, moving it in and out of core or 
> appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this 
> also shows what code we get rid of) instead of working around it. I suppose 
> the Netty security patch will be completed soon. If not, I will restore RPC 
> compatibility.
> 
> More here: 
> https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
> https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/grap

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

2012-08-22 Thread Maja Kabiljo (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maja Kabiljo updated GIRAPH-293:


Attachment: GIRAPH-293.patch

Forgot to mark as 'Patch Available' last time.

I'm just rebasing it now. Do I need to upload new patch when only line numbers 
and revision ids are changed?

> Should aggregators be checkpointed?
> ---
>
> Key: GIRAPH-293
> URL: https://issues.apache.org/jira/browse/GIRAPH-293
> Project: Giraph
>  Issue Type: Bug
>Reporter: Alessandro Presta
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are 
> kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which 
> currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators 
> should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or 
> they are always reset at each superstep. None of these is happening, but the 
> error cancels out with the fact that we are not actually resuming from a 
> checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

2012-08-22 Thread Alessandro Presta (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439414#comment-13439414
 ] 

Alessandro Presta commented on GIRAPH-293:
--

If there's no merge conflict, you don't need to post the new one.

> Should aggregators be checkpointed?
> ---
>
> Key: GIRAPH-293
> URL: https://issues.apache.org/jira/browse/GIRAPH-293
> Project: Giraph
>  Issue Type: Bug
>Reporter: Alessandro Presta
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are 
> kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which 
> currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators 
> should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or 
> they are always reset at each superstep. None of these is happening, but the 
> error cancels out with the fact that we are not actually resuming from a 
> checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Presta updated GIRAPH-249:
-

Attachment: GIRAPH-249.patch

Updated to match review https://reviews.apache.org/r/5987/diff/10/

> Move part of the graph out-of-core when memory is low
> -
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Alessandro Presta
>Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping 
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of 
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate 
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job 
> (albeit slowly) instead of failing when the graph is too big, while still 
> encouraging memory optimizations and high-memory clusters; or restructuring 
> Giraph to be as efficient as possible in disk mode, making it almost a 
> standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Presta updated GIRAPH-249:
-

Attachment: GIRAPH-249.patch

https://reviews.apache.org/r/5987/diff/11/

> Move part of the graph out-of-core when memory is low
> -
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Alessandro Presta
>Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping 
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of 
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate 
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job 
> (albeit slowly) instead of failing when the graph is too big, while still 
> encouraging memory optimizations and high-memory clusters; or restructuring 
> Giraph to be as efficient as possible in disk mode, making it almost a 
> standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

2012-08-22 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439475#comment-13439475
 ] 

Maja Kabiljo commented on GIRAPH-293:
-

https://reviews.apache.org/r/6731/diff/#index_header
Sadly it can't show which code was moved from one file to another.

> Should aggregators be checkpointed?
> ---
>
> Key: GIRAPH-293
> URL: https://issues.apache.org/jira/browse/GIRAPH-293
> Project: Giraph
>  Issue Type: Bug
>Reporter: Alessandro Presta
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are 
> kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which 
> currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators 
> should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or 
> they are always reset at each superstep. None of these is happening, but the 
> error cancels out with the fact that we are not actually resuming from a 
> checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-211) Add secure authentication to Netty IPC

2012-08-22 Thread Alessandro Presta (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439481#comment-13439481
 ] 

Alessandro Presta commented on GIRAPH-211:
--

Any update on this? I don't know the schedule for cutting the 0.2 release, but 
I'd say this is a prerequisite: not only we get rid of a lot of cruft, but also 
the API currently includes methods (putMessages/getMessages) that are used only 
by the Hadoop RPC implementation.

> Add secure authentication to Netty IPC
> --
>
> Key: GIRAPH-211
> URL: https://issues.apache.org/jira/browse/GIRAPH-211
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Fix For: 0.2.0
>
> Attachments: GIRAPH-211.patch, GIRAPH-211-proposal.txt
>
>
> Gianmarco De Francisci Morales asked on the user list:
> bq. I am getting the exception in the subject when running my giraph program
> bq. on a cluster with Kerberos authentication.
> This leads to the idea of having Kerberos authentication supported within 
> GIRAPH. Hopefully it would use our fast GIRAPH-37 IPC, but could also 
> interoperate with Hadoop security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-310) Simplify vertex initialization

2012-08-22 Thread Alessandro Presta (JIRA)
Alessandro Presta created GIRAPH-310:


 Summary: Simplify vertex initialization
 Key: GIRAPH-310
 URL: https://issues.apache.org/jira/browse/GIRAPH-310
 Project: Giraph
  Issue Type: Improvement
Reporter: Alessandro Presta
Assignee: Alessandro Presta


Once we get rid of Hadoop RPC, there is little sense in having the user define 
Vertex#initialize() in the API. The only method that has to be implemented 
should be Vertex#putEdges(), and initialize(id, value, edges) can be defined in 
terms of that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-311) Master halting in superstep 0 is ignored by workers

2012-08-22 Thread Maja Kabiljo (JIRA)
Maja Kabiljo created GIRAPH-311:
---

 Summary: Master halting in superstep 0 is ignored by workers
 Key: GIRAPH-311
 URL: https://issues.apache.org/jira/browse/GIRAPH-311
 Project: Giraph
  Issue Type: Bug
Reporter: Maja Kabiljo


As one of our users noticed, if master.compute() halts computation in superstep 
0 workers ignore this and application doesn't terminate. 
This happens because return value of BspServiceWorker.finishSuperstep() is 
ignored during the setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Giraph Job "Task attempt_* failed to report status" Problem

2012-08-22 Thread Vishal Patel
After several supersteps, sometimes a worker thread dies (say it ran out of
memory). Zookeeper waits for ~5 mins (600 seconds) and then decides that
the worker is not responsive and fails the entire job. At this point if you
have a checkpoint saved it will resume from there otherwise you have to
start from scratch.

If you run the job again it should successfully finish (or it might error
at some other superstep / worker combination).

Vishal



On Tue, Aug 21, 2012 at 10:12 PM, Amani Alonazi
wrote:

> Hi all,
>
> I'm running a minimum spanning tree compute function on Hadoop cluster (20
> machines). After certain supersteps (e.g. superstep 47 for a graph of
> 4,194,304 vertices and 181,566,970 edges), the execution time increased
> dramatically. This is not the only problem, the job has been killed "Task
> attempt_* failed to report status for 601 seconds. Killing! "
>
> I disabled the checkpoint feature by setting the
> "CHECKPOINT_FREQUENCY_DEFAULT = 0" in GiraphJob.java. I don't need to write
> any data to disk neither snapshots nor output. I tested the algorithm on
> sample graph of 7 vertices and it works well.
>
> Is there any way to profile or debug Giraph job?
> In the Giraph Stats the "Aggregate finished vertices" counter is it for
> the vertices which voted to halt? Also the "sent messages" counter, is it
> per each superstep or the total msgs?
> If a vertex vote to halt, will it be activated upon receiving messages?
>
> Thanks a lot!
>
> Best,
> Amani AlOnazi
> MSc Computer Science
> King Abdullah University of Science and Technology
> Kingdom of Saudi Arabia
>
>
> --
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.


Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you 
> > only had 10 vertices here)? If vertices are visited in random order in the 
> > prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the 
> > > partition is in memory, which would make it way slower in the standard 
> > > use case.
> > What do you mean by this?

Yeah, 10 was an unfortunate choice (more partitions than vertices!), I guess 
last night I was really too tired :P
Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 
edges/vertex and got the same pattern):
http://pastebin.com/raw.php?i=jGBzaZA8
So we're loading each out-of-core partition twice. I get this same result with 
different numbers of in-memory partitions. I added some logging and it looks 
like MessageStore#getDestinationVertices() is returning vertices grouped by 
partition. Do you have any idea why? I wonder if it's because of hashing 
(messages are stored in a hash-map indexed by vertex id, and partitions are 
formed by hashing vertex ids).
An adversarial configuration could make us load partitions back and forth in a 
random fashion.

Regarding addPartition, I mean that whenever we add a partition in memory, we 
currently simply move the reference (fast), whereas if we need to keep track of 
vertex ids we would have to copy them all in a global map. Anyway hold on, I'll 
see if I can do something about this. I'm mainly concerned with code calling 
Partition#putVertex() directly though, I see no way to disallow it.


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java,
> >  lines 235-236
> > 
> >
> > You could end up with more than maxInMemoryPartitions.

I see, we could have two threads concurrently write to inMemoryPartitions. 
Fixed by synchronizing on inMemoryPartitions and returning early upon 
successful put().


- Alessandro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
---


On Aug. 21, 2012, 11:02 p.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 21, 2012, 11:02 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to 
> putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
> except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions 
> in memory, and spill the remaining ones to disk. Each partition is stored in 
> a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the 
> file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a 
> partition held in memory only requires a read-lock (since Partition is 
> thread-safe), while creating a new partition, moving it in and out of core or 
> appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this 
> also shows what code we get rid of) instead of working around it. I suppose 
> the Netty security patch will be completed soon. If not, I will restore RPC 
> compatibility.
> 
> More here: 
> https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
> https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
>  1375453 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWork

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
---

(Updated Aug. 22, 2012, 10:34 a.m.)


Review request for giraph and Avery Ching.


Changes
---

This fixes the potential problem with maxPartitionsInMemory.
Will have a look into keeping vertex ids and stats in PartitionStore before 
finalizing.


Description
---

I gave this another shot. This time it plays nicely with input superstep: I 
replaced both the temporary partitions and the worker partition map with a 
common data structure, PartitionStore, held in ServerData. This is similar to 
what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to 
putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in 
memory, and spill the remaining ones to disk. Each partition is stored in a 
separate file.
Adding vertices to an out-of-core partition consists in appending them to the 
file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition 
held in memory only requires a read-lock (since Partition is thread-safe), 
while creating a new partition, moving it in and out of core or appending 
vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also 
shows what code we get rid of) instead of working around it. I suppose the 
Netty security patch will be completed soon. If not, I will restore RPC 
compatibility.

More here: 
https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
---

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and 
DiskBackedPartitionStore


Thanks,

Alessandro Presta



Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Maja Kabiljo


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you 
> > only had 10 vertices here)? If vertices are visited in random order in the 
> > prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the 
> > > partition is in memory, which would make it way slower in the standard 
> > > use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
> Yeah, 10 was an unfortunate choice (more partitions than vertices!), I 
> guess last night I was really too tired :P
> Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 
> edges/vertex and got the same pattern):
> http://pastebin.com/raw.php?i=jGBzaZA8
> So we're loading each out-of-core partition twice. I get this same result 
> with different numbers of in-memory partitions. I added some logging and it 
> looks like MessageStore#getDestinationVertices() is returning vertices 
> grouped by partition. Do you have any idea why? I wonder if it's because of 
> hashing (messages are stored in a hash-map indexed by vertex id, and 
> partitions are formed by hashing vertex ids).
> An adversarial configuration could make us load partitions back and forth 
> in a random fashion.
> 
> Regarding addPartition, I mean that whenever we add a partition in 
> memory, we currently simply move the reference (fast), whereas if we need to 
> keep track of vertex ids we would have to copy them all in a global map. 
> Anyway hold on, I'll see if I can do something about this. I'm mainly 
> concerned with code calling Partition#putVertex() directly though, I see no 
> way to disallow it.

Oh right, it's because SimpleMessageStore keeps messages grouped by partition, 
and then when you call getDestinationVertices it just appends all of them. If 
you try using out-of-core messaging, you should see much different results.

I see now what you are saying for addPartition, but it doesn't look like a big 
deal to me.


- Maja


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
---


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to 
> putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
> except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions 
> in memory, and spill the remaining ones to disk. Each partition is stored in 
> a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the 
> file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a 
> partition held in memory only requires a read-lock (since Partition is 
> thread-safe), while creating a new partition, moving it in and out of core or 
> appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this 
> also shows what code we get rid of) instead of working around it. I suppose 
> the Netty security patch will be completed soon. If not, I will restore RPC 
> compatibility.
> 
> More here: 
> https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
> https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/jav

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you 
> > only had 10 vertices here)? If vertices are visited in random order in the 
> > prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the 
> > > partition is in memory, which would make it way slower in the standard 
> > > use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
> Yeah, 10 was an unfortunate choice (more partitions than vertices!), I 
> guess last night I was really too tired :P
> Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 
> edges/vertex and got the same pattern):
> http://pastebin.com/raw.php?i=jGBzaZA8
> So we're loading each out-of-core partition twice. I get this same result 
> with different numbers of in-memory partitions. I added some logging and it 
> looks like MessageStore#getDestinationVertices() is returning vertices 
> grouped by partition. Do you have any idea why? I wonder if it's because of 
> hashing (messages are stored in a hash-map indexed by vertex id, and 
> partitions are formed by hashing vertex ids).
> An adversarial configuration could make us load partitions back and forth 
> in a random fashion.
> 
> Regarding addPartition, I mean that whenever we add a partition in 
> memory, we currently simply move the reference (fast), whereas if we need to 
> keep track of vertex ids we would have to copy them all in a global map. 
> Anyway hold on, I'll see if I can do something about this. I'm mainly 
> concerned with code calling Partition#putVertex() directly though, I see no 
> way to disallow it.
> 
> Maja Kabiljo wrote:
> Oh right, it's because SimpleMessageStore keeps messages grouped by 
> partition, and then when you call getDestinationVertices it just appends all 
> of them. If you try using out-of-core messaging, you should see much 
> different results.
> 
> I see now what you are saying for addPartition, but it doesn't look like 
> a big deal to me.

Having to keep all the vertex ids in memory as Writables seems a big overhead 
to me (both memory, and time to keep it updated for every single operation). I 
think it would be better to choose a wise access pattern instead.
I see that MessageStoreByPartition has a getPartitionDestinationVertices(int 
partitionId). Can't we make that a top-level requirement so that we can do:

for each partition {
   for each destination vertex in partition {
   resolve vertex
   }
}


- Alessandro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
---


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to 
> putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
> except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions 
> in memory, and spill the remaining ones to disk. Each partition is stored in 
> a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the 
> file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a 
> partition held in memory only requires a read-lock (since Partition is 
> thread-safe), while creating a new partition, moving it in and out of core or 
> appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this 
> also shows what code we get rid of) instead of working around it. I suppose 
> the Netty security patch will be completed soon. If not, I will restore RPC 
> compatibility.
> 
> More here: 
> https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
> https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/gira

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Maja Kabiljo


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you 
> > only had 10 vertices here)? If vertices are visited in random order in the 
> > prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the 
> > > partition is in memory, which would make it way slower in the standard 
> > > use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
> Yeah, 10 was an unfortunate choice (more partitions than vertices!), I 
> guess last night I was really too tired :P
> Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 
> edges/vertex and got the same pattern):
> http://pastebin.com/raw.php?i=jGBzaZA8
> So we're loading each out-of-core partition twice. I get this same result 
> with different numbers of in-memory partitions. I added some logging and it 
> looks like MessageStore#getDestinationVertices() is returning vertices 
> grouped by partition. Do you have any idea why? I wonder if it's because of 
> hashing (messages are stored in a hash-map indexed by vertex id, and 
> partitions are formed by hashing vertex ids).
> An adversarial configuration could make us load partitions back and forth 
> in a random fashion.
> 
> Regarding addPartition, I mean that whenever we add a partition in 
> memory, we currently simply move the reference (fast), whereas if we need to 
> keep track of vertex ids we would have to copy them all in a global map. 
> Anyway hold on, I'll see if I can do something about this. I'm mainly 
> concerned with code calling Partition#putVertex() directly though, I see no 
> way to disallow it.
> 
> Maja Kabiljo wrote:
> Oh right, it's because SimpleMessageStore keeps messages grouped by 
> partition, and then when you call getDestinationVertices it just appends all 
> of them. If you try using out-of-core messaging, you should see much 
> different results.
> 
> I see now what you are saying for addPartition, but it doesn't look like 
> a big deal to me.
> 
> Alessandro Presta wrote:
> Having to keep all the vertex ids in memory as Writables seems a big 
> overhead to me (both memory, and time to keep it updated for every single 
> operation). I think it would be better to choose a wise access pattern 
> instead.
> I see that MessageStoreByPartition has a 
> getPartitionDestinationVertices(int partitionId). Can't we make that a 
> top-level requirement so that we can do:
> 
> for each partition {
>for each destination vertex in partition {
>resolve vertex
>}
> }

Right, we can do that - ServerData.getCurrentMessageStore() returns 
MessageStoreByPartition. Looking at the code again, I remembered that the outer 
out-of-core message store also appends vertices of its inner partition message 
stores, so it should be O(p) reads there also.

Still, for most of applications I think it would be smaller overhead to keep 
vertex ids in memory than to have to have twice as many partition reads from 
disk. But I agree that this could be avoided with some code redesign, maybe 
doing vertex resolutions and computations for partition together. Anyway, since 
this can be only 2x and not much more as I first thought, and with in-core 
there is no overhead, I'm ok with it now. Just something to think about in the 
future.


- Maja


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
---


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to 
> putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
> except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions 
> in memory, and spill the remaining ones to disk. Each partition is stored in 
> a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the 
> file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operatio

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you 
> > only had 10 vertices here)? If vertices are visited in random order in the 
> > prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the 
> > > partition is in memory, which would make it way slower in the standard 
> > > use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
> Yeah, 10 was an unfortunate choice (more partitions than vertices!), I 
> guess last night I was really too tired :P
> Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 
> edges/vertex and got the same pattern):
> http://pastebin.com/raw.php?i=jGBzaZA8
> So we're loading each out-of-core partition twice. I get this same result 
> with different numbers of in-memory partitions. I added some logging and it 
> looks like MessageStore#getDestinationVertices() is returning vertices 
> grouped by partition. Do you have any idea why? I wonder if it's because of 
> hashing (messages are stored in a hash-map indexed by vertex id, and 
> partitions are formed by hashing vertex ids).
> An adversarial configuration could make us load partitions back and forth 
> in a random fashion.
> 
> Regarding addPartition, I mean that whenever we add a partition in 
> memory, we currently simply move the reference (fast), whereas if we need to 
> keep track of vertex ids we would have to copy them all in a global map. 
> Anyway hold on, I'll see if I can do something about this. I'm mainly 
> concerned with code calling Partition#putVertex() directly though, I see no 
> way to disallow it.
> 
> Maja Kabiljo wrote:
> Oh right, it's because SimpleMessageStore keeps messages grouped by 
> partition, and then when you call getDestinationVertices it just appends all 
> of them. If you try using out-of-core messaging, you should see much 
> different results.
> 
> I see now what you are saying for addPartition, but it doesn't look like 
> a big deal to me.
> 
> Alessandro Presta wrote:
> Having to keep all the vertex ids in memory as Writables seems a big 
> overhead to me (both memory, and time to keep it updated for every single 
> operation). I think it would be better to choose a wise access pattern 
> instead.
> I see that MessageStoreByPartition has a 
> getPartitionDestinationVertices(int partitionId). Can't we make that a 
> top-level requirement so that we can do:
> 
> for each partition {
>for each destination vertex in partition {
>resolve vertex
>}
> }
> 
> Maja Kabiljo wrote:
> Right, we can do that - ServerData.getCurrentMessageStore() returns 
> MessageStoreByPartition. Looking at the code again, I remembered that the 
> outer out-of-core message store also appends vertices of its inner partition 
> message stores, so it should be O(p) reads there also.
> 
> Still, for most of applications I think it would be smaller overhead to 
> keep vertex ids in memory than to have to have twice as many partition reads 
> from disk. But I agree that this could be avoided with some code redesign, 
> maybe doing vertex resolutions and computations for partition together. 
> Anyway, since this can be only 2x and not much more as I first thought, and 
> with in-core there is no overhead, I'm ok with it now. Just something to 
> think about in the future.

Exactly what I was thinking: first of all let's make sure the access pattern is 
optimal. Then we can try packing different phases together so that we load each 
partition once, do vertex resolutions and compute, move on to the next 
partition.
Note that we should do the same for mutations, but we should first treat 
mutation requests like messages and group them by partition (we can also argue 
that a mutation-intensive algorithm would need the same out-of-core 
functionality that we have for messages).

Will submit an updated patch shortly.


- Alessandro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
---


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
---

(Updated Aug. 22, 2012, 12:36 p.m.)


Review request for giraph and Avery Ching.


Changes
---

Iterate over messages by partition in prepareSuperstep().


Description
---

I gave this another shot. This time it plays nicely with input superstep: I 
replaced both the temporary partitions and the worker partition map with a 
common data structure, PartitionStore, held in ServerData. This is similar to 
what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to 
putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in 
memory, and spill the remaining ones to disk. Each partition is stored in a 
separate file.
Adding vertices to an out-of-core partition consists in appending them to the 
file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition 
held in memory only requires a read-lock (since Partition is thread-safe), 
while creating a new partition, moving it in and out of core or appending 
vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also 
shows what code we get rid of) instead of working around it. I suppose the 
Netty security patch will be completed soon. If not, I will restore RPC 
compatibility.

More here: 
https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
---

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and 
DiskBackedPartitionStore


Thanks,

Alessandro Presta



Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Alessandro Presta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
---

(Updated Aug. 22, 2012, 1 p.m.)


Review request for giraph and Avery Ching.


Changes
---

Replaced one last usage of MapMaker with Maps.newConcurrentMap.


Description
---

I gave this another shot. This time it plays nicely with input superstep: I 
replaced both the temporary partitions and the worker partition map with a 
common data structure, PartitionStore, held in ServerData. This is similar to 
what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to 
putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in 
memory, and spill the remaining ones to disk. Each partition is stored in a 
separate file.
Adding vertices to an out-of-core partition consists in appending them to the 
file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition 
held in memory only requires a read-lock (since Partition is thread-safe), 
while creating a new partition, moving it in and out of core or appending 
vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also 
shows what code we get rid of) instead of working around it. I suppose the 
Netty security patch will be completed soon. If not, I will restore RPC 
compatibility.

More here: 
https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
 1375843 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
---

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and 
DiskBackedPartitionStore


Thanks,

Alessandro Presta



Review Request: Fixing checkpointing for aggregators

2012-08-22 Thread Maja Kabiljo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6731/
---

Review request for giraph.


Description
---

Making aggregators work correctly with checkpointing - saving the aggregator 
name, class, value and whether it's persistent. Apart from that, I removed the 
code for aggregators handling from BspServiceWorker and BspServiceMaster to 
separate classes, since I think it's cleaner this way, and those two classes do 
too much different stuff as it is. But that's the reason why the patch looks 
big. Later with GIRAPH-273 AggregatorHandler classes should become more 
independent of BspServices.


This addresses bug GIRAPH-293.
https://issues.apache.org/jira/browse/GIRAPH-293


Diffs
-

  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/AggregatorHandler.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/MasterAggregatorHandler.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerAggregatorHandler.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestAggregatorsHandling.java
 1375970 
  
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/TestAggregatorsHandling.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/6731/diff/


Testing
---

I added test for aggregator serialization and manual restarting from checkpoint 
(that one also relies on recent GIRAPH-296 and GIRAPH-298 working). The patch 
passes mvn verify and tests in pseudo-distributed mode.


Thanks,

Maja Kabiljo



Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-22 Thread Maja Kabiljo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10621
---


Looks good to me now. Again, great work!

- Maja Kabiljo


On Aug. 22, 2012, 1 p.m., Alessandro Presta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> ---
> 
> (Updated Aug. 22, 2012, 1 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> ---
> 
> I gave this another shot. This time it plays nicely with input superstep: I 
> replaced both the temporary partitions and the worker partition map with a 
> common data structure, PartitionStore, held in ServerData. This is similar to 
> what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to 
> putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, 
> except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions 
> in memory, and spill the remaining ones to disk. Each partition is stored in 
> a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the 
> file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a 
> partition held in memory only requires a read-lock (since Partition is 
> thread-safe), while creating a new partition, moving it in and out of core or 
> appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this 
> also shows what code we get rid of) instead of working around it. I suppose 
> the Netty security patch will be completed soon. If not, I will restore RPC 
> compatibility.
> 
> More here: 
> https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
> https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
>  1375843 
>   
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apa

[jira] [Updated] (GIRAPH-301) InputSplit Reservations are clumping, leaving many workers asleep while other process too many splits and get overloaded.

2012-08-22 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-301:
---

Attachment: GIRAPH-301-7.patch

just a quick rebase, no changes.

> InputSplit Reservations are clumping, leaving many workers asleep while other 
> process too many splits and get overloaded.
> -
>
> Key: GIRAPH-301
> URL: https://issues.apache.org/jira/browse/GIRAPH-301
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp, graph, zookeeper
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>  Labels: patch
> Fix For: 0.2.0
>
> Attachments: GIRAPH-301-1.patch, GIRAPH-301-2.patch, 
> GIRAPH-301-3.patch, GIRAPH-301-4.patch, GIRAPH-301-5.patch, 
> GIRAPH-301-6.patch, GIRAPH-301-7.patch
>
>
> With recent additions to the codebase, users here have noticed many workers 
> are able to load input splits extremely quickly, and this has altered the 
> behavior of Giraph during INPUT_SUPERSTEP when using the current algorithm 
> for split reservations. A few workers process multiple splits (often 
> overwhelming Netty and getting GC errors as they attempt to offload too much 
> data too quick) while many (often most) of the others just sleep through the 
> superstep, never successfully participating at all.
> Essentially, the current algo is:
> 1. scan input split list, skipping nodes that are marked "Finsihed"
> 2. grab the first unfinished node in the list (reserved or not) and check its 
> reserved status.
> 3. if not reserved, attempt to reserve & return it if successful.
> 4. if the first one you check is already taken, sleep for way too long and 
> only wake up if another worker finishes a split, then contend with that 
> worker for another split, while the majority of the split list might sit 
> idle, not actually checked or claimed by anyone yet.
> This does not work. By making a few simple changes (and acknowledging that ZK 
> reads are cheap, only writes are not) this patch is able to get every worker 
> involved, and keep them in the game, ensuring that the INPUT_SUPERSTEP passes 
> quickly and painlessly, and without overwhelming Netty by spreading the 
> memory load the split readers bear more evenly. If the giraph.splitmb and -w 
> options are set correctly, behavior is now exactly as one would expect it to 
> be.
> This also results in INPUT_SUPERSTEP passing more quickly, and survive the 
> INPUT_SUPERSTEP for a given data load on less Hadoop memory slots.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-211) Add secure authentication to Netty IPC

2012-08-22 Thread Eugene Koontz (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439703#comment-13439703
 ] 

Eugene Koontz commented on GIRAPH-211:
--

Hi Alessandro, I am working actively on it and hope to have a patch ready in 
the next day or so.
-Eugene

> Add secure authentication to Netty IPC
> --
>
> Key: GIRAPH-211
> URL: https://issues.apache.org/jira/browse/GIRAPH-211
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Fix For: 0.2.0
>
> Attachments: GIRAPH-211.patch, GIRAPH-211-proposal.txt
>
>
> Gianmarco De Francisci Morales asked on the user list:
> bq. I am getting the exception in the subject when running my giraph program
> bq. on a cluster with Kerberos authentication.
> This leads to the idea of having Kerberos authentication supported within 
> GIRAPH. Hopefully it would use our fast GIRAPH-37 IPC, but could also 
> interoperate with Hadoop security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-211) Add secure authentication to Netty IPC

2012-08-22 Thread Alessandro Presta (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439739#comment-13439739
 ] 

Alessandro Presta commented on GIRAPH-211:
--

Awesome, looking forward to it!

> Add secure authentication to Netty IPC
> --
>
> Key: GIRAPH-211
> URL: https://issues.apache.org/jira/browse/GIRAPH-211
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Fix For: 0.2.0
>
> Attachments: GIRAPH-211.patch, GIRAPH-211-proposal.txt
>
>
> Gianmarco De Francisci Morales asked on the user list:
> bq. I am getting the exception in the subject when running my giraph program
> bq. on a cluster with Kerberos authentication.
> This leads to the idea of having Kerberos authentication supported within 
> GIRAPH. Hopefully it would use our fast GIRAPH-37 IPC, but could also 
> interoperate with Hadoop security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-312) Giraph needs an admin script

2012-08-22 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-312:
--

 Summary: Giraph needs an admin script
 Key: GIRAPH-312
 URL: https://issues.apache.org/jira/browse/GIRAPH-312
 Project: Giraph
  Issue Type: New Feature
  Components: conf and scripts, zookeeper
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Minor
 Fix For: 0.2.0


Our zookeeper instances have very long uptimes on our cluster, and failed job 
trees are never cleaned from memory. There is a separate shell script to do 
this, but its not picky about which node trees it erases, and on some systems 
some Giraph users may not have access to it.

This patch will add a shell script to activate a new class which will use 
Giraph conf file options or our normal -Dgiraph.XYZ command-line opts to get 
the ZK quorum info, and clean out the remnants from its memory of old failed 
and killed jobs. They do pile up over time.

This led to the larger idea that Giraph needs a general giraph-admin shell 
script as a home for stuff like this. Jakob suggested it would be a good idea 
to put this into such a script since then admin groups can be created so that 
not every Giraph client can run it. This script currently only has code to 
start up the zk cleaner, but can have more options added to it as JIRA's to add 
features crop up.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-312) Giraph needs an admin script

2012-08-22 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-312:
---

Attachment: GIRAPH-312-1.patch

> Giraph needs an admin script
> 
>
> Key: GIRAPH-312
> URL: https://issues.apache.org/jira/browse/GIRAPH-312
> Project: Giraph
>  Issue Type: New Feature
>  Components: conf and scripts, zookeeper
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-312-1.patch
>
>
> Our zookeeper instances have very long uptimes on our cluster, and failed job 
> trees are never cleaned from memory. There is a separate shell script to do 
> this, but its not picky about which node trees it erases, and on some systems 
> some Giraph users may not have access to it.
> This patch will add a shell script to activate a new class which will use 
> Giraph conf file options or our normal -Dgiraph.XYZ command-line opts to get 
> the ZK quorum info, and clean out the remnants from its memory of old failed 
> and killed jobs. They do pile up over time.
> This led to the larger idea that Giraph needs a general giraph-admin shell 
> script as a home for stuff like this. Jakob suggested it would be a good idea 
> to put this into such a script since then admin groups can be created so that 
> not every Giraph client can run it. This script currently only has code to 
> start up the zk cleaner, but can have more options added to it as JIRA's to 
> add features crop up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira