[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-09 Thread Severin Corsten (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101590#comment-13101590
 ] 

Severin Corsten commented on GIRAPH-11:
---

What hashing algorithm do you want to use for the hash-partitioning? Just a 
hash of the vertexid or a more complex think or is it the users choice?

How do you want to solve the messaging between the vertices when using hash 
partitioning? Will you store a Map with Vertex -> Worker or provide the hash 
algorithem to workers, so that they can identify the destination worker by 
themself? 

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101593#comment-13101593
 ] 

Avery Ching commented on GIRAPH-11:
---

The hash partitioning will be based on hashCode() by default, but the user can 
implement something they like as well based on the vertex id.  I am designing 
it to get hash based and hash range based.  In a pure hash-based distribution, 
you should get great load balancing.  In a hash-range based distribution, the 
user could possibly get some locality benefits without changing anything from 
the hash based partitioning.  Then finally, there should be a way for the user 
to do a pure range based split of the id space, but this requires the most work 
by the user to specify their division of the id space (depends on the type).

The hash based and hash-range based schemes will be implemented by default and 
will be selectable by users.  The range based scheme will be a partial 
implementation since we require users to do the id range partitioning.  
Additionally, we will provide the API for users to implement their own graph 
partitioning scheme.

Let me know what you think.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-09 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101602#comment-13101602
 ] 

Jakob Homan commented on GIRAPH-27:
---

+1. reviewboard on an iPad on hotel wifi sucks.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-09 Thread Severin Corsten (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101682#comment-13101682
 ] 

Severin Corsten commented on GIRAPH-11:
---

Can you clarify the difference between hash based and hash range based? Is the 
difference just partition operator, which is modulo for hash based and a rang 
function for hash range based?

I like the idea to use the hashCode() function and that the user has control 
about the used algorithm. I think that a better locality leads to better 
performance, because messages don't need to be sent over the network and no 
lookups have to be performed.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-09 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching resolved GIRAPH-25.
---

Resolution: Fixed

Not sure if I am supposed to close this issue, or the reporter should, but I'll 
close it since it's been committed.  Please reopen if there is an issue.

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101710#comment-13101710
 ] 

Avery Ching commented on GIRAPH-11:
---

Regarding the difference in hash based and hash rang based, it refers to how 
the hash code is assigned to a partition.  The application dev will implement 
hashCode() for their vertex id and then the assignment of the hashCode() to a 
partition can be hashed (i.e. hashCode() % # partitions) or range based 
([0-a),[a-b)...etc).  Hope that's more clear.  Code will help.  It's coming 
soon, by mid next week I hope.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-09 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101711#comment-13101711
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-25:
-

I think usually committer resolves the issue.

Thanks for taking the patch! I'm going to try and break Giraph in a few more 
ways this weekend :-)

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101950#comment-13101950
 ] 

Hudson commented on GIRAPH-27:
--

Integrated in Giraph-trunk-Commit #3 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/3/])
GIRAPH-27: Mutable static global state in Vertex.java should be
refactored. jake.mannix via aching.

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167420
Files : 
* /incubator/giraph/trunk/CHANGELOG
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java


> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101962#comment-13101962
 ] 

Avery Ching commented on GIRAPH-25:
---

Thanks for the advice.  I'll be doing the same this weekend =).

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira