Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Claudio Martella

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7169
---


Looks good to me, wound't hurt to see some stress test to check performance, 
although I wouldn't expect this to be slower than the synchronized block. Also, 
I'd agree that moving the messages directly from the inMessages to the Vertex 
could be a memory win.

- Claudio


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260416#comment-13260416
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-185:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7169
---


Looks good to me, wound't hurt to see some stress test to check performance, 
although I wouldn't expect this to be slower than the synchronized block. Also, 
I'd agree that moving the messages directly from the inMessages to the Vertex 
could be a memory win.

- Claudio


On 2012-04-24 06:11:38, Bo Wang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4852/
bq.  ---
bq.  
bq.  (Updated 2012-04-24 06:11:38)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess 
to message map. The concurrencyLevel of ConcurrentHashMap uses the default 
value. There may be some performance gain by tuning this value.
bq.  
bq.  
bq.  This addresses bug GIRAPH-185.
bq.  https://issues.apache.org/jira/browse/GIRAPH-185
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1328747 
bq.  
bq.  Diff: https://reviews.apache.org/r/4852/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Bo
bq.  
bq.



> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-170) Workflow for loading RDF graph data into Giraph

2012-04-24 Thread Benjamin Heitmann (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260512#comment-13260512
 ] 

Benjamin Heitmann commented on GIRAPH-170:
--

Thanks guys for your comments!

Paolo: I will take a look at Jena RIOT for inferencing. 

Sebastian: I did not know that it is possible to assign one mapper to each core 
in Hadoop, I will try that for sure. Also, my algorithm does only use a part of 
the graph when it runs. So that might be the easiest explanation for the 
observed behavior. 

Claudio: Thanks for the suggestion, I will further investigate the issue, and 
provide an update when I know whats going on. 

> Workflow for loading RDF graph data into Giraph
> ---
>
> Key: GIRAPH-170
> URL: https://issues.apache.org/jira/browse/GIRAPH-170
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Dan Brickley
>Priority: Minor
>
> W3C RDF provides a family of Web standards for exchanging graph-based data. 
> RDF uses sets of simple binary relationships, labeling nodes and links with 
> Web identifiers (URIs). Many public datasets are available as RDF, including 
> the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many 
> such datasets are listed at http://thedatahub.org/
> RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
> line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
> language is Turtle. Apache Jena and Any23 provide software to handle all 
> these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/
> This JIRA leaves open the strategy for loading RDF data into Giraph. There 
> are various possibilites, including exploitation of intermediate 
> Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a 
> more Giraph-friendly form, or writing custom loaders. Even a HOWTO document 
> or implementor notes here would be an advance on the current state of the 
> art. The BluePrints Graph API (Gremlin etc.) has also been aligned with 
> various RDF datasources.
> Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
> touches on the issue (since we can't currently easily represent fully general 
> RDF graphs since two nodes might be connected by more than one typed edge). 
> Even without multigraphs it ought to be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the Movies + 
> People subset of a big RDF collection.
> From Avery in email: "a helper VertexInputFormat (and maybe 
> VertexOutputFormat) would certainly [despite GIRAPH-141] still help"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-170) Workflow for loading RDF graph data into Giraph

2012-04-24 Thread Benjamin Heitmann (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260517#comment-13260517
 ] 

Benjamin Heitmann commented on GIRAPH-170:
--

In addition, just for the record: 

There is actually a Jira issue about enabling the TextInputFormat class to 
modify a vertex which it has already created. (That was pointed out in an email 
on the list.) 

That jira issue is here: https://issues.apache.org/jira/browse/GIRAPH-155
"Allow creation of graph by adding edges that span multiple workers"


> Workflow for loading RDF graph data into Giraph
> ---
>
> Key: GIRAPH-170
> URL: https://issues.apache.org/jira/browse/GIRAPH-170
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Dan Brickley
>Priority: Minor
>
> W3C RDF provides a family of Web standards for exchanging graph-based data. 
> RDF uses sets of simple binary relationships, labeling nodes and links with 
> Web identifiers (URIs). Many public datasets are available as RDF, including 
> the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many 
> such datasets are listed at http://thedatahub.org/
> RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
> line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
> language is Turtle. Apache Jena and Any23 provide software to handle all 
> these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/
> This JIRA leaves open the strategy for loading RDF data into Giraph. There 
> are various possibilites, including exploitation of intermediate 
> Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a 
> more Giraph-friendly form, or writing custom loaders. Even a HOWTO document 
> or implementor notes here would be an advance on the current state of the 
> art. The BluePrints Graph API (Gremlin etc.) has also been aligned with 
> various RDF datasources.
> Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
> touches on the issue (since we can't currently easily represent fully general 
> RDF graphs since two nodes might be connected by more than one typed edge). 
> Even without multigraphs it ought to be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the Movies + 
> People subset of a big RDF collection.
> From Avery in email: "a helper VertexInputFormat (and maybe 
> VertexOutputFormat) would certainly [despite GIRAPH-141] still help"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Bo Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260749#comment-13260749
 ] 

Bo Wang commented on GIRAPH-185:


Thanks for looking at this, Claudio. I may do some perf tests and post the 
results. I also found inMessages and vertex message list can be merged to save 
memory and time. I will create a Jira and work on it.

> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-188) Merge inMessages and vertex message list

2012-04-24 Thread Bo Wang (JIRA)
Bo Wang created GIRAPH-188:
--

 Summary: Merge inMessages and vertex message list
 Key: GIRAPH-188
 URL: https://issues.apache.org/jira/browse/GIRAPH-188
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0


Currently received messages will firstly be stored in transientInMessages and 
then merged and moved to inMessages before finally copied to vertex message 
list. The copy from inMessages to vertex message list can be avoided to save 
time and space.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


Bo, I'm a little leery about converting the List and ArrayList to 
LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
more memory than the array list due to the double links (forward and backward). 
 Also, is ConcurrentLinkedList supposted to outperform a synchronized 
ArrayList?  I haven't seen much on that.

The concurrenthashmap changes look good.


- Avery


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261004#comment-13261004
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-185:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


Bo, I'm a little leery about converting the List and ArrayList to 
LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
more memory than the array list due to the double links (forward and backward). 
 Also, is ConcurrentLinkedList supposted to outperform a synchronized 
ArrayList?  I haven't seen much on that.

The concurrenthashmap changes look good.


- Avery


On 2012-04-24 06:11:38, Bo Wang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4852/
bq.  ---
bq.  
bq.  (Updated 2012-04-24 06:11:38)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess 
to message map. The concurrencyLevel of ConcurrentHashMap uses the default 
value. There may be some performance gain by tuning this value.
bq.  
bq.  
bq.  This addresses bug GIRAPH-185.
bq.  https://issues.apache.org/jira/browse/GIRAPH-185
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1328747 
bq.  
bq.  Diff: https://reviews.apache.org/r/4852/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Bo
bq.  
bq.



> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Bo Wang


> On 2012-04-24 20:53:33, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
> >  lines 776-777
> > 
> >
> > Bo, I'm a little leery about converting the List and ArrayList to 
> > LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
> > more memory than the array list due to the double links (forward and 
> > backward).  Also, is ConcurrentLinkedList supposted to outperform a 
> > synchronized ArrayList?  I haven't seen much on that.
> > 
> > The concurrenthashmap changes look good.

Avery, thanks for the comments. I just measured the sizes of these classes and 
below are an estimation. 

java.util.ArrayList: 149 bytes
java.util.LinkedList: 101 bytes
java.util.concurrent.ConcurrentLinkedQueue: 118 bytes

The tool I was using is a program from the link below.
http://www.javapractices.com/topic/TopicAction.do?Id=83

In terms of performance, here is a benchmark.
http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html

In its test #1 (adding element), ConcurrentLinkedQueue performed slightly 
better than LinkedList. In test #3 (iterator), LinkedList outperformed 
ConcurrentLinkedQueue. I think the most time consuming part is add, while 
iteration is also heavily used but no concurrent accesses. 


- Bo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261029#comment-13261029
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-185:
--



bq.  On 2012-04-24 20:53:33, Avery Ching wrote:
bq.  > 
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
 lines 776-777
bq.  > 
bq.  >
bq.  > Bo, I'm a little leery about converting the List and ArrayList to 
LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
more memory than the array list due to the double links (forward and backward). 
 Also, is ConcurrentLinkedList supposted to outperform a synchronized 
ArrayList?  I haven't seen much on that.
bq.  > 
bq.  > The concurrenthashmap changes look good.

Avery, thanks for the comments. I just measured the sizes of these classes and 
below are an estimation. 

java.util.ArrayList: 149 bytes
java.util.LinkedList: 101 bytes
java.util.concurrent.ConcurrentLinkedQueue: 118 bytes

The tool I was using is a program from the link below.
http://www.javapractices.com/topic/TopicAction.do?Id=83

In terms of performance, here is a benchmark.
http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html

In its test #1 (adding element), ConcurrentLinkedQueue performed slightly 
better than LinkedList. In test #3 (iterator), LinkedList outperformed 
ConcurrentLinkedQueue. I think the most time consuming part is add, while 
iteration is also heavily used but no concurrent accesses. 


- Bo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4852/
bq.  ---
bq.  
bq.  (Updated 2012-04-24 06:11:38)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess 
to message map. The concurrencyLevel of ConcurrentHashMap uses the default 
value. There may be some performance gain by tuning this value.
bq.  
bq.  
bq.  This addresses bug GIRAPH-185.
bq.  https://issues.apache.org/jira/browse/GIRAPH-185
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1328747 
bq.  
bq.  Diff: https://reviews.apache.org/r/4852/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Bo
bq.  
bq.



> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Avery Ching


> On 2012-04-24 20:53:33, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
> >  lines 776-777
> > 
> >
> > Bo, I'm a little leery about converting the List and ArrayList to 
> > LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
> > more memory than the array list due to the double links (forward and 
> > backward).  Also, is ConcurrentLinkedList supposted to outperform a 
> > synchronized ArrayList?  I haven't seen much on that.
> > 
> > The concurrenthashmap changes look good.
> 
> Bo Wang wrote:
> Avery, thanks for the comments. I just measured the sizes of these 
> classes and below are an estimation. 
> 
> java.util.ArrayList: 149 bytes
> java.util.LinkedList: 101 bytes
> java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
> 
> The tool I was using is a program from the link below.
> http://www.javapractices.com/topic/TopicAction.do?Id=83
> 
> In terms of performance, here is a benchmark.
> 
> http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
> 
> In its test #1 (adding element), ConcurrentLinkedQueue performed slightly 
> better than LinkedList. In test #3 (iterator), LinkedList outperformed 
> ConcurrentLinkedQueue. I think the most time consuming part is add, while 
> iteration is also heavily used but no concurrent accesses. 
> 
>

Thanks for the response Bo.

Those numbers are for the empty data structures I'm assuming.  I was referring 
to the incremental cost of adding elements (messages) to the data structures.  
The performance isn't a a concern to me (unless we call size() somewhere).


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261046#comment-13261046
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-185:
--



bq.  On 2012-04-24 20:53:33, Avery Ching wrote:
bq.  > 
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
 lines 776-777
bq.  > 
bq.  >
bq.  > Bo, I'm a little leery about converting the List and ArrayList to 
LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
more memory than the array list due to the double links (forward and backward). 
 Also, is ConcurrentLinkedList supposted to outperform a synchronized 
ArrayList?  I haven't seen much on that.
bq.  > 
bq.  > The concurrenthashmap changes look good.
bq.  
bq.  Bo Wang wrote:
bq.  Avery, thanks for the comments. I just measured the sizes of these 
classes and below are an estimation. 
bq.  
bq.  java.util.ArrayList: 149 bytes
bq.  java.util.LinkedList: 101 bytes
bq.  java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
bq.  
bq.  The tool I was using is a program from the link below.
bq.  http://www.javapractices.com/topic/TopicAction.do?Id=83
bq.  
bq.  In terms of performance, here is a benchmark.
bq.  
http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
bq.  
bq.  In its test #1 (adding element), ConcurrentLinkedQueue performed 
slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed 
ConcurrentLinkedQueue. I think the most time consuming part is add, while 
iteration is also heavily used but no concurrent accesses. 
bq.  
bq.

Thanks for the response Bo.

Those numbers are for the empty data structures I'm assuming.  I was referring 
to the incremental cost of adding elements (messages) to the data structures.  
The performance isn't a a concern to me (unless we call size() somewhere).


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4852/
bq.  ---
bq.  
bq.  (Updated 2012-04-24 06:11:38)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess 
to message map. The concurrencyLevel of ConcurrentHashMap uses the default 
value. There may be some performance gain by tuning this value.
bq.  
bq.  
bq.  This addresses bug GIRAPH-185.
bq.  https://issues.apache.org/jira/browse/GIRAPH-185
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1328747 
bq.  
bq.  Diff: https://reviews.apache.org/r/4852/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Bo
bq.  
bq.



> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Avery Ching


> On 2012-04-24 20:53:33, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
> >  lines 776-777
> > 
> >
> > Bo, I'm a little leery about converting the List and ArrayList to 
> > LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
> > more memory than the array list due to the double links (forward and 
> > backward).  Also, is ConcurrentLinkedList supposted to outperform a 
> > synchronized ArrayList?  I haven't seen much on that.
> > 
> > The concurrenthashmap changes look good.
> 
> Bo Wang wrote:
> Avery, thanks for the comments. I just measured the sizes of these 
> classes and below are an estimation. 
> 
> java.util.ArrayList: 149 bytes
> java.util.LinkedList: 101 bytes
> java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
> 
> The tool I was using is a program from the link below.
> http://www.javapractices.com/topic/TopicAction.do?Id=83
> 
> In terms of performance, here is a benchmark.
> 
> http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
> 
> In its test #1 (adding element), ConcurrentLinkedQueue performed slightly 
> better than LinkedList. In test #3 (iterator), LinkedList outperformed 
> ConcurrentLinkedQueue. I think the most time consuming part is add, while 
> iteration is also heavily used but no concurrent accesses. 
> 
>
> 
> Avery Ching wrote:
> Thanks for the response Bo.
> 
> Those numbers are for the empty data structures I'm assuming.  I was 
> referring to the incremental cost of adding elements (messages) to the data 
> structures.  The performance isn't a a concern to me (unless we call size() 
> somewhere).

By the incremental cost, I mean the memory cost, sorry.


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261054#comment-13261054
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-185:
--



bq.  On 2012-04-24 20:53:33, Avery Ching wrote:
bq.  > 
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
 lines 776-777
bq.  > 
bq.  >
bq.  > Bo, I'm a little leery about converting the List and ArrayList to 
LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
more memory than the array list due to the double links (forward and backward). 
 Also, is ConcurrentLinkedList supposted to outperform a synchronized 
ArrayList?  I haven't seen much on that.
bq.  > 
bq.  > The concurrenthashmap changes look good.
bq.  
bq.  Bo Wang wrote:
bq.  Avery, thanks for the comments. I just measured the sizes of these 
classes and below are an estimation. 
bq.  
bq.  java.util.ArrayList: 149 bytes
bq.  java.util.LinkedList: 101 bytes
bq.  java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
bq.  
bq.  The tool I was using is a program from the link below.
bq.  http://www.javapractices.com/topic/TopicAction.do?Id=83
bq.  
bq.  In terms of performance, here is a benchmark.
bq.  
http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
bq.  
bq.  In its test #1 (adding element), ConcurrentLinkedQueue performed 
slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed 
ConcurrentLinkedQueue. I think the most time consuming part is add, while 
iteration is also heavily used but no concurrent accesses. 
bq.  
bq.
bq.  
bq.  Avery Ching wrote:
bq.  Thanks for the response Bo.
bq.  
bq.  Those numbers are for the empty data structures I'm assuming.  I was 
referring to the incremental cost of adding elements (messages) to the data 
structures.  The performance isn't a a concern to me (unless we call size() 
somewhere).

By the incremental cost, I mean the memory cost, sorry.


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4852/
bq.  ---
bq.  
bq.  (Updated 2012-04-24 06:11:38)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess 
to message map. The concurrencyLevel of ConcurrentHashMap uses the default 
value. There may be some performance gain by tuning this value.
bq.  
bq.  
bq.  This addresses bug GIRAPH-185.
bq.  https://issues.apache.org/jira/browse/GIRAPH-185
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1328747 
bq.  
bq.  Diff: https://reviews.apache.org/r/4852/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Bo
bq.  
bq.



> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-172) Javadoc for BasicVertex:compute link to compute is broken

2012-04-24 Thread Abhishek Srivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Srivastava updated GIRAPH-172:
---

Attachment: GIRAPH-172.patch

Fixed the broken javadoc link in BasicVertex.

Testing Done: 
 - mvn compile
 - mvn test

> Javadoc for BasicVertex:compute link to compute is broken
> -
>
> Key: GIRAPH-172
> URL: https://issues.apache.org/jira/browse/GIRAPH-172
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Priority: Trivial
>  Labels: newbie
> Attachments: GIRAPH-172.patch
>
>
> In BasicVertex the JavaDoc link to #compute can't be resolved:
> {code} /**
>* Release unnecessary resources (will be called after vertex returns from
>* {@link #compute()})
>*/
>   abstract void releaseResources();{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-173) BspCase:getNumWorkers javadoc refers to non-existent parameter

2012-04-24 Thread Abhishek Srivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Srivastava updated GIRAPH-173:
---

Attachment: GIRAPH-173.patch

Fixed broken comment for getNumWorkers.

Testing Done: 

 - mvn compile
 - mvn test

> BspCase:getNumWorkers javadoc refers to non-existent parameter
> --
>
> Key: GIRAPH-173
> URL: https://issues.apache.org/jira/browse/GIRAPH-173
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Priority: Trivial
>  Labels: newbie
> Attachments: GIRAPH-173.patch
>
>
> {code}  /**
>* Get the number of workers used in the BSP application
>*
>* @param numProcs number of processes to use
>*/
>   public int getNumWorkers() {
> return numWorkers;
>   }{code}
> numProcs is a lie...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira