[jira] [Updated] (GIRAPH-12) Investigate communication improvements

2011-10-01 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-12:
---

Attachment: GIRAPH-12_3.patch

Avery,

Thank you for your comments. Yes, I agreed that the aggregated wasted memory 
should be considered =)

In the latest comment, I investigated the real occupation of thread stack by 
using Sleep class (https://gist.github.com/1249761). When I created 2000 Sleep 
threads with the default stack size option '-Xss4096k', the memory usage of 
both the process and all threads is only 46 mega bytes.

So, I would like to say that the individual thread consumes much less stack 
size than default thread stack size. The default thread stack size affects the 
virtual memory area size. It is not resident memory size. The actual stack size 
per thread seems to be only affected by local variables and function 
invocations.

As a result, I guess that the memory problem is usually caused by outgoing 
messages kept in memory =)

Anyway, I attach the patch. The main difference from the previous patch is that 
the default number of thread pool is set to the number of workers - 1 if  
unset. Besides, I added more comments.

The unit tests are passed against the real hadoop cluster.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch, GIRAPH-12_3.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-45) Improve the way to keep outgoing messages

2011-10-01 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-45:
---

Description: 
As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
potential problem to cause out of memory when the rate of message generation is 
higher than the rate of message flush (or network bandwidth).

To overcome this problem, we need more eager strategy for message flushing or 
some approach to spill messages into disk.

The below link is Dmitriy's suggestion.
https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

  was:
As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
potential problem to cause out of memory may occur when the rate of message 
generation is higher than the rate of message flush (or network bandwidth).

To overcome this problem, we need more eager strategy for message flushing or 
some approach to spill messages into disk.

The below link is Dmitriy's suggestion.
https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253


> Improve the way to keep outgoing messages
> -
>
> Key: GIRAPH-45
> URL: https://issues.apache.org/jira/browse/GIRAPH-45
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Hyunsik Choi
>
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
> potential problem to cause out of memory when the rate of message generation 
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or 
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-79) Change the menu layout of the site

2011-11-13 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-79:
---

Attachment: GIRAPH-79_1.patch

> Change the menu layout of the site
> --
>
> Key: GIRAPH-79
> URL: https://issues.apache.org/jira/browse/GIRAPH-79
> Project: Giraph
>  Issue Type: Task
>  Components: site
>Reporter: Hyunsik Choi
>  Labels: site
> Attachments: GIRAPH-79_1.patch
>
>
> The current site has the basic menu layout generated by maven site plugin.
> This layout is restricted to embrace new contents.
> I would like to suggest the following menu layout.
> http://people.apache.org/~hyunsik/giraph/site/index.html
> Although the layout includes most existing contents, it has two addition 
> categories, Giraph and Documentation. I think that this layout is simpler and 
> is good to add new contents.
> Anyone has any other suggestions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-79) Change the menu layout of the site

2011-11-13 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-79:
---

Attachment: GIRAPH-79_2.patch

I put the project reports as a subitem in the project information.

http://people.apache.org/~hyunsik/giraph/site2/index.html

> Change the menu layout of the site
> --
>
> Key: GIRAPH-79
> URL: https://issues.apache.org/jira/browse/GIRAPH-79
> Project: Giraph
>  Issue Type: Task
>  Components: site
>Reporter: Hyunsik Choi
>  Labels: site
> Attachments: GIRAPH-79_1.patch, GIRAPH-79_2.patch
>
>
> The current site has the basic menu layout generated by maven site plugin.
> This layout is restricted to embrace new contents.
> I would like to suggest the following menu layout.
> http://people.apache.org/~hyunsik/giraph/site/index.html
> Although the layout includes most existing contents, it has two addition 
> categories, Giraph and Documentation. I think that this layout is simpler and 
> is good to add new contents.
> Anyone has any other suggestions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-68) Implement a Graph Generator

2011-11-16 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-68:
---

Attachment: GIRAPH-68_1.patch

I attached the patch. GraphGenerator class writes a generated graph data into a 
specific HDFS directory by using existing Input/OutputFormats.

> Implement a Graph Generator
> ---
>
> Key: GIRAPH-68
> URL: https://issues.apache.org/jira/browse/GIRAPH-68
> Project: Giraph
>  Issue Type: New Feature
>  Components: benchmark
>Affects Versions: 0.70.0
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Attachments: GIRAPH-68_1.patch
>
>
> To provide users with benchmark environments and to deeply test the 
> input/output system of giraph, we need a graph generator. We will enable the 
> graph generator to generate various kinds of graph data sets by specifying a 
> VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-68) Implement a Graph Generator

2011-11-17 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-68:
---

Attachment: GIRAPH-68_2.patch

Avery,

Thank you for review.

I think that the GraphGenerator is necessary to test the overall of IO-related 
sub systems. For example, *InputFormat and Partitioners can be examined by some 
generated data set instead of PseudoRandomVertexInputFormat.

As you mentioned, I modified PageRank/RandomMessageBenchmark to use a specified 
InputFormat and an input path. If the input format and input path are not 
given, they will work as the current implementation using 
PseudoRandomVertexInputFormat.

> Implement a Graph Generator
> ---
>
> Key: GIRAPH-68
> URL: https://issues.apache.org/jira/browse/GIRAPH-68
> Project: Giraph
>  Issue Type: New Feature
>  Components: benchmark
>Affects Versions: 0.70.0
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Attachments: GIRAPH-68_1.patch, GIRAPH-68_2.patch
>
>
> To provide users with benchmark environments and to deeply test the 
> input/output system of giraph, we need a graph generator. We will enable the 
> graph generator to generate various kinds of graph data sets by specifying a 
> VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira