[ 
https://issues.apache.org/jira/browse/GIRAPH-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723824#comment-15723824
 ] 

ASF GitHub Bot commented on GIRAPH-1125:
----------------------------------------

GitHub user heslami opened a pull request:

    https://github.com/apache/giraph/pull/12

    [GIRAPH-1125] Memory estimation mechanism for more efficient OOC execution

    **Thanks to Dionysios Logothetis for helping a lot with this diff.**
    
    The new out-of-core mechanism is designed with the adaptivity goal in mind, 
meaning that we wanted the out-of-core mechanism to kick in only when it is 
necessary. In other words, when the amount of data (graph, messages, and 
mutations) all fit in memory, we want to take advantage of the entire memory. 
And, when in a stage the memory is short, only enough (minimal) amount of data 
goes out of core (to disk). This ensures a good performance for the out-of-core 
mechanism.
    
    To satisfy the adaptiveness goal, we need to know how much memory is used 
at each point of time. The default out-of-core mechanism (ThresholdBasedOracle) 
get memory information based on JVM's internal methods (Runtime's 
freeMemory()). This method is inaccurate (and pessimistic), meaning that it 
does not account for garbage data that has not been purged by GC. Using JVM's 
default methods, OOC behaves pessimistically and move data out of core even if 
it is not necessary. For instance,
    consider the case where there are a lot of garbage on the heap, but GC has 
not happened for a while. In this case, the default OOC pushes data on disk and 
immediately after a major GC it brings back the data to memory. This causes 
inefficiency in the default out of core mechanism. If out-of-core is used, but 
the data can entirely fit in memory, the job goes out of core even though going 
out of core is not necessary.
    
    To address this issue, we need to have a mechanism to more accurately know 
how much of heap is filled with non-garbage data. Consequently, we need to 
change the Oracle (OOC policy) to take advantage of a more accurate memory 
usage estimation.
    
    In this diff, we introduce a mechanism to estimate the amount of memory 
used by non-garbage data on the heap at each point of time. This estimation is 
based on the fact that Giraph is a data-parallel system in its essence, meaning 
that several types of threads exist, each type doing the same computation on 
various data. More specifically, we have compute/input threads, communication 
(receiving) threads, and OOC-IO threads. In a normal uniform execution, each 
type of threads behave
    similarly and contribute similarly to each other on the memory footprint 
(meaning that different compute threads contribute similarly to each other on 
the memory footprint). In the proposed approach, we use a measure of progress 
for each type of thread and use linear regression to estimate the amount of 
memory.
    
    The measure of progress for compute threads is the total number of vertices 
they have collectively processed in a superstep at each point, the measure of 
progress for communication threads is the total number of bytes received by a 
worker up to each point, and the measure of progress for IO threads is the 
amount of data read/written to disk up to each point during a superstep. These 
measures are restarted at the beginning of each superstep. We use these 
measures at the point where full GC
    happens (when we have the accurate estimation of non-garbage data on the 
heap) and devise the linear model of used memory. We then use the linear model 
to estimate the amount of memory at each time based on the above progress 
measures.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/heslami/giraph ooc-memory-estimation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/giraph/pull/12.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12
    
----
commit 4d2b7c96da94aa26cddf75655dd6cb7d93bf9b1d
Author: Hassan Eslami <has...@wirelessprv-10-193-225-240.near.illinois.edu>
Date:   2016-12-05T23:37:29Z

    [GIRAPH-1125] Memory estimation mechanism for more efficient OOC execution
    
    The new out-of-core mechanism is designed with the adaptivity goal in mind, 
meaning that we wanted out-of-core mechanism to kick in only when it is 
necessary. In other words, when the amount of data (graph, messages, and 
mutations) all fit in memory, we want to take advantage of the entire memory. 
And, when in a stage the memory is short, only enough (minimal) amount of data 
goes out of core (to disk). This ensures a good performance for the out-of-core 
mechanism.
    
    To satisfy the adaptiveness goal, we need to know how much memory is used 
at each point of time. The default out-of-core mechanism (ThresholdBasedOracle) 
get memory information based on JVM's internal methods (Runtime's 
freeMemory()). This method is inaccurate (and pessimistic), meaning that it 
does not account for garbage data that has not been purged by GC. Using JVM's 
default methods, OOC behaves pessimistically and move data out of core even if 
it is not necessary. For instance,
    consider the case where there are a lot of garbage on the heap, but GC has 
not happened for a while. In this case, the default OOC pushes data on disk and 
immediately after a major GC it brings back the data to memory. This causes 
inefficiency in the default out of core mechanism. If out-of-core is used, but 
the data can entirely fit in memory, the job goes out of core even though going 
out of core is not necessary.
    
    To address this issue, we need to have a mechanism to more accurately know 
how much of heap is filled with non-garbage data. Consequently, we need to 
change the Oracle (OOC policy) to take advantage of a more accurate memory 
usage estimation.
    
    In this diff, we introduce a mechanism to estimate the amount of memory 
used by non-garbage data on the heap at each point of time. This estimation is 
based on the fact that Giraph is a data-parallel system in its essence, meaning 
that several types of threads exist, each type doing the same computation on 
various data. More specifically, we have compute/input threads, communication 
(receiving) threads, and OOC-IO threads. In a normal uniform execution, each 
type of threads behave
    similarly and contribute similarly to each other on the memory footprint 
(meaning that different compute threads contribute similarly to each other on 
the memory footprint). In the proposed approach, we use a measure of progress 
for each type of thread and use linear regression to estimate the amount of 
memory.
    
    The measure of progress for compute threads is the total number of vertices 
they have collectively processed in a superstep at each point, the measure of 
progress for communication threads is the total number of bytes received by a 
worker up to each point, and the measure of progress for IO threads is the 
amount of data read/written to disk up to each point during a superstep. These 
measures are restarted at the beginning of each superstep. We use these 
measures at point where full GC
    happens (when we have the accurate estimation of non-garbage data on heap) 
and devise the linear model of used memory. We then use the linear model to 
estimate the amount of memory at each time based on the above progress measures.

----


> Add memory estimation mechanism to out-of-core
> ----------------------------------------------
>
>                 Key: GIRAPH-1125
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1125
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Hassan Eslami
>            Assignee: Hassan Eslami
>
> The new out-of-core mechanism is designed with the adaptivity goal in mind, 
> meaning that we wanted out-of-core mechanism to kick in only when it is 
> necessary. In other words, when the amount of data (graph, messages, and 
> mutations) all fit in memory, we want to take advantage of the entire memory. 
> And, when in a stage the memory is short, only enough (minimal) amount of 
> data goes out of core (to disk). This ensures a good performance for the 
> out-of-core mechanism.
> To satisfy the adaptiveness goal, we need to know how much memory is used at 
> each point of time. The default out-of-core mechanism (ThresholdBasedOracle) 
> get memory information based on JVM's internal methods (Runtime's 
> freeMemory()). This method is inaccurate (and pessimistic), meaning that it 
> does not account for garbage data that has not been purged by GC. Using JVM's 
> default methods, OOC behaves pessimistically and move data out of core even 
> if it is not necessary. For instance, consider the case where there are a lot 
> of garbage on the heap, but GC has not happened for a while. In this case, 
> the default OOC pushes data on disk and immediately after a major GC it 
> brings back the data to memory. This causes inefficiency in the default out 
> of core mechanism. If out-of-core is used but the data can entirely fit in 
> memory, the job goes out of core even though going out of core is not 
> necessary.
> To address this issue, we need to have a mechanism to more accurately know 
> how much of heap is filled with non-garbage data. Consequently, we need to 
> change the Oracle (OOC policy) to take advantage of a more accurate memory 
> usage estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to