Hassan Eslami created GIRAPH-1125:
-------------------------------------

             Summary: Add memory estimation mechanism to out-of-core
                 Key: GIRAPH-1125
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1125
             Project: Giraph
          Issue Type: Improvement
            Reporter: Hassan Eslami
            Assignee: Hassan Eslami


The new out-of-core mechanism is designed with the adaptivity goal in mind, 
meaning that we wanted out-of-core mechanism to kick in only when it is 
necessary. In other words, when the amount of data (graph, messages, and 
mutations) all fit in memory, we want to take advantage of the entire memory. 
And, when in a stage the memory is short, only enough (minimal) amount of data 
goes out of core (to disk). This ensures a good performance for the out-of-core 
mechanism.

To satisfy the adaptiveness goal, we need to know how much memory is used at 
each point of time. The default out-of-core mechanism (ThresholdBasedOracle) 
get memory information based on JVM's internal methods (Runtime's 
freeMemory()). This method is inaccurate (and pessimistic), meaning that it 
does not account for garbage data that has not been purged by GC. Using JVM's 
default methods, OOC behaves pessimistically and move data out of core even if 
it is not necessary. For instance, consider the case where there are a lot of 
garbage on the heap, but GC has not happened for a while. In this case, the 
default OOC pushes data on disk and immediately after a major GC it brings back 
the data to memory. This causes inefficiency in the default out of core 
mechanism. If out-of-core is used but the data can entirely fit in memory, the 
job goes out of core even though going out of core is not necessary.

To address this issue, we need to have a mechanism to more accurately know how 
much of heap is filled with non-garbage data. Consequently, we need to change 
the Oracle (OOC policy) to take advantage of a more accurate memory usage 
estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to