[
https://issues.apache.org/jira/browse/GIRAPH-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773389#comment-15773389
]
ASF GitHub Bot commented on GIRAPH-1125:
----------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/giraph/pull/12
> Add memory estimation mechanism to out-of-core
> ----------------------------------------------
>
> Key: GIRAPH-1125
> URL: https://issues.apache.org/jira/browse/GIRAPH-1125
> Project: Giraph
> Issue Type: Improvement
> Reporter: Hassan Eslami
> Assignee: Hassan Eslami
>
> The new out-of-core mechanism is designed with the adaptivity goal in mind,
> meaning that we wanted out-of-core mechanism to kick in only when it is
> necessary. In other words, when the amount of data (graph, messages, and
> mutations) all fit in memory, we want to take advantage of the entire memory.
> And, when in a stage the memory is short, only enough (minimal) amount of
> data goes out of core (to disk). This ensures a good performance for the
> out-of-core mechanism.
> To satisfy the adaptiveness goal, we need to know how much memory is used at
> each point of time. The default out-of-core mechanism (ThresholdBasedOracle)
> get memory information based on JVM's internal methods (Runtime's
> freeMemory()). This method is inaccurate (and pessimistic), meaning that it
> does not account for garbage data that has not been purged by GC. Using JVM's
> default methods, OOC behaves pessimistically and move data out of core even
> if it is not necessary. For instance, consider the case where there are a lot
> of garbage on the heap, but GC has not happened for a while. In this case,
> the default OOC pushes data on disk and immediately after a major GC it
> brings back the data to memory. This causes inefficiency in the default out
> of core mechanism. If out-of-core is used but the data can entirely fit in
> memory, the job goes out of core even though going out of core is not
> necessary.
> To address this issue, we need to have a mechanism to more accurately know
> how much of heap is filled with non-garbage data. Consequently, we need to
> change the Oracle (OOC policy) to take advantage of a more accurate memory
> usage estimation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)