[ https://issues.apache.org/jira/browse/GIRAPH-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781296#comment-15781296 ]
ASF GitHub Bot commented on GIRAPH-1125: ---------------------------------------- Github user dlogothetis commented on the issue: https://github.com/apache/giraph/pull/12 Somehow the file MemoryEstimatorOracle.java wasn't committed. I pulled the most recent changes, I can see the commit log (see below), but this file is missing. commit f5b685efa09b539b1f95925405723f7ac7b1dcea Author: Hassan Eslami <hesl...@apache.org> Date: Fri Dec 23 12:03:37 2016 -0600 GIRAPH-1125 Closes #12 > Add memory estimation mechanism to out-of-core > ---------------------------------------------- > > Key: GIRAPH-1125 > URL: https://issues.apache.org/jira/browse/GIRAPH-1125 > Project: Giraph > Issue Type: Improvement > Reporter: Hassan Eslami > Assignee: Hassan Eslami > > The new out-of-core mechanism is designed with the adaptivity goal in mind, > meaning that we wanted out-of-core mechanism to kick in only when it is > necessary. In other words, when the amount of data (graph, messages, and > mutations) all fit in memory, we want to take advantage of the entire memory. > And, when in a stage the memory is short, only enough (minimal) amount of > data goes out of core (to disk). This ensures a good performance for the > out-of-core mechanism. > To satisfy the adaptiveness goal, we need to know how much memory is used at > each point of time. The default out-of-core mechanism (ThresholdBasedOracle) > get memory information based on JVM's internal methods (Runtime's > freeMemory()). This method is inaccurate (and pessimistic), meaning that it > does not account for garbage data that has not been purged by GC. Using JVM's > default methods, OOC behaves pessimistically and move data out of core even > if it is not necessary. For instance, consider the case where there are a lot > of garbage on the heap, but GC has not happened for a while. In this case, > the default OOC pushes data on disk and immediately after a major GC it > brings back the data to memory. This causes inefficiency in the default out > of core mechanism. If out-of-core is used but the data can entirely fit in > memory, the job goes out of core even though going out of core is not > necessary. > To address this issue, we need to have a mechanism to more accurately know > how much of heap is filled with non-garbage data. Consequently, we need to > change the Oracle (OOC policy) to take advantage of a more accurate memory > usage estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)