[jira] [Commented] (GIRAPH-1160) Fix memory estimation in MemoryEstimatorOrcal

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174863#comment-16174863
 ] 

ASF GitHub Bot commented on GIRAPH-1160:


Github user asfgit closed the pull request at:

https://github.com/apache/giraph/pull/49


> Fix memory estimation in MemoryEstimatorOrcal
> -
>
> Key: GIRAPH-1160
> URL: https://issues.apache.org/jira/browse/GIRAPH-1160
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dionysios Logothetis
>
> Method MemoryEstimatorOracle.calculateRegression() exits if the number of 
> valid columns to use for the regression is not the same as the total number 
> of columns. This is wrong, the regression can run on only the valid columns. 
> This causes the memory estimation to be very off.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GIRAPH-1160) Fix memory estimation in MemoryEstimatorOrcal

2017-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172284#comment-16172284
 ] 

ASF GitHub Bot commented on GIRAPH-1160:


GitHub user dlogothetis opened a pull request:

https://github.com/apache/giraph/pull/49

Fix bug in memory estimation

Method MemoryEstimatorOracle.calculateRegression() exits if the number of 
valid columns to use for the regression is not the same as the total number of 
columns. This is wrong, the regression can still run on only the valid columns. 
This causes memory estimation to never be used in practice, and OOC starts 
spilling only when memory usage gets very high.

This is fixed in https://github.com/apache/giraph/pull/34 too, but I want 
to make these changes one-by-one so that we can test in isolation.

Tests:
- mvn clean install
- Snapshot tests, including snapshot test that uses OOC.
- Run 3 production jobs and verified that this reduces data spills and jobs 
finish faster. The max % spilled is reduced by more than 40%.

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1160




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dlogothetis/giraph fix_mem_est

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/giraph/pull/49.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #49


commit f5a124beef6b65bf8f9178120fefc1360566fda6
Author: Dionysios Logothetis 
Date:   2017-09-19T14:47:56Z

Fix bug in memory estimation




> Fix memory estimation in MemoryEstimatorOrcal
> -
>
> Key: GIRAPH-1160
> URL: https://issues.apache.org/jira/browse/GIRAPH-1160
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dionysios Logothetis
>
> Method MemoryEstimatorOracle.calculateRegression() exits if the number of 
> valid columns to use for the regression is not the same as the total number 
> of columns. This is wrong, the regression can run on only the valid columns. 
> This causes the memory estimation to be very off.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)