[ https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566522#comment-14566522 ]
ASF GitHub Bot commented on FLINK-2121: --------------------------------------- GitHub user ggevay opened a pull request: https://github.com/apache/flink/pull/752 [FLINK-2121] Fix the summation in FileInputFormat.addFilesInDir Removed the length parameter, and made the length calculation start from 0 instead. I also added a second inner dir to the test, so now it catches this problem with any directory listing order. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ggevay/flink dirSizeFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/752.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #752 ---- commit 7fc86ce10ddc640126c7da8265403a815a30c2d2 Author: Gabor Gevay <gga...@gmail.com> Date: 2015-05-31T11:27:15Z [FLINK-2121] Fix the recursive summation in FileInputFormat.addFilesInDir ---- > FileInputFormat.addFilesInDir miscalculates total size > ------------------------------------------------------ > > Key: FLINK-2121 > URL: https://issues.apache.org/jira/browse/FLINK-2121 > Project: Flink > Issue Type: Bug > Components: Core > Reporter: Gabor Gevay > Assignee: Gabor Gevay > Priority: Minor > > In FileInputFormat.addFilesInDir, the length variable should start from 0, > because the return value is always used by adding it to the length (instead > of just assigning). So with the current version, the length before the call > will be seen twice in the result. > mvn verify caught this for me now. The reason why this hasn't been seen yet, > is because testGetStatisticsMultipleNestedFiles catches this only if it gets > the listings of the outer directory in a certain order. Concretely, if the > inner directory is seen before the other file in the outer directory, then > length is 0 at that point, so the bug doesn't show. But if the other file is > seen first, then its size is added twice to the total result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)