[
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483786#comment-15483786
]
Nandor Kollar commented on PIG-3891:
------------------------------------
Attached patch:
- executed TestMultiStorage in Tez mode, after minor adjustments it passed in
Tez mode too. When I executed the tests, it seemed that in Tez mode the
statistics written to the console are not collected via
FileBasedOutputSizeReader, I could see the correct values there even without
the fix, but in MR mode the console output was incorrect without the recursive
traversal fix in FileBasedOutputSizeReader. I don't know what kind of changes I
should do in MRJobStats, JobStats, those tests passed even in Tez and in MR
mode.
- in TestMultiStorage I added asserts for getMultiStoreCounters
- renamed the method, added a comment
- in addition, I fixed typos in methods in TestMRJobStats.java
[~rohini] could you please take a look at the latest patch?
> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
> Key: PIG-3891
> URL: https://issues.apache.org/jira/browse/PIG-3891
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Rohini Palaniswamy
> Assignee: Nandor Kollar
> Attachments: PIG-3891-1.patch, PIG-3891-2.patch, PIG-3891-3.patch,
> PIG-3891-4.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output
> directory. So if files are stored under subdirectories (For eg:
> MultiStorage), it does not have the bytes written correctly.
> 0.11 shows the correct number of total bytes written and this is a
> regression. A quick look at the code shows that the
> JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and
> code is same as FileBasedOutputSizeReader. Need to investigate where the
> correct value comes from in 0.11 and fix it in 0.12.1/0.13.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)